New articles on Electrical Engineering and Systems Science


[1] 2602.10155

A Systematic Review on Data-Driven Brain Deformation Modeling for Image-Guided Neurosurgery

Accurate compensation of brain deformation is a critical challenge for reliable image-guided neurosurgery, as surgical manipulation and tumor resection induce tissue motion that misaligns preoperative planning images with intraoperative anatomy and longitudinal studies. In this systematic review, we synthesize recent AI-driven approaches developed between January 2020 and April 2025 for modeling and correcting brain deformation. A comprehensive literature search was conducted in PubMed, IEEE Xplore, Scopus, and Web of Science, with predefined inclusion and exclusion criteria focused on computational methods applied to brain deformation compensation for neurosurgical imaging, resulting in 41 studies meeting these criteria. We provide a unified analysis of methodological strategies, including deep learning-based image registration, direct deformation field regression, synthesis-driven multimodal alignment, resection-aware architectures addressing missing correspondences, and hybrid models that integrate biomechanical priors. We also examine dataset utilization, reported evaluation metrics, validation protocols, and how uncertainty and generalization have been assessed across studies. While AI-based deformation models demonstrate promising performance and computational efficiency, current approaches exhibit limitations in out-of-distribution robustness, standardized benchmarking, interpretability, and readiness for clinical deployment. Our review highlights these gaps and outlines opportunities for future research aimed at achieving more robust, generalizable, and clinically translatable deformation compensation solutions for neurosurgical guidance. By organizing recent advances and critically evaluating evaluation practices, this work provides a comprehensive foundation for researchers and clinicians engaged in developing and applying AI-based brain deformation methods.


[2] 2602.10167

Anatomy-Preserving Latent Diffusion for Generation of Brain Segmentation Masks with Ischemic Infarct

The scarcity of high-quality segmentation masks remains a major bottleneck for medical image analysis, particularly in non-contrast CT (NCCT) neuroimaging, where manual annotation is costly and variable. To address this limitation, we propose an anatomy-preserving generative framework for the unconditional synthesis of multi-class brain segmentation masks, including ischemic infarcts. The proposed approach combines a variational autoencoder trained exclusively on segmentation masks to learn an anatomical latent representation, with a diffusion model operating in this latent space to generate new samples from pure noise. At inference, synthetic masks are obtained by decoding denoised latent vectors through the frozen VAE decoder, with optional coarse control over lesion presence via a binary prompt. Qualitative results show that the generated masks preserve global brain anatomy, discrete tissue semantics, and realistic variability, while avoiding the structural artifacts commonly observed in pixel-space generative models. Overall, the proposed framework offers a simple and scalable solution for anatomy-aware mask generation in data-scarce medical imaging scenarios.


[3] 2602.10315

Uncertainty-Aware Ordinal Deep Learning for cross-Dataset Diabetic Retinopathy Grading

Diabetes mellitus is a chronic metabolic disorder characterized by persistent hyperglycemia due to insufficient insulin production or impaired insulin utilization. One of its most severe complications is diabetic retinopathy (DR), a progressive retinal disease caused by microvascular damage, leading to hemorrhages, exudates, and potential vision loss. Early and reliable detection of DR is therefore critical for preventing irreversible blindness. In this work, we propose an uncertainty-aware deep learning framework for automated DR severity grading that explicitly models the ordinal nature of disease progression. Our approach combines a convolutional backbone with lesion-query attention pooling and an evidential Dirichlet-based ordinal regression head, enabling both accurate severity prediction and principled estimation of predictive uncertainty. The model is trained using an ordinal evidential loss with annealed regularization to encourage calibrated confidence under domain shift. We evaluate the proposed method on a multi-domain training setup combining APTOS, Messidor-2, and a subset of EyePACS fundus datasets. Experimental results demonstrate strong cross-dataset generalization, achieving competitive classification accuracy and high quadratic weighted kappa on held-out test sets, while providing meaningful uncertainty estimates for low-confidence cases. These results suggest that ordinal evidential learning is a promising direction for robust and clinically reliable diabetic retinopathy grading.


[4] 2602.10336

Analyzing Model Misspecification in Quantitative MRI: Application to Perfusion ASL

Quantitative MRI (qMRI) involves parameter estimation governed by an explicit signal model. However, these models are often confounded and difficult to validate in vivo. A model is misspecified when the assumed signal model differs from the true data-generating process. Under misspecification, the variance of any unbiased estimator is lower-bounded by the misspecified Cramer-Rao bound (MCRB), and maximum-likelihood estimates (MLE) may exhibit bias and inconsistency. Based on these principles, we assess misspecification in qMRI using two tests: (i) examining whether empirical MCRB asymptotically approaches the CRB as repeated measurements increase; (ii) comparing MLE estimates from two equal-sized subsets and evaluating whether their empirical variance aligns with theoretical CRB predictions. We demonstrate the framework using arterial spin labeling (ASL) as an illustrative example. Our result shows the commonly used ASL signal model appears to be specified in the brain and moderately misspecified in the kidney. The proposed framework offers a general, theoretically grounded approach for assessing model validity in quantitative MRI.


[5] 2602.10355

Efficient Policy Adaptation for Voltage Control Under Unknown Topology Changes

Reinforcement learning (RL) has shown great potential for designing voltage control policies, but their performance often degrades under changing system conditions such as topology reconfigurations and load variations. We introduce a topology-aware online policy optimization framework that leverages data-driven estimation of voltage-reactive power sensitivities to achieve efficient policy adaptation. Exploiting the sparsity of topology-switching events, where only a few lines change at a time, our method efficiently detects topology changes and identifies the affected lines and parameters, enabling fast and accurate sensitivity updates without recomputing the full sensitivity matrix. The estimated sensitivity is subsequently used for online policy optimization of a pre-trained neural-network-based RL controller. Simulations on both the IEEE 13-bus and SCE 56-bus systems demonstrate over 90 percent line identification accuracy, using only 15 data points. The proposed method also significantly improves voltage regulation performance compared with non-adaptive policies and adaptive policies that rely on regression-based online optimization methods for sensitivity estimation.


[6] 2602.10359

Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT

Purpose: Translating foundation models into clinical practice requires evaluating their performance under compound distribution shift, where severe class imbalance coexists with heterogeneous imaging appearances. This challenge is relevant for traumatic bowel injury, a rare but high-mortality diagnosis. We investigated whether specificity deficits in foundation models are associated with heterogeneity in the negative class. Methods: This retrospective study used the multi-institutional, RSNA Abdominal Traumatic Injury CT dataset (2019-2023), comprising scans from 23 centres. Two foundation models (MedCLIP, zero-shot; RadDINO, linear probe) were compared against three task-specific approaches (CNN, Transformer, Ensemble). Models were trained on 3,147 patients (2.3% bowel injury prevalence) and evaluated on an enriched 100-patient test set. To isolate negative-class effects, specificity was assessed in patients without bowel injury who had concurrent solid organ injury (n=58) versus no abdominal pathology (n=50). Results: Foundation models achieved equivalent discrimination to task-specific models (AUC, 0.64-0.68 versus 0.58-0.64) with higher sensitivity (79-91% vs 41-74%) but lower specificity (33-50% vs 50-88%). All models demonstrated high specificity in patients without abdominal pathology (84-100%). When solid organ injuries were present, specificity declined substantially for foundation models (50-51 percentage points) compared with smaller reductions of 12-41 percentage points for task-specific models. Conclusion: Foundation models matched task-specific discrimination without task-specific training, but their specificity deficits were driven primarily by confounding negative-class heterogeneity rather than prevalence alone. Susceptibility to negative-class heterogeneity decreased progressively with labelled training, suggesting adaptation is required before clinical implementation.


[7] 2602.10394

Boiling flow parameter estimation from boundary layer data

Atmospheric turbulence and aero-optic effects cause phase aberrations in propagating light waves, thereby reducing effectiveness in transmitting and receiving coherent light from an aircraft. Existing optical sensors can measure the resulting phase aberrations, but the physical experiments required to induce these aberrations are expensive and time-intensive. Simulation methods could provide a less expensive alternative. For example, an existing simulation algorithm called boiling flow, which generalizes the Taylor frozen-flow method, can generate synthetic phase aberration data (i.e., phase screens) induced by atmospheric turbulence. However, boiling flow depends on physical parameters, such as the Fried coherence length r0, which are not well-defined for aero-optic effects. In this paper, we introduce a method to estimate the parameters of boiling flow from measured aero-optic phase aberration data. Our algorithm estimates these parameters to fit the spatial and temporal statistics of the measured data. This method is computationally efficient and our experiments show that the temporal power spectral density of the slopes of the synthetic phase screens reasonably matches that of the measured phase aberrations from two turbulent boundary layer data sets, with errors between 8-9%. However, the Kolmogorov spatial structure function of the phase screens does not match that of the measured phase aberrations, with errors above 28%. This suggests that, while the parameters of boiling flow can reasonably fit the temporal statistics of highly convective data, they cannot fit the complex spatial statistics of aero-optic phase aberrations.


[8] 2602.10397

Resilient Voltage Estimation for Battery Packs Using Self-Learning Koopman Operator

Cloud-based battery management systems (BMSs) rely on real-time voltage measurement data to ensure coordinated bi-directional charging of electric vehicles (EVs) with vehicle-to-grid technology. Unfortunately, an adversary can corrupt the measurement data during transmission from the local-BMS to the cloud-BMS, leading to disrupted EV charging. Therefore, to ensure reliable voltage data under such sensor attacks, this paper proposes a two-stage error-corrected self-learning Koopman operator-based secure voltage estimation scheme for large-format battery packs. The first stage of correction compensates for the Koopman approximation error. The second stage aims to recover the error amassing from the lack of higher-order battery dynamics information in the self-learning feedback, using two alternative methods: an adaptable empirical strategy that uses cell-level knowledge of open circuit voltage to state-of-charge mapping for pack-level estimation, and a Gaussian process regression-based data-driven method that leverages minimal data-training. During our comprehensive case studies using the high-fidelity battery simulation package 'PyBaMM-liionpack', our proposed secure estimator reliably generated real-time voltage estimation with high accuracy under varying pack topologies, charging settings, battery age-levels, and attack policies. Thus, the scalable and adaptable algorithm can be easily employed to diverse battery configurations and operating conditions, without requiring significant modifications, excessive data or sensor redundancy, to ensure optimum charging of EVs under compromised sensing.


[9] 2602.10417

RadarEye: Robust Liquid Level Tracking Using mmWave Radar in Robotic Pouring

Transparent liquid manipulation in robotic pouring remains challenging for perception systems: specular/refraction effects and lighting variability degrade visual cues, undermining reliable level estimation. To address this challenge, we introduce RadarEye, a real-time mmWave radar signal processing pipeline for robust liquid level estimation and tracking during the whole pouring process. RadarEye integrates (i) a high-resolution range-angle beamforming module for liquid level sensing and (ii) a physics-informed mid-pour tracker that suppresses multipath to maintain lock on the liquid surface despite stream-induced clutter and source container reflections. The pipeline delivers sub-millisecond latency. In real-robot water-pouring experiments, RadarEye achieves a 0.35 cm median absolute height error at 0.62 ms per update, substantially outperforming vision and ultrasound baselines.


[10] 2602.10434

Benchmarking Deep Learning and Statistical Target Detection Methods for PFM-1 Landmine Detection in UAV Hyperspectral Imagery

In recent years, unmanned aerial vehicles (UAVs) equipped with imaging sensors and automated processing algorithms have emerged as a promising tool to accelerate large-area surveys while reducing risk to human operators. Although hyperspectral imaging (HSI) enables material discrimination using spectral signatures, standardized benchmarks for UAV-based landmine detection remain scarce. In this work, we present a systematic benchmark of four classical statistical detection algorithms, including Spectral Angle Mapper (SAM), Matched Filter (MF), Adaptive Cosine Estimator (ACE), and Constrained Energy Minimization (CEM), alongside a proposed lightweight Spectral Neural Network utilizing Parametric Mish activations for PFM-1 landmine detection. We also release pixel-level binary ground truth masks (target/background) to enable standardized, reproducible evaluation. Evaluations were conducted on inert PFM-1 targets across multiple scene crops using a recently released VNIR hyperspectral dataset. Metrics such as receiver operating characteristic (ROC) curve, area under the curve (AUC), precision-recall (PR) curve, and average precision (AP) were used. While all methods achieve high ROC-AUC on an independent test set, the ACE method observes the highest AUC of 0.989. However, because target pixels are extremely sparse relative to background, ROC-AUC alone can be misleading; under precision-focused evaluation (PR and AP), the Spectral-NN outperforms classical detectors, achieving the highest AP. These results emphasize the need for precision-focused evaluation, scene-aware benchmarking, and learning-based spectral models for reliable UAV-based hyperspectral landmine detection. The code and pixel-level annotations will be released.


[11] 2602.10438

Nonparametric Variational Bayesian Learning for Channel Estimation with OTFS Modulation

Orthogonal time frequency space (OTFS) modulation has demonstrated significant advantages in high-mobility scenarios in future 6G networks. However, existing channel estimation methods often overlook the structured sparsity and clustering characteristics inherent in realistic clustered delay line (CDL) channels, leading to degraded performance in practical systems. To address this issue, we propose a novel nonparametric Bayesian learning (NPBL) framework for OTFS channel estimation. Specifically, a stick-breaking process is introduced to automatically infer the number of multipath components and assign each path to its corresponding cluster. The channel coefficients within each cluster are modeled by a Gaussian mixture distribution to capture complex fading statistics. Furthermore, an effective pruning criterion is designed to eliminate spurious multipath components, thereby enhancing estimation accuracy and reducing computational complexity. Simulation results demonstrate that the proposed method achieves superior performance in terms of normalized mean squared error compared to existing methods.


[12] 2602.10523

Scale-Free delta-Level Coherent Output Synchronization of Multi-Agent Systems with Adaptive Protocols and Bounded Disturbances

In this paper, we investigate scale-free delta-level coherent output synchronization for multi-agent systems (MAS) operating under bounded disturbances or noises. We introduce an adaptive scale-free framework designed solely based on the knowledge of agent models and completely agnostic to both the communication topology and the size of the network. We define the level of coherency for each agent as the norm of the weighted sum of the disagreement dynamics with its neighbors. We define each agents coherency level as the norm of a weighted sum of its disagreement dynamics relative to its neighbors. The goal is to ensure that the networks coherency level remains below a prescribed threshold delta, without requiring any a priori knowledge of the disturbance.


[13] 2602.10537

Clutter-Aware Integrated Sensing and Communication: Models, Methods, and Future Directions

Integrated sensing and communication (ISAC) can substantially improve spectral, hardware, and energy efficiency by unifying radar sensing and data communications. In wideband and scattering-rich environments, clutter often dominates weak target reflections and becomes a fundamental bottleneck for reliable sensing. Practical ISAC clutter includes "cold" clutter arising from environmental backscatter of the probing waveform, and "hot" clutter induced by external interference and reflections from the environment whose statistics can vary rapidly over time. In this article, we develop a unified wideband multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) signal model that captures both clutter types across the space, time, and frequency domains. Building on this model, we review clutter characterization at multiple levels, including amplitude statistics, robust spherically invariant random vector (SIRV) modeling, and structured covariance representations suitable for limited-snapshot regimes. We then summarize receiver-side suppression methods in the temporal and spatial domains, together with extensions to space-time adaptive processing (STAP) and space-frequency-time adaptive processing (SFTAP), and we provide guidance on selecting techniques under different waveform and interference conditions. To move beyond reactive suppression, we discuss clutter-aware transceiver co-design that couples beamforming and waveform optimization with practical communication quality-of-service (QoS) constraints to enable proactive clutter avoidance. We conclude with open challenges and research directions toward environment-adaptive and clutter-resilient ISAC for next-generation networks.


[14] 2602.10567

Rapid Boundary Stabilization of Two-Dimensional Elastic Plates with In-Domain Aeroelastic Instabilities

Motivated by active wing flutter suppression in high-Mach-number flight, this paper presents a rapid boundary stabilization strategy for a two-dimensional PDE-modeled elastic plate with in-domain instabilities, where the exponential stability is achieved with a decay rate that can be arbitrarily assigned by the users. First, the aeroelastic system is modeled as two-dimensional coupled wave PDEs with internal anti-damping terms, derived by Piston theory and Hamilton's principle. Using Fourier series expansion, the 2-D problem is decomposed into a parameterized family of 1-D systems. For each mode, a full-state boundary feedback controller is designed via PDE backstepping transformation. To enable output-feedback implementation, a state observer is further designed to estimate the distributed states over the two-dimensional spatial domain. Through Lyapunov analysis, the exponential stability of the 2-D elastic plate PDE under the proposed boundary control is established with a designer-tunable decay rate. Numerical simulations verify the effectiveness of the control strategy in suppressing flow-induced vibrations.


[15] 2602.10656

AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

Due to recent advancements in Large Audio-Language Models (LALMs) that demonstrate remarkable performance across a range of sound-, speech- and music-related tasks, there is a growing interest in proposing benchmarks to assess these models. Existing benchmarks generally focus only on reasoning with internal knowledge, neglecting real-world scenarios that require external information grounding. To bridge this gap, we introduce AudioRAG, a novel benchmark designed to evaluate audio-based reasoning augmented by information retrieval in realistic web environments. This benchmark comprises both LLM-generated and manually curated question-answer pairs. Our evaluations reveal that even the state-of-the-art LALMs struggle to answer these questions. We therefore propose an agentic pipeline that integrates audio reasoning with retrieval-augmented generation, providing a stronger baseline for future research.


[16] 2602.10666

From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks

Speech Enhancement (SE) in audio devices is often supported by auxiliary modules for Voice Activity Detection (VAD), SNR estimation, or Acoustic Scene Classification to ensure robust context-aware behavior and seamless user experience. Just like SE, these tasks often employ deep learning; however, deploying additional models on-device is computationally impractical, whereas cloud-based inference would introduce additional latency and compromise privacy. Prior work on SE employed Dynamic Channel Pruning (DynCP) to reduce computation by adaptively disabling specific channels based on the current input. In this work, we investigate whether useful signal properties can be estimated from these internal pruning masks, thus removing the need for separate models. We show that simple, interpretable predictors achieve up to 93% accuracy on VAD, 84% on noise classification, and an R2 of 0.86 on F0 estimation. With binary masks, predictions reduce to weighted sums, inducing negligible overhead. Our contribution is twofold: on one hand, we examine the emergent behavior of DynCP models through the lens of downstream prediction tasks, to reveal what they are learning; on the other, we repurpose and re-propose DynCP as a holistic solution for efficient SE and simultaneous estimation of signal properties.


[17] 2602.10716

RE-LLM: Refining Empathetic Speech-LLM Responses by Integrating Emotion Nuance

With generative AI advancing, empathy in human-AI interaction is essential. While prior work focuses on emotional reflection, emotional exploration, key to deeper engagement, remains overlooked. Existing LLMs rely on text which captures limited emotion nuances. To address this, we propose RE-LLM, a speech-LLM integrating dimensional emotion embeddings and auxiliary learning. Experiments show statistically significant gains in empathy metrics across three datasets. RE-LLM relatively improves the Emotional Reaction score by 14.79% and 6.76% compared to text-only and speech-LLM baselines on ESD. Notably, it raises the Exploration score by 35.42% and 3.91% on IEMOCAP, 139.28% and 9.83% on ESD, and 60.95% and 22.64% on MSP-PODCAST. It also boosts unweighted accuracy by 5.4% on IEMOCAP, 2.3% on ESD, and 6.9% on MSP-PODCAST in speech emotion recognition. These results highlight the enriched emotional understanding and improved empathetic response generation of RE-LLM.


[18] 2602.10724

Integrating Active Damping with Shaping-Filtered Reset Tracking Control for Piezo-Actuated Nanopositioning

Piezoelectric nanopositioning systems are often limited by lightly damped structural resonances and the gain--phase constraints of linear feedback, which restrict achievable bandwidth and tracking performance. This paper presents a dual-loop architecture that combines an inner-loop non-minimum-phase resonant controller (NRC) for active damping with an outer-loop tracking controller augmented by a constant-gain, lead-in-phase (CgLp) reset element to provide phase lead at the targeted crossover without increasing loop gain. We show that aggressively tuned CgLp designs with larger phase lead can introduce pronounced higher-order harmonics, degrading error sensitivity in specific frequency bands and causing multiple-reset behavior. To address this, a shaping filter is introduced in the reset-trigger path to regulate the reset action and suppress harmonic-induced effects while preserving the desired crossover-phase recovery. The proposed controllers are implemented in real time on an industrial piezo nanopositioner, demonstrating an experimental open-loop crossover increase of approximately 55~Hz and a closed-loop bandwidth improvement of about 34~Hz relative to a well-tuned linear baseline.


[19] 2602.10736

Transfer to Sky: Unveil Low-Altitude Route-Level Radio Maps via Ground Crowdsourced Data

The expansion of the low-altitude economy is contingent on reliable cellular connectivity for unmanned aerial vehicles (UAVs). A key challenge in pre-flight planning is predicting communication link quality along proposed and pre-defined routes, a task hampered by sparse measurements that render existing radio map methods ineffective. This paper introduces a transfer learning framework for high-fidelity route-level radio map prediction. Our key insight is to leverage abundant crowdsourced ground signals as auxiliary supervision. To bridge the significant domain gap between ground and aerial data and address spatial sparsity, our framework learns general propagation priors from simulation, performs adversarial alignment of the feature spaces, and is fine-tuned on limited real UAV measurements. Extensive experiments on a real-world dataset from Meituan show that our method achieves over 50% higher accuracy in predicting Route RSRP compared to state-of-the-art baselines.


[20] 2602.10742

Stochastic Design of Active RIS-Assisted Satellite Downlinks under Interference, Folded Noise, and EIRP Constraints

Active reconfigurable intelligent surfaces (RISs) can mitigate the double-fading loss of passive reflection in satellite downlinks. However, their gains are limited by random co-channel interference, gain-dependent amplifier noise, and regulatory emission constraints. This paper develops a stochastic reliability framework for active RIS-assisted satellite downlinks by modeling the desired and interfering channels, receiver noise, and RIS amplifier noise as random variables. The resulting instantaneous signal-to-interference-plus-noise ratio (SINR) model explicitly captures folded cascaded amplifier noise and reveals a finite high-gain SINR ceiling. To guarantee a target outage level, we formulate a chance-constrained max-SINR design that jointly optimizes the binary RIS configuration and a common amplification gain. The chance constraint is handled using a sample-average approximation (SAA) with a violation budget. The resulting feasibility problem is solved as a mixed-integer second-order cone program (MISOCP) within a bisection search over the SINR threshold. Practical implementation is enforced by restricting the gain to an admissible range determined by small-signal stability and effective isotropic radiated power (EIRP) limits. We also derive realization-wise SINR envelopes based on eigenvalue and l1-norm bounds, which provide interpretable performance limits and fast diagnostics. Monte Carlo results show that these envelopes tightly bound the simulated SINR, reproduce the predicted saturation behavior, and quantify performance degradation as interference increases. Overall, the paper provides a solver-ready, reliability-targeting design methodology whose achieved reliability is validated through out-of-sample Monte Carlo testing under realistic randomness and hardware constraints.


[21] 2602.10752

Improving CACC Robustness to Parametric Uncertainty via Plant Equivalent Controller Realizations

Cooperative Adaptive Cruise Control (CACC) enables vehicle platooning through inter-vehicle communication, improving traffic efficiency and safety. Conventional CACC relies on feedback linearization, assuming exact vehicle parameters; however, longitudinal vehicle dynamics are nonlinear and subject to parametric uncertainty. Applying feedback linearization with a nominal model yields imperfect cancellation, leading to model mismatch and degraded performance with off-the-shelf CACC controllers. To improve robustness without redesigning the CACC law, we explicitly model the mismatch between the ideal closed-loop dynamics assumed by the CACC design and the actual dynamics under parametric uncertainties. Robustness is formulated as an $\mathcal{L}_2$ trajectory-matching problem, minimizing the energy of this mismatch to make the uncertain system behave as closely as possible to the ideal model. This objective is addressed by optimizing over plant equivalent controller (PEC) realizations that preserve the nominal closed-loop behavior while mitigating the effects of parametric uncertainty. Stability and performance are enforced via linear matrix inequalities, yielding a convex optimization problem applicable to heterogeneous platoons. Experimental results demonstrate improved robustness and performance under parametric uncertainty while preserving nominal CACC behavior.


[22] 2602.10767

Constellation Design for Robust Interference Mitigation

This paper investigates symbol detection for single-carrier communication systems operating in the presence of additive interference with Nakagami-m statistics. Such interference departs from the assumptions underlying conventional detection methods based on Gaussian noise models and leads to detection mismatch that fundamentally affects symbol-level performance. In particular, the presence of random interference amplitude and non-uniform phase alters the structure of the optimal decision regions and renders standard Euclidean distance-based detectors suboptimal. To address this challenge, we develop the maximum-likelihood Gaussian-phase approximate (ML-G) detector, a low-complexity detection rule that accurately approximates maximum-likelihood detection while remaining suitable for practical implementation. The proposed detector explicitly incorporates the statistical properties of the interference and induces decision regions that differ significantly from those arising under conventional metrics. Building on the ML-G framework, we further investigate constellation design under interference-aware detection and formulate an optimization problem that seeks symbol placements that minimize the symbol error probability subject to an average energy constraint. The resulting constellations are obtained numerically and adapt to the interference environment, exhibiting non-standard and asymmetric structures as interference strength increases. Simulation results demonstrate clear symbol error probability gains over established benchmark schemes across a range of interference conditions, particularly in scenarios with dominant additive interference.


[23] 2602.10792

Bayesian Signal Component Decomposition via Diffusion-within-Gibbs Sampling

In signal processing, the data collected from sensing devices is often a noisy linear superposition of multiple components, and the estimation of components of interest constitutes a crucial pre-processing step. In this work, we develop a Bayesian framework for signal component decomposition, which combines Gibbs sampling with plug-and-play (PnP) diffusion priors to draw component samples from the posterior distribution. Unlike many existing methods, our framework supports incorporating model-driven and data-driven prior knowledge into the diffusion prior in a unified manner. Moreover, the proposed posterior sampler allows component priors to be learned separately and flexibly combined without retraining. Under suitable assumptions, the proposed DiG sampler provably produces samples from the posterior distribution. We also show that DiG can be interpreted as an extension of a class of recently proposed diffusion-based samplers, and that, for suitable classes of sensing operators, DiG better exploits the structure of the measurement model. Numerical experiments demonstrate the superior performance of our method over existing approaches.


[24] 2602.10829

Self-Supervised Learning for Speaker Recognition: A study and review

Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, making them costly to scale and prone to poor generalization under unseen conditions. To address these challenges, Self-Supervised Learning (SSL) has emerged as a promising paradigm, leveraging vast amounts of unlabeled data to learn relevant representations. The application of SSL for Automatic Speech Recognition (ASR) has been extensively studied, but research on other downstream tasks, notably Speaker Recognition (SR), remains in its early stages. This work describes major SSL instance-invariance frameworks (e.g., SimCLR, MoCo, and DINO), initially developed for computer vision, along with their adaptation to SR. Various SSL methods for SR, proposed in the literature and built upon these frameworks, are also presented. An extensive review of these approaches is then conducted: (1) the effect of the main hyperparameters of SSL frameworks is investigated; (2) the role of SSL components is studied (e.g., data-augmentation, projector, positive sampling); and (3) SSL frameworks are evaluated on SR with in-domain and out-of-domain data, using a consistent experimental setup, and a comprehensive comparison of SSL methods from the literature is provided. Specifically, DINO achieves the best downstream performance and effectively models intra-speaker variability, although it is highly sensitive to hyperparameters and training conditions, while SimCLR and MoCo provide robust alternatives that effectively capture inter-speaker variability and are less prone to collapse. This work aims to highlight recent trends and advancements, identifying current challenges in the field.


[25] 2602.10835

Reference Output Tracking in Boolean Control Networks

In this paper, the problem of tracking a given reference output trajectory is investigated for the class of Boolean control networks, by resorting to their algebraic representation. First, the case of a finite-length reference trajectory is addressed, and the analysis and algorithm first proposed in [17] are extended to be able to deal with arbitrary initial conditions and to identify all possible solutions. The approach developed for the finite-length case is then adjusted to cope with periodic reference output trajectories. The results of the paper are illustrated through an example.


[26] 2602.10837

FPGA Implementation of Sketched LiDAR for a 192 x 128 SPAD Image Sensor

This study presents an efficient field-programmable gate array (FPGA) implementation of a polynomial spline function-based statistical compression algorithm designed to address the critical challenge of massive data transfer bandwidth in emerging high-spatial-resolution single-photon avalanche diode (SPAD) arrays, where data rates can reach tens of gigabytes per second. In our experiments, the proposed hardware implementation achieves a compression ratio of 512x compared with conventional histogram-based outputs, with the potential for further improvement. The algorithm is first optimized in software using fixed-point (FXP) arithmetic and look-up tables (LUTs) to eliminate explicit additions, multiplications, and non-linear operations. This enables a careful balance between accuracy and hardware resource utilization. Guided by this trade-off analysis, online sketch processing elements (SPEs) are implemented on an FPGA to directly process time-stamp streams from the SPAD sensor. The implementation is validated using a customized LiDAR setup with a 192 x 128-pixel SPAD array. This work demonstrates histogram-free online depth reconstruction with high fidelity, effectively alleviating the time-stamp transfer bottleneck of SPAD arrays and offering scalability as pixel counts continue to increase for future SPADs.


[27] 2602.10855

Singular Port-Hamiltonian Systems Beyond Passivity

In this paper, we study a class of port-Hamiltonian systems whose vector fields exhibit singularities. A representative example of this class has recently been employed in the power electronics literature to implement a grid-forming controller. We show that, under certain conditions, these port-Hamiltonian systems, when interconnected with passive systems, converge to a prescribed non-equilibrium steady state. At first glance, the apparently passive nature of the port-Hamiltonian system seems incompatible with the active power injection required to sustain this non-equilibrium condition. However, we demonstrate that the discontinuity inherent in the vector field provides the additional energy needed to maintain this operating point, indicating that the system is not globally passive. Moreover, when the discontinuity is replaced by a continuous approximation, the resulting system becomes cyclo-dissipative while still capable of supplying the required power.


[28] 2602.10876

Backstepping Control of PDEs on Domains with Graph-Monotone Boundaries

Despite the extensive body of work on backstepping for one-dimensional PDEs, results in higher dimensions remain comparatively limited. Most available methods either exploit particular symmetries of the PDE or address problems posed on parallelepiped domains. To the best of our knowledge, the only approach that enables the design of backstepping controllers on non-parallelepiped regions without symmetry assumptions is the domain extension technique. This method, however, presents several drawbacks. In particular, the control input at each time instant is obtained by simulating a PDE on an extended domain, from which the actual input on the original domain is approximated. By contrast, in the one-dimensional setting, once the time-independent backstepping gain kernel is known, the control input can be computed in closed form as a feedback depending solely on the state at that same instant. Moreover, problems such as output-feedback design or adaptive and robust control do not appear straightforward to address with the domain extension method, at least to the best of our knowledge. These considerations motivate the search, whenever possible, for alternatives that preserve the main advantages of one-dimensional backstepping. A motivating example for the domain extension method is the control of the heat equation on a piano-shaped domain, with actuation applied at the tail of the piano. In this extended abstract, we show through a simple calculation that the domain extension method is not required in this setting. Instead, a strategy akin to that used for parallelepiped domains can be adopted. This result constitutes a first instance of a broader framework for backstepping control of asymmetric PDEs posed on non-parallelepiped regions, which we refer to as domains with graph-monotone boundaries. The general framework is developed in a forthcoming paper.


[29] 2602.10888

Anomaly Detection with Machine Learning Algorithms in Large-Scale Power Grids

We apply several machine learning algorithms to the problem of anomaly detection in operational data for large-scale, high-voltage electric power grids. We observe important differences in the performance of the algorithms. Neural networks typically outperform classical algorithms such as k-nearest neighbors and support vector machines, which we explain by the strong contextual nature of the anomalies. We show that unsupervised learning algorithm work remarkably well and that their predictions are robust against simultaneous, concurring anomalies.


[30] 2602.10906

Training-Free Stimulus Encoding for Retinal Implants via Sparse Projected Gradient Descent

Retinal implants aim to restore functional vision despite photoreceptor degeneration, yet are fundamentally constrained by low resolution electrode arrays and patient-specific perceptual distortions. Most deployed encoders rely on task-agnostic downsampling and linear brightness-to-amplitude mappings, which are suboptimal under realistic perceptual models. While global inverse problems have been formulated as neural networks, such approaches can be fast at inference, and can achieve high reconstruction fidelity, but require training and have limited generalizability to arbitrary inputs. We cast stimulus encoding as a constrained sparse least-squares problem under a linearized perceptual forward model. Our key observation is that the resulting perception matrix can be highly sparse, depending on patient and implant configuration. Building on this, we apply an efficient projected residual norm steepest descent solver that exploits sparsity and supports stimulus bounds via projection. In silico experiments across four simulated patients and implant resolutions from $15\times15$ to $100\times100$ electrodes demonstrate improved reconstruction fidelity, with up to $+0.265$ SSIM increase, $+12.4\,\mathrm{dB}$ PSNR, and $81.4\%$ MAE reduction on Fashion-MNIST compared to Lanczos downsampling.


[31] 2602.10936

Trajectory-based data-driven predictive control and the state-space predictor

We define trajectory predictive control (TPC) as a family of output-feedback indirect data-driven predictive control (DDPC) methods that represent the output trajectory of a discrete-time system as a linear function of the recent input/output history and the planned input trajectory. This paper shows that for different choices of the trajectory predictor, TPC encompasses a wide variety of DDPC methods, including subspace predictive control (SPC), closed-loop SPC, $\gamma$-DDPC, causal-$\gamma$-DDPC, transient predictive control, and others. This paper introduces a trajectory predictor that corresponds to a linear state-space model with the recent input/output history as the state. With this state-space predictor, TPC is a special case of linear model predictive control and therefore inherits its mature theory. In numerical experiments, TPC performance approaches the limit of oracle $H_2$-optimal control with perfect knowledge of the underlying system model. For TPC with small training datasets, the state-space predictor outperforms other predictors because it has fewer parameters.


[32] 2602.10958

Fluid-Antenna-Enabled Integrated Bistatic Sensing and Backscatter Communication Systems

This paper studies a fluid-antenna-enabled integrated bistatic sensing and backscatter communication system for future networks where connectivity, power delivery, and environmental awareness are jointly supported by the same infrastructure. A multi-antenna base station (BS) with transmitting fluid antennas serves downlink users, energizes passive tags, and illuminates radar targets, while a spatially separated multi-antenna reader decodes tag backscatter and processes radar echoes to avoid the strong self-interference that would otherwise obscure weak returns at the BS. The coexistence of tags and targets, however, induces severe near--far disparities and multi-signal interference, which can be mitigated by fluid antennas through additional spatial degrees of freedom that reshape the multi-hop channels. We formulate a transmit-power minimization problem that jointly optimizes the BS information beamformers, sensing covariance matrix, reader receive beamformers, tag reflection coefficients, and fluid-antenna (FA) positions under heterogeneous quality of service constraints for communication, backscatter, and sensing, as well as energy-harvesting and FA geometry requirements. To tackle the resulting non-convex problem, we develop an alternating-optimization block-coordinate framework that solves four tractable subproblems using semidefinite relaxation, majorization--minimization, and successive convex approximation. Numerical results show consistent transmit-power savings over fixed-position antennas and zero-forcing baselines, achieving about 13.7% and 54.5% reductions, respectively.


[33] 2602.10963

Lie Group Variational Integrator for the Geometrically Exact Rod with Circular Cross-Section Incorporating Cross-Sectional Deformation

In this paper, we derive the continuous space-time equations of motion of a three-dimensional geometrically exact rod, or the Cosserat rod, incorporating planar cross-sectional deformation. We then adopt the Lie group variational integrator technique to obtain a discrete model of the rod incorporating both rotational motion and cross-sectional deformation as well. The resulting discrete model possesses several desirable features: it ensures volume conservation of the discrete elements by considering cross-sectional deformation through a local dilatation factor, it demonstrates the beneficial properties associated with the variational integrator technique, such as the preservation of the rotational configuration, and energy conservation with a bounded error. An exhaustive set of numerical results under various initial conditions of the rod demonstrates the efficacy of the model in replicating the physics of the system.


[34] 2602.10976

Physically Consistent Evaluation of Commonly Used Near-Field Models

Near-field multi-antenna wireless communication has attracted growing research interest in recent years. Despite this development, most of the current literature on antennas and reflecting structures relies on simplified models, whose validity for real systems remains unclear. In this paper, we introduce a physically consistent near-field model, which we use to evaluate commonly used models. Our results indicate that common models are sufficient for basic beamfocusing, but fail to accurately predict the sidelobes and frequency dependence of reflecting structures.


[35] 2602.11063

Deep Neural Network-Enhanced Frequency-Constrained Optimal Power Flow with Multi-Governor Dynamics

To ensure frequency security in power systems, both the rate of change of frequency (RoCoF) and the frequency nadir (FN) must be explicitly accounted for in real-time frequency-constrained optimal power flow (FCOPF). However, accurately modeling sys-tem frequency dynamics through analytical formulations is chal-lenging due to their inherent nonlinearity and complexity. To address this issue, deep neural networks (DNNs) are utilized to capture the nonlinear mapping between system operating condi-tions and key frequency performance metrics. In this paper, a DNN-based frequency prediction model is developed and trained using the high-fidelity time-domain simulation data generated in PSCAD/EMTDC. The trained DNN is subsequently transformed into an equivalent mixed-integer linear programming (MILP) form and embedded into the FCOPF problem as additional con-straints to explicitly enforce frequency security, leading to the proposed DNN-FCOPF formulation. For benchmarking, two alternative models are considered: a conventional optimal power flow without frequency constraints and a linearized FCOPF in-corporating system-level RoCoF and FN constraints. The effec-tiveness of the proposed method is demonstrated by comparing the solutions of these three models through extensive PSCAD/EMTDC time-domain simulations under various loading scenarios.


[36] 2602.11076

Interpretable Attention-Based Multi-Agent PPO for Latency Spike Resolution in 6G RAN Slicing

Sixth-generation (6G) radio access networks (RANs) must enforce strict service-level agreements (SLAs) for heterogeneous slices, yet sudden latency spikes remain difficult to diagnose and resolve with conventional deep reinforcement learning (DRL) or explainable RL (XRL). We propose \emph{Attention-Enhanced Multi-Agent Proximal Policy Optimization (AE-MAPPO)}, which integrates six specialized attention mechanisms into multi-agent slice control and surfaces them as zero-cost, faithful explanations. The framework operates across O-RAN timescales with a three-phase strategy: predictive, reactive, and inter-slice optimization. A URLLC case study shows AE-MAPPO resolves a latency spike in $18$ms, restores latency to $0.98$ms with $99.9999\%$ reliability, and reduces troubleshooting time by $93\%$ while maintaining eMBB and mMTC continuity. These results confirm AE-MAPPO's ability to combine SLA compliance with inherent interpretability, enabling trustworthy and real-time automation for 6G RAN slicing.


[37] 2602.11077

Credit-Based vs. Discount-Based Congestion Pricing: A Comparison Study

Credit-based congestion pricing (CBCP) and discount-based congestion pricing (DBCP), which respectively allot travel credits and toll discounts to subsidize low-income users' access to tolled roads, have emerged as promising policies for alleviating the societal inequity concerns of congestion pricing. However, since real-world implementations of CBCP and DBCP are nascent, their relative merits remain unclear. In this work, we compare the efficacy of deploying CBCP and DBCP in reducing user costs and increasing toll revenues. We first formulate a non-atomic congestion game in which low-income users receive a travel credit or toll discount for accessing tolled lanes. We establish that, in our formulation, Nash equilibrium flows always exist and can be computed or well approximated via convex programming. Our main result establishes a set of practically relevant conditions under which DBCP provably outperforms CBCP in inducing equilibrium outcomes that minimize a given societal cost, which encodes user cost reduction and toll revenue maximization. Finally, we validate our theoretical contributions via a case study of the 101 Express Lanes Project, a CBCP program implemented in the San Francisco Bay Area.


[38] 2602.11116

Multi-UAV Trajectory Optimization for Bearing-Only Localization in GPS Denied Environments

Accurate localization of maritime targets by unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments. UAVs equipped with gimballed electro-optical sensors are typically used to localize targets, however, reliance on these sensors increases mechanical complexity, cost, and susceptibility to single-point failures, limiting scalability and robustness in multi-UAV operations. This work presents a new trajectory optimization framework that enables cooperative target localization using UAVs with fixed, non-gimballed cameras operating in coordination with a surface vessel. This estimation-aware optimization generates dynamically feasible trajectories that explicitly account for mission constraints, platform dynamics, and out-of-frame events. Estimation-aware trajectories outperform heuristic paths by reducing localization error by more than a factor of two, motivating their use in cooperative operations. Results further demonstrate that coordinated UAVs with fixed, non-gimballed cameras achieve localization accuracy that meets or exceeds that of single gimballed systems, while substantially lowering system complexity and cost, enabling scalability, and enhancing mission resilience.


[39] 2602.11140

Reed-Muller Error-Correction Code Encoder for SFQ-to-CMOS Interface Circuits

Data transmission from superconducting digital electronics such as single flux quantum (SFQ) logic to semiconductor (CMOS) circuits is subject to bit errors due to, e.g., flux trapping, process parameter variations (PPV), and fabrication defects. In this paper, a lightweight hardware-efficient error-correction code encoder is designed and analyzed. Particularly, a Reed-Muller code RM(1,3) encoder is implemented with SFQ digital logic. The proposed RM(1,3) encoder converts a 4-bit message into an 8-bit codeword and can detect and correct up to 3- and 1-bit errors, respectively. This encoder circuit is designed using MIT-LL SFQ5ee process and SuperTools/ColdFlux RSFQ cell library. A simulation framework integrating JoSIM simulator and MATLAB script for automated data collection and analysis, is proposed to study the performance of RM(1,3) encoder. The proposed encoder improves the probability of having no bit errors by 6.7% as compared to an encoder-less design under $\pm20\%$ PPV. With $\pm15\%$ and lower PPV, the proposed encoder could correct all errors with at least 99.1% probability. The impact of fabrication defects such as open circuit faults on the encoder circuit is also studied using the proposed framework.


[40] 2602.10162

Limits of Residual-Based Detection for Physically Consistent False Data Injection

False data injection attacks (FDIAs) pose a persistent challenge to AC power system state estimation. In current practice, detection relies primarily on topology-aware residual-based tests that assume malicious measurements can be distinguished from normal operation through physical inconsistency reflected in abnormal residual behavior. This paper shows that this assumption does not always hold: when FDIA scenarios produce manipulated measurements that remain on the measurement manifold induced by AC power flow relations and measurement redundancy, residual-based detectors may fail to distinguish them from nominal data. The resulting detectability limitation is a property of the measurement manifold itself and does not depend on the attacker's detailed knowledge of the physical system model. To make this limitation observable in practice, we present a data-driven constructive mechanism that incorporates the generic functional structure of AC power flow to generate physically consistent, manifold-constrained perturbations, providing a concrete witness of how residual-based detectors can be bypassed. Numerical studies on multiple AC test systems characterize the conditions under which detection becomes challenging and illustrate its failure modes. The results highlight fundamental limits of residual-based detection in AC state estimation and motivate the need for complementary defenses beyond measurement consistency tests.


[41] 2602.10164

Emotion-Coherent Speech Data Augmentation and Self-Supervised Contrastive Style Training for Enhancing Kids's Story Speech Synthesis

Expressive speech synthesis requires vibrant prosody and well-timed pauses. We propose an effective strategy to augment a small dataset to train an expressive end-to-end Text-to-Speech model. We merge audios of emotionally congruent text using a text emotion recognizer, creating augmented expressive speech data. By training with two-sentence audio, our model learns natural breaks between lines. We further apply self-supervised contrastive training to improve the speaking style embedding extraction from speech. During inference, our model produces multi-sentence speech in one step, guided by the text-predicted speaking style. Evaluations showcase the effectiveness of our proposed approach when compared to a baseline model trained with consecutive two-sentence audio. Our synthesized speeches give a closer inter-sentence pause distribution to the ground truth speech. Subjective evaluations reveal our synthesized speech scored higher in naturalness and style suitability than the baseline.


[42] 2602.10166

MerkleSpeech: Public-Key Verifiable, Chunk-Localised Speech Provenance via Perceptual Fingerprints and Merkle Commitments

Speech provenance goes beyond detecting whether a watermark is present. Real workflows involve splicing, quoting, trimming, and platform-level transforms that may preserve some regions while altering others. Neural watermarking systems have made strides in robustness and localised detection, but most deployments produce outputs with no third-party verifiable cryptographic proof tying a time segment to an issuer-signed original. Provenance standards like C2PA adopt signed manifests and Merkle-based fragment validation, yet their bindings target encoded assets and break under re-encoding or routine processing. We propose MerkleSpeech, a system for public-key verifiable, chunk-localised speech provenance offering two tiers of assurance. The first, a robust watermark attribution layer (WM-only), survives common distribution transforms and answers "was this chunk issued by a known party?". The second, a strict cryptographic integrity layer (MSv1), verifies Merkle inclusion of the chunk's fingerprint under an issuer signature. The system computes perceptual fingerprints over short speech chunks, commits them in a Merkle tree whose root is signed with an issuer key, and embeds a compact in-band watermark payload carrying a random content identifier and chunk metadata sufficient to retrieve Merkle inclusion proofs from a repository. Once the payload is extracted, all subsequent verification steps (signature check, fingerprint recomputation, Merkle inclusion) use only public information. The result is a splice-aware timeline indicating which regions pass each tier and why any given region fails. We describe the protocol, provide pseudocode, and present experiments targeting very low false positive rates under resampling, bandpass filtering, and additive noise, informed by recent audits identifying neural codecs as a major stressor for post-hoc audio watermarks.


[43] 2602.10169

Non-Fungible Blockchain Tokens for Traceable Online-Quality Assurance of Milled Workpieces

This work presents a concept and implementation for the secure storage and transfer of quality-relevant data of milled workpieces from online-quality assurance processes enabled by real-time simulation models. It utilises Non-Fungible Tokens (NFT) to securely and interoperably store quality data in the form of an Asset Administration Shell (AAS) on a public Ethereum blockchain. Minted by a custom smart contract, the NFTs reference the metadata saved in the Interplanetary File System (IPFS), allowing new data from additional processing steps to be added in a flexible yet secure manner. The concept enables automated traceability throughout the value chain, minimising the need for time-consuming and costly repetitive manual quality checks.


[44] 2602.10219

Rethinking Security of Diffusion-based Generative Steganography

Generative image steganography is a technique that conceals secret messages within generated images, without relying on pre-existing cover images. Recently, a number of diffusion model-based generative image steganography (DM-GIS) methods have been introduced, which effectively combat traditional steganalysis techniques. In this paper, we identify the key factors that influence DM-GIS security and revisit the security of existing methods. Specifically, we first provide an overview of the general pipelines of current DM-GIS methods, finding that the noise space of diffusion models serves as the primary embedding domain. Further, we analyze the relationship between DM-GIS security and noise distribution of diffusion models, theoretically demonstrating that any steganographic operation that disrupts the noise distribution compromise DM-GIS security. Building on this insight, we propose a Noise Space-based Diffusion Steganalyzer (NS-DSer)-a simple yet effective steganalysis framework allowing for detecting DM-GIS generated images in the diffusion model noise space. We reevaluate the security of existing DM-GIS methods using NS-DSer across increasingly challenging detection scenarios. Experimental results validate our theoretical analysis of DM-GIS security and show the effectiveness of NS-DSer across diverse detection scenarios.


[45] 2602.10225

Quantum Integrated Sensing and Computation with Indefinite Causal Order

Quantum operations with indefinite causal order (ICO) represent a framework in quantum information processing where the relative order between two events can be indefinite. In this paper, we investigate whether sensing and computation, two canonical tasks in quantum information processing, can be carried out within the ICO framework. We propose a scheme for integrated sensing and computation that uses the same quantum state for both tasks. The quantum state is represented as an agent that performs state observation and learns a function of the state to make predictions via a parametric model. Under an ICO operation, the agent experiences a superposition of orders, one in which it performs state observation and then executes the required computation steps, and another in which the agent carries out the computation first and then performs state observation. This is distinct from prevailing information processing and machine intelligence paradigms where information acquisition and learning follow a strict causal order, with the former always preceding the latter. We provide experimental results and we show that the proposed scheme can achieve small training and testing losses on a representative task in magnetic navigation.


[46] 2602.10230

Frame-Level Internal Tool Use for Temporal Grounding in Audio LMs

Large audio language models are increasingly used for complex audio understanding tasks, but they struggle with temporal tasks that require precise temporal grounding, such as word alignment and speaker diarization. The standard approach, where we generate timestamps as sequences of text tokens, is computationally expensive and prone to hallucination, especially when processing audio lengths outside the model's training distribution. In this work, we propose frame-level internal tool use, a method that trains audio LMs to use their own internal audio representations to perform temporal grounding directly. We introduce a lightweight prediction mechanism trained via two objectives: a binary frame classifier and a novel inhomogeneous Poisson process (IHP) loss that models temporal event intensity. Across word localization, speaker diarization, and event localization tasks, our approach outperforms token-based baselines. Most notably, it achieves a >50x inference speedup and demonstrates robust length generalization, maintaining high accuracy on out-of-distribution audio durations where standard token-based models collapse completely.


[47] 2602.10294

Device Applications of Heterogeneously Integrated Strain-Switched Ferrimagnets/Topological Insulator/Piezoelectric Stacks

A family of ferrimagnets (CoV2O4, GdCo, TbCo) exhibits out-of-plane magnetic anisotropy when strained compressively and in-plane magnetic anisotropy when strained expansively (or vice versa). If such a ferrimagnetic thin film is placed on top of a topological insulator (TI) thin film and its magnetic anisotropy is modulated with strain, then interfacial exchange coupling between the ferrimagnet (FM) and the underlying TI will modulate the surface current flowing through the latter. If the strain is varied continuously, the current will also vary continuously and if the strain alternates in time, the current will also alternate with the frequency of the strain modulation, as long as the frequency is not so high that the period is smaller than the switching time of the FM. If the strain is generated with a gate voltage by integrating a piezoelectric underneath the FM/TI stack, then that can implement a transconductance amplifier or a synapse for neuromorphic computation.


[48] 2602.10401

Experimental Demonstration of Online Learning-Based Concept Drift Adaptation for Failure Detection in Optical Networks

We present a novel online learning-based approach for concept drift adaptation in optical network failure detection, achieving up to a 70% improvement in performance over conventional static models while maintaining low latency.


[49] 2602.10420

Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning

Flow matching has emerged as a powerful framework for generative modeling, with recent empirical successes highlighting the effectiveness of signal-space prediction ($x$-prediction). In this work, we investigate the transfer of this paradigm to binary manifolds, a fundamental setting for generative modeling of discrete data. While $x$-prediction remains effective, we identify a latent structural mismatch that arises when it is coupled with velocity-based objectives ($v$-loss), leading to a time-dependent singular weighting that amplifies gradient sensitivity to approximation errors. Motivated by this observation, we formalize prediction-loss alignment as a necessary condition for flow matching training. We prove that re-aligning the objective to the signal space ($x$-loss) eliminates the singular weighting, yielding uniformly bounded gradients and enabling robust training under uniform timestep sampling without reliance on heuristic schedules. Finally, with alignment secured, we examine design choices specific to binary data, revealing a topology-dependent distinction between probabilistic objectives (e.g., cross-entropy) and geometric losses (e.g., mean squared error). Together, these results provide theoretical foundations and practical guidelines for robust flow matching on binary -- and related discrete -- domains, positioning signal-space alignment as a key principle for robust diffusion learning.


[50] 2602.10439

AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning

Large Audio Language Models (LALMs) have demonstrated strong capabilities in audio understanding and reasoning. However, their performance on fine grained auditory perception remains unreliable, and existing approaches largely rely on data intensive training to internalize perceptual abilities. We propose AudioRouter, a reinforcement learning framework that enables LALMs to improve audio understanding by learning when and how to use external audio tools. Rather than tightly coupling tool usage with audio reasoning, AudioRouter formulates tool use as an explicit decision making problem and optimizes a lightweight routing policy while keeping the underlying reasoning model frozen. Experimental results show that AudioRouter achieves substantial improvements on standard audio understanding benchmarks while requiring up to 600x less training data to learn tool usage compared with conventional training paradigms. These findings suggest that learning effective tool usage offers a data efficient and scalable alternative to internalizing perceptual abilities in LALMs.


[51] 2602.10482

Robust Semantic Transmission for Low-Altitude UAVs: Predictive Channel-Aware Scheduling and Generative Reconstruction

Unmanned aerial vehicle (UAV) downlink transmission facilitates critical time-sensitive visual applications but is fundamentally constrained by bandwidth scarcity and dynamic channel impairments. The rapid fluctuation of the air-to-ground (A2G) link creates a regime where reliable transmission slots are intermittent and future channel quality can only be predicted with uncertainty. Conventional deep joint source-channel coding (DeepJSCC) methods transmit coupled feature streams, causing global reconstruction failure when specific time slots experience deep fading. Decoupling semantic content into a deterministic structure component and a stochastic texture component enables differentiated error protection strategies aligned with channel reliability. A predictive transmission framework is developed that utilizes a split-stream variational codec and a channel-aware scheduler to prioritize the delivery of structural layout over reliable slots. Experimental evaluations indicate that this approach achieves a 5.6 dB gain in peak signal-to-noise (SNR) ratio over single-stream baselines and maintains structural fidelity under significant prediction mismatch.


[52] 2602.10526

The Infrastructure Equation: Water, Energy, and Community Policy for Georgia's Data Center Boom

The rapid growth of data centers driven by cloud computing and artificial intelligence is reshaping infrastructure planning and environmental governance in the United States. Georgia has emerged as a major market for data center development, particularly in the Atlanta metropolitan region, creating economic opportunity alongside significant challenges. Data centers are water-intensive, energy-intensive, and land-intensive infrastructure whose cumulative impacts strain municipal water systems, electric grids, and local land-use frameworks. Unlike single industrial projects, data centers are often proposed in clusters, amplifying community and infrastructure impacts. This report draws on insights from a Georgia-based expert convening to describe the implications of data center growth for water management, energy reliability, ratepayer equity, zoning, and community engagement, identify potential gaps in transparency and regulatory coordination, and present a policy roadmap to help Georgia balance digital infrastructure growth with sustainability, equity, and community protection.


[53] 2602.10586

Enhancing Underwater Images via Adaptive Semantic-aware Codebook Learning

Underwater Image Enhancement (UIE) is an ill-posed problem where natural clean references are not available, and the degradation levels vary significantly across semantic regions. Existing UIE methods treat images with a single global model and ignore the inconsistent degradation of different scene components. This oversight leads to significant color distortions and loss of fine details in heterogeneous underwater scenes, especially where degradation varies significantly across different image regions. Therefore, we propose SUCode (Semantic-aware Underwater Codebook Network), which achieves adaptive UIE from semantic-aware discrete codebook representation. Compared with one-shot codebook-based methods, SUCode exploits semantic-aware, pixel-level codebook representation tailored to heterogeneous underwater degradation. A three-stage training paradigm is employed to represent raw underwater image features to avoid pseudo ground-truth contamination. Gated Channel Attention Module (GCAM) and Frequency-Aware Feature Fusion (FAFF) jointly integrate channel and frequency cues for faithful color restoration and texture recovery. Extensive experiments on multiple benchmarks demonstrate that SUCode achieves state-of-the-art performance, outperforming recent UIE methods on both reference and no-reference metrics. The code will be made public available at this https URL.


[54] 2602.10615

Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding

Packet-level discrete-event simulation (PLDES) is a prevalent tool for evaluating detailed performance of large model training. Although PLDES offers high fidelity and generality, its slow performance has plagued networking practitioners. Existing optimization techniques either simplify the network model, resulting in large errors; or execute it in parallel using multiple processors, with an upper bound on speedup. This paper explores an alternative optimization direction that reduces the computational loads of PLDES while maintaining high fidelity. Our key insight is that, in distributed LLM training, packet-level traffic behaviors often exhibit repetitive contention patterns and steady-states where flow rates stabilize, ignoring these redundant discrete events speeds up the simulation considerably and the error is negligible. We realize this idea by proposing Wormhole, a user-transparent PLDES kernel capable of automatically memoization for unsteady-states and skipping for steady-states. Wormhole adopts network partitioning, state memoization and reuse, and rate-based steady-state identification to accurately determine the periods of each flow's steady-state, while maintaining simulation consistency after fast-forwarding. Experiments demonstrate that Wormhole can achieve a 744x speedup over the original ns-3 (510x for MoE workload), with a bounded error of <1%. Applying current multithreading parallel techniques and Wormhole together allows a 1012x speedup, reducing the simulation time for one GPT-13B training under 128 GPUs from 9 hours to 5 minutes.


[55] 2602.10754

Exploring the impact of adaptive rewiring in Graph Neural Networks

This paper explores sparsification methods as a form of regularization in Graph Neural Networks (GNNs) to address high memory usage and computational costs in large-scale graph applications. Using techniques from Network Science and Machine Learning, including Erdős-Rényi for model sparsification, we enhance the efficiency of GNNs for real-world applications. We demonstrate our approach on N-1 contingency assessment in electrical grids, a critical task for ensuring grid reliability. We apply our methods to three datasets of varying sizes, exploring Graph Convolutional Networks (GCN) and Graph Isomorphism Networks (GIN) with different degrees of sparsification and rewiring. Comparison across sparsification levels shows the potential of combining insights from both research fields to improve GNN performance and scalability. Our experiments highlight the importance of tuning sparsity parameters: while sparsity can improve generalization, excessive sparsity may hinder learning of complex patterns. Our adaptive rewiring approach, particularly when combined with early stopping, proves promising by allowing the model to adapt its connectivity structure during training. This research contributes to understanding how sparsity can be effectively leveraged in GNNs for critical applications like power grid reliability analysis.


[56] 2602.10770

LOREN: Low Rank-Based Code-Rate Adaptation in Neural Receivers

Neural network based receivers have recently demonstrated superior system-level performance compared to traditional receivers. However, their practicality is limited by high memory and power requirements, as separate weight sets must be stored for each code rate. To address this challenge, we propose LOREN, a Low Rank-Based Code-Rate Adaptation Neural Receiver that achieves adaptability with minimal overhead. LOREN integrates lightweight low rank adaptation adapters (LOREN adapters) into convolutional layers, freezing a shared base network while training only small adapters per code rate. An end-to-end training framework over 3GPP CDL channels ensures robustness across realistic wireless environments. LOREN achieves comparable or superior performance relative to fully retrained base neural receivers. The hardware implementation of LOREN in 22nm technology shows more than 65% savings in silicon area and up to 15% power reduction when supporting three code rates.


[57] 2602.10813

Dynamic Interference Management for TN-NTN Coexistence in the Upper Mid-Band

The coexistence of terrestrial networks (TN) and non-terrestrial networks (NTN) in the frequency range 3 (FR3) upper mid-band presents considerable interference concerns, as dense TN deployments can severely degrade NTN downlink performance. Existing studies rely on interference-nulling beamforming, precoding, or exclusion zones that require accurate channel state information (CSI) and static coordination, making them unsuitable for dynamic NTN scenarios. To overcome these limitations, we develop an optimization framework that jointly controls TN downlink power, uplink power, and antenna downtilt to protect NTN links while preserving terrestrial performance. The resultant non-convex coupling between TN and NTN parameters is addressed by a Proximal Policy Optimization (PPO)-based reinforcement learning method that develops adaptive power and tilt control strategies. Simulation results demonstrate a reduction up to 8 dB in the median interference-to-noise ratio (INR) while maintaining over 87% TN basestation activity, outperforming conventional baseline methods and validating the feasibility of the proposed strategy for FR3 coexistence.


[58] 2602.10878

Simple generators of rational function fields

Consider a subfield of the field of rational functions in several indeterminates. We present an algorithm that, given a set of generators of such a subfield, finds a simple generating set. We provide an implementation of the algorithm and show that it improves upon the state of the art both in efficiency and the quality of the results. Furthermore, we demonstrate the utility of simplified generators through several case studies from different application domains, such as structural parameter identifiability. The main algorithmic novelties include performing only partial Gröbner basis computation via sparse interpolation and efficient search for polynomials of a fixed degree in a subfield of the rational function field.


[59] 2602.10911

Tuning the burn-in phase in training recurrent neural networks improves their performance

Training recurrent neural networks (RNNs) with standard backpropagation through time (BPTT) can be challenging, especially in the presence of long input sequences. A practical alternative to reduce computational and memory overhead is to perform BPTT repeatedly over shorter segments of the training data set, corresponding to truncated BPTT. In this paper, we examine the training of RNNs when using such a truncated learning approach for time series tasks. Specifically, we establish theoretical bounds on the accuracy and performance loss when optimizing over subsequences instead of the full data sequence. This reveals that the burn-in phase of the RNN is an important tuning knob in its training, with significant impact on the performance guarantees. We validate our theoretical results through experiments on standard benchmarks from the fields of system identification and time series forecasting. In all experiments, we observe a strong influence of the burn-in phase on the training process, and proper tuning can lead to a reduction of the prediction error on the training and test data of more than 60% in some cases.


[60] 2602.10934

MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

Discrete audio tokenizers are fundamental to empowering large language models with native audio processing and generation capabilities. Despite recent progress, existing approaches often rely on pretrained encoders, semantic distillation, or heterogeneous CNN-based architectures. These designs introduce fixed inductive biases that limit reconstruction fidelity and hinder effective scaling. In this paper, we argue that discrete audio tokenization should be learned fully end-to-end using a homogeneous and scalable architecture. To this end, we first propose CAT (Causal Audio Tokenizer with Transformer), a purely Transformer-based architecture that jointly optimizes the encoder, quantizer, and decoder from scratch for high-fidelity reconstruction. Building on the CAT architecture, we develop MOSS-Audio-Tokenizer, a large-scale audio tokenizer featuring 1.6 billion parameters, pre-trained on 3 million hours of diverse, general audio data. We show that this simple, fully end-to-end approach built from homogeneous, causal Transformer blocks scales gracefully and supports high-fidelity reconstruction across diverse audio domains. Across speech, sound, and music, MOSS-Audio-Tokenizer consistently outperforms prior codecs over a wide range of bitrates, while exhibiting predictable improvements with increased scale. Notably, leveraging the discrete tokens from our model, we develop the first purely autoregressive TTS model that surpasses prior non-autoregressive and cascaded systems. Furthermore, MOSS-Audio-Tokenizer enables competitive ASR performance without auxiliary encoders. Our findings position the CAT architecture as a unified, scalable interface for the next generation of native audio foundation models.


[61] 2602.11004

Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles' Perception

Autonomous vehicles (AVs) rely on sensors and deep neural networks (DNNs) to perceive their surrounding environment and make maneuver decisions in real time. However, achieving real-time DNN inference in the AV's perception pipeline is challenging due to the large gap between the computation requirement and the AV's limited resources. Most, if not all, of existing studies focus on optimizing the DNN inference time to achieve faster perception by compressing the DNN model with pruning and quantization. In contrast, we present a Predictable Perception system with DNNs (PP-DNN) that reduce the amount of image data to be processed while maintaining the same level of accuracy for multi-tenant DNNs by dynamically selecting critical frames and regions of interest (ROIs). PP-DNN is based on our key insight that critical frames and ROIs for AVs vary with the AV's surrounding environment. However, it is challenging to identify and use critical frames and ROIs in multi-tenant DNNs for predictable inference. Given image-frame streams, PP-DNN leverages an ROI generator to identify critical frames and ROIs based on the similarities of consecutive frames and traffic scenarios. PP-DNN then leverages a FLOPs predictor to predict multiply-accumulate operations (MACs) from the dynamic critical frames and ROIs. The ROI scheduler coordinates the processing of critical frames and ROIs with multiple DNN models. Finally, we design a detection predictor for the perception of non-critical frames. We have implemented PP-DNN in an ROS-based AV pipeline and evaluated it with the BDD100K and the nuScenes dataset. PP-DNN is observed to significantly enhance perception predictability, increasing the number of fusion frames by up to 7.3x, reducing the fusion delay by >2.6x and fusion-delay variations by >2.3x, improving detection completeness by 75.4% and the cost-effectiveness by up to 98% over the baseline.


[62] 2602.11072

Simultaneous Speech-to-Speech Translation Without Aligned Data

Simultaneous speech translation requires translating source speech into a target language in real-time while handling non-monotonic word dependencies. Traditional approaches rely on supervised training with word-level aligned data, which is difficult to collect at scale and thus depends on synthetic alignments using language-specific heuristics that are suboptimal. We propose Hibiki-Zero, which eliminates the need for word-level alignments entirely. This fundamentally simplifies the training pipeline and enables seamless scaling to diverse languages with varying grammatical structures, removing the bottleneck of designing language-specific alignment heuristics. We first train on sentence-level aligned data to learn speech translation at high latency, then apply a novel reinforcement learning strategy using GRPO to optimize latency while preserving translation quality. Hibiki-Zero achieves state-of-the-art performance in translation accuracy, latency, voice transfer, and naturalness across five X-to-English tasks. Moreover, we demonstrate that our model can be adapted to support a new input language with less than 1000h of speech. We provide examples, model weights, inference code and we release a benchmark containing 45h of multilingual data for speech translation evaluation.


[63] 2602.11082

Digging for Data: Experiments in Rock Pile Characterization Using Only Proprioceptive Sensing in Excavation

Characterization of fragmented rock piles is a fundamental task in the mining and quarrying industries, where rock is fragmented by blasting, transported using wheel loaders, and then sent for further processing. This field report studies a novel method for estimating the relative particle size of fragmented rock piles from only proprioceptive data collected while digging with a wheel loader. Rather than employ exteroceptive sensors (e.g., cameras or LiDAR sensors) to estimate rock particle sizes, the studied method infers rock fragmentation from an excavator's inertial response during excavation. This paper expands on research that postulated the use of wavelet analysis to construct a unique feature that is proportional to the level of rock fragmentation. We demonstrate through extensive field experiments that the ratio of wavelet features, constructed from data obtained by excavating in different rock piles with different size distributions, approximates the ratio of the mean particle size of the two rock piles. Full-scale excavation experiments were performed with a battery electric, 18-tonne capacity, load-haul-dump (LHD) machine in representative conditions in an operating quarry. The relative particle size estimates generated with the proposed sensing methodology are compared with those obtained from both a vision-based fragmentation analysis tool and from sieving of sampled materials.


[64] 2602.11145

SCRAPL: Scattering Transform with Random Paths for Machine Learning

The Euclidean distance between wavelet scattering transform coefficients (known as paths) provides informative gradients for perceptual quality assessment of deep inverse problems in computer vision, speech, and audio processing. However, these transforms are computationally expensive when employed as differentiable loss functions for stochastic gradient descent due to their numerous paths, which significantly limits their use in neural network training. Against this problem, we propose "Scattering transform with Random Paths for machine Learning" (SCRAPL): a stochastic optimization scheme for efficient evaluation of multivariable scattering transforms. We implement SCRAPL for the joint time-frequency scattering transform (JTFS) which demodulates spectrotemporal patterns at multiple scales and rates, allowing a fine characterization of intermittent auditory textures. We apply SCRAPL to differentiable digital signal processing (DDSP), specifically, unsupervised sound matching of a granular synthesizer and the Roland TR-808 drum machine. We also propose an initialization heuristic based on importance sampling, which adapts SCRAPL to the perceptual content of the dataset, improving neural network convergence and evaluation performance. We make our code and audio samples available and provide SCRAPL as a Python package.


[65] 2202.11496

MITI: SLAM Benchmark for Laparoscopic Surgery

We propose a new benchmark for evaluating stereoscopic visual-inertial computer vision algorithms (SLAM/ SfM/ 3D Reconstruction/ Visual-Inertial Odometry) for minimally invasive surgical (MIS) interventions in the abdomen. Our MITI Dataset available at [this https URL] provides all the necessary data by a complete recording of a handheld surgical intervention at Research Hospital Rechts der Isar of TUM. It contains multimodal sensor information from IMU, stereoscopic video, and infrared (IR) tracking as ground truth for evaluation. Furthermore, calibration for the stereoscope, accelerometer, magnetometer, the rigid transformations in the sensor setup, and time-offsets are available. We wisely chose a suitable intervention that contains very few cutting and tissue deformation and shows a full scan of the abdomen with a handheld camera such that it is ideal for testing SLAM algorithms. Intending to promote the progress of visual-inertial algorithms designed for MIS application, we hope that our clinical training dataset helps and enables researchers to enhance algorithms.


[66] 2208.13969

Airway Tree Modeling Using Dual-channel 3D UNet 3+ with Vesselness Prior

The lung airway tree modeling is essential to work for the diagnosis of pulmonary diseases, especially for X-Ray computed tomography (CT). The airway tree modeling on CT images can provide the experts with 3-dimension measurements like wall thickness, etc. This information can tremendously aid the diagnosis of pulmonary diseases like chronic obstructive pulmonary disease [1-4]. Many scholars have attempted various ways to model the lung airway tree, which can be split into two major categories based on its nature. Namely, the model-based approach and the deep learning approach. The performance of a typical model-based approach usually depends on the manual tuning of the model parameter, which can be its advantages and disadvantages. The advantage is its don't require a large amount of training data which can be beneficial for a small dataset like medical imaging. On the other hand, the performance of model-based may be a misconcep-tion [5,6]. In recent years, deep learning has achieved good results in the field of medical image processing, and many scholars have used UNet-based methods in medical image segmentation [7-11]. Among all the variation of UNet, the UNet 3+ [11] have relatively good result compare to the rest of the variation of UNet. Therefor to further improve the accuracy of lung airway tree modeling, this study combines the Frangi filter [5] with UNet 3+ [11] to develop a dual-channel 3D UNet 3+. The Frangi filter is used to extracting vessel-like feature. The vessel-like feature then used as input to guide the dual-channel UNet 3+ training and testing procedures.


[67] 2407.07295

Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis

In medical imaging, the diffusion models have shown great potential for synthetic image generation tasks. However, these approaches often lack the interpretable connections between the generated and real images and can create anatomically implausible structures or illusions. To address these limitations, we propose the Deformation-Recovery Diffusion Model (DRDM), a novel diffusion-based generative model that emphasises morphological transformation through deformation fields rather than direct image synthesis. DRDM introduces a topology-preserving deformation field generation strategy, which randomly samples and integrates multi-scale Deformation Velocity Fields (DVFs). DRDM is trained to learn to recover unrealistic deformation components, thus restoring randomly deformed images to a realistic distribution. This formulation enables the generation of diverse yet anatomically plausible deformations that preserve structural integrity, thereby improving data augmentation and synthesis for downstream tasks such as few-shot learning and image registration. Experiments on cardiac Magnetic Resonance Imaging and pulmonary Computed Tomography show that DRDM is capable of creating diverse, large-scale deformations, while maintaining anatomical plausibility of deformation fields. Additional evaluations on 2D image segmentation and 3D image registration tasks indicate notable performance gains, underscoring DRDM's potential to enhance both image manipulation and generative modelling in medical imaging applications. Project page: this https URL


[68] 2501.17866

Advancing Brainwave-Based Biometrics: A Large-Scale, Multi-Session Evaluation

The field of brainwave-based biometrics has gained attention for its potential to revolutionize user authentication through hands-free interaction, resistance to shoulder surfing, continuous authentication, and revocability. However, current research often relies on single-session or limited-session datasets with fewer than 55 subjects, raising concerns about the generalizability of the findings. To address this gap, we conducted a large-scale study using a public brainwave dataset comprising 345 subjects and over 6,007 sessions (an average of 17 per subject) recorded over five years using three headsets. Our results reveal that deep learning approaches significantly outperform hand-crafted feature extraction methods. We also observe Equal Error Rates (EER) increases over time (e.g., from 6.7% after 1 day to 14.3% after a year). Therefore, it is necessary to reinforce the enrollment set after successful login attempts. Moreover, we demonstrate that fewer brainwave measurement sensors can be used, with an acceptable increase in EER, which is necessary for transitioning from medical-grade to affordable consumer-grade devices. Finally, we compared our results to prior work and existing biometric standards. While our performance is on par with or exceeds previous approaches, it still falls short of industrial benchmarks. Based on the results, we hypothesize that further improvements are possible with larger training sets. To support future research, we have open-sourced our analysis code.


[69] 2503.04681

Mixed Near-field and Far-field Localization in Extremely Large-scale MIMO Systems

In this paper, we study efficient \emph{mixed near-field and far-field} target localization methods in extremely large-scale multiple-input multiple-output (XL-MIMO) systems Compared with existing works, we address two new challenges in target localization of MIMO communication systems via using decoupled subspace methods, arising from the half-wavelength antenna spacing constraint and \emph{hybrid uniform planar array} (UPA) this http URL this end, we propose a new three-step mixed-field localization method. First, we reconstruct the equivalent signals received at UPA antennas by judiciously designing analog combining matrices over time with minimum recovery this http URL, based on recovered signals, we extend the modified multiple signal classification (MUSIC) algorithm to the UPA architectures by constructing a new covariance matrix of a virtual sparse UPA (S-UPA) to decouple the 2D angles and range this http URL to the structure of the S-UPA, there exist ambiguous angles when estimating true angles of this http URL the third step, we design an effective classification method to distinguish mixed-field targets, determine true angles of all targets, as well as estimate the ranges of near-field this http URL particular, angular ambiguity is resolved by showing an important fact that the three types of estimated angles (i.e., far-field, near-field, and ambiguous angles) exhibit significantly different patterns in the range-domain MUSIC this http URL, to characterize the estimation error lower-bound, we obtain a matrix closed-form Cramér-Rao bounds for mixed-field target this http URL, numerical results demonstrate the effectiveness of our proposed mixed-field localization method, which improves target-classification accuracy and achieves a lower root mean square error than various benchmark schemes.


[70] 2503.19925

Accurate, provable and fast polychromatic tomographic reconstruction: A variational inequality approach

We consider the problem of signal reconstruction for computed tomography (CT) under a nonlinear forward model that accounts for exponential signal attenuation, a polychromatic X-ray source, general measurement noise (e.g., Poisson shot noise), and observations acquired over multiple wavelength windows. We develop a simple iterative algorithm for single-material reconstruction, which we call EXACT (EXtragradient Algorithm for Computed Tomography), based on formulating our estimate as the fixed point of a monotone variational inequality. We prove guarantees on the statistical and computational performance of EXACT under realistic assumptions on the measurement process. We also consider a recently introduced variant of this model with Gaussian measurements and present sample and iteration complexity bounds for EXACT that improve upon those of existing algorithms. We apply our EXACT algorithm to a CT phantom image recovery task and show that it often requires fewer X-ray views, lower source intensity, and less computation time to achieve reconstruction quality similar to existing methods. Code is available at this https URL.


[71] 2505.21435

Expectation-maximization for low-SNR multi-reference alignment

We study the multi-reference alignment (MRA) problem of recovering a signal from noisy observations acted on by unknown random circular shifts. While the information-theoretic limits of MRA are well characterized in many settings, the algorithmic behavior at low signal-to-noise ratio (SNR), the regime of practical interest, remains poorly understood. In this paper, we analyze the expectation-maximization (EM) algorithm, a widely used method for MRA, and characterize its convergence dynamics and initialization dependence in the low-SNR limit. On the convergence side, we prove a two-phase phenomenon near the ground truth as $\mathrm{SNR}\to 0$: an initial contraction with error decaying as $\exp(-\, \mathrm{SNR} \cdot t)$ followed by a much slower phase scaling as $\exp(- \,\mathrm{SNR}^2 \cdot t)$, where $t$ is the iteration number. This yields an iteration-complexity lower bound $T \gtrsim \mathrm{SNR}^{-2}$ to reach a small fixed target accuracy, revealing a severe computational bottleneck at low SNR. We also identify a finite-sample instability, which we term \emph{Ghost of Newton}, in which EM initially approaches the ground truth but later diverges, degrading reconstruction quality. On the bias side, we analyze EM in the noise-only setting ($\mathrm{SNR}=0$), a regime referred to as Einstein from Noise, to highlight its pronounced sensitivity to initialization. We prove that the EM map preserves the Fourier phases of the initialization across all iterations, while the corresponding Fourier magnitudes contract toward zero at a slow rate of $(1+T)^{-1/2}$. Consequently, although the amplitudes vanish in the limit of $T \to \infty$ iterations, the reconstructed structure continues to reflect the geometry encoded by the template's Fourier phases. Together, these results expose fundamental computational and initialization-driven limitations of EM for MRA in the low-SNR regime.


[72] 2506.04392

SLM-S2ST: A multimodal language model for direct speech-to-speech translation

Speech-aware language models (LMs) have demonstrated capabilities in understanding spoken language while generating text-based responses. However, enabling them to produce speech output efficiently and effectively remains a challenge. In this paper, we present SLM-S2ST, a multimodal LM for direct speech-to-speech translation (S2ST), built on the open-source Phi4-MM model. SLM-S2ST extends its predecessor by generating translated speech using an audio transformer head that predicts audio tokens with a delay relative to text tokens, followed by a streaming vocoder for waveform synthesis. Our experimental results on the CVSS-C dataset demonstrate SLM-S2ST's superior performance, significantly surpassing existing baseline models trained on the same dataset. Furthermore, when we scale up the training data and the model size, SLM-S2ST reaches on-par performance with the current SOTA model.


[73] 2506.04518

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Speech language models (Speech LMs) enable end-to-end speech-text modeling within a single model, offering a promising direction for spoken dialogue systems. The choice of speech-text jointly decoding paradigm plays a critical role in performance, efficiency, and alignment quality. In this work, we systematically compare representative joint speech-text decoding strategies, including the interleaved, and parallel generation paradigms, under a controlled experimental setup using the same base language model, speech tokenizer and training data. Our results show that the interleaved approach achieves the best alignment. However it suffers from slow inference due to long token sequence length. To address this, we propose a novel early-stop interleaved (ESI) pattern that not only significantly accelerates decoding but also yields slightly better performance. Additionally, we curate high-quality question answering (QA) datasets to further improve speech QA performance.


[74] 2506.22804

Online Coreset Selection for Learning Dynamic Systems

With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we develop an online coreset selection method for set-membership identification in the presence of process disturbances, improving data efficiency while preserving convergence guarantees. Specifically, we derive a stacked polyhedral representation that over-approximates the feasible parameter set. Based on this representation, we propose a geometric selection criterion that retains a data point only if it induces a sufficient contraction of the feasible set. Theoretically, the feasible-set volume is shown to converge to zero almost surely under persistently exciting data and a tight disturbance bound. When the disturbance bound is mismatched, an explicit Hausdorff-distance bound is derived to quantify the resulting identification error. In addition, an upper bound on the expected coreset size is established and extensions to nonlinear systems with linear-in-the-parameter structures and to bounded measurement noise are discussed. The effectiveness of the proposed method is demonstrated through comprehensive simulation studies.


[75] 2509.01705

Predictive Communications for Low-Altitude Networks

The emergence of dense, mission-driven aerial networks supporting the low-altitude economy presents unique communication challenges, including extreme channel dynamics and severe cross-tier interference. Traditional reactive communication paradigms are ill-suited to these environments, as they fail to leverage the network's inherent predictability. This paper introduces predictive communication, a novel paradigm transforming network management from reactive adaptation to proactive optimization. The approach is enabled by fusing predictable mission trajectories with stable, large-scale radio environment models (e.g., radio maps). Specifically, we present a hierarchical framework that decomposes the predictive cross-layer resource allocation problem into three layers: strategic (routing), tactical (timing), and operational (power). This structure aligns decision-making timescales with the accuracy levels and ranges of available predictive information. We demonstrate that this foresight-driven framework achieves an order-of-magnitude reduction in cross-tier interference, laying the groundwork for robust and scalable low-altitude communication systems.


[76] 2509.17143

MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances

We introduce MaskVCT, a zero-shot voice conversion (VC) model that offers multi-factor controllability through multiple classifier-free guidances (CFGs). While previous VC models rely on a fixed conditioning scheme, MaskVCT integrates diverse conditions in a single model. To further enhance robustness and control, the model can leverage continuous or quantized linguistic features to enhance intelligibility and speaker similarity, and can use or omit pitch contour to control prosody. These choices allow users to seamlessly balance speaker identity, linguistic content, and prosodic factors in a zero-shot VC setting. Extensive experiments demonstrate that MaskVCT achieves the best target speaker and accent similarities while obtaining competitive word and character error rates compared to existing baselines. Audio samples are available at this https URL.


[77] 2510.02700

A UAV-Based VNIR Hyperspectral Benchmark Dataset for Landmine and UXO Detection

This paper introduces a novel benchmark dataset of Visible and Near-Infrared (VNIR) hyperspectral imagery acquired via an unmanned aerial vehicle (UAV) platform for landmine and unexploded ordnance (UXO) detection research. The dataset was collected over a controlled test field seeded with 143 realistic surrogate landmine and UXO targets, including surface, partially buried, and fully buried configurations. Data acquisition was performed using a Headwall Nano-Hyperspec sensor mounted on a multi-sensor drone platform, flown at an altitude of approximately 20.6 m, capturing 270 contiguous spectral bands spanning 398-1002 nm. Radiometric calibration, orthorectification, and mosaicking were performed followed by reflectance retrieval using a two-point Empirical Line Method (ELM), with reference spectra acquired using an SVC spectroradiometer. Cross-validation against six reference objects yielded RMSE values below 1.0 and SAM values between 1 and 6 degrees in the 400-900 nm range, demonstrating high spectral fidelity. The dataset is released alongside raw radiance cubes, GCP/AeroPoint data, and reference spectra to support reproducible research. This contribution fills a critical gap in open-access UAV-based hyperspectral data for landmine detection and offers a multi-sensor benchmark when combined with previously published drone-based electromagnetic induction (EMI) data from the same test field.


[78] 2510.19775

Interpretable machine learning for cardiogram-based biometrics

This study investigates the role of electrocardiogram (ECG) and impedance cardiogram (ICG) features in biometric identification, emphasizing their discriminative capacity and robustness to emotional variability. A total of 29 features spanning four domains (temporal, amplitude, slope, and morphological) are evaluated using random forest (RF) models combined with multiple interpretability methods. Feature importance shows that both ECG- and ICG-derived features are consistently ranked among the top 10 by Gini importance, permutation importance, and SHAP values, with ECG features, particularly QRS-centric descriptors, occupying the highest positions. In parallel, ICG BCX features contribute complementary, however, with lower cross-method stability. Correlation analysis reveals substantial multicollinearity, where the RF distributes and diminishes importance across highly correlated pairs, confirming reduced independent contributions. Statistical analysis identifies 14 features with significant differences between baseline and anger, without a clear pattern by domain. Feature selection with recursive feature elimination and genetic algorithms converges on a subset (12 features) that attains accuracy within 1% of the full set (99%), improving efficiency in storage and computation. Proposed complementary analyses indicate that the individuality is primarily encoded in the QRS-related ECG features across all four domains. Meanwhile, BCX-derived ICG features contribute mainly through amplitude, providing supportive but less stable discriminatory cues. The confirmed resilience of QRS-centric descriptors to emotional variation, where stable inter-individual differences in the QRS complex could be traced to variations in ventricular mass, conduction pathways, and thoracic geometry, may indicate their central role in future models of cardiogram-based identity.


[79] 2511.00494

An Indoor Radio Mapping Dataset Combining 3D Point Clouds and RSSI

The growing number of smart devices supporting bandwidth-intensive and latency-sensitive applications, such as real-time video analytics, smart sensing, Extended Reality (XR), etc., necessitates reliable wireless connectivity in indoor environments. In such environments, accurate design of Radio Environment Maps (REMs) enables adaptive wireless network planning and optimization of Access Point (AP) placement. However, generating realistic REMs remains difficult due to the variability of indoor environments and the limitations of existing modeling approaches, which often rely on simplified layouts or fully synthetic data. These challenges are further amplified by the adoption of next-generation Wi-Fi standards, which operate at higher frequencies and suffer from limited range and wall penetration. To support the efforts in addressing these challenges, we collected a dataset that combines high-resolution 3D LiDAR scans with Wi-Fi RSSI measurements collected across 20 setups in a multi-room indoor environment. The dataset includes two measurement scenarios, the first without human presence in the environment, and the second with human presence, enabling the development and validation of REM estimation models that incorporate physical geometry and environmental dynamics. The described dataset supports research in data-driven wireless modeling and the development of high-capacity indoor communication networks.


[80] 2511.03220

Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication

This paper presents Multimodal-Wireless, a large-scale open-source dataset for multimodal sensing and communication research. The dataset is generated through an integrated and customizable data pipeline built upon the CARLA simulator and Sionna framework, and features high-resolution communication channel state information (CSI) fully synchronized with five other sensor modalities, namely LiDAR, RGB and depth camera, inertial measurement unit (IMU) and radar, all sampled at 100 Hz. It contains approximately 160,000 frames collected across four virtual towns, sixteen communication scenarios, and three weather conditions. This paper provides a comprehensive overview of the dataset, outlining its key features, overall framework, and technical implementation details. In addition, it explores potential research applications concerning communication and collaborative perception, exemplified by beam prediction using a multimodal large language model. The dataset is open in this https URL.


[81] 2512.11228

Mitigating Dynamic Tip-Over during Mobile Crane Slewing using Input Shaping

Payload swing during rapid slewing of mobile cranes poses a safety risk, as it generates overturning moments that can lead to tip-over accidents of mobile cranes. Currently, to limit the risk of tip-over, mobile crane operators are forced to either reduce the slewing speed (which lowers productivity) or reduce the load being carried to reduce the induced moments. Both of these approaches reduce productivity. This paper seeks to enable rapid slewing without compromising safety by applying input shaping to the crane-slewing commands generated by the operator. A key advantage of this approach is that the input shaper requires only the information about the rope length, and does not require detailed mobile crane dynamics. Simulations and experiments show that the proposed method reduces residual payload swing and enables significantly higher slewing speeds without tip over, reducing slewing completion time by at least 38% compared to unshaped control. Human control with input shaping improves task completion time by 13%, reduces the peak swing by 18%, and reduces the potential of collisions by 82% when compared to unshaped control. Moreover, shaped control with a human had no tip-over, whereas large swing led to tip-over without input shaping. Thereby, the proposed method substantially recovers the operational-safety envelope of mobile cranes (designed to avoid tip-over using static analysis) that would otherwise be lost in dynamic conditions. Videos and demonstrations are available at this https URL.


[82] 2601.08335

On Robust Fixed-Time Stabilization of the Cauchy Problem in Hilbert Spaces

This paper presents finite-time and fixed-time stabilization results for inhomogeneous abstract evolution problems, extending existing theories. We prove well-posedness for strong and weak solutions, and estimate upper bounds for settling times for both homogeneous and inhomogeneous systems. We generalize finite-dimensional results to infinite-dimensional systems and demonstrate partial state stabilization with actuation on a subset of the domain. The interest of these results are illustrated through an application of a heat equation with memory term.


[83] 2602.08484

Physics-Guided Variational Model for Unsupervised Sound Source Tracking

Sound source tracking is commonly performed using classical array-processing algorithms, while machine-learning approaches typically rely on precise source position labels that are expensive or impractical to obtain. This paper introduces a physics-guided variational model capable of fully unsupervised single-source sound source tracking. The method combines a variational encoder with a physics-based decoder that injects geometric constraints into the latent space through analytically derived pairwise time-delay likelihoods. Without requiring ground-truth labels, the model learns to estimate source directions directly from microphone array signals. Experiments on real-world data demonstrate that the proposed approach outperforms traditional baselines and achieves accuracy and computational complexity comparable to state-of-the-art supervised models. We further show that the method generalizes well to mismatched array geometries and exhibits strong robustness to corrupted microphone position metadata. Finally, we outline a natural extension of the approach to multi-source tracking and present the theoretical modifications required to support it.


[84] 2602.09427

Lateral tracking control of all-wheel steering vehicles with intelligent tires

The accurate characterization of tire dynamics is critical for advancing control strategies in autonomous road vehicles, as tire behavior significantly influences handling and stability through the generation of forces and moments at the tire-road interface. Smart tire technologies have emerged as a promising tool for sensing key variables such as road friction, tire pressure, and wear states, and for estimating kinematic and dynamic states like vehicle speed and tire forces. However, most existing estimation and control algorithms rely on empirical correlations or machine learning approaches, which require extensive calibration and can be sensitive to variations in operating conditions. In contrast, model-based techniques, which leverage infinite-dimensional representations of tire dynamics using partial differential equations (PDEs), offer a more robust approach. This paper proposes a novel model-based, output-feedback lateral tracking control strategy for all-wheel steering vehicles that integrates distributed tire dynamics with smart tire technologies. The primary contributions include the suppression of micro-shimmy phenomena at low speeds and path-following via force control, achieved through the estimation of tire slip angles, vehicle kinematics, and lateral tire forces. The proposed controller and observer are based on formulations using ODE-PDE systems, representing rigid body dynamics and distributed tire behavior. This work marks the first rigorous control strategy for vehicular systems equipped with distributed tire representations in conjunction with smart tire technologies.


[85] 2602.09429

First-order friction models with bristle dynamics: lumped and distributed formulations

Dynamic models, particularly rate-dependent models, have proven effective in capturing the key phenomenological features of frictional processes, whilst also possessing important mathematical properties that facilitate the design of control and estimation algorithms. However, many rate-dependent formulations are built on empirical considerations, whereas physical derivations may offer greater interpretability. In this context, starting from fundamental physical principles, this paper introduces a novel class of first-order dynamic friction models that approximate the dynamics of a bristle element by inverting the friction characteristic. Amongst the developed models, a specific formulation closely resembling the LuGre model is derived using a simple rheological equation for the bristle element. This model is rigorously analyzed in terms of stability and passivity -- important properties that support the synthesis of observers and controllers. Furthermore, a distributed version, formulated as a hyperbolic partial differential equation (PDE), is presented, which enables the modeling of frictional processes commonly encountered in rolling contact phenomena. The tribological behavior of the proposed description is evaluated through classical experiments and validated against the response predicted by the LuGre model, revealing both notable similarities and key differences.


[86] 2408.01253

Metareasoning in uncertain environments: a meta-BAMDP framework

\textit{Reasoning} may be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome. However, executing $P$ itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to Bernoulli bandit tasks. Owing to the meta problem's complexity, our solutions are necessarily approximate. However, we introduce two novel theorems that significantly enhance the tractability of the problem, enabling stronger approximations that are robust within a range of assumptions grounded in realistic human decision-making scenarios. These results offer a resource-rational perspective and a normative framework for understanding human exploration under cognitive constraints, as well as providing experimentally testable predictions about human behavior in Bernoulli Bandit tasks.


[87] 2412.03462

Multi-Momentum Observer Contact Estimation for Bipedal Robots

As bipedal robots become more and more popular in commercial and industrial settings, the ability to control them with a high degree of reliability is critical. To that end, this paper considers how to accurately estimate which feet are currently in contact with the ground so as to avoid improper control actions that could jeopardize the stability of the robot. Additionally, modern algorithms for estimating the position and orientation of a robot's base frame rely heavily on such contact mode estimates. Dedicated contact sensors on the feet can be used to estimate this contact mode, but these sensors are prone to noise, time delays, damage/yielding from repeated impacts with the ground, and are not available on every robot. To overcome these limitations, we propose a momentum observer based method for contact mode estimation that does not rely on such contact sensors. Often, momentum observers assume that the robot's base frame can be treated as an inertial frame. However, since many humanoids' legs represent a significant portion of the overall mass, the proposed method instead utilizes multiple simultaneous dynamic models. Each of these models assumes a different contact condition. A given contact assumption is then used to constrain the full dynamics in order to avoid assuming that either the body is an inertial frame or that a fully accurate estimate of body velocity is known. The (dis)agreement between each model's estimates and measurements is used to determine which contact mode is most likely using a Markov-style fusion method. The proposed method produces contact detection accuracy of up to 98.44% with a low noise simulation and 77.12% when utilizing data collect on the Sarcos Guardian XO robot (a hybrid humanoid/exoskeleton).


[88] 2502.04468

Iterative Importance Fine-tuning of Diffusion Models

Diffusion models are an important tool for generative modelling, serving as effective priors in applications such as imaging and protein design. A key challenge in applying diffusion models for downstream tasks is efficiently sampling from resulting posterior distributions, which can be addressed using Doob's $h$-transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by learning the optimal control, enabling amortised conditional sampling. Our method iteratively refines the control using a synthetic dataset resampled with path-based importance weights. We demonstrate the effectiveness of this framework on class-conditional sampling, inverse problems and reward fine-tuning for text-to-image diffusion models.


[89] 2503.04563

Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments

Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to enhance safety. Based on these constraints, the CMPC constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.


[90] 2503.06790

GenDR: Lighten Generative Detail Restoration

Although recent research applying text-to-image (T2I) diffusion models to real-world super-resolution (SR) has achieved remarkable progress, the misalignment of their targets leads to a suboptimal trade-off between inference speed and detail fidelity. Specifically, the T2I task requires multiple inference steps to synthesize images matching to prompts and reduces the latent dimension to lower generating difficulty. Contrariwise, SR can restore high-frequency details in fewer inference steps, but it necessitates a more reliable variational auto-encoder (VAE) to preserve input information. However, most diffusion-based SRs are multistep and use 4-channel VAEs, while existing models with 16-channel VAEs are overqualified diffusion transformers, e.g., FLUX (12B). To align the target, we present a one-step diffusion model for generative detail restoration, GenDR, distilled from a tailored diffusion model with a larger latent space. In detail, we train a new SD2.1-VAE16 (0.9B) via representation alignment to expand the latent space without increasing the model size. Regarding step distillation, we propose consistent score identity distillation (CiD) that incorporates SR task-specific loss into score distillation to leverage more SR priors and align the training target. Furthermore, we extend CiD with adversarial learning and representation alignment (CiDA) to enhance perceptual quality and accelerate training. We also polish the pipeline to achieve a more efficient inference. Experimental results demonstrate that GenDR achieves state-of-the-art performance in both quantitative metrics and visual fidelity.


[91] 2503.20184

Spectrum from Defocus: Fast Spectral Imaging with Chromatic Focal Stack

Hyperspectral cameras face harsh trade-offs between spatial, spectral, and temporal resolution in inherently low-photon conditions. Computational imaging systems break through these trade-offs with compressive sensing, but have required complex optics and/or extensive compute. We present Spectrum from Defocus (SfD), a chromatic focal sweep method that achieves state-of-the-art hyperspectral imaging with only two off-the-shelf lenses, a grayscale sensor, and less than one second of reconstruction time. By capturing a chromatically-aberrated focal stack that preserves nearly all incident light, and reconstructing it with a fast physics-based iterative algorithm, SfD delivers sharp, accurate hyperspectral images. The combination of photon efficiency, optical simplicity, and physical interpretability makes SfD a promising solution for fast, compact, interpretable hyperspectral imaging.


[92] 2507.10775

A New Dataset and Performance Benchmark for Real-time Spacecraft Segmentation in Onboard Computers

Spacecraft deployed in outer space are routinely subjected to various forms of damage due to exposure to hazardous environments. In addition, there are significant risks to the subsequent process of in-space repairs through human extravehicular activity or robotic manipulation, incurring substantial operational costs. Recent developments in image segmentation could enable the development of reliable and cost-effective autonomous inspection systems. While these models often require large amounts of training data to achieve satisfactory results, publicly available annotated spacecraft segmentation data are very scarce. Here, we present a new dataset of nearly 64k annotated spacecraft images that was created using real spacecraft models, superimposed on a mixture of real and synthetic backgrounds generated using NASA's TTALOS pipeline. To mimic camera distortions and noise in real-world image acquisition, we also added different types of noise and distortion to the images. Our dataset includes images with several real-world challenges, including noise, camera distortions, glare, varying lighting conditions, varying field of view, partial spacecraft visibility, brightly-lit city backgrounds, densely patterned and confounding backgrounds, aurora borealis, and a wide variety of spacecraft geometries. Finally, we finetuned YOLOv8 and YOLOv11 models for spacecraft segmentation to generate performance benchmarks for the dataset under well-defined hardware and inference time constraints to mimic real-world image segmentation challenges for real-time onboard applications in space on NASA's inspector spacecraft. The resulting models, when tested under these constraints, achieved a Dice score of 0.92, Hausdorff distance of 0.69, and an inference time of about 0.5 second. The dataset and models for performance benchmark are available at this https URL.


[93] 2508.14600

Energy Injection Identification enabled Disaggregation with Deep Multi-Task Learning

Non-Intrusive Load Monitoring (NILM) offers a cost-effective method to obtain fine-grained appliance-level energy consumption in smart homes and building applications. However, the increasing adoption of behind-the-meter (BTM) energy sources such as solar panels and battery storage poses new challenges for conventional NILM methods that rely solely on at-the-meter data. The energy injected from the BTM sources can obscure the power signatures of individual appliances, leading to a significant decrease in NILM performance. To address this challenge, we present DualNILM, a deep multi-task learning framework designed for the dual tasks of appliance state recognition and injected energy identification. Using a Transformer-based architecture that integrates sequence-to-point and sequence-to-sequence strategies, DualNILM effectively captures multiscale temporal dependencies in the aggregate power consumption patterns, allowing for accurate appliance state recognition and energy injection identification. Extensive evaluation on self-collected and synthesized datasets demonstrates that DualNILM maintains an excellent performance for dual tasks in NILM, much outperforming conventional methods. Our work underscores the framework's potential for robust energy disaggregation in modern energy systems with renewable penetration. Synthetic photovoltaic augmented datasets with realistic injection simulation methodology are open-sourced at this https URL.


[94] 2509.18046

HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba

End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. A concise six-term reward balances contact quality, swing smoothness, foot placement, posture, and body stability while implicitly promoting energy saving. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. To our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy.


[95] 2509.25275

VoiceBridge: Designing Latent Bridge Models for General Speech Restoration at Scale

Bridge models have recently been explored for speech enhancement tasks such as denoising, dereverberation, and super-resolution, while these efforts are typically confined to a single task or small-scale datasets, with constrained general speech restoration (GSR) capability at scale. In this work, we introduce VoiceBridge, a GSR system rooted in latent bridge models (LBMs), capable of reconstructing high-fidelity speech at full-band (\textit{i.e.,} 48~kHz) from various distortions. By compressing speech waveform into continuous latent representations, VoiceBridge models the~\textit{diverse LQ-to-HQ tasks} (namely, low-quality to high-quality) in GSR with~\textit{a single latent-to-latent generative process} backed by a scalable transformer architecture. To better inherit the advantages of bridge models from the data domain to the latent space, we present an energy-preserving variational autoencoder, enhancing the alignment between the waveform and latent space over varying energy levels. Furthermore, to address the difficulty of HQ reconstruction from distinctively different LQ priors, we propose a joint neural prior, uniformly alleviating the reconstruction burden of LBM. At last, considering the key requirement of GSR systems, human perceptual quality, a perceptually aware fine-tuning stage is designed to mitigate the cascading mismatch in generation while improving perceptual alignment. Extensive validation across in-domain and out-of-domain tasks and datasets (\textit{e.g.}, refining recent zero-shot speech and podcast generation results) demonstrates the superior performance of VoiceBridge. Demo samples can be visited at: this https URL.


[96] 2510.03534

Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.


[97] 2510.15198

HyperAIRI: a plug-and-play algorithm for precise hyperspectral image reconstruction in radio interferometry

The next-generation radio-interferometric (RI) telescopes require imaging algorithms capable of forming high-resolution high-dynamic-range images from large data volumes spanning wide frequency bands. Recently, AIRI, a plug-and-play (PnP) approach taking the forward-backward algorithmic structure (FB), has demonstrated state-of-the-art performance in monochromatic RI imaging by alternating a data-fidelity step with a regularization step via learned denoisers. In this work, we introduce HyperAIRI, its hyperspectral extension, underpinned by learned hyperspectral denoisers enforcing a power-law spectral model. For each spectral channel, the HyperAIRI denoiser takes as input its current image estimate, alongside estimates of its two immediate neighboring channels and the spectral index map, and provides as output its associated denoised image. To ensure convergence of HyperAIRI, the denoisers are trained with a Jacobian regularization enforcing non-expansiveness. To accommodate varying dynamic ranges, we assemble a shelf of pre-trained denoisers, each tailored to a specific dynamic range. At each HyperAIRI iteration, the spectral channels of the target image cube are updated in parallel using dynamic-range-matched denoisers from the pre-trained shelf. The denoisers are also endowed with a spatial image faceting functionality, enabling scalability to varied image sizes. Additionally, we formally introduce Hyper-uSARA, a variant of the optimization-based algorithm HyperSARA, promoting joint sparsity across spectral channels via the $\ell_{2,1}$-norm, also adopting FB. We evaluate HyperAIRI's performance on simulated and real observations. We showcase its superior performance compared to its optimization-based counterpart Hyper-uSARA, CLEAN's hyperspectral variant in WSClean, and the monochromatic imaging algorithms AIRI and uSARA.


[98] 2510.18082

Provably Optimal Reinforcement Learning under Safety Filtering

Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.


[99] 2511.06994

Experimental Validation of Reflective Near-Field Beamfocusing using a b-bit RIS

This paper presents the first experimental validation of reflective near-field beamfocusing using a reconfigurable intelligent surface (RIS). While beamfocusing has been theoretically established as a key feature of large-aperture RISs, its practical realization has remained unexplored. We derive new analytical expressions for the array gain achieved with a $b$-bit RIS in near-field line-of-sight scenarios, characterizing both the finite depth and angular width of the focal region. The theoretical results are validated through a series of measurements in an indoor office environment at 28 GHz using a one-bit 1024-element RIS. The experiments confirm that beamfocusing can be dynamically achieved and accurately predicted by the proposed simplified analytical model, despite the presence of hardware imperfections and multipath propagation. These findings demonstrate that near-field beamfocusing is a robust and practically viable feature of RIS-assisted wireless communications.


[100] 2512.01702

A unified framework for geometry-independent operator learning in cardiac electrophysiology simulations

Learning neural operators on heterogeneous and irregular geometries remains a fundamental challenge, as existing approaches typically rely on structured discretisations or explicit mappings to a shared reference domain. We propose a unified framework for geometry-independent operator learning that reformulates the learning problem in an intrinsic coordinate space defined on the underlying manifold. By expressing both inputs and outputs in this shared coordinate domain, the framework decouples operator learning from mesh discretisation and geometric variability, while preserving meaningful spatial organisation and enabling faithful reconstruction on the original geometry. We demonstrate the framework on cardiac electrophysiology, a particularly challenging setting due to extreme anatomical variability across heart geometries. Leveraging a GPU-accelerated simulation pipeline, we generate large-scale datasets of high-fidelity electrophysiology simulations across diverse patient-specific anatomies and train customised neural operators to predict full-field local activation time maps. The proposed approach outperforms established neural operators on both atrial and ventricular geometries. Beyond cardiac electrophysiology, we further show that the same representation enables operator learning in cardiac biomechanics, a distinct problem involving volumetric deformation, highlighting the generality of the proposed framework. Together, these results establish intrinsic coordinate representations as a principled and extensible pathway for neural operator learning on complex physical systems characterised by heterogeneous geometry.


[101] 2601.06081

First Multi-Constellation Observations of Navigation Satellite Signals in the Lunar Domain by Post-Processing L1/L5 IQ Snapshots

The use of Global Navigation Satellite Systems (GNSS) to increase spacecraft autonomy for orbit determination has gained renewed momentum following the Lunar GNSS Receiver Experiment (LuGRE), which demonstrated feasible onboard GPS and Galileo signal reception and tracking at lunar distances. This work processes in-phase and quadrature (IQ) snapshots collected by the LuGRE receiver in cis-lunar space and on the lunar surface to assess multi-frequency, multi-constellation signal availability. Signals from additional systems beyond GPS and Galileo, including RNSS and SBAS constellations, are observable and successfully acquired exclusively in the recorded IQ snapshots. These observations provide the first experimental evidence that signals from multiple constellations, including systems not supported by LuGRE realtime operations, are detectable at unprecedented distances from Earth. Useful observables can be extracted from the IQ snapshots, despite minimal sampling rates, 4-bit quantization, and short durations (200 ms-2 s), through a hybrid coherent/non-coherent acquisition stage compensating for code Doppler. These observations are exploited to tune simulation tools and to perform extended simulation campaigns, showing that the inclusion of additional constellations significantly improves availability; for a 26 dB-Hz acquisition threshold, the fraction of epochs with at least four visible satellites increases from 11% to 46% of the total epoch count. These findings indicate that BeiDou, RNSS, and SBAS signals can substantially enhance GNSS-based autonomy for lunar and cislunar missions.


[102] 2601.11827

Shortest-Path Flow Matching with Mixture-Conditioned Bases for OOD Generalization to Unseen Conditions

Robust generalization under distribution shift remains a key challenge for conditional generative modeling: conditional flow-based methods often fit the training conditions well but fail to extrapolate to unseen ones. We introduce SP-FM, a shortest-path flow-matching framework that improves out-of-distribution (OOD) generalization by conditioning both the base distribution and the flow field on the condition. Specifically, SP-FM learns a condition-dependent base distribution parameterized as a flexible, learnable mixture, together with a condition-dependent vector field trained via shortest-path flow matching. Conditioning the base allows the model to adapt its starting distribution across conditions, enabling smooth interpolation and more reliable extrapolation beyond the observed training range. We provide theoretical insights into the resulting conditional transport and show how mixture-conditioned bases enhance robustness under shift. Empirically, SP-FM is effective across heterogeneous domains, including predicting responses to unseen perturbations in single-cell transcriptomics and modeling treatment effects in high-content microscopy--based drug screening. Overall, SP-FM provides a simple yet effective plug-in strategy for improving conditional generative modeling and OOD generalization across diverse domains.


[103] 2601.18840

Analysis of Control Bellman Residual Minimization for Markov Decision Problem

Markov decision problems are most commonly solved via dynamic programming. Another approach is Bellman residual minimization, which directly minimizes the squared Bellman residual objective function. However, compared to dynamic programming, this approach has received relatively less attention, mainly because it is often less efficient in practice and can be more difficult to extend to model-free settings such as reinforcement learning. Nonetheless, Bellman residual minimization has several advantages that make it worth investigating, such as more stable convergence with function approximation for value functions. While Bellman residual methods for policy evaluation have been widely studied, methods for policy optimization (control tasks) have been scarcely explored. In this paper, we establish foundational results for the control Bellman residual minimization for policy optimization.