New articles on Electrical Engineering and Systems Science


[1] 2603.04438

CogGen: Cognitive-Load-Informed Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction

Fully unsupervised deep generative modeling (FU-DGM) is promising for compressively sampled MRI (CS-MRI) when training data or compute are limited. Classical FU-DGMs such as DIP and INR rely on architectural priors, but the ill-conditioned inverse problem often demands many iterations and easily overfits measurement noise. We propose CogGen, a cognitive-load-informed FU-DGM that casts CS-MRI as staged inversion and regulates task-side "cognitive load" by progressively scheduling intrinsic difficulty and extraneous interference. CogGen replaces uniform data fitting with an easy-to-hard k-space weighting/selection strategy: early iterations emphasize low-frequency, high-SNR, structure-dominant samples, while higher-frequency or noise-dominated measurements are introduced later. We realize this schedule via self-paced curriculum learning with complementary student-mode (what the model can currently learn) and teacher-mode (what it should follow) criteria, supporting both soft weighting and hard selection. Experiments and analysis show that CogGen-DIP and CogGen-INR improve fidelity and convergence over strong unsupervised baselines and competitive supervised pipelines.


[2] 2603.04603

Risk-Aware Rulebooks for Multi-Objective Trajectory Evaluation under Uncertainty

We present a risk-aware formalism for evaluating system trajectories in the presence of uncertain interactions between the system and its environment. The proposed formalism supports reasoning under uncertainty and systematically handles complex relationships among requirements and objectives, including hierarchical priorities and non-comparability. Rather than treating the environment as exogenous noise, we explicitly model how each system trajectory influences the environment and evaluate trajectories under the resulting distribution of environment responses. We prove that the formalism induces a preorder on the set of system trajectories, ensuring consistency and preventing cyclic preferences. Finally, we illustrate the approach with an autonomous driving example that demonstrates how the formalism enhances explainability by clarifying the rationale behind trajectory selection.


[3] 2603.04605

Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings

Training-free anomalous sound detection (ASD) based on pre-trained audio embedding models has recently garnered significant attention, as it enables the detection of anomalous sounds using only normal reference data while offering improved robustness under domain shifts. However, existing embedding-based approaches almost exclusively rely on temporal mean pooling, while alternative pooling strategies have so far only been explored for spectrogram-based representations. Consequently, the role of temporal pooling in training-free ASD with pre-trained embeddings remains insufficiently understood. In this paper, we present a systematic evaluation of temporal pooling strategies across multiple state-of-the-art audio embedding models. We propose relative deviation pooling (RDP), an adaptive pooling method that emphasizes informative temporal deviations, and introduce a hybrid pooling strategy that combines RDP with generalized mean pooling. Experiments on five benchmark datasets demonstrate that the proposed methods consistently outperform mean pooling and achieve state-of-the-art performance for training-free ASD, including results that surpass all previously reported trained systems and ensembles on the DCASE2025 ASD dataset.


[4] 2603.04626

Joint Visible Light and RF Backscatter Communications for Ambient IoT Network: Fundamentals, Applications, and Opportunities

The rapid growth of the Internet of Things (IoT) devices in the sixth-generation (6G) wireless networks raises significant generality and scalability challenges due to energy consumption, deployment complexity, and environmental impact. Ambient IoT (A-IoT), leveraging ambient energy harvesting (EH) for batteryless device operation, has emerged as a promising solution to address these this http URL various EH and communication techniques, visible light communication (VLC) integrated with ambient backscatter communication (AmBC) offers remarkable advantages, including energy neutrality, high reliability, and enhanced security. In this paper, we propose a joint VLC-AmBC architecture, emphasizing fundamental concepts, system designs, and practical implementations. We explore potential applications in environmental monitoring, healthcare, smart logistics, and secure communications. We present proof-of-concept demonstrations for three distinct types of ambient backscatter devices (AmBDs): EH-Only, VLC-Relay, and VLC-Control. Experimental results demonstrate the feasibility of implementing joint VLC-AmBC systems, highlighting their practical viability across various deployment scenarios. Finally, we outline future research directions, including integrated sensing and communication, as well as optimized energy-efficient deployment. Open issues, such as large-scale deployment challenges, are also discussed, thereby providing a clear roadmap for future developments in joint VLC-AmBC-enabled A-IoT ecosystems.


[5] 2603.04661

On boundedness of solutions of three-state Moore-Greitzer compressor model with nonlinear proportional-integral controller for the surge subsystem

The work focuses on Lagrange stability of the origin for the three-state Moore-Greitzer compressor model in closed loop with a nonlinear PI controller, tuned only to stabilize a lower-dimensional invariant surge-dynamics this http URL linearization of the system is not stabilizable but the static nonlinearity satisfies a sector condition, and together with a structural property of the stall-dynamics subsystem, this plays an essential role in the analysis. The main contribution provides explicit conditions on the controller parameters together with analytical arguments that guarantee boundedness of all solutions of the closed-loop system. The analysis employs a non-standard application of circle-criterion-based arguments. Together with the additional arguments developed in the work, this stability test also shows that the closed-loop system is robust to certain perturbations and model uncertainties.


[6] 2603.04684

Exploiting Segmented Waveguide-Enabled Pinching-Antenna Systems (SWANs) for Uplink Tri-Hybrid Beamforming

A segmented waveguide-enabled pinching-antenna system (SWAN)-based tri-hybrid beamforming architecture is proposed for uplink multi-user MIMO communications, which jointly optimizes digital, analog, and pinching beamforming. Both fully-connected (FC) and partially-connected (PC) structures between RF chains and segment feed points are considered. For the FC architecture, tri-hybrid beamforming is optimized using the weighted minimum mean-square error (WMMSE) and zero-forcing (ZF) approaches. Specifically, the digital, analog, and pinching beamforming components are optimized via a closed-form solution, Riemannian manifold optimization, and a Gauss-Seidel search, respectively. For the PC architecture, an interleaved topology tailored to the SWAN receiver is proposed, in which segments assigned to each RF chain (sub-array) are interleaved with those from other sub-arrays. Based on this structure, a WMMSE-based tri-hybrid design is developed, in which the Riemannian-manifold update used for the FC structure is replaced by element-wise phase calibration to exploit sparsity in analog beamforming. To gain insight into the performance of the proposed system, the rate-scaling laws with respect to the number of segments are derived for both the FC and PC structures. Our results demonstrate that: i)~SWAN with the proposed tri-hybrid beamforming consistently outperforms conventional hybrid beamforming and conventional pinching-antenna systems with pinching beamforming for both the FC and PC structures; and ii)~the PC structure can strike a good balance between sum rate and energy consumption when the number of segments is large; and iii) the achievable rate does not necessarily increase with the number of segments.


[7] 2603.04699

Intensity Fluctuation Spectra as a Design Guide for Nonlinear-Tolerant Constellation Shaping

Nonlinearity in coherent fiber links is fundamentally driven by the temporal statistics and spectral structure of signal intensity. This paper develops a unified framework that links block-level energy statistics of shaped constellations to the low-frequency features of the intensity-fluctuation power spectral density (PSD), thereby enabling spectral-temporal co-design for nonlinear mitigation. A semi-analytical PSD model is derived for finitely block-shaped symbols (including Constant Composition Distribution Matching (CCDM) and Enumerative Sphere Shaping (ESS)), explicitly exposing contributions from self-beating dependent on symbol energy variance, inter-symbol beating dependent on mean symbol energy, and block-induced energy variance terms. A compact expression for the spectral-dip width is obtained that captures the block length, symbol rate, pulse roll-off, and chromatic dispersion. This yields design rules for lowering the low-frequency content. The low-frequency content most strongly drives the induced XPM. Resulting optimal symbol-rate laws are provided for shaped and unshaped systems, and are validated by Monte-Carlo simulations, which also confirm the distinct low-frequency behaviour of CCDM (suppressed DC) versus ESS (finite DC pedestal at moderate block lengths). The framework consolidates prior time- and frequency-domain views and supplies actionable guidance for choosing block length, symbol rate, and shaping method to reduce nonlinear interference in high-capacity WDM systems.


[8] 2603.04764

MIMO Channel Prediction via Deep Learning-based Conformal Bayes Filter

Channel prediction has emerged as an effective solution for acquiring accurate channel state information (CSI) in the presense of channel aging. Existing methods have inherent limitations, with conventional Kalman filter (KF)-based approach being vulnerable to model mismatch and deep learning (DL)-based approaches producing overconfident predictions. To address these issues, we propose a DL-based conformal Bayes filter (DCBF) that integrates DL-based prediction, conformal quantile regression (CQR), and Bayesian filtering. The proposed framework enables principled fusion of calibrated priors and observations, yielding reliable channel predictions with the calibrated uncertainty. Simulation results demonstrate that DCBF significantly improves DL-based prediction and outperforms the KF-based method.


[9] 2603.04813

Detection of GNSS Interference Using Reflected Signal Ob-servations from the LEO Satellite Constellation

Radio Frequency Interference (RFI) is a growing concern for Global Navigation Satellite System (GNSS) reliability. The Cyclone GNSS (CYGNSS) constellation, designed for ocean wind retrieval via GNSS reflectometry (GNSS-R), provides Delay-Doppler Maps (DDMs) with noise floor metrics exploitable for spaceborne RFI detection. This study proposes a maximum-based DDM noise floor strategy that selects the highest noise floor value among four simultaneous GNSS reflections at each 0.5-second epoch, rather than their mean, preventing dilution of anomalous signals by unaffected channels. To suppress false alarms, a two-tier verification framework is introduced: (1) multi-satellite concurrent detection, confirming RFI when two or more CYGNSS satellites independently flag the same geographic region, and (2) temporal persistence verification, confirming a single-satellite detection only if threshold exceedance persists over a 10-second window. The physical basis for this criterion is established through slant-range geometry analysis between a ground-based jammer and the orbiting satellite. Performance is evaluated using CYGNSS Level 1 data from May 2025 in two regions: White Sands Missile Range, where NOTAM-announced GPS jamming tests were conducted, and the Middle East, where persistent RFI has been documented. The proposed method is compared against NASA's kurtosis-based RFI flags and a mean-based noise floor method. Results show that it detected RFI on three dates where the other methods produced negligible detections, and flagged 62% of total epochs in the Middle East compared to 46% (mean-based) and 33% (kurtosis-based). It also demonstrated capability to detect the early onset of gradually intensifying interference and atypical abnormal patterns not previously reported, highlighting the potential of maximum-based DDM noise floor analysis for sensitive and reliable spaceborne RFI detection.


[10] 2603.04840

An Approach to Simultaneous Acquisition of Real-Time MRI Video, EEG, and Surface EMG for Articulatory, Brain, and Muscle Activity During Speech Production

Speech production is a complex process spanning neural planning, motor control, muscle activation, and articulatory kinematics. While the acoustic speech signal is the most accessible product of the speech production act, it does not directly reveal its causal neurophysiological substrates. We present the first simultaneous acquisition of real-time (dynamic) MRI, EEG, and surface EMG, capturing several key aspects of the speech production chain: brain signals, muscle activations, and articulatory movements. This multimodal acquisition paradigm presents substantial technical challenges, including MRI-induced electromagnetic interference and myogenic artifacts. To mitigate these, we introduce an artifact suppression pipeline tailored to this tri-modal setting. Once fully developed, this framework is poised to offer an unprecedented window into speech neuroscience and insights leading to brain-computer interface advances.


[11] 2603.04866

The Vertical Challenge of Low-Altitude Economy: Why We Need a Unified Height System?

The explosive growth of the low-altitude economy, driven by eVTOLs and UAVs, demands a unified digital infrastructure to ensure safety and scalability. However, the current aviation vertical references are dangerously fragmented: manned aviation relies on barometric pressure, cartography uses Mean Sea Level (MSL), and obstacle avoidance depends on Above Ground Level (AGL). This fragmentation creates significant ambiguity for autonomous systems and hinders cross-stakeholder interoperability. In this article, we propose Height Above Ellipsoid (HAE) as the standardized vertical reference for lower airspace. Unlike legacy systems prone to environmental drift and inconsistent datums, HAE provides a globally consistent, GNSS-native, and mathematically stable reference. We present a pragmatic bidirectional transformation framework to bridge HAE with legacy systems and demonstrate its efficacy through (1) real-world implementation in Shenzhen's partitioned airspace management, and (2) a probabilistic risk assessment driven by empirical flight logs from the PX4 ecosystem. Results show that transitioning to HAE reduces the required vertical separation minimum, effectively increasing dynamic airspace capacity while maintaining a target safety level. This work offers a roadmap for transitioning from analog height keeping to a digital-native vertical standard.


[12] 2603.04906

A Method to Derate the Rate-Dependency in the Pass-Band Droop of Comb Decimators

This paper presents a method to derate the dependency on the decimation factor, $M$, of the pass-band droop inherent to $N$-th ordered comb decimators. It is achieved by cascading a symmetric 3-tap FIR filter in the integral stage of the corresponding comb decimator and choosing the coefficients only as a function of order $N$. The proposed derating method derived from the conventional comb decimator can be readily applied to any recently developed comb decimator and droop-compensation filter design method.


[13] 2603.04926

HoloPASWIN: Robust Inline Holographic Reconstruction via Physics-Aware Swin Transformers

In-line digital holography (DIH) is a widely used lensless imaging technique, valued for its simplicity and capability to image samples at high throughput. However, capturing only intensity of the interference pattern during the recording process gives rise to some unwanted terms such as cross-term and twin-image. The cross-term can be suppressed by adjusting the intensity of reference wave, but the twin-image problem remains. The twin-image is a spectral artifact that superimposes a defocused conjugate wave onto the reconstructed object, severely degrading image quality. While deep learning has recently emerged as a powerful tool for phase retrieval, traditional Convolutional Neural Networks (CNNs) are limited by their local receptive fields, making them less effective at capturing the global diffraction patterns inherent in holography. In this study, we introduce HoloPASWIN, a physics-aware deep learning framework based on the Swin Transformer architecture. By leveraging hierarchical shifted-window attention, our model efficiently captures both local details and long-range dependencies essential for accurate holographic reconstruction. We propose a comprehensive loss function that integrates frequency-domain constraints with physical consistency via a differentiable angular spectrum propagator, ensuring high spectral fidelity. Validated on a large-scale synthetic dataset of 25,000 samples with diverse noise configurations (speckle, shot, read, and dark noise), HoloPASWIN demonstrates effective twin-image suppression and robust reconstruction quality.


[14] 2603.04962

Design of Grid Forming Multi Timescale Coordinated Control Strategies for Dynamic Virtual Power Plants

As the penetration level of distributed energy resources (DERs) continues to rise, traditional frequency and voltage support from synchronous machines declines. This weakens grid stability and increases the need for fast and adaptive control in a dynamic manner, especially in weak grids. However, most virtual power plants (VPPs) rely on static aggregation and plan based resource allocation strategies. These methods overlook differences in device response times and limit flexibility for ancillary services. To address this issue, we propose a dynamic virtual power plant (DVPP) that coordinates heterogeneous resources across multiple time scales using grid forming control. We first contrast grid following and grid forming converters: grid following designs rely on a phase locked loop which can undermine stability in weak grids, whereas our DVPP applies virtual synchronous generator control at the aggregate level to provide effective inertia and damping. Then, we introduce a dynamic participation factor framework that measures each device s contribution through the frequency active power and voltage reactive power loops. Exploiting device heterogeneity, we adopt a banded allocation strategy: slow resources manage steady state and low frequency regulation; intermediate resources smooth transitions; and fast resources deliver rapid response and high frequency damping. Comparative simulations demonstrate that this coordinated, timescale aware approach enhances stability and ancillary service performance compared to conventional VPPs.


[15] 2603.04988

A Unified Hybrid Control Architecture for Multi-DOF Robotic Manipulators

Multi-degree-of-freedom (DOF) robotic manipulators exhibit strongly nonlinear, high-dimensional, and coupled dynamics, posing significant challenges for controller design. To address these issues, this work proposes a unified hybrid control architecture that integrates model predictive control (MPC) with feedback regulation, together with a stability analysis of the proposed scheme. The proposed approach mitigates the optimization difficulty associated with high-dimensional nonlinear systems and enhances overall control performance. Furthermore, a hardware implementation scheme based on machine learning (ML) is proposed to achieve high computational efficiency while maintaining control accuracy. Finally, simulation and hardware experiments under external disturbances validate the proposed architecture, demonstrating its superior performance, hardware feasibility, and generalization capability for multi-DOF manipulation tasks.


[16] 2603.05011

Receding-Horizon Maximum-Likelihood Estimation of Neural-ODE Dynamics and Thresholds from Event Cameras

Event cameras emit asynchronous brightness-change events where each pixel triggers an event when the last event exceeds a threshold, yielding a history-dependent measurement model. We address online maximum-likelihood identification of continuous-time dynamics from such streams. The latent state follows a Neural ODE and is mapped to predicted log-intensity through a differentiable state-to-image model. We model events with a history-dependent marked point process whose conditional intensity is a smooth surrogate of contrast-threshold triggering, treating the contrast threshold as an unknown parameter. The resulting log-likelihood consists of an event term and a compensator integral. We propose a receding-horizon estimator that performs a few gradient steps per update on a receding horizon window. For streaming evaluation, we store two scalars per pixel (last-event time and estimated log-intensity at that time) and approximate the compensator via Monte Carlo pixel subsampling. Synthetic experiments demonstrate joint recovery of dynamics parameters and the contrast threshold, and characterize accuracy--latency trade-offs with respect to the window length.


[17] 2603.05021

Formal Entropy-Regularized Control of Stochastic Systems

Analyzing and controlling system entropy is a powerful tool for regulating predictability of control systems. Applications benefiting from such approaches range from reinforcement learning and data security to human-robot collaboration. In continuous-state stochastic systems, accurate entropy analysis and control remains a challenge. In recent years, finite-state abstractions of continuous systems have enabled control synthesis with formal performance guarantees on objectives such as stage costs. However, these results do not extend to entropy-based performance measures. We solve this problem by first obtaining bounds on the entropy of system discretizations using traditional formal-abstractions results, and then obtaining an additional bound on the difference between the entropy of a continuous distribution and that of its discretization. The resulting theory enables formal entropy-aware controller synthesis that trades predictability against control performance while preserving formal guarantees for the original continuous system. More specifically, we focus on minimizing the linear combination of the KL divergence of the system trajectory distribution to uniform -- our system entropy metric -- and a generic cumulative cost. We note that the bound we derive on the difference between the KL divergence to uniform of a given continuous distribution and its discretization can also be relevant in more general information-theoretic contexts. A set of case studies illustrates the effectiveness of the method.


[18] 2603.05023

Label Hijacking in Track Consensus-Based Distributed Multi-Target Tracking

Distributed multi-target tracking (DMTT) in limited field-of-view (FoV) sensor networks commonly suffers from label inconsistency, whereby different nodes disagree on the identity of the same target. Recent track-consensus DMTT (TC-DMTT) strategies mitigate this issue by enforcing kinematic and label agreement through metric-based track matching. Nevertheless, their behavior under adversarial conditions remains largely unexplored. In this paper, we reveal identity-level vulnerabilities in TC-DMTT and introduce the concept of label hijacking: an attack in which an adversary injects spoofed tracks to corrupt target identities across the network. Drawing on an analogy to classical pull-off deception in radar, we formalize a notion of attack stealthiness and derive an optimization-based strategy for crafting such attacks. A three-sensor network case study demonstrates the impact of the proposed attack on label consistency and tracking accuracy, showing successful target impersonation. Overall, this work highlights the need to rethink robustness at the consensus layer in DMTT frameworks.


[19] 2603.05091

Voice Timbre Attribute Detection with Compact and Interpretable Training-Free Acoustic Parameters

Voice timbre attribute detection (vTAD) is the task of determining the relative intensity of timbre attributes between speech utterances. Voice timbre is a crucial yet inherently complex component of speech perception. While deep neural network (DNN) embeddings perform well in speaker modelling, they often act as black-box representations with limited physical interpretability and high computational cost. In this work, a compact acoustic parameter set is investigated for vTAD. The set captures important acoustic measures and their temporal dynamics which are found to be crucial in the task. Despite its simplicity, the acoustic parameter set is competitive, outperforming conventional cepstral features and supervised DNN embeddings, and approaching state-of-the-art self-supervised models. Importantly, the studied set require no trainable parameters, incur negligible computation, and offer explicit interpretability for analysing physical traits behind human timbre perception.


[20] 2603.05115

Trajectory Tracking for Uncrewed Surface Vessels with Input Saturation and Dynamic Motion Constraints

This work addresses the problem of constrained motion control of the uncrewed surface vessels. The constraints are imposed on states/inputs of the vehicles due to the physical limitations, mission requirements, and safety considerations. We develop a nonlinear feedback controller utilizing log-type Barrier Lyapunov Functions to enforce static and dynamic motion constraints. The proposed scheme uniquely addresses asymmetric constraints on position and heading alongside symmetric constraints on surge, sway, and yaw rates. Additionally, a smooth input saturation model is incorporated in the design to guarantee stability even under actuator bounds, which, if unaccounted for, can lead to severe performance degradation and poor tracking. Rigorous Lyapunov stability analysis shows that the closed-loop system remains stable and that all state variables remain within their prescribed bounds at all times, provided the initial conditions also lie within those bounds. Numerical simulations demonstrate the effectiveness of the proposed strategies for surface vessels without violating the motion and actuator constraints.


[21] 2603.05127

A Fully Open-source Implementation of an Analog 8-PAM Demapper for High-speed Communications

Spectrally-efficient communication systems rely on the use of multi-level modulation formats. At the receiver side, a demodulator is often used to extract soft information about the transmitted bits. Such a demodulator is typically implemented in the digital domain. However, analog implementations of such demodulators are also possible. In this paper, we design and simulate an analog 8-ary pulse-amplitude modulation (8-PAM) demapper in IHP SG13G2 SiGe BiCMOS technology. We generalize and improve a design available in the literature for 4-PAM. A fully MOSFET-based 8-PAM design is proposed. Our simulations and design are completely based on open-source IC design tools. Our results show an energy efficiency of 0.33 pJ/bit for a data rate of 1Gbit/s.


[22] 2603.05128

PolyBench: A Benchmark for Compositional Reasoning in Polyphonic Audio

Large Audio Language Models (LALMs) are increasingly capable of reasoning over audio. However, existing benchmarks provide limited coverage of reasoning in polyphonic audio, where multiple sound events co-occur and induce compositional structure. In this work, we introduce PolyBench, a benchmark designed to evaluate compositional reasoning in polyphonic audio. PolyBench comprises five evaluation subsets covering counting, classification, detection, concurrency, and duration estimation, requiring reasoning over multiple concurrent events and their relations. Evaluation of state-of-the-art LALMs reveals consistent performance degradation in polyphonic audio, indicating a fundamental bottleneck in current LALMs.


[23] 2603.05133

Anti-Aliasing Snapshot HDR Imaging Using Non-Regular Sensing

Snapshot HDR imaging is essential to capture the full dynamic range of a scene in a single exposure, making it essential for video and dynamic environments where motion prevents the use of multi-exposure techniques or complex hardware set-ups. This work presents a snapshot HDR imaging sensor that is based on spatially varying apertures, implemented by combining two differently sized prototype pixels. The different light integration areas physically extend the dynamic range towards the lower end, compared to a standard high resolution sensor. A non-regular pixel arrangement is suggested, to mitigate aliasing and overcome a loss in spatial resolution that is associated with increased light integration area of the larger prototype pixel. Subsequent reconstruction in the Fourier domain, where natural images can be sparsely represented allows to recover the image with high detail. The image acquisition approach with the proposed non-regular HDR sensor is simulated and analysed with special emphasis on the spatial resolution. The results suggest the snapshot HDR sensor layout to be an effective way to acquire images with high dynamic range and free from aliasing artefacts.


[24] 2603.05154

Revitalizing AR Process Simulation of Non-Gaussian Radar Clutter via Series-Based Analytic Continuation

Due to the conceptual simplicity, the linear filtering framework, notably the autoregressive (AR) process, has a long history in simulating clutter sequences with specified probability density functions (PDFs) and autocorrelation functions (ACFs). However, linear filtering inevitably distorts the input distribution, which may lead to inaccurate PDF reproduction or restrict applicability to very simple ACFs. To address these challenges, this study proposes a series-based analytic continuation strategy that revitalizes AR process clutter simulation by accurately precomputing the input pre-distortion required to compensate for AR filtering. First, the moments and cumulants of the AR input are derived based on the input-output relationship of the AR process, facilitating the moment and cumulant expansions of the Laplace transform (LT) and the logarithmic LT around zero, respectively. Second, both series expansions are analytically continued via the Padé approximation (PA) to recover the LT over the full complex plane. Notably, the PA-based continuation of the moment expansion, a conventional choice, can be highly inaccurate when the LT exhibits strong oscillations. By contrast, given the logarithmic LT generally has a simpler structure, the continuation of the cumulant expansion provides a more stable and accurate alternative. Third, the LT recovered from the cumulant expansion facilitates fast simulation of the AR input non-Gaussian white sequence via a random variable transformation method, thereby enabling an efficient AR process. Finally, simulations demonstrate that the proposed strategy enables accurate and fast simulation of non-Gaussian correlated clutter sequences.


[25] 2603.05169

Uncertainty and Autarky: Cooperative Game Theory for Stable Local Energy Market Partitioning

Local energy markets empower prosumers to form coalitions for energy trading. However, the optimal partitioning of the distribution grid into such coalitions remains unclear, especially in constrained grids with stochastic production and consumption. This analysis must take into account the interests of both the grid operator and the constituent prosumers. In this work, we present a cooperative game theoretic framework to study distribution grid partitioning into local energy market coalitions under uncertain prosumption and grid constraints. We formulate the optimal stable partitioning problem to balance the interests of the grid operator with that of prosumers. Under deterministic load and generation, we show that the largest market coalition is the optimal stable partition. For the case of stochastic loads and generation, we provide an algorithm to evaluate the optimal stable partition. Numerical experiments are performed on benchmark and real world distribution grids. Our results help in understanding how uncertainty affects local energy market partitioning decisions in constrained distribution grids.


[26] 2603.05183

Limited-Angle CT Reconstruction Using Multi-Volume Latent Consistency Model

Limited-angle computed tomography (LACT) reconstruction is an inverse problem with severe ill-posedness arising from missing projection angles, and it is difficult to restore high-precision images without sufficient prior knowledge. In recent years, machine learning methods represented by diffusion models have demonstrated high image generation capabilities. However, accurate restoration of three-dimensional structures of organs and vessels and preservation of contrast remain challenges, and the impact of differences in diverse clinical imaging conditions such as field of view (FOV) and projection angle range on reconstruction accuracy has not been sufficiently investigated. In this study, we propose a multi-volume latent diffusion model that uses three-dimensional latent representations obtained from multiple effective fields of view as guidance for LACT reconstruction in clinical practical problems. The proposed method achieves fast and stable inference by introducing consistency models into latent space, and enables high-precision preservation of organ boundary information and internal structures under different FOV conditions through a Multi-volume encoder that acquires latent variables from different scales of the global region and central region. The evaluation experiments demonstrated that the proposed method achieved high-precision synthetic CT image generation compared to existing methods. Under the limited-angle condition of 60 degrees, MAE of 10.12 HU and SSIM of 0.9677 were achieved, and under the extreme limited-angle condition of 30 degrees, MAE of 16.69 HU and SSIM of 0.9393 were achieved. Furthermore, stable reconstruction performance was demonstrated even for unknown projection angle conditions not included during training, confirming the applicability to diverse imaging conditions in clinical practice.


[27] 2603.05213

BabAR: from phoneme recognition to developmental measures of young children's speech production

Studying early speech development at scale requires automatic tools, yet automatic phoneme recognition, especially for young children, remains largely unsolved. Building on decades of data collection, we curate TinyVox, a corpus of more than half a million phonetically transcribed child vocalizations in English, French, Portuguese, German, and Spanish. We use TinyVox to train BabAR, a cross-linguistic phoneme recognition system for child speech. We find that pretraining the system on multilingual child-centered daylong recordings substantially outperforms alternatives, and that providing 20 seconds of surrounding audio context during fine-tuning further improves performance. Error analyses show that substitutions predominantly fall within the same broad phonetic categories, suggesting suitability for coarse-grained developmental analyses. We validate BabAR by showing that its automatic measures of speech maturity align with developmental estimates from the literature.


[28] 2603.05220

Adaptive Sampling for Storage of Progressive Images on DNA

The short lifespan of traditional data storage media, coupled with an exponential increase in storage demand, has made long-term archival a fundamental problem in the data storage industry and beyond. Consequently, researchers are looking for innovative media solutions that can store data over long time periods at a very low cost. DNA molecules, with their high density, long lifespan, and low energy needs, have emerged as a viable alternative to digital data archival. However, current DNA data storage technologies are facing challenges with respect to cost and reliability. Thus, coding rate and error robustness are critical to scale DNA storage and make it technologically and economically achievable. Moreover, the molecules of DNA that encode different files are often located in the same oligo pool. Without random access solutions at the oligo level, it is very impractical to decode a specific file from these mixed pools, as all oligos need to first be sequenced and decoded before a target file can be retrieved, which greatly deteriorates the read cost. This paper introduces a solution to efficiently encode and store images into DNA molecules, that aims at reducing the read cost necessary to retrieve a resolution-reduced version of an image. This image storage system is based on the Progressive Decoding Functionality of the JPEG2000 codec but can be adapted to any conventional progressive codec. Each resolution layer is encoded into a set of oligos using the JPEG DNA VM codec, a DNA-based coder that aims at retrieving a file with a high reliability. Depending on the desired resolution to be read, the set of oligos as well as the portion of the oligos to be sequenced and decoded are adjusted accordingly. These oligos will be selected at sequencing time, with the help of the adaptive sampling method provided by the Nanopore sequencers, making it a PCR-free random access solution.


[29] 2603.05239

Computing Scaled Relative Graphs of Discrete-time LTI Systems from Data

Graphical methods for system analysis have played a central role in control theory. A recently emerging tool in this field is the Scaled Relative Graph (SRG). In this paper, we further extend its applicability by showing how the SRG of discrete-time linear-time-invariant (LTI) systems can be computed exactly from its state-space representation using linear matrix inequalities. We additionally propose a fully data-driven approach where we demonstrate how to compute the SRG exclusively from input-output data. Furthermore, we introduce a robust version of the SRG, which can be computed from noisy data trajectories and contains the SRG of the actual system.


[30] 2603.05247

ICHOR: A Robust Representation Learning Approach for ASL CBF Maps with Self-Supervised Masked Autoencoders

Arterial spin labeling (ASL) perfusion MRI allows direct quantification of regional cerebral blood flow (CBF) without exogenous contrast, enabling noninvasive measurements that can be repeated without constraints imposed by contrast injection. ASL is increasingly acquired in research studies and clinical MRI protocols. Building on successes in structural imaging, recent efforts have implemented deep learning based methods to improve image quality, enable automated quality control, and derive robust quantitative and predictive biomarkers with ASL derived CBF. However, progress has been limited by variable image quality, substantial inter-site, vendor and protocol differences, and limited availability of labeled datasets needed to train models that generalize across cohorts. To address these challenges, we introduce ICHOR, a self supervised pre-training approach for ASL CBF maps that learns transferable representations using 3D masked autoencoders. ICHOR is pretrained via masked image modeling using a Vision Transformer backbone and can be used as a general-purpose encoder for downstream ASL tasks. For pre-training, we curated one of the largest ASL datasets to date, comprising 11,405 ASL CBF scans from 14 studies spanning multiple sites and acquisition protocols. We evaluated the pre-trained ICHOR encoder on three downstream diagnostic classification tasks and one ASL CBF map quality prediction regression task. Across all evaluations, ICHOR outperformed existing neuroimaging self-supervised pre-training methods adapted to ASL. Pre-trained weights and code will be made publicly available.


[31] 2603.05251

On Dual-Fed Pinching Antenna Systems with In-Waveguide Attenuation

Pinching antenna systems (PAS) have recently emerged as a promising architecture for flexible and reconfigurable wireless communications. However, their performance is fundamentally constrained by in-waveguide attenuation, which is non-negligible in practical dielectric waveguides and can severely degrade the achievable data rate, particularly for long waveguides. To overcome this limitation, we propose a dual-fed PAS (DF-PAS), in which each waveguide is equipped with two feed points located at the two ends, enabling dynamic feed-point selection based on user locations. This design effectively shortens the in-waveguide propagation distance and mitigates attenuation-induced power loss without modifying the waveguide structure or the PA actuation mechanism. We investigate the DF-PAS in both single- and multi-waveguide scenarios. For the single-waveguide case, we derive closed-form high-SNR approximations of the ergodic rate and obtain closed-form solutions for the optimal PA position and feed-point selection under time-division multiple access (TDMA). We then extend DF-PAS to a multi-waveguide scenario, where we first derive closed-form high-SNR approximations of the ergodic rate and then formulate a joint optimization problem over feed-point selection, PA placement, and beamforming under general orthogonal multiple access (OMA). To solve this problem efficiently, we develop a two-phase optimization framework that integrates greedy feed-point switching, gradient-based PA placement, and WMMSE-based beamforming. Simulation results demonstrate that the proposed DF-PAS consistently outperforms conventional single-fed PAS (SF-PAS) across various network configurations, validating its effectiveness as a practical and scalable solution for mitigating in-waveguide attenuation in PAS-enabled wireless networks.


[32] 2603.05270

Visual-Informed Speech Enhancement Using Attention-Based Beamforming

Recent studies have demonstrated that incorporating auxiliary information, such as speaker voiceprint or visual cues, can substantially improve Speech Enhancement (SE) performance. However, single-channel methods often yield suboptimal results in low signal-to-noise ratio (SNR) conditions, when there is high reverberation, or in complex scenarios involving dynamic speakers, overlapping speech, or non-stationary noise. To address these issues, we propose a novel Visual-Informed Neural Beamforming Network (VI-NBFNet), which integrates microphone array signal processing and deep neural networks (DNNs) using multimodal input features. The proposed network leverages a pretrained visual speech recognition model to extract lip movements as input features, which serve for voice activity detection (VAD) and target speaker identification. The system is intended to handle both static and moving speakers by introducing a supervised end-to-end beamforming framework equipped with an attention mechanism. The experimental results demonstrated that the proposed audiovisual system has achieved better SE performance and robustness for both stationary and dynamic speaker scenarios, compared to several baseline methods.


[33] 2603.05363

A Comprehensive Approach to Directly Addressing Estimation Delays in Stochastic Guidance

In realistic pursuit-evasion scenarios, abrupt target maneuvers generate unavoidable periods of elevated uncertainty that result in estimation delays. Such delays can degrade interception performance to the point of causing a miss. Existing delayed-information guidance laws fail to provide a complete remedy, as they typically assume constant and known delays. Moreover, in practice they are fed by filtered estimates, contrary to these laws' foundational assumptions. We present an overarching strategy for tracking and interception that explicitly accounts for time-varying estimation delays. We first devise a guidance law that incorporates two time-varying delays, thereby generalizing prior deterministic formulations. This law is driven by a particle-based fixed-lag smoother that provides it with appropriately delayed state estimates. Furthermore, using semi-Markov modeling of the target's maneuvers, the delays are estimated in real-time, enabling adaptive adjustment of the guidance inputs during engagement. The resulting framework consistently conjoins estimation, delay modeling, and guidance. Its effectiveness and superior robustness over existing delayed-information guidance laws are demonstrated via an extensive Monte Carlo study.


[34] 2603.05441

Near-Optimal Low-Complexity MIMO Detection via Structured Reduced-Search Enumeration

Maximum-likelihood (ML) detection in high-order MIMO systems is computationally prohibitive due to exponential complexity in the number of transmit layers and constellation size. In this white paper, we demonstrate that for practical MIMO dimensions (up to 8x8) and modulation orders, near-ML hard-decision performance can be achieved using a structured reduced-search strategy with complexity linear in constellation size. Extensive simulations over i.i.d. Rayleigh fading channels show that list sizes of 3|X| for 3x3, 4|X| for 4x4, and 8|X| for 8x8 systems closely match full ML performance, even under high channel condition numbers, |X| being the constellation size. In addition, we provide a trellis based interpretation of the method. We further discuss implications for soft LLR generation and FEC interaction.


[35] 2603.04443

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Long-running LLM agents require persistent memory to preserve state across interactions, yet most deployed systems manage memory with age-based retention (e.g., TTL). While TTL bounds item lifetime, it does not bound the computational footprint of memory on the request path: as retained items accumulate, retrieval candidate sets and vector similarity scans can grow unpredictably, yielding heavy-tailed latency and unstable throughput. We present AMV-L (Adaptive Memory Value Lifecycle), a memory-management framework that treats agent memory as a managed systems resource. AMV-L assigns each memory item a continuously updated utility score and uses value-driven promotion, demotion, and eviction to maintain lifecycle tiers; retrieval is restricted to a bounded, tier-aware candidate set that decouples the request-path working set from total retained memory. We implement AMV-L in a full-stack LLM serving system and evaluate it under identical long-running workloads against two baselines: TTL and an LRU working-set policy, with fixed prompt-injection caps. Relative to TTL, AMV-L improves throughput by 3.1x and reduces latency by 4.2x (median), 4.7x (p95), and 4.4x (p99), while reducing the fraction of requests exceeding 2s from 13.8% to 0.007%. Compared to LRU, AMV-L trades a small regression in median/p95 latency (+26% / +3%) for improved extreme-tail behavior (-15% p99; -98% >2s) and lower token overhead (approximately 6% fewer tokens/request), while matching retrieval quality (value means within approximately 0-2%). The gains arise primarily from bounding retrieval-set size and vector-search work, not from shortening prompts. Our results show that predictable performance for long-running LLM agents requires explicit control of memory working-set size and value-driven lifecycle management, rather than retention time alone.


[36] 2603.04696

When Denoising Becomes Unsigning: Theoretical and Empirical Analysis of Watermark Fragility Under Diffusion-Based Image Editing

Robust invisible watermarking systems aim to embed imperceptible payloads that remain decodable after common post-processing such as JPEG compression, cropping, and additive noise. In parallel, diffusion-based image editing has rapidly matured into a default transformation layer for modern content pipelines, enabling instruction-based editing, object insertion and composition, and interactive geometric manipulation. This paper studies a subtle but increasingly consequential interaction between these trends: diffusion-based editing procedures may unintentionally compromise, and in extreme cases practically bypass, robust watermarking mechanisms that were explicitly engineered to survive conventional distortions. We develop a unified view of diffusion editors that (i) inject substantial Gaussian noise in a latent space and (ii) project back to the natural image manifold via learned denoising dynamics. Under this view, watermark payloads behave as low-energy, high-frequency signals that are systematically attenuated by the forward diffusion step and then treated as nuisance variation by the reverse generative process. We formalize this degradation using information-theoretic tools, proving that for broad classes of pixel-level watermark encoders/decoders the mutual information between the watermark payload and the edited output decays toward zero as the editing strength increases, yielding decoding error close to random guessing. We complement the theory with a realistic hypothetical experimental protocol and tables spanning representative watermarking methods and representative diffusion editors. Finally, we discuss ethical implications, responsible disclosure norms, and concrete design guidelines for watermarking schemes that remain meaningful in the era of generative transformations.


[37] 2603.04734

Multistage Stochastic Programming for Rare Event Risk Mitigation in Power Systems Management

High intermittent renewable penetration in the energy mix presents challenges in robustness for the management of power systems' operation. If a tail realization of the distribution of weather yields a prolonged period of time during which solar irradiation and wind speed are insufficient for satisfying energy demand, then it becomes critical to ramp up the generation of conventional power plants with adequate foresight. This event trigger is costly, and inaccurate forecasting can either be wasteful or yield catastrophic undersupply. This encourages particular attention to accurate modeling of the noise and the resulting dynamics within the aforementioned scenario. In this work we present a method for rare event-aware control of power systems using multi-stage scenario-based optimization. A Fleming-Viot particle approach is used to bias the scenario generation towards rare realizations of very low wind power, in order to obtain a cost-effective control of conventional power plants that is robust under prolonged renewable energy shortfalls.


[38] 2603.04761

Adaptive Policy Switching of Two-Wheeled Differential Robots for Traversing over Diverse Terrains

Exploring lunar lava tubes requires robots to traverse without human intervention. Because pre-trained policies cannot fully cover all possible terrain conditions, our goal is to enable adaptive policy switching, where the robot selects an appropriate terrain-specialized model based on its current terrain features. This study investigates whether terrain types can be estimated effectively using posture-related observations collected during navigation. We fine-tuned a pre-trained policy using Proximal Policy Optimization (PPO), and then collected the robot's 3D orientation data as it moved across flat and rough terrain in a simulated lava-tube environment. Our analysis revealed that the standard deviation of the robot's pitch data shows a clear difference between these two terrain types. Using Gaussian mixture models (GMM), we evaluated terrain classification across various window sizes. An accuracy of more than 98% was achieved when using a 70-step window. The result suggests that short-term orientation data are sufficient for reliable terrain estimation, providing a foundation for adaptive policy switching.


[39] 2603.04787

Data-Driven Control of a Magnetically Actuated Fish-Like Robot

Magnetically actuated fish-like robots offer promising solutions for underwater exploration due to their miniaturization and agility; however, precise control remains a significant challenge because of nonlinear fluid dynamics, flexible fin hysteresis, and the variable-duration control steps inherent to the actuation mechanism. This paper proposes a comprehensive data-driven control framework to address these complexities without relying on analytical modeling. Our methodology comprises three core components: 1) developing a forward dynamics model (FDM) using a neural network trained on real-world experimental data to capture state transitions under varying time steps; 2) integrating this FDM into a gradient-based model predictive control (G-MPC) architecture to optimize control inputs for path following; and 3) applying imitation learning to approximate the G-MPC policy, thereby reducing the computational cost for real-time implementation. We validate the approach through simulations utilizing the identified dynamics model. The results demonstrate that the G-MPC framework achieves accurate path convergence with minimal root mean square error (RMSE), and the imitation learning controller (ILC) effectively replicates this performance. This study highlights the potential of data-driven control strategies for the precise navigation of miniature, fish-like soft robots.


[40] 2603.04843

Policy Optimization of Mixed H2/H-infinity Control: Benign Nonconvexity and Global Optimality

Mixed H2/H-infinity control balances performance and robustness by minimizing an H2 cost bound subject to an H-infinity constraint. However, classical Riccati/LMI solutions offer limited insight into the nonconvex optimization landscape and do not readily scale to large-scale or data-driven settings. In this paper, we revisit mixed H2/H-infinity control from a modern policy optimization viewpoint, including the general two-channel and single-channel cases. One central result is that both cases enjoy a benign nonconvex structure: every stationary point is globally optimal. We characterize the H-infinity-constrained feasible set, which is open, path-connected, with boundary given exactly by policies saturating the H-infinity constraint. We also show that the mixed objective is real analytic in the interior with explicit gradient formulas. Our key analysis builds on an Extended Convex Lifting (ECL) framework that bridges nonconvex policy optimization and convex reformulations. The ECL constructions rely on non-strict Riccati inequalities that allow us to characterize global optimality. These insights reveal hidden convexity in mixed H2/H-infinity control and facilitate the design of scalable policy iteration methods in large-scale settings.


[41] 2603.04914

U-OBCA: Uncertainty-Aware Optimization-Based Collision Avoidance via Wasserstein Distributionally Robust Chance Constraints

Uncertainties arising from localization error, trajectory prediction errors of the moving obstacles and environmental disturbances pose significant challenges to robot's safe navigation. Existing uncertainty-aware planners often approximate polygon-shaped robots and obstacles using simple geometric primitives such as circles or ellipses. Though computationally convenient, these approximations substantially shrink the feasible space, leading to overly conservative trajectories and even planning failure in narrow environments. In addition, many such methods rely on specific assumptions about noise distributions, which may not hold in practice and thus limit their performance guarantees. To address these limitations, we extend the Optimization-Based Collision Avoidance (OBCA) framework to an uncertainty-aware formulation, termed \emph{U-OBCA}. The proposed method explicitly accounts for the collision risk between polygon-shaped robots and obstacles by formulating OBCA-based chance constraints, and hence avoiding geometric simplifications and reducing unnecessary conservatism. These probabilistic constraints are further tightened into deterministic nonlinear constraints under mild distributional assumptions, which can be solved efficiently by standard numerical optimization solvers. The proposed approach is validated through theoretical analysis, numerical simulations and real-world experiments. The results demonstrate that U-OBCA significantly mitigates the conservatism in trajectory planning and achieves higher navigation efficiency compared to existing baseline methods, particularly in narrow and cluttered environments.


[42] 2603.05058

A 360-degree Multi-camera System for Blue Emergency Light Detection Using Color Attention RT-DETR and the ABLDataset

This study presents an advanced system for detecting blue lights on emergency vehicles, developed using ABLDataset, a curated dataset that includes images of European emergency vehicles under various climatic and geographic conditions. The system employs a configuration of four fisheye cameras, each with a 180-degree horizontal field of view, mounted on the sides of the vehicle. A calibration process enables the azimuthal localization of the detections. Additionally, a comparative analysis of major deep neural network algorithms was conducted, including YOLO (v5, v8, and v10), RetinaNet, Faster R-CNN, and RT-DETR. RT-DETR was selected as the base model and enhanced through the incorporation of a color attention block, achieving an accuracy of 94.7 percent and a recall of 94.1 percent on the test set, with field test detections reaching up to 70 meters. Furthermore, the system estimates the approach angle of the emergency vehicle relative to the center of the car using geometric transformations. Designed for integration into a multimodal system that combines visual and acoustic data, this system has demonstrated high efficiency, offering a promising approach to enhancing Advanced Driver Assistance Systems (ADAS) and road safety.


[43] 2603.05157

The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

Deep learning models can identify racial identity with high accuracy from chest X-ray (CXR) recordings. Thus, there is widespread concern about the potential for racial shortcut learning, where a model inadvertently learns to systematically bias its diagnostic predictions as a function of racial identity. Such racial biases threaten healthcare equity and model reliability, as models may systematically misdiagnose certain demographic groups. Since racial shortcuts are diffuse - non-localized and distributed throughout the whole CXR recording - image preprocessing methods may influence racial shortcut learning, yet the potential of such methods for reducing biases remains underexplored. Here, we investigate the effects of image preprocessing methods including lung masking, lung cropping, and Contrast Limited Adaptive Histogram Equalization (CLAHE). These approaches aim to suppress spurious cues encoding racial information while preserving diagnostic accuracy. Our experiments reveal that simple bounding box-based lung cropping can be an effective strategy for reducing racial shortcut learning while maintaining diagnostic model performance, bypassing frequently postulated fairness-accuracy trade-offs.


[44] 2603.05268

Curve-Induced Dynamical Systems on Riemannian Manifolds and Lie Groups

Deploying robots in household environments requires safe, adaptable, and interpretable behaviors that respect the geometric structure of tasks. Often represented on Lie groups and Riemannian manifolds, this includes poses on SE(3) or symmetric positive definite matrices encoding stiffness or damping matrices. In this context, dynamical system-based approaches offer a natural framework for generating such behavior, providing stability and convergence while remaining responsive to changes in the environment. We introduce Curve-induced Dynamical systems on Smooth Manifolds (CDSM), a real-time framework for constructing dynamical systems directly on Riemannian manifolds and Lie groups. The proposed approach constructs a nominal curve on the manifold, and generates a dynamical system which combines a tangential component that drives motion along the curve and a normal component that attracts the state toward the curve. We provide a stability analysis of the resulting dynamical system and validate the method quantitatively. On an S2 benchmark, CDSM demonstrates improved trajectory accuracy, reduced path deviation, and faster generation and query times compared to state-of-the-art methods. Finally, we demonstrate the practical applicability of the framework on both a robotic manipulator, where poses on SE(3) and damping matrices on SPD(n) are adapted online, and a mobile manipulator.


[45] 2603.05279

From Code to Road: A Vehicle-in-the-Loop and Digital Twin-Based Framework for Central Car Server Testing in Autonomous Driving

Simulation is one of the most essential parts in the development stage of automotive software. However, purely virtual simulations often struggle to accurately capture all real-world factors due to limitations in modeling. To address this challenge, this work presents a test framework for automotive software on the centralized E/E architecture, which is a central car server in our case, based on Vehicle-in-the-Loop (ViL) and digital twin technology. The framework couples a physical test vehicle on a dynamometer test bench with its synchronized virtual counterpart in a simulation environment. Our approach provides a safe, reproducible, realistic, and cost-effective platform for validating autonomous driving algorithms with a centralized architecture. This test method eliminates the need to test individual physical ECUs and their communication protocols separately. In contrast to traditional ViL methods, the proposed framework runs the full autonomous driving software directly on the vehicle hardware after the simulation process, eliminating flashing and intermediate layers while enabling seamless virtual-physical integration and accurately reflecting centralized E/E behavior. In addition, incorporating mixed testing in both simulated and physical environments reduces the need for full hardware integration during the early stages of automotive development. Experimental case studies demonstrate the effectiveness of the framework in different test scenarios. These findings highlight the potential to reduce development and integration efforts for testing autonomous driving pipelines in the future.


[46] 2603.05354

Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

Model merging is a scalable alternative to multi-task training that combines the capabilities of multiple specialised models into a single model. This is particularly attractive for large speech foundation models, which are typically adapted through domain-specific fine-tuning, resulting in multiple customised checkpoints, for which repeating full fine-tuning when new data becomes available is computationally prohibitive. In this work, we study model merging for multi-domain ASR and benchmark 11 merging algorithms for 10 European Portuguese domains, evaluating in-domain accuracy, robustness under distribution shift, as well as English and multilingual performance. We further propose BoostedTSV-M, a new merging algorithm based on TSV-M that mitigates rank collapse via singular-value boosting and improves numerical stability. Overall, our approach outperforms full fine-tuning on European Portuguese while preserving out-of-distribution generalisation in a single model.


[47] 2603.05373

Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection

Neural codec language models enable high-quality discrete speech synthesis, yet their inference remains vulnerable to token-level artifacts and distributional drift that degrade perceptual realism. Rather than relying on preference optimization or retraining, we propose MSpoof-TTS, a training-free inference framework that improves zero-shot synthesis through multi-resolution spoof guidance. We introduce a Multi-Resolution Token-based Spoof Detection framework that evaluates codec sequences at different temporal granularities to detect locally inconsistent or unnatural patterns. We then integrate the spoof detectors into a hierarchical decoding strategy, progressively pruning low-quality candidates and re-ranking hypotheses. This discriminator-guided generation enhances robustness without modifying model parameters. Experiments validate the effectiveness of our framework for robust and high-quality codec-based speech generation.


[48] 2603.05385

Accelerating Sampling-Based Control via Learned Linear Koopman Dynamics

This paper presents an efficient model predictive path integral (MPPI) control framework for systems with complex nonlinear dynamics. To improve the computational efficiency of classic MPPI while preserving control performance, we replace the nonlinear dynamics used for trajectory propagation with a learned linear deep Koopman operator (DKO) model, enabling faster rollout and more efficient trajectory sampling. The DKO dynamics are learned directly from interaction data, eliminating the need for analytical system models. The resulting controller, termed MPPI-DK, is evaluated in simulation on pendulum balancing and surface vehicle navigation tasks, and validated on hardware through reference-tracking experiments on a quadruped robot. Experimental results demonstrate that MPPI-DK achieves control performance close to MPPI with true dynamics while substantially reducing computational cost, enabling efficient real-time control on robotic platforms.


[49] 2603.05489

NL2GDS: LLM-aided interface for Open Source Chip Design

The growing complexity of hardware design and the widening gap between high-level specifications and register-transfer level (RTL) implementation hinder rapid prototyping and system design. We introduce NL2GDS (Natural Language to Layout), a novel framework that leverages large language models (LLMs) to translate natural language hardware descriptions into synthesizable RTL and complete GDSII layouts via the open-source OpenLane ASIC flow. NL2GDS employs a modular pipeline that captures informal design intent, generates HDL using multiple LLM engines and verifies them, and orchestrates automated synthesis and layout. Evaluations on ISCAS'85 and ISCAS'89 benchmark designs demonstrate up to 36% area reduction, 35% delay reduction, and 70% power savings compared to baseline designs, highlighting its potential to democratize ASIC design and accelerate hardware innovation.


[50] 2405.18995

Best Ergodic Averages via Optimal Graph Filters in Reversible Markov Chains

In this paper, we address the problem of finding the best ergodic or Birkhoff averages in the mean ergodic theorem to ensure rapid convergence to a desired value, using graph filters. Our approach begins by representing a function on the state space as a graph signal, where the (directed) graph is formed by the transition probabilities of a reversible Markov chain. We introduce a concept of graph variation, enabling the definition of the graph Fourier transform for graph signals on this directed graph. Viewing the iteration in the mean ergodic theorem as a graph filter, we recognize its non-optimality and propose three optimization problems aimed at determining optimal graph filters. These optimization problems yield the Bernstein, Chebyshev, and Legendre filters. Numerical testing reveals that while the Bernstein filter performs slightly better than the traditional ergodic average, the Chebyshev and Legendre filters significantly outperform the ergodic average, demonstrating rapid convergence to the desired value.


[51] 2409.09769

Risk-Aware Autonomous Driving with Linear Temporal Logic Specifications

Human drivers naturally balance the risks of different concerns while driving, including traffic rule violations, minor accidents, and fatalities. However, achieving the same behavior in autonomous driving systems remains an open problem. This paper extends a risk metric that has been verified in human-like driving studies to encompass more complex driving scenarios specified by linear temporal logic (LTL) that go beyond just collision risks. This extension incorporates the timing and severity of events into LTL specifications, thereby reflecting a human-like risk awareness. Without sacrificing expressivity for traffic rules, we adopt LTL specifications composed of safety and co-safety formulas, allowing the control synthesis problem to be reformulated as a reachability problem. By leveraging occupation measures, we further formulate a linear programming (LP) problem for this LTL-based risk metric. Consequently, the synthesized policy balances different types of driving risks, including both collision risks and traffic rule violations. The effectiveness of the proposed approach is validated by three typical traffic scenarios in Carla simulator.


[52] 2501.05310

A Large-Scale Probing Analysis of Speaker-Specific Attributes in Self-Supervised Speech Representations

Enhancing explainability in speech self-supervised learning (SSL) is important for developing reliable SSL-based speech processing systems. This study probes how speech SSL models encode speaker-specific information via a large-scale probing analysis of 11 models, decomposing identity into acoustic, prosodic, and paralinguistic attributes. The results confirm a general hierarchy wherein initial layers encode fundamental acoustics and middle layers synthesise abstract traits. Crucially, the consensus that final layers purely abstract linguistic content is challenged. It is discovered that larger models unexpectedly recover speaker identity in their deep layers. Furthermore, the intermediate representations of speech SSL models are found to capture dynamic prosody better than specialised speaker embeddings. These insights decode the complex internal mechanics of SSL models, providing guidelines for selecting interpretable and task-optimal representations.


[53] 2502.14401

MedFuncta: A Unified Framework for Learning Efficient Medical Neural Fields

Research in medical imaging primarily focuses on discrete data representations that poorly scale with grid resolution and fail to capture the often continuous nature of the underlying signal. Neural Fields (NFs) offer a powerful alternative by modeling data as continuous functions. While single-instance NFs have successfully been applied in medical contexts, extending them to large-scale medical datasets remains an open challenge. We therefore introduce MedFuncta, a unified framework for large-scale NF training on diverse medical signals. Building on Functa, our approach encodes data into a unified representation, namely a 1D latent vector, that modulates a shared, meta-learned NF, enabling generalization across a dataset. We revisit common design choices, introducing a non-constant frequency parameter $\omega$ in widely used SIREN activations, and establish a connection between this $\omega$-schedule and layer-wise learning rates, relating our findings to recent work in theoretical learning dynamics. We additionally introduce a scalable meta-learning strategy for shared network learning that employs sparse supervision during training, thereby reducing memory consumption and computational overhead while maintaining competitive performance. Finally, we evaluate MedFuncta across a diverse range of medical datasets and show how to solve relevant downstream tasks on our neural data representation. To promote further research in this direction, we release our code, model weights and the first large-scale dataset - MedNF - containing > 500 k latent vectors for multi-instance medical NFs.


[54] 2507.00831

Adiabatic Capacitive Neuron: An Energy-Efficient Functional Unit for Artificial Neural Networks

This paper introduces a new, highly energy-efficient, Adiabatic Capacitive Neuron (ACN) hardware implementation of an Artificial Neuron (AN) with improved functionality, accuracy, robustness and scalability over previous work. The paper describes the implementation of a \mbox{12-bit} single neuron, with positive and negative weight support, in an $\mathbf{0.18\mu m}$ CMOS technology. The paper also presents a new Threshold Logic (TL) design for a binary AN activation function that generates a low symmetrical offset across three process corners and five temperatures between $-55^o$C and $125^o$C. Post-layout simulations demonstrate a maximum rising and falling offset voltage of 9$mV$ compared to conventional TL, which has rising and falling offset voltages of 27$mV$ and 5$mV$ respectively, across temperature and process. Moreover, the proposed TL design shows a decrease in average energy of 1.5$\%$ at the SS corner and 2.3$\%$ at FF corner compared to the conventional TL design. The total synapse energy saving for the proposed ACN was above 90$\%$ (over 12x improvement) when compared to a non-adiabatic CMOS Capacitive Neuron (CCN) benchmark for a frequency ranging from 500$kHz$ to 100$MHz$. A 1000-sample Monte Carlo simulation including process variation and mismatch confirms the worst-case energy savings of $\>$90$\%$ compared to CCN in the synapse energy profile. Finally, the impact of supply voltage scaling shows consistent energy savings of above 90$\%$ (except all zero inputs) without loss of functionality.


[55] 2507.09995

Graph-Based Multi-Modal Light-weight Network for Adaptive Brain Tumor Segmentation

Multi-modal brain tumor segmentation remains challenging for practical deployment due to the high computational costs of mainstream models. In this work, we propose GMLN-BTS, a Graph-based Multi-modal interaction Lightweight Network for brain tumor segmentation. Our architecture achieves high-precision, resource-efficient segmentation through three key components. First, a Modality-Aware Adaptive Encoder (M2AE) facilitates efficient multi-scale semantic extraction. Second, a Graph-based Multi-Modal Collaborative Interaction Module (G2MCIM) leverages graph structures to model complementary cross-modal relationships. Finally, a Voxel Refinement UpSampling Module (VRUM) integrates linear interpolation with multi-scale transposed convolutions to suppress artifacts and preserve boundary details. Experimental results on BraTS 2017, 2019, and 2021 benchmarks demonstrate that GMLN-BTS achieves state-of-the-art performance among lightweight models. With only 4.58M parameters, our method reduces parameter count by 98% compared to mainstream 3D Transformers while significantly outperforming existing compact approaches.


[56] 2508.12689

Multi-Domain Supervised Contrastive Learning for UAV Radio-Frequency Open-Set Recognition

5G-Advanced (5G-A) has enabled the vibrant development of low altitude integrated sensing and communication (LA-ISAC) networks. As a core component of these networks, unmanned aerial vehicles (UAVs) have witnessed rapid growth in recent years. However, due to the lag in traditional industry regulatory norms, unauthorized flight incidents occur frequently, posing a severe security threat to LA-ISAC networks. To surveil the non-cooperative UAVs, in this paper, we propose a multi-domain supervised contrastive learning (MD-SupContrast) framework for UAV radio frequency (RF) open-set recognition. Specifically, first, the texture features and the time-frequency position features from the ResNet and the TransformerEncoder are fused, and then the supervised contrastive learning is applied to optimize the feature representation of the closed-set samples. Next, to surveil the invasive UAVs that appear in real life, we propose an improved generative OpenMax (IG-OpenMax) algorithm and construct an open-set recognition model, namely Open-RFNet. According to the unknown samples, we first freeze the feature extraction layers and then only retrain the classification layer, which achieves excellent recognition performance both in closed-set and open-set recognitions. We analyze the computational complexity of the proposed model. Experiments are conducted with a large-scale UAV open dataset. The results show that the proposed Open-RFNet outperforms the existing benchmark methods in terms of recognition accuracy between the known and the unknown UAVs, as it achieves 95.12% in closed-set and 96.08% in open-set under 25 UAV types, respectively.


[57] 2509.15001

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

Child-centered daylong recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic differences. We introduce BabyHuBERT, a self-supervised speech model trained on 13,000 hours of multilingual child-centered recordings spanning 40+ languages. Evaluated on voice type classification -- distinguishing target children from female adults, male adults, and other children, a key preprocessing step for analyzing naturalistic language experiences -- BabyHuBERT-VTC achieves F1-scores from 52.1% to 74.4% across six corpora, consistently outperforming W2V2-LL4300 (English daylongs) and HuBERT (clean adult speech). Notable gains include 13.2 and 15.9 absolute F1 points over HuBERT on Vanuatu and Solomon Islands, demonstrating effectiveness on underrepresented languages. We share code and model to support researchers working with child-centered recordings across diverse linguistic contexts.


[58] 2512.11556

ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals

Millimeter-wave (mmWave) radar provides robust sensing under adverse conditions and can penetrate thin materials for non-visual perception in industrial and robotic settings. Recent work with MIMO mmWave radar has demonstrated its ability to penetrate cardboard packaging for occluded object classification. However, existing models leave room for extensions and improvements across different sensing frequencies. Building on recent work with MIMO radar for occluded object classification, we propose ACCOR, an attention-enhanced complex-valued contrastive learning approach for radar, enabling robust occluded object classification. ACCOR processes complex-valued IQ radar signals via a complex-valued CNN backbone, a multi-head attention layer and a hybrid loss. The hybrid loss combines a weighted cross-entropy term with a supervised contrastive term. We extend an existing 64 GHz dataset with a new 67 GHz subset and evaluate performance across both bands. ACCOR achieves 96.60 % accuracy at 64 GHz and 93.59 % at 67 GHz on 10 objects, surpassing prior radar-specific and adapted image models. Results demonstrate the benefits of integrating complex-valued deep learning, attention, and contrastive learning for mmWave radar-based occluded object classification.


[59] 2601.04478

Prediction of Cellular Malignancy Using Electrical Impedance Signatures and Supervised Machine Learning

Bioelectrical properties of cells such as relative permittivity, conductivity, and characteristic time constants vary significantly between healthy and malignant cells across different frequencies. These distinctions provide a promising foundation for diagnostic and classification applications. This study systematically reviewed 33 scholarly articles to compile datasets of quantitative bioelectric parameters and evaluated their utility in predictive modeling. Three supervised machine learning algorithms- Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) were implemented and tuned using key hyperparameters to assess classification performance. Model effectiveness was evaluated using accuracy and F1 score as performance metrics. Results demonstrate that Random Forest achieved the highest predictive accuracy of ~ 90% when configured with a maximum depth of 4 and 100 estimators. These findings highlight the potential of integrating bioelectrical property analysis with machine learning for improved diagnostic decision-making. Similarly, for KNN and SVM, the F1 score peaked at approximately 78% and 76.5%, respectively. Future work will explore incorporating additional discriminative features, leveraging stimulated datasets, and optimizing hyperparameter through advanced search strategies. Ultimately, hardware prototype with embedded micro-electrodes and real-time control systems could pave the path for practical diagnostic tools capable of in-situ cell classification.


[60] 2602.13308

Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

Medical image analysis requires substantial labeled data for model training, yet expert annotation is expensive and time-consuming. Active learning (AL) addresses this challenge by strategically selecting the most informative samples for the annotation purpose, but traditional methods solely rely on predictive uncertainty while ignoring whether models learn from clinically meaningful features a critical requirement for clinical deployment. We propose an explainability-guided active learning framework that integrates spatial attention alignment into a sample acquisition process. Our approach advocates for a dual-criterion selection strategy combining: (i) classification uncertainty to identify informative examples, and (ii) attention misalignment with radiologist-defined regions-of-interest (ROIs) to target samples where the model focuses on incorrect features. By measuring misalignment between Grad-CAM attention maps and expert annotations using Dice similarity, our acquisition function judiciously identifies samples that enhance both predictive performance and spatial interpretability. We evaluate the framework using three expert-annotated medical imaging datasets, namely, BraTS (MRI brain tumors), VinDr-CXR (chest X-rays), and SIIM-COVID-19 (chest X-rays). Using only 570 strategically selected samples, our explainability-guided approach consistently outperforms random sampling across all the datasets, achieving 77.22% accuracy on BraTS, 52.37% on VinDr-CXR, and 52.66% on SIIM-COVID. Grad-CAM visualizations confirm that the models trained by our dual-criterion selection focus on diagnostically relevant regions, demonstrating that incorporating explanation guidance into sample acquisition yields superior data efficiency while maintaining clinical interpretability.


[61] 2602.20909

Continuous-Time Analysis of AFDM: Pulse-Shaping, Fundamental Bounds and Impact of Hardware Impairments

Affine frequency division multiplexing (AFDM) has recently emerged as a resilient waveform candidate for high-mobility next-generation wireless systems. However, current literature mostly focuses on discrete time (DT) models, often overlooking effects and hardware non-idealities of actual continuous time (CT) signal generation. In this paper, we bridge this gap by developing a CT-analytical framework based on the affine Fourier series (AFS) representation, which allows us to demonstrate that strictly bandlimited pulses and subcarrier suppression strategies are essential to maintain the multicarrier structure of the transmitted signal. In addition, we derive the analytical power spectral density of AFDM and compare its spectral characteristics in comparison with those of other multicarrier schemes, taking into account the impact of realistic truncated pulse-shaping. Furthermore, we analyze the sensitivity of the CT model to phase noise, carrier frequency offset, and sampling jitter, providing a theoretical analysis of communication performance. Finally, we derive closed-form Cramér-Rao bounds for channel parameter estimation, showing that the chirped modulation peculiar of AFDM increases estimation variance but enables the resolution of Doppler ambiguities. Our findings lay the necessary theoretical and practical foundations for the implementation of AFDM in realistic wireless transceivers.


[62] 2602.21525

Optimal Real-Time Fusion of Time-Series Data Under Rényi Differential Privacy

In this paper, we investigate the optimal real-time fusion of data collected by multiple sensors. In our set-up, the sensor measurements are considered to be private and are jointly correlated with an underlying process. A fusion center combines the private sensor measurements and releases its output to an honest-but-curious party, which is responsible for estimating the state of the underlying process based on the fusion center's output. The privacy leakage incurred by the fusion policy is quantified using Rényi differential privacy. We formulate the privacy-aware fusion design as a constrained finite-horizon optimization problem, in which the fusion policy and the state estimation are jointly optimized to minimize the state estimation error subject to a total privacy budget constraint. We derive the constrained optimality conditions for the proposed optimization problem and use them to characterize the structural properties of the optimal fusion policy. Unlike classical differential privacy mechanisms, the optimal fusion policy is shown to adaptively allocates the privacy budget and regulates the adversary's belief in a closed-loop manner. To reduce the computational burden of solving the resulting constrained optimality equations, we parameterize the fusion policy using a structured Gaussian distribution and show that the parameterized fusion policy satisfies the privacy constraint. We further develop a numerical algorithm to jointly optimize the fusion policy and state estimator. Finally, we demonstrate the effectiveness of the proposed fusion framework through a traffic density estimation case study.


[63] 2603.01270

VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling

Speech processing systems face a fundamental challenge: the human voice changes with age, yet few datasets support rigorous longitudinal evaluation. We introduce VoxKnesset, an open-access dataset of ~2,300 hours of Hebrew parliamentary speech spanning 2009-2025, comprising 393 speakers with recording spans of up to 15 years. Each segment includes aligned transcripts and verified demographic metadata from official parliamentary records. We benchmark modern speech embeddings (WavLM-Large, ECAPA-TDNN, Wav2Vec2-XLSR-1B) on age prediction and speaker verification under longitudinal conditions. Speaker verification EER rises from 2.15\% to 4.58\% over 15 years for the strongest model, and cross-sectionally trained age regressors fail to capture within-speaker aging, while longitudinally trained models recover a meaningful temporal signal. We publicly release the dataset and pipeline to support aging-robust speech systems and Hebrew speech processing.


[64] 2603.01972

A System-of-Systems Convergence Paradigm for Societal Challenges of the Anthropocene

Modern societal challenges, such as climate change, urbanization, and water resource management, demand integrated, multi-discipline, multi-problem approaches to frame and address their complexity. Unfortunately, current methodologies often operate within disciplinary silos, leading to fragmented insights and missed opportunities for convergence. A critical barrier to cross-disciplinary integration lies in the disparate ontologies that shape how different fields conceptualize and communicate knowledge. To address these limitations, this paper proposes a system-of-systems (SoS) convergence paradigm grounded in a meta-cognition map, a framework that integrates five complementary domains: real-world observations, systems thinking, visual modeling, mathematics, and computing. The paradigm is based on the Systems Modeling Language (SysML), offering a standardized, domain-neutral approach for representing and analyzing complex systems. The proposed methodology is demonstrated through a case study of the Chesapeake Bay Watershed, a socio-environmental system requiring coordination across land use, hydrology, economic and policy domains. By modeling this system with SysML, the study illustrates practical strategies for navigating interdisciplinary challenges and highlights the potential of agile SoS modeling to support large-scale, multi-dimensional decision-making.


[65] 2603.02813

Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge

The DIarization and Speech Processing for LAnguage understanding in Conversational Environments - Medical (DISPLACE-M) challenge introduces a conversational AI benchmark for understanding goal-oriented, real-world medical dialogues. The challenge addresses multi-speaker interactions between frontline health workers and care seekers, characterized by spontaneous, noisy and overlapping speech. As part of the challenge, medical conversational dataset comprising 40 hours of development and 15 hours of blind evaluation recordings was released. We provided baseline systems across 4 tasks - speaker diarization, automatic speech recognition, topic identification and dialogue summarization - to enable consistent benchmarking. System performance is evaluated using diarization error rate (DER), time-constrained minimum-permutation word error rate (tcpWER) and ROUGE-L. This paper describes the Phase-I evaluation - data, tasks and baseline systems - along with the summary of the evaluation results.


[66] 2603.03471

The PARLO Dementia Corpus: A German Multi-Center Resource for Alzheimer's Disease

Early and accessible detection of Alzheimer's disease (AD) remains a major challenge, as current diagnostic methods often rely on costly and invasive biomarkers. Speech and language analysis has emerged as a promising non-invasive and scalable approach to detecting cognitive impairment, but research in this area is hindered by the lack of publicly available datasets, especially for languages other than English. This paper introduces the PARLO Dementia Corpus (PDC), a new multi-center, clinically validated German resource for AD collected across nine academic memory clinics in Germany. The dataset comprises speech recordings from individuals with AD-related mild cognitive impairment and mild to moderate dementia, as well as cognitively healthy controls. Speech was elicited using a standardized test battery of eight neuropsychological tasks, including confrontation naming, verbal fluency, word repetition, picture description, story reading, and recall tasks. In addition to audio recordings, the dataset includes manually verified transcriptions and detailed demographic, clinical, and biomarker metadata. Baseline experiments on ASR benchmarking, automated test evaluation, and LLM-based classification illustrate the feasibility of automatic, speech-based cognitive assessment and highlight the diagnostic value of recall-driven speech production. The PDC thus establishes the first publicly available German benchmark for multi-modal and cross-lingual research on neurodegenerative diseases.


[67] 2403.03455

Robust Control Lyapunov-Value Functions for Nonlinear Disturbed Systems

Control Lyapunov Functions (CLFs) have been extensively used in the control community. A well-known drawback is the absence of a systematic way to construct CLFs for general nonlinear systems, and the problem can become more complex with input or state constraints. Our preliminary work on constructing Control Lyapunov Value Functions (CLVFs) using Hamilton-Jacobi (HJ) reachability analysis provides a method for finding a non-smooth CLF. In this paper, we extend our work on CLVFs to systems with bounded disturbance and define the Robust CLVF (R-CLVF). The R-CLVF naturally inherits all properties of the CLVF; i.e., it first identifies the "smallest robust control invariant set (SRCIS)" and stabilizes the system to it with a user-specified exponential rate. The region from which the exponential rate can be met is called the "region of exponential stabilizability (ROES)." We provide clearer definitions of the SRCIS and more rigorous proofs of several important theorems. Since the computation of the R-CLVF suffers from the "curse of dimensionality," we also provide two techniques (warmstart and system decomposition) that solve it, along with necessary proofs. Three numerical examples are provided, validating our definition of SRCIS, illustrating the trade-off between a faster decay rate and a smaller ROES, and demonstrating the efficiency of computation using warmstart and decomposition.


[68] 2404.03740

Randomized Greedy Methods for Weak Submodular Sensor Selection with Robustness Considerations

We study a pair of budget- and performance-constrained weak-submodular maximization problems. For computational efficiency, we explore the use of stochastic greedy algorithms which limit the search space via random sampling instead of the standard greedy procedure which explores the entire feasible search space. We propose a pair of stochastic greedy algorithms, namely, Modified Randomized Greedy (MRG) and Dual Randomized Greedy (DRG) to approximately solve the budget- and performance-constrained problems, respectively. For both algorithms, we derive approximation guarantees that hold with high probability. We then examine the use of DRG in robust optimization problems wherein the objective is to maximize the worst-case of a number of weak submodular objectives and propose the Randomized Weak Submodular Saturation Algorithm (Random-WSSA). We further derive a high-probability guarantee for when Random-WSSA successfully constructs a robust solution. Finally, we showcase the effectiveness of these algorithms in a variety of relevant uses within the context of Earth-observing low Earth orbit satellite constellations which estimate atmospheric weather conditions and provide Earth coverage.


[69] 2404.03759

Localized Distributional Robustness in Submodular Multi-Task Subset Selection

In this work, we treat the problem of multi-task submodular optimization from the perspective of local distributional robustness within the neighborhood of a reference distribution which assigns an importance score to each task. We initially propose to introduce a relative-entropy regularization term to the standard multi-task objective. We then demonstrate through duality that this novel formulation itself is equivalent to the maximization of a monotone increasing function composed with a submodular function, which may be efficiently carried out through standard greedy selection methods. This approach bridges the existing gap in the optimization of performance-robustness trade-offs in multi-task subset selection. To numerically validate our theoretical results, we test the proposed method in two different settings, one on the selection of satellites in low Earth orbit constellations in the context of a sensor selection problem involving weak-submodular functions, and the other on an image summarization task using neural networks involving submodular functions. Our method is compared with two other algorithms focused on optimizing the performance of the worst-case task, and on directly optimizing the performance on the reference distribution itself. We conclude that our novel formulation produces a solution that is locally distributional robust, and computationally inexpensive.


[70] 2405.06754

Wall-Street: An Intelligent Vehicular Surface for Reliable mmWave Handover

mmWave networks promise high bandwidth but face significant challenges in maintaining reliable connections for users moving at high speed. Frequent handovers, complex beam alignment, and signal blockage from car bodies lead to service interruptions and degraded performance. We present Wall-Street, a vehicle-mounted smart surface that enhances mmWave connectivity for in-vehicle users. Wall-Street improves mobility management by (1) steering outdoor mmWave signals into the vehicle for shared coverage and providing a single, collective handover for all users; (2) performing neighbor-cell search without interrupting data transfer, ensuring seamless handovers; and (3) connecting users to a new cell before disconnecting from the old cell for reliable cell transitions. We implemented and integrated Wall-Street into the COSMOS testbed. We collected PHY traces with multiple base station nodes and in-vehicle user nodes with a surface-mounted vehicle, driving on a nearby road. Our trace-driven ns-3 simulation demonstrates a throughput im- provement of up to 78% and a latency reduction of up to 34% over the standard Standalone handover scheme.


[71] 2506.07915

A Signal Contract for Online Language Grounding and Discovery in Decision-Making

Autonomous systems increasingly receive time-sensitive contextual updates from humans through natural language, yet embedding language understanding inside decision-makers couples grounding to learning or planning. This increases redeployment burden when language conventions or domain knowledge change and can hinder diagnosability by confounding grounding errors with control errors. We address online language grounding where messy, evolving verbal reports are converted into control-relevant signals during execution through an interface that localises language updates while keeping downstream decision-makers language-agnostic. We propose LUCIFER (Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement), an inference-only middleware that exposes a Signal Contract. The contract provides four outputs, policy priors, reward potentials, admissible-option constraints, and telemetry-based action prediction for efficient information gathering. We validate LUCIFER in a search-and-rescue (SAR)-inspired testbed using dual-phase, dual-client evaluation: (i) component benchmarks show reasoning-based extraction remains robust on self-correcting reports where pattern-matching baselines degrade, and (ii) system-level ablations with two structurally distinct clients (hierarchical RL and a hybrid A*+heuristics planner) show consistent necessity and synergy. Grounding improves safety, discovery improves information-collection efficiency, and only their combination achieves both.


[72] 2506.23036

Parameter Stress Analysis in Reinforcement Learning: Applying Synaptic Filtering to Policy Networks

This paper explores reinforcement learning (RL) policy robustness by systematically analyzing network parameters under internal and external stresses. \textcolor{black}{We apply synaptic filtering methods using high-pass, low-pass, and pulse-wave filters from} \citep{pravin2024fragility}, as an internal stress by selectively perturbing parameters, while adversarial attacks apply external stress through modified agent observations. This dual approach enables the classification of parameters as \textit{fragile}, \textit{robust}, or \textit{antifragile}, based on their influence on policy performance in clean and adversarial settings. Parameter scores are defined to quantify these characteristics, and the framework is validated on proximal policy optimization (PPO)-trained agents in Mujoco continuous control environments. The results highlight the presence of antifragile parameters that enhance policy performance under stress, demonstrating the potential of targeted filtering techniques to improve RL policy adaptability. These insights provide a foundation for future advancements in the design of robust and antifragile RL systems.


[73] 2507.09264

Overtone: Cyclic Patch Modulation for Clean, Efficient, and Flexible Physics Emulators

Transformer-based PDE surrogates achieve remarkable performance but face two key challenges: fixed patch sizes cause systematic error accumulation at harmonic frequencies, and computational costs remain inflexible regardless of problem complexity or available resources. We introduce Overtone, a unified solution through dynamic patch size control at inference. Overtone's key insight is that cyclically modulating patch sizes during autoregressive rollouts distributes errors across the frequency spectrum, mitigating the systematic harmonic artifact accumulation that plague fixed-patch models. We implement this through two architecture-agnostic modules--CSM (using dynamic stride modulation) and CKM (using dynamic kernel resizing)--that together provide both harmonic mitigation and compute-adaptive deployment. This flexible tokenization lets users trade accuracy for speed dynamically based on computational constraints, and the cyclic rollout strategy yields up to 40% lower long rollout error in variance-normalised RMSE (VRMSE) compared to conventional, static-patch surrogates. Across challenging 2D and 3D PDE benchmarks, one Overtone model matches or exceeds fixed-patch baselines across inference compute budgets, when trained under a fixed total training budget setting.


[74] 2509.05983

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

Code-switching (CS) presents a significant challenge for general Auto-Speech Recognition (ASR) systems. Existing methods often fail to capture the sub tle phonological shifts inherent in CS scenarios. The challenge is particu larly difficult for language pairs like Vietnamese and English, where both distinct phonological features and the ambiguity arising from similar sound recognition are present. In this paper, we propose a novel architecture for Vietnamese-English CS ASR, a Two-Stage Phoneme-Centric model (TSPC). TSPC adopts a phoneme-centric approach based on an extended Vietnamese phoneme set as an intermediate representation for mixed-lingual modeling, while remaining efficient under low computational-resource constraints. Ex perimental results demonstrate that TSPC consistently outperforms exist ing baselines, including PhoWhisper-base, in Vietnamese-English CS ASR, achieving a significantly lower word error rate of 19.06% with reduced train ing resources. Furthermore, the phonetic-based two-stage architecture en ables phoneme adaptation and language conversion to enhance ASR perfor mance in complex CS Vietnamese-English ASR scenarios.


[75] 2509.15680

SAM: A Mamba-2 State-Space Audio-Language Model

We present SAM, a State-space Audio-language Model that integrates an audio encoder with a Mamba-2 backbone. SAM-2.7B achieves 21.1 mAP on AudioSet and 17.6 SPICE on AudioCaps, matching or surpassing larger 7B transformer-based models with fewer parameters. We further provide the first systematic, representation-level analysis of how SSMs interact with audio encoder outputs: (1) joint audio encoder finetuning is essential, supported by accuracy gains and observed adaptation of token representation rank and similarity across different SSM sizes; (2) despite linear scaling, SSMs benefit more from compact, information-rich audio token representations than from excessively long token sequences; and (3) incorporating instruction-following supervision substantially improves reasoning ability, boosting MMAU-Sound accuracy from 22.8 to 56.8. Through comprehensive experiments and analysis, we establish practical design principles for SSMs as strong, scalable backbones for audio-language models.


[76] 2509.20321

Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones

LLMs serve as the backbone in SpeechLLMs, yet their behavior on spontaneous conversational input remains poorly understood. Conversational speech contains pervasive disfluencies -- interjections, edits, and parentheticals -- that are rare in the written corpora used for pre-training. Because gold disfluency removal is a deletion-only task, it serves as a controlled probe to determine whether a model performs faithful structural repair or biased reinterpretation. Using the DRES evaluation framework, we evaluate proprietary and open-source LLMs across architectures and scales. We show that model performance clusters into stable precision-recall regimes reflecting distinct editing policies. Notably, reasoning models systematically over-delete fluent content, revealing a bias toward semantic abstraction over structural fidelity. While fine-tuning achieves SOTA results, it harms generalization. Our findings demonstrate that robustness to speech is shaped by specific training objectives.


[77] 2509.21739

Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription

Automatic drum transcription (ADT) is traditionally formulated as a discriminative task to predict drum events from audio spectrograms. In this work, we redefine ADT as a conditional generative task and introduce Noise-to-Notes (N2N), a framework leveraging diffusion modeling to transform audio-conditioned Gaussian noise into drum events with associated velocities. This generative diffusion approach offers distinct advantages, including a flexible speed-accuracy trade-off and strong inpainting capabilities. However, the generation of binary onset and continuous velocity values presents a challenge for diffusion models, and to overcome this, we introduce an Annealed Pseudo-Huber loss to facilitate effective joint optimization. Finally, to augment low-level spectrogram features, we propose incorporating features extracted from music foundation models (MFMs), which capture high-level semantic information and enhance robustness to out-of-domain drum audio. Experimental results demonstrate that including MFM features significantly improves robustness and N2N establishes a new state-of-the-art performance across multiple ADT benchmarks.


[78] 2510.14959

CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions

Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.


[79] 2510.16834

Schrödinger Bridge Mamba for One-Step Speech Enhancement

We present Schrödinger Bridge Mamba (SBM), a novel model for efficient speech enhancement by integrating the Schrödinger Bridge (SB) training paradigm and the Mamba architecture. Experiments of joint denoising and dereverberation tasks demonstrate SBM outperforms strong generative and discriminative methods on multiple metrics with only one step of inference while achieving a competitive real-time factor for streaming feasibility. Ablation studies reveal that the SB paradigm consistently yields improved performance across diverse architectures over conventional mapping. Furthermore, Mamba exhibits a stronger performance under the SB paradigm compared to Multi-Head Self-Attention (MHSA) and Long Short-Term Memory (LSTM) backbones. These findings highlight the synergy between the Mamba architecture and the SB trajectory-based training, providing a high-quality solution for real-world speech enhancement. Demo page: this https URL


[80] 2510.17276

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Control-flow hijacking attacks manipulate orchestration mechanisms in multi-agent systems into performing unsafe actions that compromise the system and exfiltrate sensitive information. Recently proposed defenses, such as LlamaFirewall, rely on alignment checks of inter-agent communications to ensure that all agent invocations are "related to" and "likely to further" the original objective. We start by demonstrating control-flow hijacking attacks that evade these defenses even if alignment checks are performed by advanced LLMs. We argue that the safety and functionality objectives of multi-agent systems fundamentally conflict with each other. This conflict is exacerbated by the brittle definitions of "alignment" and the checkers' incomplete visibility into the execution context. We then propose, implement, and evaluate ControlValve, a new defense inspired by the principles of control-flow integrity and least privilege. ControlValve (1) generates permitted control-flow graphs for multi-agent systems, and (2) enforces that all executions comply with these graphs, along with contextual rules (generated in a zero-shot manner) for each agent invocation.


[81] 2512.04551

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention

Speech emotion recognition (SER) is an important technology in human-computer interaction. However, achieving high performance is challenging due to emotional complexity and scarce annotated data. To tackle these challenges, we propose a multi-loss learning (MLL) framework integrating an energy-adaptive mixup (EAM) method and a frame-level attention module (FLAM). The EAM method leverages SNR-based augmentation to generate diverse speech samples capturing subtle emotional variations. FLAM enhances frame-level feature extraction for multi-frame emotional cues. Our MLL strategy combines Kullback-Leibler divergence, focal, center, and supervised contrastive loss to optimize learning, address class imbalance, and improve feature separability. We evaluate our method on four widely used SER datasets: IEMOCAP, MSP-IMPROV, RAVDESS, and SAVEE. The results demonstrate our method achieves state-of-the-art performance, suggesting its effectiveness and robustness.


[82] 2512.04772

TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards

In recent years, precision agriculture has been introducing groundbreaking innovations in the field, with a strong focus on automation. However, research studies in robotics and autonomous navigation often rely on controlled simulations or isolated field trials. The absence of a realistic common benchmark represents a significant limitation for the diffusion of robust autonomous systems under real complex agricultural conditions. Vineyards pose significant challenges due to their dynamic nature, and they are increasingly drawing attention from both academic and industrial stakeholders interested in automation. In this context, we introduce the TEMPO-VINE dataset, a large-scale multi-temporal dataset specifically designed for evaluating sensor fusion, simultaneous localization and mapping (SLAM), and place recognition techniques within operational vineyard environments. TEMPO-VINE is the first multi-modal public dataset that brings together data from heterogeneous LiDARs of different price levels, AHRS, RTK-GPS, and cameras in real trellis and pergola vineyards, with multiple rows exceeding 100 m in length. In this work, we address a critical gap in the landscape of agricultural datasets by providing researchers with a comprehensive data collection and ground truth trajectories in different seasons, vegetation growth stages, terrain and weather conditions. The sequence paths with multiple runs and revisits will foster the development of sensor fusion, localization, mapping and place recognition solutions for agricultural fields. The dataset, the processing tools and the benchmarking results are available on the webpage.


[83] 2512.07718

Bimorph Lithium Niobate Piezoelectric Micromachined Ultrasonic Transducers

Piezoelectric micromachined ultrasonic transducers (PMUTs) are widely utilized in applications that demand mechanical resilience, thermal stability, and compact form factors. Recent efforts have sought to demonstrate that single-crystal lithium niobate (LN) is a promising PMUT material platform, offering high electromechanical coupling (k2) and bidirectional performance. In addition, advances in LN film transfer technology have enabled high quality periodically poled piezoelectric films (P3F), facilitating a bimorph piezoelectric stack without intermediate electrodes. In this work, we showcase a bimorph PMUT incorporating a mechanically robust, 20 $\mu$m thick P3F LN active layer. We establish the motivation for LN PMUTs through a material comparison, followed by extensive membrane geometry optimization and subsequent enhancement of the PMUT's k2. We demonstrate a 775 kHz flexural mode device with a quality factor (Q) of 200 and an extracted k2 of 6.4\%, yielding a high transmit efficiency of 65 nm/V with a mechanically robust active layer. We leverage the high performance to demonstrate extreme-temperature resilience, showcasing stable device operation up to 600 $^\circ$C and survival up to 900 $^\circ$C, highlighting LN's potential as a resilient PMUT platform.


[84] 2512.13183

Efficient Path Generation with Curvature Guarantees by Mollification

Path generation, the process of converting high-level mission specifications, such as sequences of waypoints from a path planner, into smooth, executable paths, is a fundamental challenge in mobile robotics. Most path following and trajectory tracking algorithms require the desired path to be defined by at least twice continuously differentiable functions to guarantee key properties such as global convergence, especially for nonholonomic robots like unicycles with speed constraints. Consequently, path generation methods must bridge the gap between convenient but non-differentiable planning outputs, such as piecewise linear segments, and the differentiability requirements imposed by downstream control algorithms. While techniques such as spline interpolation or optimization-based methods are commonly used to smooth non-differentiable paths or create feasible ones from sequences of waypoints, they either produce unnecessarily complex trajectories or are computationally expensive. In this work, we present a method to regularize non-differentiable functions and generate feasible paths through mollification. Specifically, we approximate an arbitrary path with a differentiable function that can converge to it with arbitrary precision. Additionally, we provide a systematic method for bounding the curvature of generated paths, which we demonstrate by applying it to paths resulting from linking a sequence of waypoints with segments. The proposed approach is analytically shown to be computationally more efficient than standard interpolation methods, enabling real-time implementation on microcontrollers, while remaining compatible with standard trajectory tracking and path following algorithms.


[85] 2602.10878

Simple generators of rational function fields

Consider a subfield of the field of rational functions in several indeterminates. We present an algorithm that, given a set of generators of such a subfield, finds a simple generating set. We provide an implementation of the algorithm and show that it improves upon the state of the art both in efficiency and the quality of the results. Furthermore, we demonstrate the utility of simplified generators through several case studies from different application domains, such as structural parameter identifiability. The main algorithmic novelties include performing only partial Gröbner basis computation via sparse interpolation and efficient search for polynomials of a fixed degree in a subfield of the rational function field.


[86] 2602.18452

RA-QA: A Benchmarking System for Respiratory Audio Question Answering Under Real-World Heterogeneity

As conversational multimodal AI tools are increasingly adopted to process patient data for health assessment, robust benchmarks are needed to measure progress and expose failure modes under realistic conditions. Despite the importance of respiratory audio for mobile health screening, respiratory audio question answering remains underexplored, with existing studies evaluated narrowly and lacking real-world heterogeneity across modalities, devices, and question types. We hence introduce the Respiratory-Audio Question-Answering (RA-QA) benchmark, including a standardized data generation pipeline, a comprehensive multimodal QA collection, and a unified evaluation protocol. RA-QA harmonizes public RA datasets into a collection of 9 million format-diverse QA pairs covering diagnostic and contextual attributes. We benchmark classical ML baselines alongside multimodal audio-language models, establishing reproducible reference points and showing how current approaches fail under heterogeneity.


[87] 2602.18655

Infinite-Dimensional Closed-Loop Inverse Kinematics for Soft Robots via Neural Operators

For fully actuated rigid robots, kinematic inversion is a purely geometric problem, efficiently solved by closed-loop inverse kinematics (CLIK) schemes that compute joint configurations to position the robot body in space. For underactuated soft robots, however, not all configurations are attainable through control action, making kinematic inversion extremely challenging. Extensions of CLIK address this by introducing end-to-end mappings from actuation to task space for the controller to operate on, but typically assume finite dimensions of the underlying virtual configuration space. In this work, we formulate CLIK in the infinite-dimensional domain to reason about the entire soft robot shape while solving tasks. We do this by composing an actuation-to-shape map with a shape-to-task map, deriving the differential end-to-end kinematics via an infinite-dimensional chain rule, and thereby obtaining a Jacobian-based CLIK algorithm. Since this actuation-to-shape mapping is rarely available in closed form, we propose to learn it using differentiable neural operator networks. We first present an analytical study on a constant-curvature segment, and then apply the neural version of the algorithm to a three-fiber soft robotic arm whose underlying model relies on morphoelasticity and active filament theory.


[88] 2603.00395

Fine-grained Soundscape Control for Augmented Hearing

Hearables are becoming ubiquitous, yet their sound controls remain blunt: users can either enable global noise suppression or focus on a single target sound. Real-world acoustic scenes, however, contain many simultaneous sources that users may want to adjust independently. We introduce Aurchestra, the first system to provide fine-grained, real-time soundscape control on resource-constrained hearables. Our system has two key components: (1) a dynamic interface that surfaces only active sound classes and (2) a real-time, on-device multi-output extraction network that generates separate streams for each selected class, achieving robust performance for upto 5 overlapping target sounds, and letting users mix their environment by customizing per-class volumes, much like an audio engineer mixes tracks. We optimize the model architecture for multiple compute-limited platforms and demonstrate real-time performance on 6 ms streaming audio chunks. Across real-world environments in previously unseen indoor and outdoor scenarios, our system enables expressive per-class sound control and achieves substantial improvements in target-class enhancement and interference suppression. Our results show that the world need not be heard as a single, undifferentiated stream: with Aurchestra, the soundscape becomes truly programmable.