New articles on Electrical Engineering and Systems Science


[1] 2606.03998

TGSD: Topology-Guided State-Space Diffusion for EEG Spatial Super-Resolution

Low-density EEG is more suitable for wearable and IoT-based brain sensing, but sparse electrode sampling often lacks sufficient spatial information to characterize cross-regional neural activity. EEG spatial super-resolution aims to recover dense-channel EEG from sparse recordings, yet remains challenging because channel missingness typically occurs at the whole-channel level, spatiotemporal dependencies over the full electrode layout are often underexplored, and the mapping from sparse to dense signals is inherently ambiguous. To address these issues, we propose TGSD, a topology-guided state-space diffusion framework for EEG spatial super-resolution. TGSD first employs a Hierarchical Spatial Prior Encoder to learn topology-aware priors over the complete electrode layout by integrating local geometric relationships with region-level contextual information. Based on these priors and sparse observations, a Conditional State-Space Diffusion Reconstructor progressively generates missing-channel signals through reverse diffusion, while alternating temporal and channel-wise state-space modeling captures long-range temporal dynamics and inter-channel dependencies in a unified framework. Experiments on the SEED and PhysioNet MM/I datasets show that TGSD consistently outperforms representative baselines under different super-resolution factors in both reconstruction fidelity and downstream classification performance. These results demonstrate the effectiveness of combining topology-aware spatial priors with conditional diffusion for enhancing practical low-density EEG sensing in wearable and IoT scenarios. The official implementation code is available at this https URL.


[2] 2606.03999

Airy Beam Dispersion in Near-Field Wideband Terahertz Communications

This letter investigates Airy beam dispersion in near-field wideband terahertz communications. Unlike conventional focusing beams, whose dispersion mainly appears as focal-point migration, Airy beams exhibit frequency-dependent shifts of both the reference focusing point and the self-bending main-lobe trajectory. Based on the Fresnel diffraction integral, a closed-form trajectory expression is derived to characterize the dispersion behavior across subcarriers. Furthermore, a true-time-delay (TTD)-assisted Airy beamforming structure is developed to actively control the trajectory dispersion. By properly designing the time delay parameters, the proposed scheme can either generate frequency-dependent curved trajectory clusters for sensing-oriented scanning or suppress trajectory drift for reliable communication.


[3] 2606.04001

Geometry-Structured Channel Reconstruction for Conventional and Fluid Antenna Systems: Bayesian Inference and Fundamental Limits

Accurate channel state information (CSI) acquisition is critical for exploiting the spatial flexibility of fluid antenna systems (FASs). However, port selection and transmission optimization require CSI over a large number of candidate port positions, making direct port-wise estimation prohibitively costly in terms of pilot overhead. This paper addresses this challenge through geometry-structured channel reconstruction, which exploits the fact that the port-domain CSI can be parameterized by a small number of dominant propagation paths. We first establish fundamental mean square error (MSE) and normalized MSE (NMSE) benchmarks for both geometry-structured and unstructured channel reconstruction, providing analytical references for evaluating the intrinsic benefit of geometric modeling in conventional antenna systems and FASs. Motivated by the strong spatial correlation induced by densely distributed fluid antenna ports, we further propose a Bayesian reconstruction framework, termed geometry-structured expectation-maximization approximate message passing (GS-EM-AMP). The proposed algorithm incorporates geometric channel structure into the EM-AMP procedure and adaptively learns unknown statistical parameters from noisy observations. Numerical results demonstrate that GS-EM-AMP achieves near-bound reconstruction accuracy while maintaining strong robustness against steering-domain correlation, thereby offering an efficient and reliable solution for large-scale CSI acquisition in FASs.


[4] 2606.04003

A sharp analysis of Root-MUSIC: locations of correct and extraneous roots

Root-MUSIC is a spectral estimation algorithm that approximates the unknown signal frequencies by constructing a high-degree polynomial and finding a subset of roots which are closest to the complex unit circle. Previous works found asymptotic expectation formulas for the performance of Root-MUSIC under the implicit assumption that the aforementioned root selection criterion does not select extraneous roots -- those which are unrelated to the correct parameters. This paper removes the need for this assumption by showing all extraneous roots lie outside an annulus of a certain thickness and therefore are not selected by the algorithm. This paper also provides sharp, non-asymptotic, and explicit error bounds for the correct roots in terms of fundamental model parameters. All results hold under a natural separation condition on the correct signal frequencies and are applicable in both the single- and multi-snapshot models. More specifically, in the multi-snapshot model, we prove that Root-MUSIC estimates the frequencies with error at most $O(\sigma /(m \sqrt n))$, where $\sigma^2$ is the noise variance, $m$ is the number of sensors, and $n$ is the number of snapshots. A novelty of this non-asymptotic bound is the explicit $1/m$ decay, which indicates that there is a significant advantage in utilizing additional sensors. Numerical simulations confirm our theory. The main mathematical insight of this paper is a geometric property of the Root-MUSIC polynomial: its correct roots are highly stable to noise while its extraneous roots must lie outside of an annulus.


[5] 2606.04008

Neural Radiated-Noise Fields for Unmanned Underwater Vehicle Noise Spectrum Prediction in Three-Dimensional Scenes

Radiated noise in unmanned underwater vehicles (UUVs) is an important indicator for characterizing acoustic signatures and evaluating platform performance. To address the strong dependence of traditional physics-based modeling and numerical simulation methods on target structural information and environmental boundary conditions, and their inability to achieve continuous spatial spectrum-response modeling in three-dimensional scenes, this paper proposes a neural radiated-noise field (NRNF). An NRNF represents the UUV radiated-noise spectrum as a continuous function of the three-dimensional UUV position, the three-dimensional hydrophone position, the UUV yaw angle, and the frequency, enabling query-based prediction at arbitrary spatial locations. The proposed method employs sinusoidal encoding for position and frequency, and introduces a learnable three-dimensional scene feature grid to explicitly represent environmental structure and propagation effects. A spectrum-prediction dataset is constructed from lake trials, and the proposed model is evaluated under three settings: horizontal extrapolation, depth extrapolation, and cross-run generalization. Results show that the NRNF achieves an average prediction error of 3.5 dB in the 50 to 5000 Hz band. Horizontal extrapolation is easiest, depth extrapolation is the most challenging, and cross-run generalization is of intermediate difficulty. Further ablation results demonstrate that the scene feature grid significantly improves the prediction stability and spatial generalization of the model.


[6] 2606.04013

Distortion-Aware UAV Placement for Aerial Semantic Relay Communications: An Analytical Approach

Aerial semantic relay communications (SRC) employs an unmanned aerial vehicle (UAV) equipped with a semantic encoder as a relay, which not only extends the data acquisition coverage of the base station (BS) from resource-limited sensing device (SD) but also enhances communication efficiency through semantic feature transmission over the UAV-BS link. Existing works mainly focus on sum-rate maximization, overlooking the end-to-end reconstruction distortion of sensory data in UAV-assisted SRC systems. Optimizing the UAV placement is crucial for minimizing the end-to-end reconstruction distortion, as it fundamentally trades off the input perturbation at the UAV-side encoder against that at the BS-side decoder through the two-hop wireless channel conditions. In this paper, we propose an interpretable and efficient UAV placement policy by minimizing end-to-end reconstruction distortion in aerial SRC. This is a challenging task since the black-box nature of the DNN-based codecs and the intricate coupling between the heterogeneous codec sensitivities, along with two-hop channel impairments, render the end-to-end distortion analytically intractable to characterize. We first derive an analytical expression of the end-to-end distortion, explicitly revealing the impact of cross-hop perturbation coupling, wireless channel and radio resource on the reconstruction error. Based on that, we develop a closed-form UAV placement strategy with fast adaptability across various aerial SRC system configurations. Numerical results demonstrate that the proposed distortion-aware UAV deployment closely tracks the empirical exhaustive-search optimum, while achieving lower distortion compared to representative capacity-based and curve-fitting benchmarks.


[7] 2606.04015

GenED-SC: Generative Editing Semantic Communication with Integrated Multi-Modal LLMs

Deep learning-based joint source-channel coding has recently demonstrated strong potential for semantic communication (SemComm). However, most existing approaches focus on optimizing visual-fidelity metrics, which can lead to reduced perceptual quality. Generative model-based SemComm leverages rich prior knowledge from large-scale pre-training to enhance perceptual quality, but often at the cost of increased distortion and unreliability. This paper addresses the above issues by proposing a two-stage semantic image transmission framework, integrating a multimodal large language model (MLLM) for generative editing. In the first stage, a JSCC-based discriminative transmission selectively prioritizes semantically important regions, preserving scene layout and object integrity under limited bandwidth. In the second phase, MLLM-driven generative editing refines missing details based on the textual descriptions, enhancing semantic fidelity and perceptual quality. Extensive experiments show that the proposed framework achieves state-of-the-art performance in semantic preservation, perceptual quality, and visual fidelity across a wide range of channel conditions, especially in low-SNR regimes.


[8] 2606.04019

Gravity-Aware Hierarchical Routing for Lightweight SensorLLM on Human Activity Recognition

Recent studies on sensor-language alignment have shown that two-stage frameworks can improve the semantic modeling ability of wearable-sensor human activity recognition (HAR), where SensorLLM-style methods first perform motion-to-language alignment and then fine-tune the model for downstream tasks. However, our experiments reveal a consistent failure mode when the Stage 2 backbone is compressed to a compact model such as TinyLlama: recognition of dynamic activities remains relatively strong, while the discrimination of low-motion static classes such as standing, sitting, and lying degrades substantially. To address this issue, we propose a gravity-aware hierarchical routing head as a lightweight post-alignment adaptation built on top of an already aligned model, rather than a new large-scale pretraining framework. The method uses the per-channel mean and std from the Chronos tokenizer state to extract statistical cues related to posture and gravity direction, and adaptively combines a static expert and a full expert through soft routing, together with a load-balancing loss for stable training. On the MHealth dataset, this design significantly improves macro-F1 with minimal parameter overhead, and the gains are concentrated mainly on static classes while preserving strong performance on dynamic activities. As a first arXiv disclosure, the current paper reports results on a single dataset only, with the goal of highlighting the core method and laying the groundwork for broader evaluation in future work.


[9] 2606.04076

SkySense: A Semi-Supervised Generative Framework for UAV Localization in ISAC Networks

Extreme data scarcity and inherent multipath spatial ambiguity severely limit existing deep learning-based channel state information (CSI) fingerprinting localization schemes for target unmanned aerial vehicles (UAVs). To overcome these challenges, we propose an end-to-end semi-supervised generative localization framework. First, by exploiting the temporal correlations inherent in continuous flight trajectories, a self-supervised encoder extracts robust spatial features from massive unlabeled CSI sequences to establish structured latent representations. Following this, we utilize a consistency model, a powerful derivative of diffusion architectures, as the core generative backbone to map the learned latent space to physical coordinates, jointly fine-tuning the pre-trained encoder with a strictly limited set of labeled CSI. This consistency formulation models the conditional distribution to resolve the mean collapse problem of discriminative models, while compressing the inference trajectory to 1-2 steps to avoid the latency bottleneck of traditional diffusion models. Furthermore, a lightweight distributed fusion mechanism is designed to aggregate spatial predictions across multiple base stations (BS) from a multi-view geometry perspective. Comprehensive evaluations on a real-world measurement dataset demonstrate that our framework achieves low latency and suppresses the mean localization error to 9.77 cm under a 3-BS fusion setup with only a 1\% label fraction, significantly outperforming existing fully supervised and semi-supervised discriminative baselines.


[10] 2606.04102

Spatial-Spectral Modeling of the Array Pattern of a Two-Element Dynamic Antenna Array with Differential Amplitude Modulation

We present a theoretical model for a two-element dynamic phased array and characterize the transfer of information as a function of angle. The array is based on a two-state switched structure with phase shifting to support beamsteering. Dynamic motion of the phase center of antenna arrays generates time-varying radiation patterns that, when appropriately designed, support directional modulation, or the transfer of information to regions of space that are narrower than that covered by the energy radiated by the array. We evaluate the impact of switching frequency and steering on the spatial width of the information beam, which is the region of space where information is recoverable. The concepts are evaluated through simulation and experiment using a 0.75$\lambda$ two-element array operating at 2.5 GHz.


[11] 2606.04156

How Many Bits Are Required for RIS Designs Without Far-Field Quantization Lobe?

Reconfigurable Intelligent Surface (RIS) designs with 1-bit phase resolution often suffer from strong quantization lobes in the far field, which significantly degrade wireless communication performance. This work investigates the minimum phase resolution required for RIS to eliminate far-field quantization lobe. The analysis demonstrates that 2-bit phase discretization offers an optimal balance between performance and hardware complexity. A practical 2-bit RIS unit cell is designed, and a 20 x 20 array configuration is implemented to evaluate its performance. The quantization-lobe suppression capability is validated through full-wave radar cross-section (RCS) simulations under plane-wave illumination for the entire RIS array. The fabricated prototype is further characterized experimentally, achieving a -13.1 dB quantization-lobe level compared to -0.8 dB for its 1-bit counterpart, confirming both the analytical and full-wave simulation results.


[12] 2606.04163

Adaptive arrival cost update for improving Moving Horizon Estimation performance

Moving horizon estimation is an efficient technique to estimate states and parameters of constrained dynamical systems. It relies on the solution of a finite horizon optimization problem to compute the estimates, providing a natural framework to handle bounds and constraints on estimates, noises and parameters. However, the approximation of the arrival cost and its updating mechanism are an active research topic. The arrival cost is very important because it provides a mean to incorporate information from previous measurements to the current estimates and it is difficult to estimate its true value. In this work, we exploit the features of adaptive estimation methods to update the parameters of the arrival cost. We show that, having a better approximation of the arrival cost, the size of the optimization problem can be significantly reduced guaranteeing the stability and convergence of the estimates. These properties are illustrated through simulation studies.


[13] 2606.04174

Co-optimization of Diffusive and Tomographic Blur in Computed Axial Lithography via Experimental Kernel Identification

Computed Axial Lithography is a volumetric additive manufacturing method that selectively cures photosensitive resin through the 3D superposition of patterns of light, offering advantages over layer-based processes including rapid print times, reduced layer artifacts, and compatibility with high-viscosity materials. However, diffusive effects, primarily those of free-radical quenchers such as oxygen, blur the boundary between cured and uncured regions, limiting resolution and preventing the reproduction of sharp, high-spatial-frequency features. By comparing micro-CT data to computational dose models convolved with kernels across a range of diffusivities, we establish a framework for extracting a single diffusion kernel from any standard uncorrected print to account for all observed deviations from the target. In this work, we correct diffusion-induced blurring by co-optimizing for its effects alongside the inherent blur of the computed tomography reconstruction, demonstrating improved fidelity over previous approaches of pre-compensating the target geometry via deconvolution.


[14] 2606.04207

Integrated Real-Time Testbed for Wideband RFID and Wireless Power Transfer

This contribution presents an experimental integrated real-time 8 x 8 distributed MIMO (D-MIMO) testbed for wideband backscatter communication (BSC) and wireless power transfer (WPT). The testbed operates in the 2.45 GHz band with coherent sampling at 200 MS/s, employs a backscatter link frequency of 40 kHz, and uses wideband 5G NR reference signals for excitation. We evaluate the testbed by exploiting the estimated channel state information (CSI) in two target applications: wireless power transfer towards the backscatter device (BD) and real-time positioning of a BD in an indoor environment. In conjunction with the baseband processing chain introduced, the testbed requires less than 2 ms of total airtime to excite the system and acquire the signals for subsequent synchronization and CSI estimation on uplink BSC signals. With the CSI, we demonstrate effective energy harvesting gains of up to 12 dB.


[15] 2606.04210

Representation Matters in Randomized Smoothing for Audio Classification

Randomized smoothing (RS) certifies robustness in the vector space where Gaussian noise is added. In audio classification, this space is often not uniquely defined as standard pipelines normalize, range-control, and transform waveforms into log-mel or other spectral features. We show that direct RS is therefore under-specified unless the certified object and preprocessing policy are explicit. On two audio benchmarks, keyword spotting and environmental-sound classification, we study waveform, feature-space, and post-processed smoothing. Our diagnostics show why representation-aware reporting is necessary: at the same smoothing level $\sigma=0.0025$, the two datasets share the same median raw radius $.007996$, but different waveform energies yield different SNR-equivalent scales ($83.98$ vs. $90.97$ dB); log-mel smoothing gives higher positive-radius certified accuracy on environmental sounds ($68.42\%$ vs. $65.53\%$), certifying more examples with nonzero radius but over features rather than waveforms; and clipping or peak normalization changes the effective perturbation norm by roughly $230$--$351\times$. We therefore recommend that audio RS studies choose and report the task-specific certified object and perturbation model, including the perturbation location, gain policy, raw radius, and any post-noise geometry changes.


[16] 2606.04239

State Observers for Linear Systems with Prescribed Residual Bounds

This paper presents a state observer design for continuous linear time-invariant (LTI) systems subject to unknown bounded disturbances, that enforces a prescribed bound on the observer residual. The proposed observer augments a continuous-time Luenberger observer with state resets, triggered when the norm of the residual equals a pre-specified bound. The reset map guarantees contraction of the residual at jump instants while preserving the uniform boundedness properties of a standard Luenberger observer. The paper also establishes forward invariance of the residual envelope and non-expansiveness of the estimation error in a Lyapunov metric. Simulation results confirm the analysis. Under bounded disturbances, the residual stays within the prescribed bound. A standard Luenberger observer with the same gains violates this bound.


[17] 2606.04361

When Freshness Is Not Enough: Distribution-Aware Age of Information for Networked LQR Control

Age of Information (AoI) has become a central metric for the design of wireless update systems, especially in applications where fresh measurements support tracking, estimation, and control. Despite its popularity, the use of mean AoI or peak AoI as a surrogate for closed-loop performance is often motivated by intuition rather than by a control-theoretic derivation. This paper examines whether minimizing the mean AoI is in fact optimal for networked control systems. For scalar linear time-invariant systems with delayed intermittent updates, we show that, under state-independent scheduling policies, the infinite-horizon LQR tracking problem reduces to an optimization over the distribution of inter-scheduling intervals. The resulting objective depends on higher-order statistical moments, and in unstable or correlated regimes on exponential moments, of the inter-scheduling process rather than only on its mean. Consequently, policies with identical mean AoI can induce substantially different tracking costs. We further extend the analysis to disturbances with exponentially decaying autocorrelation and derive equivalent cost formulations that expose the role of the full interval distribution. Finally, we validate the theory using real vehicle trajectories from the NGSIM US-101 dataset. The empirical results match the predicted performance trends, demonstrating that mean AoI alone is insufficient for control-oriented network design.


[18] 2606.04370

Masked Wavelet Scattering Transform Neural Field for Sound Field Reconstruction

In this paper, we propose a reconstruction framework that leverages the Wavelet Scattering Transform (WST) as a multi-scale feature extractor to impose statistical priors under sparse observation conditions. The reconstruction problem is formulated as an optimization task and solved using a neural field, with the WST incorporated into the training loss function. As a proof of concept, we validate the proposed method on HRTF upsampling. A masking strategy is applied to the WST coefficients, resulting in a two-phase procedure. The first phase learns a binary mask from a small multi-subject dataset, while the second phase applies the learned mask to the WST coefficients of an individual HRTF to preserve informative statistical structures during reconstruction. Validation against baseline methods, which also serve as an ablation study of the different components of the framework, demonstrates the effectiveness of the proposed approach.


[19] 2606.04376

FUSE-Flow: A Decoupled Framework for Calibration and Stateless Real-Time Multi-View Point Cloud Fusion

Real-time multi-camera 3D reconstruction is a key foundation for immersive media, remote interaction and spatial computing. While synchronized camera arrays are widely adopted, achieving geometrically consistent and scalable real-time reconstruction remains challenging. A key challenge is the close linkage among extrinsic calibration, multi-view fusion and global optimization, which causes fluctuating reconstruction results, cumulative errors and poor system expandability. We propose a decoupled framework for calibration and stateless real-time multi-view point cloud fusion (FUSE-Flow), a framework with two collaborative components: geometry-aligned multi-view extrinsic calibration (GMAC) and reliability-guided multi-view point cloud fusion (FUSE). This split design avoids conflicting optimization objectives for targeted improvement. The GMAC module refines camera extrinsics via geometric constraints and multi-view reconstruction transformers, enabling accurate sparse-view calibration without calibration targets, dense images or global bundle adjustment. The FUSE module integrates confidence weighting and adaptive spatial hashing for stateless fusion, ensuring linear time and memory consumption. The two modules mutually reinforce each other: accurate camera poses boost fusion accuracy, and confidence-aware fusion corrects calibration biases. Validated on public datasets and real camera setups, FUSE-Flow outperforms mainstream real-time reconstruction methods in visual effect, dynamic stability and scalability, providing a practical solution for large-scale real-time 3D reconstruction.


[20] 2606.04395

Input-to-State Stable Bundle Koopman Neural ODEs for Learning Controlled Dynamics under Environmental Constraints

We propose ISS-BKNO, a unified framework that integrates Koopman operator identification, Neural ordinary differential equations (ODEs), fiber bundle geometry, and input-to-state stability (ISS) certification. Unlike prior approaches that address stability, extrinsic inputs, or environmental constraints in isolation, the proposed framework simultaneously learns controlled nonlinear dynamics while guaranteeing global convergence and a computable ISS gain. The architecture introduces a three-stage lifting pipeline: a bundle-aware encoder that separates environment-specific fibers, an environment-conditioned Koopman backbone whose matrix spectrum is constrained to lie in the left half-plane, and a residual neural ODE correction whose Jacobian satisfies a quadratic sector bound. Lyapunov-based ISS regularization turns the stability requirement into a differentiable penalty that is jointly optimized with the prediction objective. Theoretical results establish fiber invariance, ISS with an explicit gain formula, and an approximation error bound that scales with the EDMD residual. Experiments on a pendulum, cart-pole, a unicycle-based navigation task, and a Franka Emika manipulator demonstrate substantially improved prediction accuracy and robustness under matched disturbances compared with existing Neural ODE and Koopman baselines.


[21] 2606.04419

L-TGVN: Leveraging Longitudinal Priors for Personalized Rapid MRI

MRI provides excellent soft-tissue contrast without ionizing radiation, but long acquisition times increase patient discomfort while also raising exam costs and limiting scanner throughput. A common approach to reduce scan time is to acquire fewer measurements, which yields an ill-posed linear inverse problem; recovering diagnostic-quality images therefore requires incorporating prior knowledge beyond the measured data. In follow-up exams, the most recent prior scan of a patient can provide a highly informative subject-specific context, but practical use is complicated by temporal changes (including pathology progression), misalignment between scans, and protocol drift across acquisitions. In this work, we introduce L-TGVN, a Longitudinal Trust-Guided Variational Network that leverages prior scans as side information to reconstruct the current scan from heavily undersampled measurements. Crucially, L-TGVN constrains the influence of prior scans to be consistent with the acquired measurements. Unlike many existing longitudinal reconstruction methods, it does not require explicit pre-registration between prior and current scans. It further accommodates differences in acquisition protocols across visits (e.g., changes in sequence parameters). We evaluate L-TGVN against matched-capacity baselines, including prior-guided methods and methods that do not use longitudinal priors, and observe consistent improvements in standard quantitative metrics together with better preservation of fine structures at challenging accelerations. Source code is available at this http URL.


[22] 2606.04444

Scaling Datasets for Multi-Sensor, Multi-Agent, and Multi-Domain Learning in Autonomous Systems

Existing datasets cannot support large-scale learning in multi-agent, multi-sensor, or multi-domain autonomy, where diversity and coordination are essential. We present a modular dataset generation pipeline that creates terabyte-scale, ground-truth-labeled data for ground, aerial, and infrastructure-based systems using the AVstack framework and CARLA simulator. Supporting single- and multi-agent configurations with flexible sensor suites, the pipeline enables controllable experimentation across challenging conditions. Representative perception and fusion studies show how generated data can support application-specific training and collaborative autonomy.


[23] 2606.04471

Self-Optimizing Control of Continuous Processes Based on Reinforcement Learning

This paper addresses the Self-Optimizing Control (SOC) problem in industrial continuous processes and proposes a Reinforcement-Learning (RL)-based SOC approach to improve dynamic performance under high-frequency disturbances. In the proposed framework, the SOC controlled variable structure is embedded in the Actor network, and reward functions are designed based on economic indicators. Through interaction with the environment, the RL agent optimizes controlled variables while implicitly considering implementability and steady-state uniqueness. Online fine-tuning is further introduced to alleviate model mismatch. Experiments on a continuous stirred-tank reactor with disturbances compare the proposed RL-based SOC method with the Objective-Guided Controlled Variable Learning Approach based on steady-state data. The results show that the RL method achieves improved dynamic performance under real-time disturbances, generates smooth controlled variable outputs without explicit regularization, reduces hyperparameter-tuning complexity, and enhances adaptability through online adjustment. Overall, the proposed RL-based SOC approach provides an effective solution for nonlinear process control and offers a promising reference for future studies involving multiple disturbances, multiple operating conditions, and model-free scenarios.


[24] 2606.04519

Bearing Only Distributed Circumnavigation with Limited Target Information for Asymmetric Dubins Vehicles

In this paper, we present a class of bearing based distributive nonlinear guidance laws for the cooperative circumnavigation of a stationary target by a heterogeneous team of asymmetric Dubins vehicles. In such a vehicle, the maximal left and right turn capabilities are non uniform. In the given framework, the location of the target is known only to a small subset of the vehicles, called the leaders. The uninformed vehicles, called the followers, use information from their out neighbours in the communication graph, constructed using the nearest neighbour rule. A class of guidance laws is formulated that relies solely on the heading angle and line of sight angles of a designated out neighbour of the vehicle in the graph. Using Zubov theorem, we prove that the proposed guidance laws achieve global asymptotic stability under angular speed only control and ensure the convergence of the trajectories of all the Dubins vehicles to a common centre. The proposed results are validated through numerical simulations.


[25] 2606.04531

Gaussian-Process Dynamics of Diagonal Expectation Propagation under Variance-Profile Gaussian Measurements

State-evolution analyses of approximate-message-passing and expectation-propagation-type algorithms rely on an effective-channel principle: after a suitable Onsager, orthogonal, or extrinsic correction, the nonlinear module receives a fresh scalar Gaussian observation. This paper studies this principle for diagonal expectation propagation under variance-profile Gaussian sensing matrices. The model preserves Gaussian conditioning, but removes the isotropy that supports the usual scalar decoupling arguments. We prove a finite-time large-system description in which the linear EP module remains Gaussian at the coordinate level, but is generally not a fresh scalar channel. Instead, the residuals form a coordinate-dependent Gaussian process whose covariance is shaped by the variance profile and by the finite linear history of the algorithm. The standard diagonal EP cavity cancels the instantaneous response of the incoming message, but may leave a component predictable from past residuals. We characterize this process through a conditioned matrix-Dyson-equation deterministic equivalent and a Schur-complement representation of the linear module. A Gaussian-regression decomposition then separates the predictable memory from the orthogonal innovation and yields an oracle state-evolution-level correction. Thus, under variance-profile measurements, the limiting object for diagonal EP is a Gaussian-process dynamics with profile-dependent memory rather than the conventional fresh-noise scalar state evolution.


[26] 2606.04532

Microwave Linear Analog Computers Aided Multiuser Communication: General Impedance Matching and Precoding Optimization

Microwave linear analog computers (MiLACs) have recently emerged as a hardware-efficient solution for implementing multi-antenna communication systems. Unlike existing MiLAC designs based on the ideal assumption of perfect impedance matching (PIM) with reflection-free transmission, this paper investigates MiLAC-aided precoding optimization under a general impedance matching (GIM) model, which enables more flexible precoder design at the cost of a potential reduction in radiated power. Specifically, we consider a downlink multiuser multiple-input single-output (MISO) communication system and aim to maximize the system sum rate by optimizing the MiLAC-enabled transmit precoding subject to physical circuit constraints. The formulated problem is challenging to solve due to the intricate coupling between the precoding and impedance parameters. To address this challenge, we first develop a singular value decomposition (SVD)-based parametric search framework for small or medium size systems. This framework exploits the feasible precoder structure and explicitly captures the tradeoff between power radiation efficiency and precoder design flexibility. We then propose a unified algorithm for solving the optimization problem based on the projected weighted minimum mean-square error (WMMSE) principle for arbitrary size systems with GIM- or PIM-based MiLAC precoding. Simulation results demonstrate that the GIM-based MiLAC design consistently outperforms its PIM counterpart as a special realization, especially in interference-limited scenarios, by allowing a moderate reduction in radiated power in exchange for additional precoder design flexibility and more effective interference mitigation. It is also shown that GIM-based MiLAC design achieves performance close to that of the baseline fully digital precoding system.


[27] 2606.04538

Small-Signal Analyses Using Analytical IBR Models and Frequency-Dependent Thévenin Equivalents

This paper investigates whether component-level studies can capture additional interactions through Small Signal Analysis (SSA) when the network connected to the Voltage Source Converter (VSC), typically modeled as a simple Thevenin Equivalent, is a more complex IBR-based network. The research investigates cases ranging from basic analytical to an IEEE 9-Bus EMT model, with and without Inverter-Based Resources (IBRs), synthesized as State-Space elements. The study identified that spurious poles at 50Hz related to dq-frame conversion can hinder the accuracy of participation factor analysis. A potential approach involves a two-step process: first, applying Henkel reduction to remove most spurious poles, followed by manual elimination of any remaining ones.


[28] 2606.04553

Robust Set-Membership Diffusion Normalization Subband Adaptive Filtering Algorithms Over Distributed Networks

With the development of wireless sensor networks, distributed networks have received widespread attention. According to the different ways of connecting the nodes in the distributed network can be divided into different structures, of which the diffusion type structure is the most commonly used one due to its simple, stable and reliable. In order to improve the robustness of the diffusion subband algorithm in distributed networks, the median absolute deviation (MAD) theorem is applied to the error boundary selection, and this paper proposes a diffusion subband algorithm with a robust boundary. Through simulations, it is verified that the proposed algorithm can effectively reduce the update step size in the face of outlier interference, so that the algorithm has a good convergence performance and also has good robustness to impulsive noise.


[29] 2606.04565

Implementation of a Misalignment-Tolerant MIMO Near Field Wireless Power Transfer System

The efficiency of reactive near-field wireless power transfer (WPT) systems degrades rapidly with increasing separation distance and is highly sensitive to misalignment between transmitting and receiving coils. These limitations restrict the mobility of powered devices and confine many near-field WPT applications to static scenarios. To address these challenges, a multiple-input multiple-output (MIMO) WPT configuration is investigated due to its capability to shape the magnetic field distribution between the transmitter and receiver. Maximum power transfer efficiency can be achieved by appropriately setting the amplitude and phase of each transmitting coil; however, determining these optimal settings requires accurate knowledge of the system's S-parameters. This paper presents the use of the Nelder-Mead iterative optimization algorithm to estimate the input amplitude and phase settings that maximize transfer efficiency in a near-field WPT system. The implementation comprises a four-element transmitter and a two-element receiver. Based on measured S-parameters, the proposed approach significantly improves WPT efficiency under both aligned and misaligned conditions.


[30] 2606.04595

KD-NVC: A Search-and-Distill Framework to Accelerate Neural Video Coding

While neural video coding (NVC) has achieved remarkable rate-distortion performance, real-time decoding on edge devices has become an important demand but remains limited by high complexity. Knowledge distillation (KD) is widely used for model acceleration, yet its application to NVC faces critical challenges. Specifically, the heterogeneity of NVC sub-modules renders uniform architectural reduction suboptimal, necessitating a per-module design for better rate-distortion-speed trade-off. However, searching for diverse architectures via existing neural architecture search (NAS) algorithms is unaffordable due to the expensive training cost of neural video codecs. Moreover, after the lightweight architecture is determined, existing distillation methods overlook the feature-energy sparsity induced by the rate-constraint, which is essential for maintaining compression performance. To address these issues, we propose a two-stage distillation framework KD-NVC. In the first stage, we introduce an acceleration-efficiency-based neural architecture search (AE-NAS) algorithm. It explores the module-wise Pareto frontier to adaptively allocate the acceleration budget across heterogeneous modules. Also, it introduces the acceleration-efficiency metric to determine the final student architecture without practically training all architecture-level candidates. In the second stage, we design an energy-aware feature distillation (EFD) loss that aligns the spatially-aggregated feature-energy signatures between the teacher and student codecs, transferring the rate-induced sparsity patterns for better compression efficiency. Experimental results demonstrate that the proposed framework consistently outperforms existing codec-oriented distillation methods, and achieves 69 FPS decoding at 1080p on RTX 5060 while maintaining comparable RD performance to VTM-LDB.


[31] 2606.04600

Joint 3D Trajectory and Power Allocation for HAPs-UAV Bistatic ISARAC in Low-Altitude Networks

This paper investigates joint three-dimensional (3D) trajectory planning and resource allocation for a high-altitude platform (HAPs)-unmanned aerial vehicle (UAV) bistatic integrated synthetic aperture radar (SAR) and communication (ISARAC) system in low-altitude networks. In the proposed architecture, the HAPs provides persistent wide-area connectivity by transmitting ISARAC waveforms for ground-user communications, while a low-altitude UAV exploits its proximity and mobility to passively collect ground-target echoes for high-resolution SAR imaging. We formulate a sum-rate maximization problem for ground users subject to stringent SAR imaging signal-to-noise ratio (SNR) and resolution requirements, a total energy budget for ISARAC transmission, and UAV dynamic constraints. The resulting problem is inherently nonconvex. To tackle it, an alternating optimization (AO) framework is developed, where the power-allocation subproblem with fixed UAV states admits a closed-form water-filling solution, while the UAV trajectory optimization with fixed transmit powers is handled via successive convex approximation (SCA) and difference-of-convex (DC) programming. Simulation results verify the effectiveness of the proposed approach and demonstrate its capability to jointly support persistent communication coverage and high-resolution sensing in low-altitude network scenarios.


[32] 2606.04638

Mixed potential for nonlinear RLC circuits with memristors

In two seminal articles published in 1964, Brayton and Moser introduced the concept of a mixed potential as a fundamental theoretic tool to describe and analyze a class RLC of nonlinear circuits containing resistors, capacitors and inductors. In this paper, it is shown for the first time that a mixed potential can be introduced for a class RLCM of RLC circuits containing also memristors. This is possible provided a memristor circuit is analyzed not in the traditional voltage-current domain but rather in the flux-charge domain. The flux-charge analysis method (FCAM) plays a crucial role in the extension, in particular, a key step is an equivalence principle established via FCAM between an RLCM circuit in the flux-charge domain and a nonlinear RLC circuit in the voltage-current domain. Several examples are discussed where the mixed potential is explicitly found. These include basic circuits with memristors, such as Chua's circuit with a memristor and also large-scale memristor arrays with a neural architecture. This paper is mainly devoted to the introduction of a mixed potential for memristor circuits and the study of its main theoretic properties, as the possibility to write the circuit state equations in the flux-charge domain in an effective and compact form via the mixed potential. In a companion paper [1], the mixed potential is used to obtain in a systematic way Lyapunov-like results on convergence of RLCM circuits. Those results will extend existing results on convergence that do not cover the important case where there is the simultaneous presence of capacitors and inductors in a memristor circuit.


[33] 2606.04680

Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

Automatic speech recognition systems commonly rely on reference transcriptions for evaluation, while reference-free approaches often depend on internal confidence estimation or auxiliary language models. We propose READ (Reference-free Hypothesis Evaluation with Acoustic Discrepancy), a novel metric that evaluates ASR hypotheses directly from the speech signal. READ emphasizes the acoustic grounding of hypotheses. It uses a pretrained auto-regressive TTS model to compute the conditional likelihood of speech tokens given a text hypothesis, to measure fine-grained acoustic discrepancy between speech and text. Without additional training, READ can be applied for hypothesis refinement. Experiments show that READ correlates with specific recognition errors and improves ASR outputs, achieving up to 20\% relative error rate reduction, with particularly strong gains under noisy conditions.


[34] 2606.04698

Adaptive $c_2$-Perturbed AFDM Waveform Design for Integrated Sensing and Communication

Affine frequency division multiplexing (AFDM) is a promising waveform for integrated sensing and communication (ISAC) systems owing to its superior performance in time--frequency doubly dispersive channels. However, AFDM still faces a pair of challenges: high PAPR and random data symbols produce imperfect autocorrelation sidelobes. To address these challenges, this paper proposes a real-time data-driven framework that optimizes the pre-chirp parameter $c_2$ to enhance the AFDM-ISAC performance. Specifically, a side-information-free optimization problem is formulated to reduce PAPR and the weighted integrated sidelobe levels of both aperiodic and periodic autocorrelation functions, with complexity comparable to that of the conventional AFDM receiver. Furthermore, an efficient non-monotone line-search spectral projected-gradient algorithm is developed by exploiting closed-form gradients. Simulation results demonstrate that the proposed method achieves a superior sensing vs. communications trade-off and is capable of striking a promoted bit error rate performance in the presence of severe power amplifier nonlinearity.


[35] 2606.04725

GPU-Accelerated Direct Transcription-Based Nonlinear Model Predictive Control

In this paper, we present a GPU-accelerated framework for nonlinear model predictive control (NMPC) based on direct transcription and second-order interior-point methods. Many real-world systems exhibit nonlinear dynamics that cannot be accurately captured by linear models, motivating the use of NMPC. However, NMPC requires the repeated real-time solution of optimal control problems (OCP), which become computationally demanding large-scale nonlinear programs (NLPs) after transcription. Although GPU acceleration has emerged as a promising approach for nonlinear optimization, existing GPU-based NMPC workflows reconstruct structurally identical OCPs at each solve. This introduces substantial overhead even though successive solves differ only through updated system measurements or reference trajectories. To address this limitation, we introduce a parametric interior-point formulation that exploits the fixed structure of transcribed OCPs, enabling reuse of structure-dependent computations (e.g., symbolic factorization in sparse Cholesky) across re-solves. We evaluate the proposed framework on distillation column and 2D heated plate benchmarks against state-of-the-art CPU and GPU configurations. The results show that the framework achieves over an order-of-magnitude speedup in total NMPC run times. These improvements are primarily driven by reduced per-iteration solve times, with GPU execution achieving up to a 94% reduction compared to the baseline. Overall, the results demonstrate the effectiveness of exploiting repeated problem structure in GPU-accelerated NMPC and highlight the potential of the proposed framework to expand the envelope of real-time NMPC applications.


[36] 2606.04756

Ultra-precise TDoA-based Localization of Frequency Hopping LPWAN Transmitters

The Internet of Things (IoT) is a highly emerging market. It serves as a key enabler for a variety of applications like the digital twin or asset tracking in industrial scenarios. This often requires the provision of precise position information. However, systems like Global Navigation Satellite Systems (GNSS) are ruled out due to high energy costs and indoor applications. A variety of systems is discussed to close this gap. In order to contribute to the investigations of possible gold standards, this paper discusses the localization based on Low Power Wide Area Networks (LPWAN). Therefore, a concept is presented, based on Time Difference of Arrival (TDoA) measurements within the LPWAN standard ETSI TS 103 357. This paper addresses two major challenges. At first, TDoA measurements require highly precise temporal synchronization of the receiving base stations. Within this work, this issue is solved by exploiting Signals of Opportunity (SoO) as synchronization source, enabling sub-meter synchronization accuracy. A further issue arises from the Frequency Hopping (FH) waveform of the transmitting endpoints, resulting in a loss of phase information and thus usable localization bandwidth. A method is introduced to overcome this limitation. This paper states the system concept, proves its functionality in theoretical investigations and simulations. Finally, real-world measurements verify the functionality and show a 2D localization accuracy of below 10 m in Line of Sight (LOS) scenarios.


[37] 2606.04770

WiSER: A Wireless Scene Encoder for Geometry-Grounded Multi-View Wireless Prediction

Indoor wireless propagation is governed by the interaction among three-dimensional (3D) scene geometry, radiomaterial properties, and transmitter and receiver configuration, which jointly determine both aggregate coverage behavior and path-level multipath structure. However, most learning-based site-specific prediction methods are designed for a single wireless representation, such as radiomap estimation or channel impulse response (CIR) prediction, and therefore do not explicitly exploit the propagation structure shared across heterogeneous wireless views. This paper introduces WiSER, a Wireless Scene Encoder for joint radiomap and multipath CIR prediction. WiSER maps a sparse voxel representation of an indoor scene and a transmitter location into a transmitter-conditioned sparse 3D scene memory, which is queried by two structure-aware decoders: a ray-corridor decoder for dense receiver-plane path-gain prediction and a Detection Transformer (DETR)-style set decoder for variable cardinality delay and power tap prediction. To train and evaluate this setting, we construct a co-registered indoor scene and wireless dataset pipeline using ScanNet++ indoor scenes and Sionna Ray Tracing, producing aligned sparse voxel inputs, dense radiomap labels, and unordered multipath CIR tap sets under a common coordinate frame and propagation configuration. Experimental results show that WiSER outperforms scene-specific radiomap baselines and substantially improves matched delay and power prediction over reference CIR baselines. These results suggest that transmitter-conditioned sparse 3D scene representations can serve as reusable wireless scene encoders for heterogeneous propagation queries, providing a geometry-grounded step toward representation learning and foundation-model development for AI-native wireless systems.


[38] 2606.04787

Towards Guaranteed Optimal PID Tuning for Uncertain Nonlinear Systems

Despite the widespread use of PID controllers in engineering practice, designing optimal PID parameters has long been regarded as a challenging problem in both theory and practice, particularly when faced with uncertain nonlinear dynamical systems. Based on the authors' PID control theory established recently for MIMO nonlinear uncertain systems (Zhao and Guo, 2022), which provides a concrete PID parameter set for global stability of PID controlled systems, this paper further proposes a near-optimal PID tuning method, where only input-output (zeroth-order) data on the control performance is available. The tuning method is formulated as a constrained optimization problem and solved by an iterative learning algorithm, referred to as HRS-KW algorithm, that combines a hysteretic random search with the Kiefer-Wolfowitz algorithm, aiming at utilizing the advantages of both global exploration and local gradient acceleration. This method operates without requiring precise structural knowledge of the system dynamics, yet its almost sure convergence to an epsilon-optimal solution for the PID parameters can be guaranteed in theory while ensuring closed-loop system stability. Simulation results illustrate that our HRS-KW algorithm outperforms other related optimization methods, exhibiting better convergence to the prescribed epsilon-optimal performance set.


[39] 2606.04790

A model-free approach to control barrier functions for higher-order systems

Control barrier functions (CBFs) are a widely applied modular tool to ensure safe operation of nonlinear dynamical control systems. However, for their construction accurate knowledge of the system dynamics is typically needed. This requirement was recently alleviated for relative-degree-one systems using techniques from prescribed performance control (PPC) or funnel control (FC). This article extends the model-free CBF design to nonlinear systems of arbitrary relative degree. Moreover, we show with a simple example that a straightforward extension of existing results for relative-degree-one systems fails. Instead, we utilize novel techniques from funnel control to characterize a subset of the controls satisfying a CBF condition without requiring a dynamic model or state measurement. Finally, we demonstrate the applicability of our results on a seven degrees of freedom robotic manipulator with relative degree two.


[40] 2606.04869

Source Side Mitigation of AI Datacenter Power Fluctuations with a Hybrid Energy Storage System and Residual Differentiable Predictive Control

The rapid growth of hyperscale AI datacenters introduces structured, workload-driven active-power fluctuations at the point of interconnection. These fluctuations appear to the grid as time-varying disturbance injections that cannot be captured by conventional peak- or average-load representations. To reduce the residual power disturbance before it propagates into the bulk power system, this paper proposes a hybrid energy storage system with differentiable predictive control (HESS-DPC) framework for datacenter-side power smoothing. A workload-driven disturbance model is first established, representing the point-of-interconnection load deviation as the superposition of training and fine-tuning workloads to capture the structured forcing inputs that can excite generator frequency dynamics. A frequency-based rule-based controller then allocates this deviation between a battery energy storage system (BESS) and a supercapacitor (SC), assigning the energy-dominant component to the BESS and the fast-varying component to the SC. To overcome the anticipation and constraint limitations of fixed-frequency decomposition, a residual differentiable predictive control policy is trained offline to compute finite-horizon command corrections around the rule-based baseline while enforcing a one-step safeguard. Simulations on the NPCC 140-bus system show that HESS-DPC reduces grid-side residual deviations during workload transitions, improves SC state-of-charge sustainability over extended operation, and reduces generator peak-to-peak frequency deviations by more than 80 percent across all monitored generators, with the worst-affected generator response falling from 15.1 mHz to 1.3 mHz. These results confirm that local active-power smoothing at the datacenter point of interconnection can substantially mitigate frequency disturbances caused by AI workloads.


[41] 2606.04872

Consistent Distributed Cooperative Localization for Ultra Large-Scale Multi-agent Systems

Cooperative localization (CL) is fundamental in emerging multi-agent systems, where agents fuse local sensing data with exchanged information to estimate their own states. At a large scale, however, tracking cross-correlations becomes infeasible, preventing the use of optimal filters. Ignoring or underestimating these correlations leads to overconfident, and thus inconsistent, estimates. Existing CL algorithms achieve good performance and consistency typically at the expense of communication, computation, or memory that scales with the network size. This is incompatible with ultra large-scale systems (ULSS) - for example, satellite mega-constellations - where per-agent resources are limited and must remain independent of the number of agents. This reveals a critical gap: no existing CL method is simultaneously well-performing, consistent, and ULSS-scalable. This paper introduces a new CL framework that addresses this gap using the recently proposed overlapping covariance intersection methodology, which enables agents to exploit limited structural information about cross-correlations without compromising consistency. The resulting CL algorithm leads to optimal conservative covariance propagation using only locally available information. The method is fully distributed, scalable to an ultra large scale, and provably recursively consistent. Simulations demonstrate substantial performance improvement over state-of-the-art consistent CL approaches while preserving scalability.


[42] 2606.04913

Access Protocols for Segmented Waveguide-Enabled Pinching-Antenna Systems (SWANs)

This paper proposes an access protocol framework for segmented waveguide-enabled pinching-antenna systems (SWANs), which exploits SWAN-induced reconfigurable channel diversity as a protocol-level resource for uplink random access. The framework consists of two stages, a channel-oracle stage and an access stage, designed under three SWAN operating modes: (i) one-segment selection (OS), (ii) segment aggregation (SA), and (iii) segment multiplexing (SM). Specifically, in the channel oracle stage, the OS mode is adopted to acquire sparse pilot observations and infer the channel responses across the SWAN configuration space. In this way, high-dimensional uplink channel acquisition is recast as a low-dimensional geometric localization problem, thereby reducing pilot overhead while preserving channel reconstruction accuracy. For the access stage, we construct two oracle-guided access codebooks under the SA and SM modes, respectively, which address the tradeoff between hardware complexity and multiuser access resolution. In particular, the SA-based scheme supports single radio frequency (RF) chain access through randomized segment-group activation, whereas the SM-based R-access scheme exploits multiple RF chains to construct deterministic access slots and enhance collision resolution. Finally, our numerical results demonstrate that (i) the proposed two-stage framework improves access performance under the same training overhead, (ii) anchor densification is more effective than aggressive segment aggregation for SA, and (iii) SM-based R-access achieves deterministic coverage and higher throughput in moderate- and high-load regimes, whereas SA-based access remains attractive for low-complexity implementations.


[43] 2606.04939

UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning

Audio generation and audio-to-text understanding remain largely separate, with diffusion models dominating high-fidelity synthesis and autoregressive (AR) language models driving captioning and semantic prediction. Existing unified approaches typically rely on either heterogeneous modules or AR-centric modeling, which can hinder joint optimization and limit acoustic fidelity. We present UAT, to our knowledge, the first diffusion-centric framework that supports unified audio generation, editing, and captioning. UAT couples continuous latent diffusion for audio with masked discrete diffusion for text, enabling bidirectional audio-text modeling within a shared dual-stream backbone. Experiments show that UAT preserves strong audio generation and editing capabilities while achieving competitive captioning performance, demonstrating a favorable balance between acoustic synthesis and semantic prediction. Demo samples are available at this https URL.


[44] 2606.04942

Encounter Geometry Effects on Space-Based Laser Debris Remediation and Estimation

The escalating accumulation of orbital debris poses a critical threat to future space operations. Space-based lasers leveraging laser ablation have emerged as a promising approach for mitigating debris proliferation and preserving the orbital environment. Current literature, however, treats space-based laser debris remediation as a deterministic problem, assuming that momentum transfer and the resulting debris perturbations are precisely known. In reality, laser-to-debris engagement outcomes are inherently stochastic due to partially known debris characteristics. Compounding this challenge, estimating critical laser-matter parameters in situ, such as the momentum coupling coefficient, requires ablation that consequently perturbs the debris trajectory. This establishes a coupled ablation-and-estimation problem in which the laser platform and target debris encounter geometry influences remediation effectiveness and estimation accuracy. To address this problem, we present a joint ablation-and-estimation methodology that provides insights into the driving factors that make different encounter geometries improve or degrade overall remediation and estimation performance. Results across multiple coplanar and out-of-plane encounter geometries demonstrate how periapsis-lowering capacity, linear system observability, and nonlinear estimation performance evolve as laser parameters and relative orbit geometry vary. By identifying the key drivers behind these metrics, this study highlights critical considerations for the safe and effective operation of space-based lasers under uncertainty.


[45] 2606.04943

Differentiable Articulatory Copy-Synthesis of Biphonic Singing

Sygyt is a Tuvan style of biphonic singing in which a low vocal drone is sustained while a high harmonic is selectively amplified in the 1--3\,kHz region. Copy-synthesizing this effect remains challenging for articulatory models, since it requires fine control of narrowly focused resonances that standard low-dimensional tract parameterizations cannot easily reproduce. We address this problem with a differentiable Kelly--Lochbaum waveguide augmented with a sublingual second source, cubic B-spline tract parameterization, and spatially varying learnable damping, optimized end-to-end by gradient descent from audio. On 20 segments from two independent sygyt datasets (5 singers, 10 pitches), the proposed model reduces log-spectral distance by 30--38\% relative to an articulatory baseline, with the largest gains concentrated in the overtone region. Cepstral-envelope analysis further shows more accurate recovery of the merged formant structure characteristic of sygyt production. The model also outperforms a DDSP harmonic-plus-noise baseline with direct per-harmonic spectral control, suggesting that explicit acoustic structure is a useful inductive bias for overtone-singing copy-synthesis.


[46] 2606.04975

A Survey of Smart Grid Emerging Use Cases and Relevant 5G and 6G Capabilities and Features

The growing complexity of modern energy systems has led to the adoption of Smart Grid (SG) that use advanced communication technologies to facilitate efficient, reliable, secure, and sustainable energy operation and management. Unlike existing surveys that often treat grid and communication domains separately, this work rigorously quantifies service requirements for high-complexity emerging scenarios. It provides a comprehensive overview of SG architecture that integrates digital communication infrastructure with distributed energy resources (DERs), microgrids, energy storage systems, and cybersecurity frameworks. Furthermore, emerging SG use cases such as smart distributed voltage control, real-time fault detection and self-healing, smart and autonomous monitoring, and predictive maintenance are identified, and more importantly, service performance requirements associated with these use cases have been quantified. Additionally, key capabilities and emerging SG enablers of fifth-generation (5G) and sixth-generation (6G) networks are described. These capabilities and enablers include network slicing, edge computing, spectrum management, artificial intelligence (AI) driven optimization, digital twins, and Open-Radio Access Network (O-RAN). Finally, the paper discusses open challenges and future research directions for designing scalable, intelligent, and secure next-generation SG systems.


[47] 2606.04981

Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination

Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility. In this paper, a bi-level computation-electricity coordination framework is proposed to explicitly capture the bidirectional interactions between DCs and power grid. Firstly, a peer-to-peer cloud service market (P2P-CSM) for geo-distributed DCs is proposed, which enables bilateral cloud service transactions to leverage regional heterogeneities (e.g., electricity prices, cooling efficiency). Secondly, locational marginal prices are embedded into the framework to reflect network congestion and nodal price disparities. Thirdly, a dual consensus alternating direction method of multipliers (ADMM)-based decentralized algorithm is developed as the P2P market clearing algorithm, and a bisection-assisted iterative algorithm is proposed to ensure rigorous convergence of the framework. Case studies conducted on modified IEEE 30-bus system validate that the P2P-CSM achieves a win-win computation-electricity coordination: it not only increases total DC operational profit by 22.8\%, but also effectively alleviates grid congestion and yields a 3.2\% reduction in total energy consumption.


[48] 2606.05041

A Compact Omnidirectional Meanderline Antenna Array for Wireless Security Using Dynamic Magnitude and Phase Pattern Modulation

A compact dynamic four-element array with omnidirectional H-plane coverage is presented for planar physical-layer security using antenna-level directional modulation. The proposed approach achieves angularly selective information transmission without phased-array beamforming or multiple RF chains by dynamically switching the excitation paths of a four-element array. The antenna comprises four printed meander-line monopole elements operating at 5.05 GHz with independently controlled differential power excitation, which introduces magnitude and phase pattern modulation and dynamic motion of the apparent element spacing, resulting in strongly angle-dependent signal distortion and bit error rate (BER) performance. Reliable information recovery is confined to a narrow broadside region in the E-plane, while significantly elevated BER is observed at off-broadside angles. In contrast, the H-plane radiation remains static and omnidirectional, enabling full 360-degree information-recoverable coverage in the orthogonal plane. The antenna is fabricated on a single-layer Rogers RO4350B substrate with a compact footprint of 0.55 x 1.73 lambda_0^2. A four-path switching network implemented using commercial RF components validates the concept experimentally. Communication measurements under high-SNR conditions above 19 dB using 16-QAM demonstrate a planar information beamwidth below 24 degrees, confirming effective antenna-level directional modulation with angle-dependent BER characteristics and omnidirectional H-plane coverage.


[49] 2606.05053

Deep Learning Based Multi-Step Channel Prediction for Adaptive Underwater Acoustic OFDM Systems

We develop an adaptive OFDM framework for underwater acoustic communications based on PatchCSI-T, a Transformer-based multistep channel prediction model with feature-independent modeling and parameter sharing. Combined with a greedy adaptive modulation and power allocation scheme, the proposed approach enables accurate, low-latency CSI forecasting and improves end-to-end BER and spectral efficiency on real-world UWA channel datasets.


[50] 2606.05084

A Cancellation Mechanism in AFDM Radar Sensing: Exact Fisher Information and Delay-Doppler Decoupling

We consider radar sensing with affine frequency division multiplexing (AFDM), a chirp-based waveform recently proposed for high-mobility integrated sensing and communication. While numerical Cramér-Rao bounds for AFDM radar are available in the literature, no closed-form Fisher information analysis has so far revealed how the waveform's chirp structure shapes delay-Doppler estimation this http URL this paper, we provide such an analysis. We identify a cancellation in the AFDM likelihood: the frequency drift introduced by the chirp modulation is exactly compensated by a discrete phase correction built into the chirp-periodic prefix, leaving only a small residual. Exploiting this cancellation, we derive an exact closed-form Fisher information matrix that depends on the AFDM chirp structure through a single scalar, and from it we obtain closed-form Cramér-Rao bounds for joint delay and Doppler this http URL consequences follow. AFDM is provably less delay-Doppler-coupled than OFDM for any nonzero chirp rate. The delay Cramér-Rao bound improves quadratically with the chirp rate, while the Doppler bound is unaffected by it. Finally, our framework reduces continuously to the classical OFDM result as the chirp vanishes, certifying it as a strict generalization of OFDM radar sensing this http URL, our work shows that the chirp-periodic prefix -- until now studied only as a channel-equalization device -- is the structural element that decouples delay and Doppler in AFDM sensing, and that AFDM's superior sensing performance can be characterized analytically rather than through numerical bounds alone. Numerical experiments at realistic vehicular and low-Earth-orbit parameters validate all closed-form expressions.


[51] 2606.05113

3D-GlioPREDICT: 3D Latent Diffusion for Post-Radiotherapy Brain MRI Prediction in Patients with Glioma

Radiotherapy is a cornerstone of glioma treatment inducing complex structural changes in brain tissue that are difficult to anticipate. Predicting these changes from pretreatment data could improve understanding of treatment-related effects and support the development of image-based outcome prediction methods. Recent studies have shown that follow-up brain magnetic resonance imaging can be synthesized from baseline imaging and treatment information, but most existing approaches operate on single 2D slices and represent treatment as a global parameter, rather than a spatially dynamic variable. In this work, we address both limitations with a 3D latent diffusion framework that conditions image generation on the spatially resolved voxel-wise dose distribution, alongside a pretreatment image and follow-up time. To make volumetric synthesis computationally feasible, the model combines latent-space compression with ControlNet-based spatial conditioning. The method was trained and evaluated on a public dataset comprising 257 scans from 25 glioma patients. Prediction quality was assessed using mean squared error, peak signal-to-noise ratio, and structural similarity index. Anatomical consistency was further evaluated using Dice scores for cerebrospinal fluid, gray matter, and white matter segmentations, together with hippocampus volume prediction error and deformation analysis based on log Jacobian determinant maps. Compared with our previously proposed 2D approach, the 3D model achieved improved image similarity while maintaining good agreement with ground truth anatomy and deformation patterns. Overall, these results support the feasibility of 3D treatment-aware generative modeling for predicting post-radiotherapy brain MRI using only pretreatment information. Code is available at this https URL


[52] 2606.04040

Channel-Oriented Design for EEG-to-Music Reconstruction

Brain-computer interfaces aim to decode naturalistic stimuli from neural signals, yet most progress to date has focused on vision and language. In this article, we study a more challenging but far less explored setting, EEG-to-music reconstruction, where signals are weak, distributed, and highly susceptible to noise and channel variability. Our central finding is that early channel mixing destroys weak but discriminative EEG signals. To address this, we propose a channel-oriented design with three key components. Specifically, channel-wise tokenization treats each electrode as an explicit token to retain spatially localized neural evidence, channel-wise multi-view self-distillation enforces consistency across temporal crops and random channel subsets to learn robust and distributed representations, and channel-wise data augmentation introduces structured channel dropout to improve invariance to noise, artifacts, and missing electrodes. Together, these components preserve weak yet informative signals across channels and enable stable alignment to a semantic music representation space. We integrate this channel-oriented design within an encoding-alignment-decoding pipeline for EEG-to-music reconstruction. Theoretically, we characterize when preserving channel-level structure leads to improved alignment. Empirically, we compare with a range of state-of-the-art baselines and demonstrate consistent and significant performance gains.


[53] 2606.04072

CADET: A Modular Platform for Evaluating Distributed Cooperative Autonomy in Connected Autonomous Vehicles

Deep learning models are increasingly central to autonomous vehicle (AV) pipelines, yet their integration has traditionally followed a monolithic design where perception, planning, and control execute on a single onboard computer. This design overlooks the emerging paradigm of cooperative autonomy, where vehicles interact with roadside units (RSUs), edge servers, and cloud-hosted intelligence through vehicle-to-everything (V2X) connectivity. Cooperative perception and control improve safety and efficiency, but also introduce systems-level challenges: network latency, compute heterogeneity, and multi-tenant contention, all critically affect real-time decision-making. These challenges are further amplified by the increasing reliance on large foundation models, whose scale necessitates cloud deployment. We present CADET (Cooperative Autonomy through Distributed Experimentation Toolkit), a modular platform for systematic and reproducible evaluation of distributed cooperative autonomy systems under realistic deployment conditions. CADET decouples the AV stack into composable modules that can be flexibly deployed across vehicles, infrastructure, and edge/cloud tiers. The framework integrates state-of-the-art models, incorporates trace-driven network and workload emulation, and provides synchronized model-, system-, and task-level instrumentation. Through V2V and V2I experiments, we show that distributed deployment choices fundamentally shape safety, with V2V intent packets outperforming cloud-based perception and RSU-assisted perception sustaining safety until overloaded by concurrent requests. Although designed for AV pipelines, CADET also supports dataset-driven experimentation, enabling systems and ML researchers to benchmark distributed inference workloads independently of full vehicle simulation. CADET is open source, with code and demo available at this https URL.


[54] 2606.04103

The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids

Conventional hearing aids rely on fixed, frequency-dependent amplification and compression to manage reduced sensitivity, which often fails to provide sufficient listening support in complex environments, such as situations with multiple speakers (the ``cocktail party'' problem). To more comprehensively address the underlying encoding dysfunctions of hearing loss, we introduce the Differentiable Auditory Loop (DAL), a new open-source framework for personalized hearing aid design and fitting. Our first implementation of DAL incorporates CARFAC, a differentiable model of human cochlear function, which we ported to JAX, to optimize a deep neural network to match impaired auditory neural activity patterns with a normal-hearing reference. To build a hearing aid with the fine-grained spectro-temporal signal processing required, we adopt SEANet, a waveform-to-waveform fully convolutional UNet generator. We fine-tune the network by comparing the outputs of a CARFAC model fitted to normal hearing with that of a CARFAC model fitted to match each subject's individual hearing impairment. The comparison is done using loss functions derived from the respective CARFAC neural activity pattern (NAP) outputs and stabilized auditory images (SAIs), the latter providing a 2D representation that captures phase-insensitive temporal structure in the auditory nerve output. Through gradient descent, the SEANet model learns to both denoise the input and compensate for the hearing loss modelled by the impaired CARFAC model. Across neural-representation and signal-fidelity metrics, the DAL-optimized SEANet model outperformed the tested master hearing aid (MHA) baselines. The DAL framework provides a practical path toward model-based, machine-learning-driven personalization of hearing aid signal processing. Next steps include hardware deployment to enable real-world clinical testing.


[55] 2606.04111

AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation

Indoor UAV navigation requires efficient exploration, scene understanding, and reliable trajectory execution under limited field-of-view observations. Existing vision-based navigation frameworks typically rely on single-view observations, limiting their ability to reason about occlusions, target visibility, and global scene structure. In this work, we propose AgenticDiffusion, a multi-view UAV navigation framework that coordinates language-guided reasoning, open-vocabulary target grounding, vision-based diffusion planning, and NMPC within a unified aerial navigation pipeline. Given a natural language instruction and synchronized first-person-view (FPV) and top-view observations, the framework determines the most informative viewpoint for navigation and generates a mission plan prior to trajectory execution. The targets are localized using an open-vocabulary grounding model, after which viewpoint-specific diffusion planners generate navigation trajectories for UAV execution. Using complementary viewpoints, the proposed framework reduces repeated target exploration and improves navigation efficiency in cluttered indoor environments. The framework was validated in four real-world UAV navigation scenarios involving adaptive viewpoint selection, multi-stage mission execution, long-horizon navigation, and safe landing-site selection. The experimental results demonstrated an overall mission success rate of 80% in 40 real-world trials, while the diffusion planners achieved a trajectory generation success rate of 100%.


[56] 2606.04221

Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid

Hearing aids impose strict latency and power constraints that current DNN-based speech enhancement systems struggle to meet on embedded hardware. We characterize this gap by deploying both speech separation and denoising using the lightweight SuDoRM-RF++ architecture on the AMD-Xilinx Kria KV260, evaluated at FP32 and 16-bit fixed-point precision for each task. Across these configurations, first-sample latency tracks with on-chip parameter caching rather than arithmetic throughput, identifying data movement as the primary bottleneck. Precision reduction halves the model memory footprint without compromising objective speech quality. The fixed-point denoising accelerator achieves a first-sample latency of 9.7~ms, meeting the 10~ms clinical threshold, while speech separation reaches 16.0~ms. These measurements establish concrete resource requirements for embedded DNN-based speech enhancement and quantify the remaining gap to hearing aid deployment.


[57] 2606.04249

Prospective Dynamic 3D MRI Reconstruction via Latent-Space Motion Tracking from Single Measurement

Prospective reconstruction is crucial in many clinical applications such as MRI-guided radiotherapy, which demands accurate image reconstruction and fast motion estimation from currently acquired measurements. However, prospective reconstruction remains challenging due to ultra-sparse sampling and stringent latency requirements. In this work, we propose PDMR, a Prospective Dynamic 3D MRI Reconstruction framework with latent-space motion tracking. Our core idea is to learn an efficient and generalizable latent manifold of motion fields offline, enabling rapid online adaptation for prospective reconstruction. Specifically, we parameterize the deformation vector fields (DVFs) on a low-dimensional manifold, effectively reducing the search space for fast online adaptation, and employ a tri-plane representation to achieve geometry-aware and memory-efficient encoding of 3D motion. Experiments on both XCAT digital phantoms and in-house abdominal MRI datasets demonstrate that PDMR achieves high-fidelity and temporally consistent reconstruction across multiple prospective scenarios (Immediate and After-2min), outperforming state-of-the-art retrospective and online methods. Our results suggest a promising pathway toward ultra-fast, motion-aware prospective MRI reconstruction in clinical practice.


[58] 2606.04335

Policy Gradient for Continuous-Time Robust Markov Decision Processes

The framework of robust Markov decision processes (RMDPs) allows the design of reinforcement learning agents that satisfy performance guarantees under worst-case transition dynamics. Traditional RMDPs consider discrete-time dynamics and recently, sample-efficient policy gradient algorithms have been considered in this context. This paper investigates policy gradient algorithms within a continuous-time RMDP framework. Policy gradients and adversarial gradients are derived using pathwise and adjoint-based formulas for stochastic and ordinary differential equations. We propose double-loop optimisers to obtain linear convergence in the oracle-based setting and an $\tilde{\mathcal{O}}(\frac{1}{\epsilon^2})$ sample complexity in the sample-based setting in an analysis which also derives novel tools for the framework of undiscounted total cost MDPs. Additionally, we propose mean-field optimisers as distributional optimisers with an $\tilde{\mathcal{O}}(\frac{1}{K})$ oracle-based convergence rate and an $\tilde{\mathcal{O}}(\frac{N^2}{\epsilon})$ sample complexity under $N$-particle approximation. The effectiveness of continuous-time policy gradient algorithms is confirmed for both optimisers on continuous-time RMDPs with neural ordinary differential equation dynamics.


[59] 2606.04358

Gauss Circle Lattices with Geometric Convolutions for Synthesizing High Dimensional Image-Source Room Impulse Responses

The image-source model (ISM) is a widely adopted method for efficiently simulating acoustic room impulse responses (RIRs) under specular reflection assumptions. Acoustic paths between source and receiver are traced to lattice points computed from successive reflections over bounding planes of the room. Rectangular rooms bound the total number of image-sources to be polynomial in the RIR's duration or distance $k$ equivalent, with degree equal the number of room dimensions $N$. Direct ISM simulations are therefore compute upper-bound by $O \left ( k^N \right )$, and consider only cases of $N \leq 3$ for tractability and real-world applications. This work proposes an alternative computational method that lowers the asymptotic compute bound to $O \left ( N k^2 \log k \right )$ for integer coordinates and room dimensions via reducing ISM lattice point counting to the classic Gauss circle problem (GCP). We extend the lattice counting model to frequency-dependent and reflection weighted image-sources in higher dimensions, relating solutions between successive dimensions via the convolution operator. Two constructions for realizing RIRs are presented, along with time-frequency controls, error and run-time analysis, and RIR statistics.


[60] 2606.04418

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

Neural audio codecs are a key component of speech processing pipelines, compressing audio into discrete tokens for downstream modeling. However, existing codecs struggle to balance reconstruction quality with token efficiency, often encoding perceptually irrelevant information such as background noise and recording artifacts at the expense of linguistically and acoustically meaningful content. We reframe audio tokenization as a selective information bottleneck problem and propose CleanCodec, a denoising audio codec which learns to encode only perceptually important features and discard imperceptible information. At just 12.5 tokens per second, CleanCodec achieves state-of-the-art tokenization efficiency, substantially outperforming existing codecs in speaker similarity and speech intelligibility. Evaluations on downstream text-to-speech and voice conversion tasks further demonstrate improved performance and up to 17x faster inference, highlighting significant efficiency gains.


[61] 2606.04474

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

Speech Large Language Models (SLLMs) underperform their text counterparts on complex reasoning. We reveal that this modality gap is not a uniform cognitive deficit. Evaluating three diverse SLLMs, we show speech-to-text (S2T) matches or exceeds text-to-text (T2T) on spatial, syntactic, and factual tasks. However, on logical tasks requiring entity tracking, S2T accuracy collapses to chance. We diagnose this localized degradation as an entity binding failure: continuous speech features cause models to lose precise entity-property associations during implicit reasoning. To resolve this, we propose Entity-Aware Chain-of-Thought (EA-CoT), forcing SLLMs to explicitly enumerate entities and bind them to claims before reasoning. Strikingly, EA-CoT bridges the gap, even when spoken names are misrecognized, yielding up to a 24.4% absolute accuracy improvement. Ablations confirm these gains stem entirely from explicit semantic binding, reframing the gap as a resolvable bottleneck.


[62] 2606.04730

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflected in IWSLT's Instruction Following Track, which this year introduced new tasks including an unknown surprise task, posing a genuine challenge against overfitting to known tasks. We present KIT's submission to the Long and Short Instruction Following tracks in the unconstrained setting. Our approach combines a general data augmentation pipeline that converts short-form corpora into long-form training data through segment concatenation, LLM-based label generation, and cross-lingual translation, yielding over 1M instances across six tasks and four languages. We further show that likelihood-based re-ranking, while highly effective for ASR, systematically degrades semantic tasks by spuriously selecting candidates generated from segmented audio processing rather than holistic long-form inference, a failure mode resolved by combining likelihood with Minimum Bayes Risk decoding.


[63] 2606.04775

Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control

Text-to-video (T2V) models trained on large-scale web data can generate undesired content, motivating interventions that reduce harmful outputs without sacrificing visual quality. Activation steering offers an attractive mechanistic alternative to finetuning and prompt filtering, but existing T2V steering methods remain limited, typically applying coarse, non-anticipative interventions that can lead to oversteering and content degradation. To close this gap, we propose Latent Activation Linear-Quadratic Regulator (LA-LQR), a reduced-order optimal control framework for minimally invasive T2V steering. LA-LQR formulates T2V inference as a dynamical system and computes closed-loop feedback interventions that steer activations toward desired feature setpoints while penalizing unnecessary perturbations. To make optimal control feasible for high-dimensional video activations, we project activations onto a low-dimensional, task-relevant subspace derived from contrastive prompt pairs, estimate local linear dynamics in this latent space, and solve a latent LQR problem to obtain timestep- and layer-specific steering signals. We provide theoretical bounds relating latent setpoint tracking to raw activation-space feature control, and empirically validate the fidelity of the reduced latent dynamics. On concept steering and video safety benchmarks, LA-LQR reduces unsafe generations relative to baselines, while preserving prompt fidelity and visual quality.


[64] 2606.04849

Dynamic FDD for Spectrum Sharing in Non-Terrestrial Networks

Future 6G networks are envisioned to integrate low Earth orbit satellite mega-constellations to enable seamless global connectivity, particularly in underserved and remote areas. However, the deployment of dense mega-constellations introduces interference among satellites operating over shared frequency bands. This represents a rather new setup for studying spectrum sharing, which exacerbates the limited flexibility of conventional FDD systems based on fixed bands for downlink and uplink transmissions. We address this spectrum-sharing problem and propose dynamic re-assignment of FDD bands for improved interference management in dense deployments, as well as evaluate the performance gain of this approach. To this end, we formulate a joint optimization problem that incorporates dynamic band assignment, user scheduling, and power allocation in both directions. This non-convex mixed integer problem is solved using a combination of equivalence transforms, alternating optimization, and state-of-the-art industrial-grade mixed integer solvers. Numerical results demonstrate that the proposed approach of dynamic FDD band assignment significantly enhances system performance over conventional FDD, achieving up to 30\% improvement in throughput in dense deployments.


[65] 2606.04921

SURF: Separation via Unsupervised Remixing Flow

The goal of single-channel source separation is to reconstruct $K$ sources given their mixture. In supervised settings where vast amounts of clean source data are available, this challenging, ill-posed problem has been addressed successfully by generative diffusion and flow-based prior models. However, access to such clean source samples is often limited, and even when available, supervised models are vulnerable to domain shifts. To bridge this gap, we present Separation via Unsupervised Remixing Flow (SURF), an unsupervised flow matching approach for source separation that learns directly from observed mixtures. This method relies on a novel combination of state-of-the-art supervised flow matching and regression-based self-supervised techniques. At a high level, starting from a teacher model, we utilize a "remixing" step to bootstrap the learning of a student flow model from the teacher's estimates. We provide insights into the objectives optimized by this approach and draw a novel connection to the Wake-Sleep algorithm. Empirical evaluations on image and audio benchmarks demonstrate that SURF establishes a new state-of-the-art, significantly outperforming existing unsupervised methods. See our demo page for examples. this https URL


[66] 2606.05038

Dual Lyapunov-based Synchronization Control of Rössler System

This paper proposes a novel approach for the synchronization problem of nonlinear dynamical systems, integrating dual Lyapunov stability analysis with polynomial optimization. A comprehensive review of the relevant scientific literature on synchronization methods is conducted, with a particular focus on classical Lyapunov-based methods for chaotic systems. In this study, the Rössler system is synchronized by employing dual Lyapunov-based closed-loop synchronization method. This method uses semidefinite programming and sum-of-squares polynomials to compute a nonlinear state feedback function which synchronize a chaotic system to a selected reference model. It is aimed that chaotic behavior is destroyed and, instead, a limit cycle becomes attracting. Simulation works are performed for randomly selected 100 different initial conditions to show that synchronization process is successfully performed. Furthermore, bifurcation diagrams and phase portraits are evaluated to analyze the system dynamics. The paper discusses results and how new constraints should be employed and adapted to more complex systems.


[67] 2606.05121

Audio Interaction Model

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-decide-respond loop, listens to sound, environment, and instructions in real time and reacts on the fly. We formalize this regime as the Audio Interaction Model, and realize it with Audio-Interaction, a unified streaming model that retains offline task execution while adding online general audio instruction following, from dialogue to full voice chatting, deciding when to respond from the semantics of the stream. To enable this, we propose SoundFlow, a framework that instantiates the perceive-decide-respond loop end to end, from data to training to deployment, through streaming-native data construction, comprehension-aware training, and asynchronous low-latency inference for stable real-time interaction. We further construct StreamAudio-2M, a 2.6M-item streaming corpus spanning 7 fundamental abilities and 28 sub-tasks, and Proactive-Sound-Bench for evaluating proactive audio intervention. Across 8 benchmarks, Audio-Interaction preserves competitive performance on mainstream audio tasks while unlocking capabilities inaccessible to offline LALMs, including real-time ASR, streaming audio instruction following, and proactive help.


[68] 2606.05149

An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

Vehicle body type is a significant determinant of cyclist injury severity in overtaking crashes, yet automated tools for classifying vehicles into injury-risk-relevant categories from naturalistic roadway video do not exist in the open literature. Standard object detection benchmarks provide only coarse vehicle labels (car, truck, bus, motorcycle), while existing fine-grained recognition systems are trained on controlled imagery and lack evaluation for deployment robustness across recording sites. This paper presents an open-source two-stage computer vision pipeline combining a pre-trained RT-DETR detector for coarse vehicle localization with a fine-tuned Vision Transformer (ViT-Base/16) for six-category body-type classification: passenger car, SUV, pickup truck, minivan, large van, and commercial truck. A confidence-based abstention mechanism withholds Stage 2 predictions when softmax output falls below 0.60, producing unknown labels rather than silent misclassifications. Evaluated on 3,805 annotated overtaking events from a bicycle-lane corridor in Ann Arbor, Michigan (in-distribution), the pipeline achieved 0.94 accuracy with per-class F1 scores from 0.91 (minivan) to 0.97 (SUV). On an independent out-of-distribution evaluation of 311 events from an open cycling dataset without retraining, accuracy was 0.89. Three of four well-represented categories maintained F1 at or above 0.90 under domain shift. The largest degradation was observed for minivan (F1 = 0.72), driven by abstention rate rising from 2.4% to 25.0% rather than active misclassification, consistent with the mechanism propagating genuine model uncertainty. The full pipeline, including inference scripts, training code, evaluation utilities, and model weights, is released as open-source software to support reproducibility and reuse across roadside video archives and cycling safety research.


[69] 2508.14623

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

This paper examines the implications of using the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) as both evaluation and training objective in supervised speech separation, when the training references contain noise, as is the case with the de facto benchmark WSJ0-2Mix. A derivation of the SI-SDR with noisy references reveals that noise limits the achievable SI-SDR, or leads to undesired noise in the separated outputs. To address this, a method is proposed to enhance references and augment the mixtures with WHAM!, aiming to train models that avoid learning noisy references. Two models trained on these enhanced datasets are evaluated with the non-intrusive NISQA.v2 metric. Results show reduced noise in separated speech but suggest that processing references may introduce artefacts, limiting overall quality gains. Negative correlation is found between SI-SDR and perceived noisiness across models on the WSJ0-2Mix and Libri2Mix test sets, underlining the conclusion from the derivation.


[70] 2509.00478

Manifold Optimization-based Pilot Allocation for Cell-Free Massive MIMO ISAC Systems

We address the challenge of pilot design in cell-free massive multiple input multiple output (CF-mMIMO) integrated sensing and communications (ISAC) systems. We propose a novel pilot allocation framework based on manifold optimization that maximizes the system sum rate by minimizing coherence among pilot sequences, while enforcing unimodularity constraints in the frequency domain to ensure pilots are suitable for both communication and sensing tasks. Simulation results demonstrate that the proposed pilot design achieves communication performance comparable to state-of-the-art (SotA) algorithms, while delivering superior sensing capabilities due to its unimodular structure. These results highlight the potential of manifold-based pilot design for practical CF-mMIMO ISAC deployment.


[71] 2509.21597

AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors

With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, existing benchmarks employ a narrow set of datasets, leaving detector generalization to real-world conditions uncertain. In this paper, we systematically review 31 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT (this https URL). The goal of this toolkit is to automate the evaluation of pretrained detectors across a wide range of speech and non-speech audio datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors under diverse manipulation types and recording conditions. We start by showcasing the usage of the developed toolkit, the composition of our benchmark, and the breakdown of different deepfake subgroups. Next, we highlight how AUDDT differs from existing benchmarking efforts by enabling large-scale, diverse evaluation across modern spoofing methods and richer attribute-level analysis through comprehensive metadata annotation. Using a widely adopted pretrained deepfake detector, we present in- and out-of-domain detection results, revealing notable performance variability across different conditions and audio manipulation types. Lastly, we also analyze the limitations of these existing datasets and their gaps relative to practical deployment scenarios.


[72] 2510.09810

Designing Control Barrier Functions Using a Dynamic Backup Policy

This paper presents a systematic approach to construct control barrier functions for nonlinear control affine systems subject to arbitrary state and input constraints. Taking inspiration from the reference governor literature, the proposed method defines a family of backup policies, parametrized by the equilibrium manifold of the system. The control barrier function is defined on the augmented state-and-reference space: given a state-reference pair, the approach quantifies the distance to constraint violation at any time in the future. The proposed method is applied to an inverted pendulum on cart.


[73] 2510.11044

Dual-Waveguide Pinching Antennas for PLS: Parallel Placement or Orthogonal Placement?

Pinching antennas (PAs), as an emerging flexible-antenna technology, enables movable PAs deployed along waveguides to customize channel conditions over a large scale. This paper investigates an application of PAs to enable physical-layer security (PLS) by enlarging the channel condition diversity between legitimate users (LUs) and eavesdroppers (Eves). Particularly, we focus on the dual-waveguide scenario, where the two waveguides employs multiple PAs to serve multiple LUs in the presence of an Eve. Specifically, we consider two waveguide placement strategies, i.e., parallel placement and orthogonal placement. Meanwhile, we incorporate two channel models, i.e., in-waveguide phase shifts, and in-waveguide phase shifts and attenuation. We formulate the secure sum rate (SSR) and secure energy efficiency (SEE) maximization problems, and propose a two-stage algorithm to solve them. The first stage adopts a particle swarm optimization (PSO) method with an improved feasibility module, termed FeaPSO, for PA placement, and the second stage employs the successive convex approximate (SCA) method to optimize beamforming and artificial noise vectors. Furthermore, we conduct numerical comparisons between the two placement strategies in terms of average performance and a special case where an Eve is positioned in front of LUs. Numerical results validate the effectiveness of the proposed algorithm and demonstrate that PAs can significantly improve both SSR and SEE. Additionally, the necessity of orthogonal waveguide placement is explicitly verified.


[74] 2510.20116

Interpolatory Approximations of PMU Data: Dimension Reduction and Pilot Selection

This work investigates the reduction of phasor measurement unit (PMU) data through low-rank matrix approximations. To reconstruct a PMU data matrix from fewer measurements, we propose the framework of interpolatory matrix decompositions (IDs). In contrast to methods relying on principal component analysis or singular value decomposition, IDs recover the complete data matrix using only a few of its rows (PMU datastreams) and/or a few of its columns (snapshots in time). This row-/column-based compression enables real-time monitoring of power transmission systems using measurements from a smaller subset of pilot datastreams, thereby minimizing communication bandwidth. The ID perspective gives a rigorous error bound on the quality of the data compression. We propose selecting the pilot measurements used in an ID via the discrete empirical interpolation method (DEIM), a greedy algorithm that aims to control the error bound. This bound yields a computable estimate of the reconstruction error during online operations. A violation of this estimate suggests a change in the system's operating conditions and thus serves as a tool for fault detection. Following a disturbance, DEIM can be used to localize the event source across all buses with high accuracy. Numerical tests on synthetic PMU data demonstrate DEIM's excellent performance in data compression and validate the proposed DEIM-based fault-detection and localization method.


[75] 2510.20253

Neural Directional Filtering with Configurable Directivity Pattern at Inference

Spatial filtering with a desired directivity pattern is advantageous for many audio applications. In this work, we propose neural directional filtering with user-defined directivity patterns (UNDF), which enables spatial filtering based on directivity patterns that users can define during inference. To achieve this, we propose a DNN architecture that integrates feature-wise linear modulation (FiLM), allowing user-defined patterns to serve as conditioning inputs. Through analysis, we demonstrate that the FiLM-based architecture enables the UNDF to generalize to unseen user-defined patterns during interference with higher directivities, scaling variations, and different steering directions. Furthermore, we progressively refine training strategies to enhance pattern approximation and enable UNDF to approximate irregular shapes. Lastly, experimental comparisons show that UNDF outperforms conventional methods.


[76] 2601.03387

SEP Analysis of a Low-Resolution SIMO System with M-PSK over Fading Channels

In this paper, the average symbol error probability (SEP) of a phase-quantized single-input multiple-output (SIMO) system with M-ary phase-shift keying (PSK) modulation is analyzed under Rayleigh fading and additive white Gaussian noise. By leveraging a novel method, we derive exact SEP expressions for a quadrature PSK (QPSK)-modulated n-bit phase-quantized SIMO system with maximum ratio combining (SIMO-MRC), along with the corresponding high signal-to-noise ratio (SNR) characterizations in terms of diversity and coding gains. For a QPSK-modulated 2-bit phase-quantized SIMO system with selection combining, the diversity and coding gains are further obtained for an arbitrary number of receive antennas, complementing existing results. Interestingly, the proposed method also reveals a duality between a SIMO-MRC system and a phase-quantized multiple-input single-output (MISO) system with maximum ratio transmission, when the modulation order, phase-quantization resolution, antenna configuration, and the channel state information (CSI) conditions are reciprocal. This duality enables direct inference to obtain the diversity of a general M-PSK-modulated n-bit phase-quantized SIMO-MRC system, and extends the results to its MISO counterpart. All the above results have been obtained assuming perfect CSI at the receiver (CSIR). Finally, the SEP analysis of a QPSK-modulated 2-bit phase-quantized SIMO system is extended to the limited CSIR case, where the CSI at each receive antenna is represented by only 2 bits of channel phase information. In this scenario, the diversity gain is shown to be further halved in general.


[77] 2602.17434

Multi-Agent Temporal Logic Planning via Penalty Functions and Block-Coordinate Optimization

Multi-agent planning under Signal Temporal Logic (STL) is often hindered by collaborative tasks that lead to computational challenges due to the inherent high dimensionality of the problem, preventing scalable synthesis with satisfaction guarantees. To address this, we formulate STL planning as an optimization program under multi-agent STL constraints and introduce a penalty-based unconstrained relaxation that can be efficiently solved via a Block-Coordinate Gradient Descent (BCGD) method, where each block corresponds to a single agent's decision variables, thereby mitigating complexity. By utilizing a quadratic penalty function defined via smooth STL semantics, we show that BCGD iterations converge to a stationary point of the penalized problem under standard regularity assumptions. To enforce feasibility, the BCGD solver is embedded within a two-layer optimization scheme: inner BCGD updates are performed for a fixed penalty parameter, which is then increased in an outer loop to progressively improve multi-agent STL robustness. The proposed framework enables scalable computations and is validated through various complex multi-robot planning scenarios.


[78] 2602.23526

Training with Hard Constraints: Learning Neural Certificates and Controllers for SDEs

Due to their expressive power, neural networks (NNs) are promising templates for functional optimization problems, particularly for reach-avoid certificate generation for systems governed by stochastic differential equations (SDEs). However, ensuring hard-constraint satisfaction remains a major challenge. In this work, we propose two constraint-driven training frameworks with guarantees for supermartingale-based neural certificate construction and controller synthesis for SDEs. The first approach enforces certificate inequalities via domain discretization and a bound-based loss, guaranteeing global validity once the loss reaches zero. We show that this method also enables joint NN controller-certificate synthesis with hard guarantees. For high-dimensional systems where discretization becomes prohibitive, we introduce a partition-free, scenario-based training method that provides arbitrarily tight PAC guarantees for certificate constraint satisfaction. Benchmarks demonstrate scalability of the bound-based method up to 5D, outperforming the state of the art, and scalability of the scenario-based approach to at least 10D with high-confidence guarantees.


[79] 2603.01781

Goal-Oriented Access Optimization for ISAC-Enabled Digital Twins

Digital twins (DTs) of physical systems enable real-time remote tracking, control, and learning, but require to be updated with environmental sensory data to maintain alignment with their physical counterparts. In a network context, integrated sensing and communication (ISAC) capabilities can expand the DT's environmental awareness by linking received updates to the location where wireless sensors acquired them. Integrating localization services, however, increases the complexity of the communication system, and can only be supported through smart access optimization. To tackle this problem, we design a two-step goal-oriented approach: firstly, sensors with a high Value of Information (VoI) inform the network of their resource demands through a push-based random access; then, pull-based scheduled transmissions of the actual sensory data are optimized to satisfy ISAC performance constraints. This design allows to maximize the VoI of the information delivered to the DT while locating the transmitting nodes, significantly outperforming existing schemes.


[80] 2603.29792

Where to Put Safety? Control Barrier Function Placement in Networked Control Systems

Control barrier functions (CBFs) are widely used to enforce safety in autonomous systems, yet their placement within networked control architectures remains largely unexplored. In this work, we investigate where to enforce safety in a networked control system in which a remote model predictive controller (MPC) communicates with the plant over a delayed network. We compare two safety strategies: i) a local myopic CBF filter applied at the plant and ii) predictive CBF constraints embedded in the remote MPC. For both architectures, we derive state-dependent disturbance tolerance bounds and show that safety placement induces a fundamental trade-off: local CBFs provide higher disturbance tolerance due to access to fresh state measurements, whereas MPC-CBF enables improved performance through anticipatory behavior, but yields stricter admissible disturbance levels. Motivated by this insight, we propose a combined architecture that integrates predictive and local safety mechanisms. The theoretical findings are illustrated in simulations on a planar three-degree-of-freedom robot performing a collision-avoidance task.


[81] 2605.20657

Cooling Channel Design Optimization for High Power Multi-Chip Packages

Thermal management is a major challenge in next-generation high-performance computing systems, particularly for heterogeneous multi-chip packages such as the NVIDIA GB200 Grace Blackwell Superchip. In this work, a physics-based computational framework is developed to optimize embedded cooling channel layouts for high-power multi-chip modules. The model couples steady-state heat conduction with a porous media-based representation of coolant transport, coupled with a row-wise coolant energy balance, to estimate chip temperature fields within microchannel networks. Unlike conventional designs, an interdigitated cooling architecture is parameterized using geometric variables, including channel count, width, and expansion over chip regions, enabling systematic design exploration. To enable efficient optimization, a surrogate-based approach is employed to approximate the relationship between geometric parameters and temperature metrics. The resulting model is optimized using a mixed-integer quadratic programming algorithm to minimize a weighted objective based on peak and average chip temperatures. To improve physical relevance, channel placement is further constrained to increase cooling coverage near GPU regions, where thermal loads are highest. The framework is applied to a representative multi-chip configuration based on NVIDIA GB200 architecture, consisting of two graphics processing units and one central processing unit. The results demonstrate that the optimal design reduces the peak chip temperature by 140.45°C and the average chip temperature by 35.87°C compared to the baseline configuration.


[82] 2605.25217

Backstepping Control of First-Order Hyperbolic Equations in Arbitrary Dimensions with Non-Trapping Characteristics

This paper presents a backstepping approach for the boundary control of first-order hyperbolic equations with spatially varying coefficients posed on domains of arbitrary dimension. The method is based on a change of variables induced by the characteristic flow of the time-invariant transport operator, transforming the original multidimensional system into a continuum of decoupled one-dimensional hyperbolic equations evolving along individual characteristic curves. A backstepping controller is then designed for each equation in the decomposition, and the resulting control laws are reassembled in the original coordinates to achieve finite-time stabilization of the full system. The framework relies on the existence of characteristic curves foliating the spatial domain, with uniformly bounded transit times (non-trapping).


[83] 2605.30457

Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels

Regional accent classification in Brazilian Portuguese (pt-BR) suffers from the need for reliable labeling. While large self-supervised learning (SSL) speech models are powerful, their training pipelines dilute sociophonetic information, since accent labels are generally not reliable or are not used in training objectives. This work introduces a novel workflow for feature extraction using only acoustic labels. By isolating explicit regional accent landmarks and using a phoneme-based forced aligner (ZIPA), our targeted feature set captures dialectal variance more effectively than utterance embeddings, demonstrating that localized features can outperform general-purpose architectures on accent-related tasks using minimal and objective data labels.


[84] 2606.01804

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing. SpeechEditBench encompasses seven atomic editing tasks, as well as compositional editing tasks that integrate multiple operations within a single instruction. We propose an anchor-based evaluation protocol that separately assesses the edit success of target attributes and the preservation of untargeted attributes, leading to three metrics: target success, preservation success, and joint success. Using this benchmark, we evaluate mainstream Speech LLMs and specialized speech editing systems. The results reveal three key findings: (1) no single model performs well across all editing dimensions; (2) closed-source Speech LLMs generally outperform open-source models; (3) compositional editing remains highly challenging, with even the most advanced models struggling to achieve high joint success. SpeechEditBench provides a rigorous diagnostic framework to identify bottlenecks in Speech LLMs, thereby facilitating the development of next-generation Speech LLMs with more robust and precise instruction-guided editing capabilities. Data and code are avaialble at this https URL .


[85] 2606.03283

SpeakerCard-1M: An Evidence-Grounded Speaker Card Corpus for In-the-Wild Speaker Verification

Modern speaker verification (SV) systems rely on speaker embeddings that are effective but difficult to interpret or query in natural language. Most existing speech-text corpora target controllable synthesis or utterance-level captioning, and provide limited speaker-level supervision for in-the-wild speaker recognition. This paper introduces SpeakerCard-1M, a bilingual speaker-centric resource for evidence-grounded SV, derived from VoxCeleb1/2 and CN-Celeb1/2, where the "-1M" suffix refers to the 1.78M utterance-level captions contained in the release. We adopt a tool-first, LLM-last approach: ten acoustic probes produce field-level evidence, the evidence is aggregated into speaker profiles under a schema that separates relatively stable traits from utterance-level states, and bilingual Speaker Cards are rendered by a constrained LLM that sees only the structured fields. The release includes 56.7K Speaker Card records over 10.2K speakers, 1.78M utterance-level captions, and speaker-ID-disjoint hard-negative triplets. We further define two SV-oriented cross-modal protocols, bidirectional Speaker-Text Retrieval (T2S-R / S2T-R) and Attribute-Conditioned Verification (AC-Verify), and compare a dual-encoder baseline against recent audio language models under a zero-shot forced-choice setting. Joint audio-text training increases VoxCeleb1-O EER by 0.31% absolute over the audio-only baseline. Under a style-symmetric LLM-generated counterfactual protocol, eight recent audio language models (7B-30B+ parameters, both open- and closed-source) score 49-77% on pitch-level AC-Verify under two-way forced choice, compared with 88.66% reached by our dual encoder.


[86] 2606.03372

Instantaneous Risk Minimization for Secure Integrated Sensing and Communication

To ensure worst-case physical layer security, this paper proposes a robust beamforming framework for secure integrated sensing and communication (ISAC) systems. Different from conventional designs that focus on maximizing the ergodic secrecy rate, the proposed method aims to minimize instantaneous information leakage risk. We formulate a multi-objective optimization problem that jointly suppresses the worst-case eavesdropper signal-to-interference-plus-noise ratio (SINR), improving sensing accuracy, and ensuring the quality of service (QoS) for legitimate users. To address the resulting non-convex problem, we develop a hierarchical iterative algorithm, in which the outer loop refines the continuous uncertainty regions based on the updated sensing performance, and the inner loop optimizes beamforming under the refined uncertainty regions. Theoretical analysis and simulation results demonstrate that the proposed method achieves per-transmission security guarantees with practical complexity.


[87] 2304.10891

Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey

Transformer-based models are becoming a central paradigm in autonomous driving because they can capture long-range spatial dependencies, multi-agent interactions, and multimodal context across perception, prediction, and planning. At the same time, their deployment in real vehicles remains difficult because high-capacity attention-based architectures impose substantial latency, memory, and energy overhead. This survey reviews representative Transformer-based autonomous driving models and organizes them by task role, sensing configuration, and architectural design. More importantly, it examines these models from a deployment-oriented perspective and analyzes how efficiency constraints reshape model design choices in practice. We further review compression and acceleration strategies relevant to Transformer-based driving systems, including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, and discuss their benefits, limitations, and task-dependent applicability. Rather than treating compression as an isolated post-processing step, we highlight it as a system-level design consideration that directly affects deployability, robustness, and safety. Finally, we identify open challenges and future research directions toward standardized, safety-aware, and hardware-conscious evaluation of efficient autonomous driving systems.


[88] 2405.15454

LiSeCo: Linear Semantic Control for Language Generation

The prevalence of Large Language Models (LLMs) in critical applications highlights the need for controlled language generation methods that are both computationally efficient and enjoy performance guarantees. To address this need, we use a common model of concept semantics as linearly represented in an LLM's latent space. In particular, we take the view that natural language generation traces a trajectory in this continuous semantic space, realized by the language model's hidden activations. This view permits a control-theoretic treatment of text generation in latent space, in which we propose Linear Semantic Control (LiSeCo), a lightweight, gradient-free intervention that dynamically steers trajectories away from regions corresponding to undesired meanings. In particular, we propose to directly intervene, in an online fashion, the activations of the token that is being generated in embedding space. Crucially, LiSeCo does not simply steer activations towards a desirable region. Instead, it relies on classical techniques from control theory to precisely control activations in a context-dependent way, and guarantees that they are brought into a specific pre-defined region of embedding space that corresponds to allowed semantics. The intervention is computed in closed form according to an optimal controller formulation, minimally impacting generation time. This control of the activations in embedding space allows for fine-grained steering of attributes of the generated sequence. We demonstrate that our approach is effective on different tasks -- toxicity, sentiment, and language (English/Spanish) steering -- while maintaining text quality.


[89] 2411.09760

SpecPCM: A Low-power PCM-based In-Memory Computing Accelerator for Full-stack Mass Spectrometry Analysis

Mass spectrometry (MS) is essential for proteomics and metabolomics but faces impending challenges in efficiently processing the vast volumes of data. This paper introduces SpecPCM, an in-memory computing (IMC) accelerator designed to achieve substantial improvements in energy and delay efficiency for both MS spectral clustering and database (DB) search. SpecPCM employs analog processing with low-voltage swing and utilizes recently introduced phase change memory (PCM) devices based on superlattice materials, optimized for low-voltage and low-power programming. Our approach integrates contributions across multiple levels: application, algorithm, circuit, device, and instruction sets. We leverage a robust hyperdimensional computing (HD) algorithm with a novel dimension-packing method and develop specialized hardware for the end-to-end MS pipeline to overcome the non-ideal behavior of PCM devices. We further optimize multi-level PCM devices for different tasks by using different materials. We also perform a comprehensive design exploration to improve energy and delay efficiency while maintaining accuracy, exploring various combinations of hardware and software parameters controlled by the instruction set architecture (ISA). SpecPCM, with up to three bits per cell, achieves speedups of up to 82x and 143x for MS clustering and DB search tasks, respectively, along with a four-orders-of-magnitude improvement in energy efficiency compared with state-of-the-art CPU/GPU tools.


[90] 2502.03799

Enhancing Hallucination Detection through Noise Injection

Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has linked hallucinations to model uncertainty, suggesting that hallucinations can be detected by measuring dispersion over answer distributions obtained from multiple samples drawn from a model. While drawing from the distribution over tokens defined by the model is a natural way to obtain samples, in this work, we argue that it is suboptimal for the purpose of detecting hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. To this end, we propose a very simple, training-free approach based on perturbing an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling. We demonstrate that our approach significantly improves inference-time hallucination detection over standard sampling across diverse datasets, model architectures, and uncertainty metrics.


[91] 2505.03057

$\mathcal{H}_2$-optimal model reduction of linear quadratic-output systems by multivariate rational interpolation

This paper addresses the $\mathcal{H}_2$-optimal approximation of linear dynamical systems with quadratic-output functions, also known as linear quadratic-output systems. Our major contributions are threefold. First, we derive interpolatory first-order optimality conditions for the linear quadratic-output $\mathcal{H}_2$ minimization problem. These conditions correspond to the mixed-multipoint tangential interpolation of the full-order linear- and quadratic-output transfer functions, and generalize the Meier-Luenberger optimality framework for the $\mathcal{H}_2$-optimal model reduction of linear time-invariant systems. Second, given the optimal interpolation data, we show how to enforce the interpolatory optimality conditions explicitly by Petrov-Galerkin projection of the full-order model. Third, to find the optimal interpolation data, we build on this projection framework and propose a generalization of the iterative rational Krylov algorithm for the $\mathcal{H}_2$-optimal model reduction of linear quadratic-output systems, called LQO-IRKA. Upon convergence, LQO-IRKA produces reduced linear quadratic-output systems that satisfy the interpolatory optimality conditions. The method only requires solving shifted linear systems and matrix-vector products, thus making it suitable for large-scale problems. Numerical examples are included to illustrate the effectiveness of the proposed method.


[92] 2505.15497

Certified Neural Approximations of Nonlinear Dynamics

Neural networks hold great potential to act as approximate models of nonlinear dynamical systems, with the resulting neural approximations enabling verification and control of such systems. However, in safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system. To address this fundamental challenge, we propose a novel, adaptive, and parallelizable verification method based on certified first-order models. Our approach provides formal error bounds on the neural approximations of dynamical systems, allowing them to be safely employed as surrogates by interpreting the error bound as bounded disturbances acting on the approximated dynamics. We demonstrate the effectiveness and scalability of our method on a range of established benchmarks from the literature, showing that it significantly outperforms the state of the art. Furthermore, we show that our framework can successfully address additional scenarios previously intractable for existing methods -- neural network compression and an autoencoder-based deep learning architecture for training Koopman operators for the purpose of trajectory prediction.


[93] 2508.08237

VGGSounder: Audio-Visual Evaluations for Foundation Models

The emergence of audio-visual foundation models underscores the importance of reliably assessing their multi-modal understanding. The VGGSound dataset is commonly used as a benchmark for evaluation audio-visual classification. However, our analysis identifies several limitations of VGGSound, including incomplete labelling, partially overlapping classes, and misaligned modalities. These lead to distorted evaluations of auditory and visual capabilities. To address these limitations, we introduce VGGSounder, a comprehensively re-annotated, multi-label test set that extends VGGSound and is specifically designed to evaluate audio-visual foundation models. VGGSounder features detailed modality annotations, enabling precise analyses of modality-specific performance. Furthermore, we reveal model limitations by analysing performance degradation when adding another input modality with our new modality confusion metric.


[94] 2510.03511

Platonic Transformers: A Solid Choice For Equivariance

While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.


[95] 2511.00569

Advancing Fluid Antenna-Assisted Non-Terrestrial Networks in 6G and Beyond: Fundamentals, State of the Art, and Future Directions

With the surging demand for ultra-reliable, low-latency, and ubiquitous connectivity in Sixth-Generation (6G) networks, Non-Terrestrial Networks (NTNs) emerge as a key complement to terrestrial networks by offering flexible access and global coverage. Despite the significant potential, NTNs still face critical challenges, including dynamic propagation environments, energy constraints, and dense interference. As a key 6G technology, Fluid Antennas (FAs) can reshape wireless channels by reconfiguring radiating elements within a limited space, such as their positions and rotations, to provide higher channel diversity and multiplexing gains. Compared to fixed-position antennas, FAs can present a promising integration path for NTNs to mitigate dynamic channel fading and optimize resource allocation. This paper provides a comprehensive review of FA-assisted NTNs. We begin with a brief overview of the classical structure and limitations of existing NTNs, the fundamentals and advantages of FAs, and the basic principles of FA-assisted NTNs. We then investigate the joint optimization solutions, detailing the adjustments of FA configurations, NTN platform motion modes, and resource allocations. We also discuss the combination with other emerging technologies and explore FA-assisted NTNs as a novel network architecture for intelligent function integrations. Furthermore, we delve into the physical layer security and covert communication in FA-assisted NTNs. Finally, we highlight the potential future directions to empower broader applications of FA-assisted NTNs.


[96] 2601.05686

Secure Multiuser Beamforming With Movable Antenna Arrays

A movable antenna (MA)-enabled secure multiuser transmission framework is developed to enhance physical-layer security. Novel expressions are derived to characterize the achievable sum secrecy rate based on the secure channel coding theorem. On this basis, a joint optimization algorithm for digital beamforming and MA placement is proposed to maximize the sum secrecy rate via fractional programming and block coordinate descent. In each iteration, every variable admits either a closed-form update or a low-complexity one-dimensional or bisection search, which yields an efficient implementation. Numerical results demonstrate the effectiveness of the proposed method and show that the MA-enabled design achieves higher secrecy rates than conventional fixed-position antenna arrays.


[97] 2601.18175

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a $\chi^2$ divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success conditioning thus emerges as a conservative improvement operator. Exact success conditioning cannot degrade performance or induce dangerous distribution shift, but when it fails, it does so observably, by hardly changing the policy at all. We apply our theory to the common practice of return thresholding, showing this can amplify improvement, but at the cost of potential misalignment with the true objective.


[98] 2602.15202

Tomography by Design: An Algebraic Approach to Low-Rank Quantum States

We present an algebraic algorithm for quantum state tomography that leverages measurements of certain observables to estimate structured entries of the underlying density matrix. Under low-rank assumptions, the remaining entries can be obtained solely using standard numerical linear algebra operations. The proposed algebraic matrix completion framework applies to a broad class of generic, low-rank mixed quantum states and, compared with state-of-the-art methods, is computationally efficient while providing deterministic recovery guarantees.


[99] 2602.23214

Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction

Plug-and-Play diffusion prior (PnPDP) frameworks have emerged as a powerful paradigm for solving imaging inverse problems by treating pretrained generative models as modular priors. However, we identify a critical flaw in prevailing PnP solvers (e.g., based on HQS or Proximal Gradient): they function as memoryless operators, updating estimates solely based on instantaneous gradients. This lack of historical tracking inevitably leads to non-vanishing steady-state bias, where the reconstruction fails to strictly satisfy physical measurements under heavy corruption. To resolve this, we propose Dual-Coupled PnP Diffusion (DC-PnPDP), which restores the classical dual variable to provide integral feedback, progressively enforce agreement between the data-consistency and prior. However, this rigorous geometric coupling introduces a secondary challenge: the accumulated dual residuals exhibit spectrally colored, structured artifacts that violate the Additive White Gaussian Noise (AWGN) assumption of diffusion priors, causing severe hallucinations. To bridge this gap, we introduce Spectral Homogenization (SH), a frequency-domain adaptation mechanism that modulates these structured residuals into statistically compliant pseudo-AWGN inputs. This effectively aligns the solver's rigorous optimization trajectory with the denoiser's valid statistical manifold. Extensive experiments on CT and MRI reconstruction demonstrate that our approach resolves the bias-hallucination trade-off, achieving state-of-the-art fidelity with significantly accelerated convergence. The code is available at this https URL


[100] 2602.23312

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.


[101] 2603.07584

Analysis-Driven Procedural Generation of an Engine Sound Dataset with Embedded Control Annotations

Computational engine sound modeling is central to the automotive audio industry, particularly for active sound design applications and virtual prototyping. Emerging data-driven engine sound synthesis methods require large volumes of standardized, clean audio recordings with precisely time-aligned operating-state annotations: data that is difficult to obtain due to high costs, specialized measurement equipment requirements, and inevitable noise contamination. We present an analysis-driven framework for generating engine audio with sample-accurate control annotations. The method extracts harmonic structures from real recordings through pitch-adaptive spectral analysis, which then drive an extended parametric harmonic-plus-noise synthesizer. With this framework, we augment 5-10 min of source audio per engine 15-30x via diverse control trajectories and parametric variation, producing the Procedural Engine Sounds Dataset (19.0 h, 5,935 files): a set of engine audio signals with sample-accurate RPM and torque annotations spanning a wide range of operating conditions, signal complexities, and harmonic profiles. Comparison against real recordings validates that the synthesized data preserves characteristic harmonic structures, and a baseline differentiable synthesis network trained on the dataset confirms its suitability for data-driven engine sound modeling. The dataset is released publicly to support research on engine timbre analysis, control parameter estimation, and neural generative synthesis.


[102] 2603.09391

Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis

Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the underlying pulse shapes and temporal structure. We present the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics. The architecture integrates physics-informed inductive biases including harmonic decay, thermodynamic pitch modulation, valve-dynamics envelopes, exhaust system resonances and derived engine operating modes such as throttle operation and Deceleration Fuel Cutoff (DFCO). Validated on three diverse engine types totaling 7.5 hours of audio, PTR achieves a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss over a harmonic-plus-noise baseline model, while providing interpretable parameters corresponding to physical phenomena. Complete code, model weights, and audio examples are openly available.


[103] 2604.12888

Advancing Network Digital Twin Framework for Generating Realistic Datasets

The integration of accurate and reproducible wireless network simulations is a key enabler for research on open, virtualized, and intelligent communication systems. Network Digital Twins (NDTs) provide a scalable alternative to costly and time-consuming measurement campaigns, while enabling controlled experimentation and data generation for data-driven network design. In this paper, we present an open and user-friendly NDT framework that integrates controllable vehicular mobility with the site-specific ray tracer Sionna and the discrete-event ns-3 network simulator, enabling virtualized end-to-end modeling of wireless networks across the radio, network, and application layers. The proposed framework is particularly well-suited for dynamic vehicular networks and urban deployments, supporting realistic mobility, traffic dynamics, and the extraction of cross-layer metrics. To promote open-source initiatives, we release both the NDT implementation and a representative dataset generated from realistic vehicular and urban scenarios. The framework and dataset facilitate reproducible experimentation and benchmarking of machine learning-based quality of service prediction, network optimization, and intelligent network management algorithms, lowering the entry barrier for research on virtual and open wireless network services.