New articles on Electrical Engineering and Systems Science


[1] 2606.06509

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

Numerous medical imaging problems must be solved under limited labels and constrained compute, yet it remains unclear whether performance gains are driven mainly by more expressive models or by better representation of clinically meaningful anatomy. We study this question through a low-data anatomy-aware benchmark for 5-class cardiac pathology prediction on the public ACDC MRI dataset. Using segmentation-derived patient descriptors from the right ventricle, myocardium, and left ventricle, we compare anatomy-specific and multi-structure representations across linear, kernel, and tree-based classifiers. We find that under limited label settings, representation dominates complexity. These results suggest that in resource-constrained healthcare settings, identifying and representing the most informative anatomy may matter more than the increasing complexity of the model alone.


[2] 2606.06512

Dilated Symmetric Difference for Binary Image Comparison

The comparison of two binary images is formulated in terms of mathematical morphology. A new operator, the dilated symmetric difference, is introduced. It is shown that the dilated symmetric difference effectively detects differences between binary images, provided that the residual alignment error is within specified bounds.


[3] 2606.06524

Advanced Flood Prediction with Physics-Guided Deep Learning: Combining UNet, FNO, and SAR/Optical Imagery

Accurate and scalable flood mapping remains challenging due to limited ground observations, heterogeneous terrain conditions, and the difficulty of enforcing hydrodynamic consistency within data-driven models. This work introduces a physics-guided deep learning framework that integrates multi-modal remote sensing (Sentinel-1 SAR, Sentinel-2 optical imagery, and DEM-derived terrain features) with constraints from the depth-averaged shallow water equations (SWE). The proposed hybrid architecture combines a UNet to capture fine-scale spatial details with a Fourier Neural Operator (FNO) to model basin-scale hydraulic interactions, while physics-informed residual losses ensure mass and momentum consistency. Evaluated across diverse floodplain settings, the hybrid model achieves an Intersection over Union of 0.82 and an F1 score of 0.90 for flood extent prediction, outperforming UNet-only and FNO-only baselines. Using hydrodynamic simulations as reference data, the model achieves an RMSE of 0.21 m for water depth and 0.15 m/s for flow velocity. Physics consistency is maintained, with low residuals and mass imbalance below 2.1%. Ablation studies confirm that removing physicsbased regularization significantly degrades performance, underscoring the value of physical constraints for stability and generalization. These results demonstrate that embedding hydrodynamic principles into deep learning yields more accurate, reliable, and physically coherent flood predictions, offering strong potential for operational monitoring and large-scale deployment.


[4] 2606.06534

Attention Consistent Longitudinal Medical Visual Question Answering Guided by Vision Foundation Models

Longitudinal medical visual question answering (VQA) requires reasoning about anatomical differences between an image of a current time point and an image of a referred time point. We propose an attention-guided encoder-decoder for this task with chest X-rays. Instead of conventional direct contrast, we propose to include a lightweight affine registration module to reduce nuisance motion by co-registering the current image to the reference image with a small registration regularizer. The registered image pair is fed into the image encoder, followed by a frozen DINO-based mask generator and a trainable adaptive mask generator to produce masks applied to the original image pairs. The masked image pairs are again fed into the image encoder and concatenated with text features as the input to a multimodal transformer-based decoder to generate final answers. To facilitate learning stabilization and clarify the change signal, inspired by DINO-v3, we include additional auxiliary objectives, including a mask rebuilding loss, a pairwise Gram-style consistency loss, and a KoLeo uniformity loss, which enhances the geometry of the representation. On the Medical-Diff-VQA benchmark, the model delivers strong BLEU, ROUGE-L, CIDEr, and METEOR scores while offering intrinsic interpretability through the shared saliency mask. These results support saliency-conditioned generation with mild pre-alignment as a principled framework for longitudinal reasoning in medical VQA. Our training strategy also illustrates the potential of a paradigm in utilizing image foundation models in biomedicine: optimizing both supervised and unsupervised learning objectives simultaneously.


[5] 2606.06540

ErA: Error-Aware Deep Unrolling Network for Single Image Defocus Deblurring

We introduce ErA (Error-Aware Deep Unrolling Network), an end-to-end frame work for single-image defocus deblurring. ErA jointly learns a compact kerne basis and per-pixel weights, while an error-aware term in Augmented Lagrangian unrolling corrects kernel estimation errors via alternating updates and ResUNet denoisers. It achieves state-of-the-art PSNR/SSIM on DPDD, RealDOF, and RTF, and shows strong generalization on CUHK without ground truth.


[6] 2606.06640

SEMIKHORN: Globally balanced affinities for mmWave Localization in MU mMIMO systems

This work conceives SEMIKHORN, a semisupervised channel charting (CC) framework for mmWave localization, which leverages t-SNEkhorn, a doubly stochastic variant of t-distributed Stochastic Neighbor Embedding (t-SNE) that utilizes entropic optimal transport to construct pairwise similarities. Unlike standard t-SNE, which normalizes affinities independently for each data point, t-SNEkhorn generates globally balanced similarities ensuring consistent neighborhood representation. We consider wireless networks with distributed base stations (BSs) equipped with multiple antennas, where each BS constructs a local dissimilarity matrix from the channel state information (CSI). These local dissimilarity matrices are then fused to obtain a single global dissimilarity matrix, which is processed through manifold learning to embed users onto a geometric map. The performance is evaluated in a simulated outdoor environment, and Bayesian optimization is employed on the framework hyperparameters to minimize the mean localization error (MLE). Experimental results demonstrate that the proposed framework achieves an MLE of 6.86% in a circular vicinity of radius 100m, requiring less than 15% of labeled CSI samples.


[7] 2606.06642

MPC for nonlinear systems: a comparative review of discretization methods

This work provides a comparative review of three different numerical methods generally used to discretize continuous-time non-linear equations appearing in model predictive control problems: direct multiple shooting, direct collocation and successive linearizations. An overview of the characteristics of each method is given and the performance of each method is evaluated through the simulation of two test cases.


[8] 2606.06672

Variational Bayes Estimation for Affine-Precoded Superimposed Pilots in Partially Connected Dual-Wideband Tera-Hertz MU-MIMO Systems

This work conceives two affine precoding based system models, common precoding with joint channel estimation (CP-JCE) and user-specific precoding for decoupled channel estimation (USPDCE). Considering a dual-wideband effected partially connected architecture, we rigorously model the terahertz (THz) multiple input multiple output (MIMO) channel for each subarray corresponding to each user by incorporating the absorption, reflection, and freespace losses. Next, to address the significant bandwidth overhead associated with conventional pilot-based channel estimation, we employ superimposed pilots. Building on this, we formulate a structured sparse channel model and develop a variational Bayesian inference algorithm that jointly estimates the channel coefficients and learns the underlying sparsity structure through hyperparameter inference, thereby enabling robust and high-precision superimposed pilotbased channel estimation under severe model uncertainty. Lastly, we compare our results for both systems and provide a trade-off analysis between them.


[9] 2606.06705

Estimating Evolving Functions with Dynamic Gaussian Processes

This paper develops the Dynamic Gaussian Process (DGP), a framework for estimating functions governed by integro-difference equations (IDEs). IDEs model continuous functions that evolve with discrete-time dynamics and arise naturally from time-discretization of linear partial differential equations (PDEs). The DGP extends Gaussian process regression to time-varying functions and extends Kalman filtering to infinite-dimensional states. The DGP posterior remains a Gaussian process with closed-form mean and covariance updates, and separable kernel structure reduces the problem to a finite-dimensional Kalman filter on basis function coefficients. This paper extends the DGP to vector-valued states, enabling the treatment of higher-order PDEs, and provides a stability and approximation error analysis for the basis function approximation. The functional L2 estimation error decomposes exactly into in-subspace and out-of-subspace contributions, and all approximation errors vanish as the number of basis functions grows. The framework is demonstrated on the heat equation and on the wave equation, the latter with a vector-valued state. Code is available at this https URL.


[10] 2606.06723

Deep Learning Based Sparse Array Design with Pre-Steering for Adaptive Beamforming

This paper investigates the use of convolutional neural networks (CNNs) for learning sparse array configurations that achieve near-optimal beamforming under varying source and interference angles. Unlike conventional or convex optimization based algorithms, the proposed deep learning approach enables rapid reconfiguration of sparse arrays in highly dynamic propagation environments. The paper considers a single desired source and a single interference signal at arbitrary angles, analyzing scenarios with both fixed and varying desired source directions. To avoid retraining for each possible source angle, an array pre-steering strategy is introduced, whereby the network is trained only at broadside, while test inputs are pre-steered to align with the broadside direction. To account for practical imperfections, the effect of pre-steering errors is examined, and a robust error-augmented training is adopted. The approach systematically incorporates small, structured pre-steering perturbations during training, enabling the network to maintain high classification accuracy and maximize the signal-to-interference-plus-noise ratio (SINR) even under angular uncertainty. The results demonstrate that the proposed method achieves over 90% test accuracy across wide ranges of source and interference angles, highlighting its potential for real-time, robust sparse array configuration in dynamic environments.


[11] 2606.06725

Compute-Optimal Network Design for Echocardiography Myocardial Segmentation and Perfusion Quantification using Neural Scaling Laws

Myocardial perfusion quantification using contrast-enhanced ultrasound offers a bedside non-ionizing alternative to nuclear imaging modalities. However, its clinical adoption is hindered by time-consuming manual labelling. Automated segmentation has proved challenging due to a paucity of in-domain training data. Adapting strategies currently used to optimise large language models for large datasets, we apply neural scaling laws to predict network performance for myocardial segmentation. We extrapolate performance on subsets of the data to determine optimal network size on the CAMUS echocardiography dataset and a 25-patient contrast-enhanced ultrasound (CEUS) dataset. Finally, we validate the clinical utility of our models by comparing the final myocardial perfusion parameters with those obtained by a senior cardiologist. Extrapolation based on the scaling law is predictive of test loss at the full dataset size, allowing us to select two networks that obtained state-of-the-art performance on CAMUS with a 240-fold reduction in parameter count. We observe the gradient of the scaling law transfers from CAMUS to the CEUS dataset with a bias in the predicted losses. The automatically segmented masks perform equivalently to a senior cardiologist in myocardial perfusion quantification. These results establish neural scaling laws as a practical tool for data-driven compute-optimal model design for small imaging datasets.


[12] 2606.06732

Angular Sector-Based Sparse Array Design for Adaptive Beamforming Using Deep Learning

Efficient sparse array reconfigurability is essential for cognitive sensing in dynamic radio frequency environments, where rapid interference variations require both adaptability and stability. This work presents a framework for designing sparse arrays optimized over broad angular sectors, enabling near-optimal beamforming that maximizes the signal-to-interference-plus-noise ratio (SINR) across a range of interferer angles. Full data correlation matrices are computed for candidate configurations, and an angular-sector-based class reduction strategy is applied to merge adjacent sectors dominated by the same configuration, resulting in 56 representative classes. Controlled up- and down-sampling produce four dataset variants involving, high and low sample count, balanced and unbalanced datasets, to systematically evaluate the effects of dataset size and class distribution on neural network performance. A lightweight convolutional neural network (CNN) and a deeper ResNet 50 architecture are trained and evaluated using these datasets. Results demonstrate high classification accuracy, with ResNet 50 achieving up to 97.3%, while SINR deviations remain below 1% for most classes and below 5% even for challenging interference angles near broadside. The proposed approach enables robust sparse array selection, maintains strong SINR performance, reduces unnecessary reconfigurations, and provides an effective framework for real-time cognitive sensing and adaptive interference mitigation.


[13] 2606.06792

Copula Function Parameter Regions in Analyzing Wireless Communications Performances

Copula functions have been widely employed in wireless communication analysis to model dependence structures and evaluate system performance. However, existing studies generally express performance metrics in terms of copula dependence parameters without explicitly characterizing their admissible regions. This letter introduces the concept of copula dependence parameter regions and investigates its significance in wireless communications. Considering a two-user wireless multiple access channel (MAC) with correlated Rayleigh fading modeled by the bivariate Farlie--Gumbel--Morgenstern (FGM) copula, explicit parameter regions are derived from communication-theoretic and probabilistic perspectives using outage probability and Pearson correlation coefficient (PCC) constraints. The results show that practical communication and statistical requirements can significantly shrink the classical copula admissible interval, rendering some theoretically admissible dependence structures infeasible. Numerical examples illustrate the proposed concept and its practical implications.


[14] 2606.06795

BiEAR: A Human Auditory-Inspired Adaptive Binaural Front-end for Multi-Speaker Localisation and Distance Estimation

We present BiEAR, a human auditory-inspired adaptive binaural front-end for multi-speaker localisation and distance estimation. Inspired by medial olivocochlear (MOC) feedback in human hearing, BiEAR uses a neural controller to adaptively adjust the frequency selectivity of a binaural auditory filterbank during inference. This yields time-frequency adaptive representations for ears, enabling the model to respond to changing acoustic conditions. We evaluate BiEAR on multi-speaker localisation and distance estimation in anechoic and real-room environments. Results show that the adaptive front-end improves localisation accuracy and robustness to unseen speakers and rooms compared with commonly used fixed binaural front-ends. Visualisation and analysis of learned filter adaptations show that BiEAR emphasises informative frequency bands over time. These findings suggest that adaptive, biologically inspired binaural front-ends can improve machine hearing robustness in complex acoustic scenes.


[15] 2606.06837

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

Scripted vs spontaneous speech detection is appealing for interview guardrails, but benchmark performance can be inflated by shortcuts tied to corpus identity, channel conditions, and recording artifacts rather than speaking style itself. We present SEAM, a shortcut-aware framework for real-time scriptedness detection that combines uniform preprocessing, seam-aware sampling, non-speech augmentation, and a compact DistilHuBERT backbone. With 8s windows, the model achieves 0.971 +- 0.004 ROC-AUC on an external interview-domain evaluation set. Removing the shortcut-prevention components improves internal held-out metrics but sharply reduces external performance, indicating shortcut learning. Post-training quantization reduces the model footprint to 41.8MB with little loss in external performance. The results demonstrate that robust real-time scriptedness detection depends not only on the backbone, but on shortcut-aware data design and evaluation. We release code and model checkpoints.


[16] 2606.06846

Variable-Length Finite-Rate CSI Feedback With Generative Priors

This letter studies variable-length finite-rate CSI feedback from a structural perspective and proposes CsiCoGen, a novel generative feedback structure with a transferable codebook mechanism without joint training. The UE maps $H_0$ into an ordered sequence of codebook indices, while the BS recursively recovers CSI from any received partial sequence of feedback indices using a shared denoising prior. This enables flexible control of feedback sequence length and per-step quantization precision through codebook size. CsiCoGen does not require jointly training a task-specific feedback encoder or codebook with the reconstructor, and the same online structure can be paired with different pretrained denoisers. In this work, we instantiate the decoder with a generative diffusion model. Simulation results on COST2100 show favorable rate-NMSE and rate-$\rho$ tradeoffs against representative baselines, with CsiCoGen reaching about -31 dB indoor NMSE and -20 dB outdoor NMSE in the high-rate regime while demonstrating scalable decoding complexity and adjustable per-step quantization precision.


[17] 2606.06847

Physics-Driven Semantic Scattering Structure Understanding of Aircraft Target in SAR Images

Synthetic aperture radar (SAR) has become indispensable for target interpretation owing to its all-day and all-weather observation capability. In SAR target interpretation, electromagnetic scattering information provides a physically grounded cue beyond visual texture and has been widely exploited for target interpretation. However, existing methods remain dominated by local scattering center representations. Such unordered and component-agnostic representations are highly unstable for aircraft targets. As a result, physically existing components with weak scattering responses are often missed, resulting in the incomplete reconstructed topology structure. To address this limitation, we establish Semantic Scattering Structure Understanding as a new paradigm for SAR aircraft interpretation. Semantic scattering keypoints are defined to associate local electromagnetic responses with physically meaningful aircraft components, while visibility-aware attributes are introduced to retain weakly observable yet physically existed components. The keypoints are further organized into a stable semantic scattering structure. Build upon this, we propose S3U-SAR, a physics-driven framework to localize semantic scattering keypoints and construct the complete representation constrained by multi-dimensional physical priors containing scattering heterogeneity, rigid-body topology, speckle uncertainty. A confidence-gated joint supervision strategy is further introduced to alleviate optimization conflicts. We construct KP-SAR-Aircraft-1.0, the first fine-grained benchmark for semantic scattering structure understanding. Extensive experiments demonstrate that S3U-SAR achieves the best performance compared with baselines. Cross-category and cross-dataset evaluations further verify its robustness and transferability.


[18] 2606.06907

SpectCount: Spectrotemporal Counting via Synthetic Signals Improves Large Audio Language Models

Large audio language models (LALMs) extend large language models with an audio encoder and large-scale audio data. However, the scarcity of high-quality annotated audio data remains a fundamental bottleneck for scaling. Through probing signal detectability analysis, we identify fine-grained spectrotemporal perceptual weaknesses in a foundation LALM. To address these challenges, we propose Spectrotemporal Counting (SpectCount), a data-efficient fine-tuning approach based on fully synthetic audio signals generated on-the-fly, without relying on real-world audio, annotations, or pretrained generative models. SpectCount not only resolves the observed weaknesses but also improves performance on diverse auditory benchmarks spanning sound, music, and speech, unseen during fine-tuning. These results suggest that weakness-targeted synthetic signals provide a data-efficient path toward enhanced auditory understanding capabilities in LALMs.


[19] 2606.06932

Forecast and Model Predictive Control of Distributed Energy Resource Aggregators for Net-Demand Balancing

With the rapid demand for energy, even the incorporation of bulk renewable energy sources is not entirely sufficient to meet demand besides adding supply uncertainty. Distributed Energy Resource Aggregators (DERAs) have the potential to address this uncertainty via aggregation and control of decentralized distributed energy sources, thereby acting like virtual power plants. We present a new approach that combines forecasting and model-predictive control to assign DERAs to follow net-demand patterns, while accounting for the dynamics of the aggregate energy sources and their capacity limits. Each DERA is represented as a flexible ``virtual battery" with constraints on state-of-charge and power limits. The dispatch problem is set up as a long-term model predictive control task that aims to minimize differences from desired charge levels, output ramping, and net-load tracking errors. To keep operations efficient in real time, we implement a rolling-horizon MPC, which updates decisions regularly using the latest marginal-demand forecasts. For forecasting, we present two models: linear regression and long-short term memory (LSTM) neural network. Using high-resolution CAISO net-demand data and five typical DERA types, our simulations demonstrate how well our approach tracks marginal-demand; in particular, we highlight the tradeoffs between forecasting horizon times and MPC update rate as well as the dependence on the choice of the load forecasting model. Our results also indicate a slight edge for LSTM models over linear regression for desired time shifts and horizon choices.


[20] 2606.06933

A 3D Formulation of the Extended Phaseless Rytov Approximation

The extended Phaseless Rytov Approximation (xPRA) is a recently proposed device-free RF imaging technique that provides high-resolution reconstructions of the imaging region using only phaseless measurements, such as received signal strength (RSS). Because of its phaseless formulation, it can be implemented straightforwardly using existing wireless commu?nication infrastructure. It also outperforms well-known device?free phaseless RF imaging methods such as Radio Tomographic Imaging (RTI). The linear phaseless formulation used in xPRA(and RTI) makes these methods potentially useful for integrated sensing and communication (ISAC) systems in next generation wireless networks since they do not require wide bandwidths. However, so far, both xPRA and RTI have primarily been formulated in two dimensions (2D). This paper introduces a 3D extension of xPRA, which we call the extended three-dimensional phaseless Rytov approximation (x3DPRA). The novelty of our approach is that it preserves the straightforward implementation advantages of RTI and xPRA while enabling volumetric (3D) imaging. Simulation results show that x3DPRA provides good estimates of location and shape and can also reconstruct object material attenuation. We present the 3D formulation, validate it with a 2D model comparison, and report simulation results demonstrating its performance.


[21] 2606.06940

Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models

While Audio Language Models (ALMs) demonstrate strong semantic understanding, they struggle with complex affective interactions. Specifically, textual semantic dominance often overshadows acoustic nuances, and a lack of cognitive depth leads to generic, emotion-agnostic responses. We propose CogAudio-LLM\footnote{ \urlstyle{same} this https URL, a novel cognitive affective reasoning framework. To mitigate semantic dominance, we build LIME-440K, a ``lexically-identical, multi-emotion'' dataset designed to facilitate acoustic-semantic decoupling. We introduce EIPS, a 4-step Chain-of-Thought (CoT) mechanism incorporating psychological reasoning. For inference efficiency, multi-stage training explicitly establishes EIPS via supervised fine-tuning, then distills this logic into an implicit generation process. Finally, we design DR-SAPO (Dual-Route Soft Adaptive Policy Optimization) to dynamically balance the logical rigor of the CoT with the empathetic quality of the direct response.


[22] 2606.06954

Learn to Access and Backhaul the Sky: Multi-Scale Radio Map Guided Multi-UAV Cooperation

Driven by the emerging low-altitude economy, uncrewed aerial vehicle (UAV) swarms offer flexible integrated air-ground access and backhaul. However, providing seamless connectivity is difficult due to the interdependent dynamics of user mobility and building blockages in these 3D scenarios. These factors create rapidly shifting bottlenecks in end-to-end paths. Furthermore, the multi-dimensional nature of joint control limits the effectiveness of traditional heuristics. To address these challenges, a \textbf{\underline{M}}ulti-Scale \textbf{\underline{R}}adio \textbf{\underline{M}}ap-\textbf{\underline{G}}uided (MRMG) framework is proposed. The MRMG framework handles heterogeneous dynamics by integrating three distinct levels of radio information: global-level maps provide regional coverage insights, local-level maps capture neighborhood-scale service conditions, and link-level maps characterize high-resolution channel features. This design effectively decouples macro-movement from micro-link adaptation. To yield long-term performance improvements, A multi-agent reinforcement learning (MARL) controller learns cooperative policies for UAV movement, next-hop selection, and transmit-power control. Simulation results show that the MRMG framework not only improves network throughput but also significantly bolsters cell-edge service, nearly doubling the 5th-percentile user rate.


[23] 2606.06962

FSC-Net: Integrating Fast Fourier Convolutions and Progressive Learning for Speech Bandwidth Extension

Speech bandwidth extension (BWE) aims to reconstruct high-fidelity wideband audio from narrowband inputs. While recent approaches have made significant progress, they often struggle to reconstruct realistic high-frequency phase and harmonic structures, leading to perceptual artifacts. In this paper, we propose FSC-Net (Full-Spectrum Context Network), a parameter-efficient architecture designed to explicitly model cross-band harmonic dependencies. By integrating Fast Fourier Convolutions (FFCs) into a complex spectral mapping framework, FSC-Net expands its receptive field to the entire spectrum, capturing long-range frequency interactions effectively. To address the ill-posed nature of high-frequency generation, our novel frequency-progressive learning curriculum guides the network to reconstruct spectral details from coarse to fine. Experimental results on the VCTK and unseen EARS datasets demonstrate that FSC-Net delivers consistently strong reconstruction quality and generalization, particularly in the challenging VCTK 4 kHz-to-48 kHz task. Compared to scaled-up baselines, our model attains leading LSD and PESQ scores while maintaining a highly compact parameter footprint (1.54 M).


[24] 2606.06983

DaX: Learning General Pathology Representations Across Scales

Computational pathology requires visual representations that transfer across diverse clinical endpoints and remain robust to variation in magnification, staining, scanner type, slide preparation, and input resolution. We present DaX, a pathology vision foundation model that adapts DINOv3-style self-supervised learning to whole-slide histopathology. DaX is initialized from natural-image DINOv3 weights and incorporates continuous magnification training, cross-scale tissue views, orientation-agnostic and acquisition-robust augmentation, multi-input-size training, and Gram-anchored dense consistency. These designs aim to connect local cellular morphology with global tissue architecture while stabilizing dense token-level representations across input scales. We further construct a WSI-level benchmark comprising 161 clinically meaningful tasks from 44 public datasets, covering 28,182 patients and 34,394 slides across four clinical domains and nine task categories. All models are evaluated under a fixed patient-level cross-validation protocol with fold-level statistical ranking, enabling reproducible comparisons that are less sensitive to split-dependent variation. Across this benchmark, DaX achieves the highest mean performance across tasks and consistently strong task-level ranking scores, with gains spanning diagnostic pathology, biomarker and molecular profiling, tissue/specimen context, and risk, response, and prognosis. These results support DaX as a transferable visual encoder for computational pathology and provide a standardized evaluation framework for future pathology foundation models. Project page: this https URL.


[25] 2606.06995

Power Grid Topology Control

Power grids are facing major challenges from growing renewable integration and worsening climate impacts. While flexibility on both the demand and generation sides has been widely explored to address these challenges, network-side flexibility, especially in network topology, remains highly underutilized. Advances in communication, power electronics, and circuit breakers have made network topology increasingly controllable. However, leveraging this topological flexibility poses substantial challenges, primarily due to the inherent non-convexity and hybrid dynamics in associated optimization and control problems. This monograph surveys the development of power grid topology control in both early and recent years. It begins by discussing the fundamental topological constraints involved in topology control problems. Subsequently, it introduces steady-state topology control for transmission and distribution networks separately, covering fundamentals, a state-of-the-art review, and representative recent advances. Additionally, the network topology transition problem, which addresses the implementation of optimal topology solutions and has garnered increasing attention in recent years, is further modeled and analyzed. Beyond utilizing the flexibility of steady-state network topology, controlling network topology during transients can also contribute to system stabilization. Traditional approaches, such as intentional controlled islanding for transmission networks, as well as recently developed topology control methods for microgrid stabilization, exemplify this concept. Finally, a summary of this monograph is provided.


[26] 2606.07026

A Novel Stripe-based RIS Optimization for UAV Communications and Sensing in Low-Altitude Wireless Networks

Low-altitude wireless networks (LAWN) envision a reconfigurable 3D network capable of supporting mission-critical aerial operations. This paper presents a reconfigurable intelligent surface (RIS)-assisted LAWN to establish a reliable communication with an unmanned aerial vehicle (UAV) across varying wireless channel conditions and signal blockages. A low complexity stripe-based RIS phase shift optimization framework is proposed to simultaneously enhance communication reliability and provide passive sensing capability for UAV tracking under 3D mobility. Unlike high-complexity optimization approaches, the proposed method leverages the inherent structural phase-gradient of the RIS adjacent elements to significantly reduce the search space for calculating and updating the RIS configuration as the UAV moves. The analysis and simulation results demonstrate that the proposed framework outperforms conventional benchmarks in convergence speed and computational efficiency, while maintaining robust, high signal-to-noise-ratio (SNR) connectivity even in the presence of phase estimation errors and low SNR regimes. In addition, the measurement experiments using a real RIS prototype in an outdoor campus environment are performed to demonstrate the practical viability of the proposed approach.


[27] 2606.07048

Geometric Time-Domain Identification of Three-Phase Load Equivalents from Terminal Measurements

This paper presents a geometric time-domain method for identifying three-phase load equivalents from instantaneous voltage and current measurements at the point of common coupling. Measured waveforms are interpreted as trajectories in Euclidean signal spaces, and load-equivalent parameters are recovered from the geometry of those trajectories. The method extends a previously published single-phase geometric identification formulation to three- and four-wire systems and places special emphasis on the three-wire case, where no neutral voltage is measured and the terminal data must satisfy coupled Kirchhoff constraints. The main advance over the earlier analytical formulation is a sampled-data implementation based on local time windows, normalized matrix equations, harmonic-projection derivative and primitive coordinates, explicit geometric identifiability tests, passivity constraints, and energy/Kirchhoff residuals. The method does not force a model when the measured trajectory lacks enough information; instead, it reports low-rank or ill-conditioned windows as low-confidence evidence. Numerical simulations with clean data, measurement noise, window-length sweeps, and sensor delay show that the method accurately identifies informative three-phase trajectories and exposes structurally degenerate cases such as pure single-frequency excitation for higher-order three-wire models. For a given admissible topology the identified circuit closes the instantaneous terminal energy balance of the measured load over the analysis window.


[28] 2606.07050

Optimized Sampling of Angle-Resolved Scatterometry Data Using End-to-End Compressed Learning Model for Nanograss Deficiency Detection

Reliable inspection of nanosurfaces is essential to ensure the quality of nanostructure manufacturing. Angle-resolved scatterometry provides a non-invasive inspection method that can be used in-line but often suffers from long acquisition times due to dense angular sampling. This paper addresses the data acquisition challenge by proposing an end-to-end compressed learning framework for 5-level vacancy deficiency detection in zinc oxide nanograss using ARS images. The proposed framework integrates a learnable latitude-based sampling layer with a convolutional neural network, allowing sampling and classification to be jointly optimized during training. The sampling layer exploits the physical structure of ARS patterns and learns informative latitudinal regions, which reduces the sampling search space and improves convergence. Evaluation results show that the proposed approach achieves high and stable deficiency-level classification performance under different noise conditions. Using full ARS images, the model achieves 94.2% accuracy for five-level deficiency classification and 98.6% accuracy for separating deficient from non-deficient nanosurfaces. The proposed sampling model matches full-image performance while using up to 90% fewer angular sampling points. Even when sampling points are reduced by 99.7%, the classification accuracy decreases by less than 10 percentage points. To further improve training with limited data, we also studied a GAN-based augmentation approach and used GAN-generated data for model pretraining. Augmented data resulted in fast convergence within only a few fine-tuning epochs.


[29] 2606.07063

Beyond Universality: The GCC-FER Dataset and Culture-Aware Adaptation for Dynamic Facial Expression Recognition

Dynamic Facial Expression Recognition (DFER) is a key enabling technology in affective computing, human-computer interaction, and intelligent multimedia systems. Despite the significant influence of cultural nuances on FER performance, most existing FER systems assume that emotional expressions are universally consistent across populations. This variation can be attributed to systematic differences in facial muscle activation patterns across cultures. A major challenge in advancing cross-cultural FER lies in the scarcity of culturally diverse benchmark datasets. To address this, a new hybrid multicultural video dataset termed Global Cross-Cultural Facial Expression Recognition (GCC-FER) is introduced. GCC-FER comprises 23,934 video samples spanning four cultural groups (African, Caucasian, East Asian, and South Asian) across seven basic expressions, combining psychologically supervised in-house data collection for underrepresented populations with rigorous ethnicity filtering of existing sources. To the best of our knowledge, GCC-FER is the first large-scale global cross-cultural DFER dataset designed to address these demographic gaps. Leveraging this dataset, behaviorally grounded cultural priors are derived for each cultural group and a global prior for practical deployment. A Culture-Aware FER (CA-FER) system is proposed to mitigate cultural bias by adaptively recalibrating latent facial representations. Extensive experiments on GCC-FER and DFEW demonstrate that the proposed system consistently improves FER performance across multicultural settings.


[30] 2606.07076

Branch-Level Energy Localization in Three-Phase Loads: Resolving Indeterminacy in Time-Domain

This paper develops a branch-level energy-localization framework for three-phase loads. The instantaneous terminal power of an admissible lumped equivalent is decomposed uniquely as Joule dissipation plus magnetic and electric stored-energy rates, branch by branch. Three formal results are established: a Branch-Level Localization Theorem (uniqueness given an admissible topology); a Topology-Indeterminacy Theorem (multiple admissible topologies reproduce identical terminal data with distinct localizations); and a Generalized Energetic Duality Theorem that organizes classical electrical dualities (Norton-Thevenin, series--parallel, L vs C, R vs G) as restrictions to Linear Time Invariant (LTI) sinusoidal regimes of a single time-domain principle in which constant-parameter equivalence is replaced by time-varying parameters. The framework is exercised on six test cases including the de Leon--Cohen open-phase paradox, switched-resistive loads, three-wire delta-versus-wye-virtual indeterminacy, fluctuating-phase loads, and a four-wire nonlinear load with hysteretic, linear, and switched branches. The framework is positioned as complementary to IEEE Std. 1459, CPC, instantaneous p-q, and Fryze-Buchholz-Depenbrock: each answers a different question, and the apparent paradoxes vanish once the question is posed precisely.


[31] 2606.07091

Rate-Splitting--Inspired Uplink Near-Field ISAC

Integrated sensing and communication (ISAC) enables sensing and communication (S&C) functionalities to share spectrum, hardware, and signal-processing resources, but the resulting inter-functionality interference creates a fundamental receiver-design challenge, particularly in uplink operation. This paper develops a rate-splitting (RS)-inspired framework for uplink near-field ISAC. The framework generalizes the sensing-centric (S-C) and communication-centric (C-C) endpoint orders of non-orthogonal multiple access (NOMA)-inspired ISAC by splitting the communication message across the sensing operation. Closed-form expressions are derived for the communication-rate (CR) and sensing-rate (SR), accounting for residual sensing interference from target-response estimation uncertainty. The achievable CR-SR rate region is characterized under sensing-matched illumination, where the proposed single-frame RS-inspired boundary contains the NOMA-inspired time-sharing region. Unlike the classical Gaussian uplink multiple access channel, where RS recovers the time-sharing dominant face, the split factor in uplink ISAC also reshapes the sensing-stage interference, allowing the RS-inspired boundary to match or strictly enlarge the S&C tradeoff. High-SNR analysis shows that, for non-aligned S&C channels, residual sensing interference changes the rate offsets but not the leading S&C slopes, whereas in the fully-aligned case it becomes slope-limiting. Using an aperture-aware near-field channel model, large-array limits are derived, showing that achievable rates remain finite as the array grows. Numerical results validate the analysis and demonstrate the benefits of the RS-inspired scheme, the impact of residual sensing interference, and the bounded large-array behaviour induced by physically consistent near-field modelling.


[32] 2606.07099

SABLE: GPU-Based Power Flow Accelerator for Sparsity-Aware Batched Learning

Recent studies have developed GPU-based approaches for solving AC power flow and successfully applied them to standalone power flow problems. However, integrating these approaches into modern differentiable learning frameworks while preserving sparsity remains challenging. To this end, we present SABLE, a GPU-based sparse batched power flow accelerator for differentiable learning via an implicit power flow layer. SABLE leverages a block-diagonal embedding that reformulates batched three-dimensional Jacobians as a fixed-pattern two-dimensional sparse template that is shared across PyTorch, CuPy, and cuDSS. This common template enables zero-copy interoperability and memory-efficient sparse reuse across the software stack. On top of this representation, SABLE accelerates repeated power flow computations through reusable sparse templates, custom GPU kernels, a cuDSS-based sparse-direct LU solver, and mixed-precision techniques. Extensive experiments show that SABLE improves standalone power flow solving throughput by up to 253.4$\times$ over pandapower and 5.7$\times$ over ExaPF. In end-to-end training, evaluated on AC optimal power flow learning models based on DC3 and DeepLDE, SABLE expands the feasible training batch range by up to 64$\times$ and improves training throughput by up to 206.7$\times$ over the corresponding baseline.


[33] 2606.07104

Robust Secure Beamforming for Movable Antenna Enhanced Integrated Sensing and Communications

In this letter, we investigate robust beamforming design for a movable antenna (MA)-enhanced secure integrated sensing and communications (ISAC) system with imperfect eaves?dropping channel state information (CSI). To improve radar sensing performance, we formulate a radar signal-to-interference?plus-noise ratio (SINR) maximization problem by jointly opti?mizing the transmit beamforming and antenna placement while ensuring communication data security. However, the resulting op?timization problem is inherently intractable due to the nonlinea mapping from antenna positions to channel coefficients, as well as the eavesdropper (Eve) channel uncertainty. To handle these challenges, we propose a block coordinate descent (BCD)-based algorithm incorporating successive convex approximation (SCA) and fractional programming (FP) techniques. Simulation results show that our proposed algorithm exhibits fast convergence and achieves a significant improvement in the radar SINR while guaranteeing communication security.


[34] 2606.07182

Audio Imitator: Controlling Timbre and Tempo in Video2Audio Synthesis with Audio Reference

Video-to-audio generation has made significant progress in achieving semantic consistency and temporal alignment from silent videos. However, audio contains rich stylistic attributes such as timbre and tempo that are difficult to infer from visual and textual inputs alone. While reference audio can serve as additional conditioning, it is typically treated as a holistic signal, limiting fine-grained style control. We propose AudioIM, an attribute-aware framework that explicitly models timbre and tempo as separate control factors rather than relying on holistic prompt conditioning. Dual encoders extract complementary timbre-related and tempo-related representations, which are injected through global conditioning. A masking-based training strategy enables effective latent prompt conditioning at inference. Experiments on VGGSound show improved style similarity while preserving semantic alignment and synchronization. Audio samples are available at: this https URL.


[35] 2606.07208

Unlocking feedforward capabilities in Model Predictive Control algorithms to deal with measurable disturbances

Disturbance rejection is a central objective in process control, particularly when measurable disturbances can be exploited through feedforward action. Although Model Predictive Control (MPC) naturally incorporates disturbance models and prediction capabilities, standard formulations cannot achieve complete disturbance rejection since the cost function penalises control effort. This limitation prevents MPC from reproducing the behaviour of classical feedforward compensators. This work proposes a novel framework to embed true feedforward capabilities within MPC without removing the control effort penalty. The approach introduces a dual-control structure in which two control actions are computed simultaneously: a tracking-oriented action addressing set-point tracking and robustness, and a feedforward-oriented action dedicated to disturbance rejection. Both contributions are combined into a single control signal on which the process constraints are explicitly enforced. The feedforward-oriented action is formulated without penalising control effort, enabling full compensation of measurable disturbances. The methodology is developed for Dynamic Matrix Control (DMC), Generalised Predictive Control (GPC), and state-space MPC. Its effectiveness is demonstrated through simulation studies, including comparisons with standard MPC and classical feedforward schemes. A case study based on a reverse osmosis process shows that the proposed approach improves disturbance rejection while preserving constraint handling and overall control performance.


[36] 2606.07259

Assessing True Generalisability of Audio-Visual Speech Recognisers

Current Audio-Visual Speech Recognition (AVSR) models achieve near-perfect performance on the standard LRS3 benchmark, raising concerns of adaptive overfitting. To systematically assess true generalisability, we construct a highly controlled, unseen evaluation set subsampled from the massive MultiVSR dataset. Unlike standard out-of-distribution benchmarks, our subset strictly matches the acoustic, visual, and demographic distributions of the LRS3 test set. Evaluating five state-of-the-art architectures reveals a universal performance collapse, proving that current systems fail to generalise even under strictly aligned conditions. Through a fine-grained attribute analysis across seven factors, we isolate the specific drivers of this degradation. Furthermore, we uncover a profound lexical bias, expose distinct error patterns, and surprisingly reveal that audio-visual performance even lags behind audio-only settings. We release our matched test set for future benchmarking.


[37] 2606.07264

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

Audio reasoning requires multi-step, evidence-grounded inference over temporally dynamic and acoustically mixed signals, exceeding conventional perception tasks such as ASR or captioning. We present VISA, our submission to the Interspeech 2026 Audio Reasoning Challenge (Agent Track), evaluated via the MMAR Rubrics for correctness and reasoning quality. Under a "LALM as a Tool" paradigm, VISA strengthens large audio language models with auxiliary multi-modal evidence while avoiding heavy orchestration. The system integrates three components: multi-modal feature extraction for complementary audio and acoustic-visual clues, model-voting inference with consistency checking for stable predictions, and fine-grained category-aware routing to resolve disagreements and select rubric-aligned reasoning chains. On the official Agent Track leaderboard, VISA ranks 2nd overall with a 66.23% Rubrics score. It also achieves 77.40% Accuracy, the highest among all systems listed across both the Single Model and Agent tracks.


[38] 2606.07284

RSMA Enabled Hierarchical UAV Networks with Non Linear Energy Harvesting: Outage Probability Analysis and UAV Placement Optimization

Uncrewed aerial vehicles (UAVs) are expected to enhance connectivity, extend network coverage, and support advanced communication services in sixth-generation (6G) cellular networks, particularly in public and civil applications. Although multi-UAV systems offer greater efficiency and cost-effectiveness than single-UAV deployments, their implementation still faces several fundamental challenges that limit their reliability, sustainability, and scalability. The limited onboard energy restricts mission duration and communication continuity. Therefore, wireless energy harvesting (EH) emerges as a promising solution to overcome this limitation. However, terrestrial energy sources experience path loss, making EH from surrounding UAVs more sustainable. Moreover, rate-splitting multiple access (RSMA) remains insufficiently explored in hierarchical UAV networks under hardware impairments (HWI) and imperfect channel state information (ICSI). This paper proposes a hierarchical ad hoc UAV network with non-linear EH and RSMA to enhance both energy and cost efficiency, where UAVs harvest energy from surrounding UAVs. For a practical scenario, we consider the effect of HWI and ICSI in our proposed system. To the best of the authors knowledge, this study is the first to investigate such a scenario in the literature. The outage probability expressions for ground Internet of things (IoT) devices, each CMU, and the overall outage probability of the proposed system are derived over Nakagami-$m$ fading channels while considering practical constraints such as HWI, ICSI, and non-linear EH. Additionally, approximate outage probability expressions are derived for high transmit power regimes. Subsequently, we formulate two optimization problems to enhance reliability and performance. Our findings indicate that the proposed system outperforms all benchmarks in terms of outage probability.


[39] 2606.07328

Implementation and Calibration of 3GPP-Compliant ISAC Channel Simulator

Integrated sensing and communication (ISAC) has emerged as a key technology for 6G systems. To support the development of ISAC systems, accurate channel modeling and simulation for performance evaluation is essential. Recently, 3GPP introduced a standardized ISAC channel model and its associated calibration procedure for this purpose. However, due to the complexity of the modeling methodology and the lack of fully explicit implementation details in the 3GPP reports, different implementations may lead to inconsistent or unsynchronized simulation results. To address this issue, in this work, we implement the 3GPP ISAC channel model simulator specified in TR 38.901 and conduct a comprehensive calibration analysis. We compare the simulation results with the reference results reported by companies in 3GPP and discuss several key implementation details to provide insights into the implementation and calibration of the simulator. To facilitate reproducibility and further research, the developed simulator, together with the relevant datasets and calibration results, has been released as an open-source project on GitHub.


[40] 2606.07347

CSI Phase Averaging for High-Sensitivity Wi-Fi Sensing in Low-Multipath Environments

This paper presents a low-complexity motion detection method for outdoor Wi-Fi sensing based on a model-driven approach. The method exploits the structural characteristics of the phase components in channel state information (CSI) for low-multipath propagation environments, which are generally considered disadvantageous for Wi-Fi sensing, to mitigate the phase offset errors originating from wireless devices. In addition, phase averaging provides a processing gain that reduces the random noise components, including quantization and thermal noise. The theoretical basis of the method is described and its effectiveness is experimentally evaluated using Compressed Beamforming frames obtained from commercial IEEE 802.11ac devices. The experiments primarily focus wild crows flying in an outdoor orchard environment. The experimental results demonstrate that the method can detect birds even when they fly several meters away from the direct line-of-sight path between the transmitter and receiver antennas. Furthermore, the results indicated that fluctuations caused by vegetation movement were negligible when the wind speed was less than 3~m/s. The proposed approach is expected to be applicable not only to orchard monitoring but also to other outdoor Wi-Fi sensing applications in low-multipath environments.


[41] 2606.07374

Beyond Backscatter: InSAR coherence from detected SAR images

In this work, we propose a deep learning framework for coherence regression directly from detected SAR images, without the need for accurate coregistration. A Residual U-Net is trained using coherence maps derived from precisely coregistered Sentinel-1 SLC data to learn the relationship between backscatter magnitudes and coherence. The model is trained on 12-day SLC pairs and evaluated across different datasets, including coregistered SLC products and open access analysis-ready data, covering diverse radiometric properties, geometries, and locations. Experimental results demonstrate that the proposed method achieves high-resolution coherence regression with improved accuracy compared to existing intensity-based approaches. The network generalizes well across diverse geographical locations and even across different temporal baselines that were never seen at training time. Additionally, the ability to operate on globally available analysis-ready data, such as ground range detected data, e.g., distributed through Google Earth Engine, enables its large-scale application in mission design, change monitoring, and diverse mapping tasks.


[42] 2606.07375

An End-to-End Encrypted Control Pipeline for Multi-Agent Coordination via CKKS Homomorphic Encryption

Cloud-based coordination of multi-agent systems requires sharing state with a central server, creating a conflict between coordination and privacy. Fully homomorphic encryption (FHE) resolves this in principle, but its severe arithmetic constraints demand that every stage of the control loop be redesigned from first principles. We present an end-to-end encrypted control pipeline in which sensing, state estimation, state propagation, and consensus control all operate on CKKS-encrypted data using only addition, multiplication, and cyclic rotation. In order to overcome the computational challenges of FHE, we employ steady-state Kalman gains instead of solving for the matrices online and graph Laplacians are applied via the diagonal method at a cost proportional to the number of nonzero cyclic diagonals, accommodating ring, torus, and complete-graph topologies within a unified framework. To quantify the cumulative effect of encryption noise, we use the separation principle to decouple controller and observer error dynamics and derive a periodic bootstrapping bound in which CKKS bootstrapping acts as an impulsive disturbance; the resulting steady-state error ball depends on the bootstrapping precision and the closed-loop spectral radius, providing a direct design equation for the privacy-accuracy tradeoff. The pipeline is validated on a multi-agent formation control scenario, confirming stable closed-loop operation under encryption with bounded tracking error.


[43] 2606.07381

Impact of Synthetic Lesional MR Images in Automated Focal Cortical Dysplasia Detection in Low-Data Scenarios

Background and Purpose: Automated detection of focal cortical dysplasia (FCD) requires large volumes of voxelwise lesion-delineated MRI data, which are difficult to acquire. This study aims to generate synthetic MRI data exhibiting FCD, assess their realism, and evaluate their impact on automated FCD detection, particularly in reducing the need for manual annotations. Methods: T1-weighted (T1w) and T2-weighted Fluid-Attenuated Inversion Recovery (FLAIR) MRI scans from 131 FCD patients and 90 healthy controls from multiple (3) sites were retrospectively studied. Synthetic MRIs were generated by conditioning a generative network on binary FCD masks. Two neuroradiologists identified real images from a random set of 14 real and 14 synthetic scans. Three nnU-Net models were trained to detect FCD using: (i) real-only (35 FCD / 35 controls), (ii) real (35 FCD / 35 controls) plus synthetic augmentation, and (iii) expanded real data (70 FCD / 70 controls). Results: Experts showed limited ability to distinguish real from synthetic images, with classification accuracy of 60% for T1w and 70% for FLAIR (inter-rater agreement kappa = 0.86). Augmenting automated FCD detection with synthetic data increased sensitivity by 8.14% (p = 0.12) and improved model confidence at true lesion sites (0.83 +/- 0.11 to 0.89 +/- 0.12; p = 0.02). The expanded real-data model further improved sensitivity to 73.8% (p < 0.001) and confidence to 0.90 +/- 0.14 (p = 0.01). Conclusion: Conditional generative networks can generate realistic synthetic FCD-MRIs, reducing labeled data needs by approximately 20% while maintaining equivalent sensitivity. Equivalent amounts of real data, when available, remain more effective than synthetic augmentation.


[44] 2606.07449

On orbital stabilization of a circular motion primitive for a dynamic extension of the Dubins car model

This paper addresses orbital stabilization of a circular motion primitive for a dynamic extension of the Dubins car model within a transverse-linearization framework. We show that the corresponding transverse linearization is unstable and not stabilizable by linear state feedback. Therefore, the standard linearization-based approach to orbital stabilization cannot be applied directly. The main contribution is a set of explicit and verifiable conditions that characterize when a controller design based on transverse linearization remains applicable. These conditions rely on the specific structure of the dynamics in a neighborhood of the motion and on the use of non-standard transverse coordinates for controller design and analysis. Numerical simulations illustrate the proposed design procedure.


[45] 2606.07463

Amortized Neural Optimization for Pre-Layout Signal Integrity Design Space Exploration using Differentiable Surrogates

Pre-layout design space exploration (DSE) for high-speed signal integrity (SI) analysis is often limited by the computational cost of simulations and iterative optimization algorithms within modern electronic design automation (EDA) workflows. While machine learning surrogate models accelerate the simulation step, optimizing designs still requires utilizing iterative black-box search methods. This iterative nature scales poorly, making multi-corner sweeps computationally expensive. As a solution, this paper proposes amortized neural optimization (ANO) for pre-layout SI design. ANO entirely eliminates iterative black-box inference by utilizing fully differentiable neural network surrogate models. ANO extracts analytical gradients from the surrogate to train a global optimization policy. Instead of solving the optimization problem repeatedly at inference, the optimization process is learned offline and therefore amortized. Once the ANO policy is trained, it maps different channel contexts directly to near-optimal design parameters in a single deterministic forward pass. The efficiency and accuracy of the ANO framework are demonstrated based on three complex SI design scenarios, including DDR5 decision feedback equalization (DFE), 9-dimensional SerDes Tx/Rx co-equalization, and DDR3 DQS differential pair routing to optimize eye diagram metrics under intra-pair skew constraints. By trading roughly 10% in optimality compared to instance-specific black-box algorithms, it realizes speedups of three to four orders of magnitude. For a large-scale 320,000-instance multi-corner SerDes sweep optimization, ANO collapses what would have taken days of computation using iterative search algorithms into a single batched forward pass that completes in milliseconds. This transforms computationally expensive SI optimization into real-time and interactive pre-layout DSE.


[46] 2606.07476

Physiologically Constrained Musculoskeletal Neural Network for Multi-DoF Joint Kinematics Estimation from Partially Observed sEMG

This paper investigates multi-degrees of freedom (DoF) joint kinematics estimation under partially observed surface electromyography (sEMG), where only a subset of task-relevant muscles can be measured due to anatomical inaccessibility or sensor constraints. A novel musculoskeletal neural network (MSK-NN) is proposed to estimate multi-DoF joint angles while simultaneously inferring activations for both measured and unmeasured muscles. MSK-NN consists of a CNN-based muscle activation estimator and an embedded MSK forward dynamics module, forming a fully differentiable architecture. Unlike existing hybrid neural frameworks that require additional biomechanical labels (e.g., muscle-tendon forces, joint torques), MSK-NN is trained without direct supervision of internal biomechanical variables. A composite physics-physiology loss is designed by incorporating a joint kinematics loss, a data-driven muscle synergy loss, and an anatomy-guided trend loss. The proposed method is evaluated on two-DoF wrist kinematics estimation across three rhythmic motions with unconstrained speed and amplitude, and one random motion. Compared with CNN, Bi-LSTM, CNN-LSTM, and PET baselines, MSK-NN achieves lower normalized root mean square error (NRMSE) and higher coefficient of determination (R2), especially for the random motion. More importantly, the optimized MSK parameters remain within physiological limits, and the estimated activation of an input-excluded muscle exhibits strong temporal agreement with its recorded sEMG envelope, demonstrating the capability of musculoskeletal (MSK)-NN to recover physiologically plausible activations.


[47] 2606.07486

OPENPATH: A Supervisor--Specialist Agent System for Personalized, Accessible, and Multi-stop Urban Trip Planning

Urban trip-planning systems are commonly optimized for travel time and cost, but they offer limited support for the heterogeneous needs that real travelers bring, such as personalized preferences, multi-stop itinerary construction, and end-to-end wheelchair accessibility. We present openpaths, a supervisor-specialist multi-agent system that handles all of these tasks within a single architecture. openpaths adopts a deliberate division of labor: LLM agents parse natural-language input, classify request intent, and orchestrate execution, while classical algorithms perform route optimization over curated mobility and accessibility data. This design ensures that the resulting trip honors heterogeneous user preferences and enforces strict accessibility requirements when requested. Beyond per-user planning, openpaths doubles as a measurement instrument for city-scale accessibility analysis: applied to NYC, the system reveals substantial ADA infrastructure gaps and quantifies their effect on job accessibility for wheelchair users. Overall, this study shows how a supervisor-specialist LLM agentic framework can support heterogeneous trip planning and transparent, equitable transportation analysis in real urban environments.


[48] 2605.24649

On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks

Recurrent Neural Networks (RNNs) can learn to predict Signal Temporal Logic (STL) verdicts online from partial trajectories, but deploying them as runtime monitors in safety-critical systems demands more than predictive accuracy. Standard RNN architectures offer no structural guarantee that outputs degrade gracefully under sensor degradation; a dropped input can silently flip a verdict from safe to unsafe. We introduce the Recurrent Differentiable Ternary Logic Gate Network (R-DTLGN), a recurrent architecture that operates over Kleene's three-valued logic $\{-1, 0, +1\}$, where $0$ explicitly represents unknown. The R-DTLGN trains through continuous polynomial surrogates and hardens to a discrete ternary logic circuit at inference. We analyze the hardened circuit through two gate vocabularies derived from two orderings on the ternary domain: numerically monotone gates ensure stable recurrent dynamics, while information-monotone gates, when present, guarantee principled abstention (unknown inputs never produce wrong outputs) and monotonicity in input certainty (more information can only improve the verdict). We show that the recurrent connections required by bounded STL operators use exclusively AND and OR, which belong to both vocabularies, linking the monitoring task to the architecture's guarantees. A realizability bound derived from the STL formula's temporal operators directly sizes the network's hidden state, replacing hyperparameter search with a formula-driven specification. We evaluate on STL specifications over D4RL PointMaze navigation data, testing prediction accuracy, degradation under predicate dropout, and the accuracy-versus-safety tradeoff between two label construction pipelines. The R-DTLGN is, to our knowledge, the first recurrent architecture that couples learned temporal prediction with formal degradation guarantees rooted in three-valued logic.


[49] 2606.06107

Deployed trusted-node quantum key distribution over 300 km with a multi-core fiber access link

Quantum key distribution (QKD) is increasingly considered for deployment in realistic communication networks, where long distances, heterogeneous fiber infrastructure, and coexistence with classical traffic present substantial challenges. Here, we demonstrate trusted-node QKD between Linköping University and the Stockholm hub of the Swedish national quantum communication infrastructure over 270 km of deployed single-mode fiber, extended by a 33 km multi-core fiber (MCF) segment emulating a metropolitan access link, for a total distance of 303 km. The two sub-links use commercial QKD systems whose receivers are interfaced with external superconducting nanowire single-photon detectors, enabling operation at losses beyond those supported by standard internal gated-mode detectors. We operate the link while actively switching the QKD channel between two MCF cores, with co-propagating Ethernet traffic and injected broadband optical noise in the other cores. The results demonstrate the integration of commercial QKD into demanding, dynamically reconfigurable fiber infrastructure relevant to future hybrid quantum-classical networks. Finally, using the generated secret keys, we illustrate how limited and time-varying QKD throughput affects one-time-pad-protected image transmission: image fidelity depends strongly on the available QKD-generated key budget and the choice of compression algorithm, highlighting application-level challenges for QKD-based encryption in realistic scenarios.


[50] 2606.06537

DSU-Net: An Attention-Enhanced Dense Skip U-Net for Breast Lesion Segmentation in Mammographic Images

Breast cancer remains one of the leading causes of cancer-related mortality among women worldwide, making early detection essential for effective treatment. Mammography is the primary screening modality; however, accurate delineation of suspicious lesions remains challenging and subject to inter-observer variability. Automated segmentation methods can assist radiologists by providing consistent and efficient lesion localization. This study presents DSU-Net, an attention-enhanced Dense Skip U-Net architecture for automated breast lesion segmentation in mammographic images. The proposed framework integrates dense skip connections and attention mechanisms to improve feature propagation, preserve spatial information, and enhance lesion boundary delineation. Experiments were conducted using the Curated Breast Imaging Subset of the Digital Database for Screening Mammography (CBIS-DDSM). To address severe foreground-background imbalance, a composite loss function combining Dice loss, focal loss, and binary cross-entropy loss was employed during training. The proposed model achieved a Dice Similarity Coefficient of 0.9421, an Intersection over Union of 0.8905, an accuracy of 0.9711, and an AUC-ROC of 0.9878 on the validation dataset. Qualitative evaluation demonstrated accurate delineation of lesions with varying sizes and morphologies, while quantitative results confirmed robust discrimination between lesion and background regions. These findings demonstrate that DSU-Net provides accurate and reliable breast lesion segmentation in mammographic images and highlights the potential of attention-guided deep learning for computer-aided breast cancer screening and diagnosis.


[51] 2606.06550

Geometric Second-Order Feature Correlation Learning for Self-Supervised Speech Emotion Recognition

Self-supervised learning (SSL) yields powerful, context-rich representations for speech emotion recognition (SER), yet aggregating these representations into holistic descriptors remains a bottleneck. Conventional first-order aggregation implicitly assumes feature independence, which overlooks the latent Riemannian geometry and discards higher-order relationships essential to the representational power of the backbone. To address this problem, this paper proposes a novel Second-Order Correlation (SOC) layer. Instead of treating features in isolation, SOC models feature correlations as covariance descriptors to capture synergistic co-occurrence patterns, which serve as discriminative signatures for robust emotion recognition. By mapping these descriptors from the Riemannian manifold to a Euclidean tangent space through Log-Euclidean mapping (LEM), the proposed method preserves geometric integrity while enabling direct linear discriminative learning. Extensive experiments on the ESD and RAVDESS datasets demonstrate that SOC recovers discriminative information lost in first-order pooling and effectively aggregates high-dimensional SSL features.


[52] 2606.06559

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

Full-duplex spoken dialogue models allow voice agents to listen and speak concurrently, enabling natural interaction with real-time overlap. However, end-to-end dual-channel models that jointly encode user and agent streams may degrade in realistic acoustic environments: interfering speakers leaking into the user microphone can be encoded as part of the user query, corrupting the LLM's conditioning and causing unstable turn-taking and reduced response quality. We propose Interference-Resilient Adaptive Fusion (IRAF), a lightweight, streaming-compatible module that modulates the contribution of user audio to the LLM frame by frame. IRAF predicts a scalar reliability gate from target-speaker and user audio embeddings and rescales user representations before fusion with agent embeddings. Experiments on MS-MARCO and InstructS2S-200K show consistent gains in response quality and full-duplex interaction under interfering-speaker conditions.


[53] 2606.06573

Multiscale POD of Transformer Attention Fields: Scale-Selective Analysis via Morlet Scalogram

We introduce scale-selective Proper Orthogonal Decomposition (POD) for transformer attention fields, inspired by the use of POD for extracting energetically dominant modes from turbulent flow ensembles. The Morlet continuous wavelet transform identifies dominant temporal scales in the attention lag structure across a document ensemble; POD then extracts the energetically dominant modes at each scale from the ensemble of attention fields. The resulting modes reveal layer-dependent scale organisation, with early layers emphasising fine scales and later layers shifting toward coarser scales. We define a spectral concentration index from the POD eigenvalue decay rate and show empirically that it differentiates layers by their attention field complexity. By the classical POD optimality theorem, the extracted modes minimise the average L2 reconstruction error over the ensemble (Theorem 1), giving a data-driven effective rank for each layer. The method requires no architectural modification and no linguistic annotations: dominant attention patterns emerge from ensemble statistics alone. The turbulence analogy is structural rather than physical: we borrow ensemble covariance and modal analysis, not fluid dynamics itself.


[54] 2606.06615

FIGMA: Towards FIne-Grained Music retrievAl

Retrieving music using natural language descriptions has improved with contrastive audio-text models such as CLAP, but current systems remain limited to coarse semantic queries. When descriptions specify fine-grained musical attributes such as tempo, key, chord progression, or rhythmic structure, existing models often fail to retrieve the correct audio. We show that this limitation stems from the contrastive learning objective itself: despite being trained on long captions, CLAP-based models effectively utilize only the first few tokens, discarding much of the information encoded in detailed prompts. Then, we propose FIGMA (FIne-Grained Music RetrievAl), a multi-view contrastive architecture that addresses this limitation by jointly optimizing global audio-text alignment and frame-level, token-wise alignment. This design enables FIGMA to capture both high-level semantic context and fine-grained musical attributes within a unified representation space. Moreover, we formalize the task of Fine-Grained Music Retrieval and construct Fine-Grained Music Caption dataset (FGMCaps), a large-scale dataset of 380K music-caption pairs for training along with a 10K test set, both annotated with tempo, key, chord progression, beat count, as well as genre and mood. Extensive experiments demonstrate that FIGMA consistently outperforms existing CLAP-based music retrieval models across multiple music retrieval benchmarks, including out-of-domain evaluations, with relative improvements of up to 73.3%.


[55] 2606.06632

Smooth Hard-Thresholding for Singular Values with Stein's Unbiased Risk Estimate

Low-rank matrix denoising is a central primitive in patch-based image restoration and many other inverse problems. Classical SVD-based image denoising methods often choose a truncation rank by matching residual singular-value energy with an estimated noise energy, but this rule is not a finite-sample risk principle because a fitted low-rank approximation inevitably absorbs part of the noise. This paper develops a mathematically rigorous alternative based on Stein's unbiased risk estimate (SURE). Since singular value hard thresholding is discontinuous and does not satisfy the hypotheses of Stein's lemma, we introduce a logistic smooth hard-threshold spectral estimator. We prove that the smooth shrinker satisfies the regularity conditions required by a spectral-estimator version of Stein's lemma, and therefore admits an exactly unbiased fixed-threshold risk estimate under Gaussian noise. For a fixed observed matrix and a finite set of candidate thresholds separated from the observed singular values, the ordering of the fixed-threshold smooth SURE objective eventually agrees with a simple limiting score. The limiting score has the same algebraic form as the biased hard-threshold SURE formula, but here it is used only as a computational device for ranking finite candidates. Selecting the minimizing threshold is a data-adaptive tuning step; the selected SURE value should not be interpreted as an unbiased risk estimate of the finally selected estimator.


[56] 2606.06652

Probabilistic Risk Sensitivity and Loss Aversion in Cumulative Prospect Theory

This paper develops a binary-gamble framework for characterizing risk sensitivity and loss aversion in Cumulative Prospect Theory (CPT). The proposed probabilistic risk-sensitivity metric is defined as a probability-threshold ratio that determines acceptance and preference thresholds in choice problems involving either a certain outcome and a binary gamble or two binary gambles. We show how standard notions of symmetric and non-symmetric bet aversion can be recovered within this framework, and we compare the resulting threshold-based conditions with utility premia, probability premia, and Arrow--Pratt curvature measures. The analysis clarifies when these criteria coincide and when they diverge, particularly for increasing aversion conditions, binary gambles with unequal probability distributions, and settings involving probability weighting functions. We also identify technical restrictions that arise when CPT-utility functions are used to represent loss aversion at the reference point. The resulting framework provides a decision-theoretic interpretation of risk sensitivity that is directly tied to probability thresholds and complements existing premium-based approaches.


[57] 2606.06687

Towards Serverless Semi-Decentralized Federated Learning with Heterogeneous Optimizers

We investigate cluster formation, involving the number and composition of clusters, in decentralized federated learning (FL) with heterogeneous machine learning (ML) optimizers. While clustering in centralized FL has enabled scalability and resource savings, its value and development in fully decentralized environments have yet to be explored. Optimizing cluster formation in such environments is challenging, especially due to the complex coupling between network graph structures, local data heterogeneity, and different local ML model optimizers. To address these challenges, we propose serverless semi-decentralized FL (SSD-FL), a methodology requiring no persistent server infrastructure. In SSD-FL, cluster formation occurs via a lightweight, one-time device-to-device (D2D) initialization phase, after which actual ML model training (alongside consensus and convergence processes) is fully serverless. Functionally, SSD-FL segments global rounds into intra-cluster and inter-cluster regimes, ensuring global convergence and consensus through novel "effective loss functions" that integrate device-specific ML optimizers with network graph-based regularization. Next, SSD-FL leverages the consensus gap via the Cheeger inequality to develop an iterative clustering algorithm evaluated against our derived convergence and consensus bounds, which incorporate a unique scoring metric to quantify data and optimizer heterogeneity across devices. Finally, experimental evaluation against three categories of decentralized FL methodologies validate that SSD-FL improves both convergence speeds and communication efficiency across various network graphs, datasets, and local optimizer regimes.


[58] 2606.06718

MSAIC-Net: A Multi-Scale Attention and Imbalance-Aware Contrastive Network for ECG-Based Myocardial Substrate Abnormality Detection

Myocardial substrate abnormalities, such as myocardial scar and myocardial infarction (MI), are associated with adverse cardiovascular outcomes. Electrocardiography (ECG) provides a low-cost and widely available tool for detecting these abnormalities, but ECG-based detection remains challenging due to heterogeneous lead-dependent manifestations, high-dimensional multi-lead signals, class imbalance, and the limited interpretability of deep learning models. We propose a multi-scale attention-enhanced convolutional network (MSAIC-Net) for ECG-based myocardial substrate abnormality detection. MSAIC-Net employs parallel atrous convolutional branches to extract ECG features across multiple temporal receptive fields. %, enabling the model to capture both local and longer-range temporal patterns. Channel attention is then used to adaptively reweight informative lead-wise and feature-channel representations. To address class imbalance and improve feature separability, we introduce a novel imbalance-aware supervised contrastive learning strategy that encourages samples from the same class to form compact representations while increasing separation between abnormal and normal samples. Lead-wise permutation importance is further incorporated to quantify the contribution of each ECG lead and improve model interpretability. The proposed method was evaluated on two complementary datasets: a low-data institutional cohort from the University of Virginia (UVA) Health System for myocardial scar classification and the large-scale public PTB-XL dataset from PhysioNet for MI identification. Experimental results show that MSAIC-Net outperforms baseline models, with particularly pronounced improvements in the low-data UVA cohort. Overall, the proposed framework provides an effective and interpretable approach for ECG-based detection of myocardial substrate abnormalities.


[59] 2606.06727

IDDMBSE: Integrating Data-Driven and Model-Based Systems Engineering for Trusted Autonomous Cyber-Physical Systems

Autonomous cyber-physical systems (CPS) sit at the intersection of Model-Based Systems Engineering (MBSE) and data-driven Machine Learning and Artificial Intelligence (ML/AI), yet no integrated Systems Engineering (SE) methodology natively spans both. We address this gap with IDDMBSE, an Integrated Data-Driven and Model-Based Systems Engineering methodology that extends the rigorous MBSE V-process with a data-driven loop at every step, anchored in SysML, the autonomy stack, and a hybrid model-based plus data-driven trade-off architecture. We instantiate IDDMBSE as an interoperable, open-source tool chain: PERFECT, which maps SysML system architectures to executable ROS autonomy stacks for scalable performance evaluation; TRADES-X, which decomposes design-space exploration into a model-based optimization stage followed by a data-driven evaluation stage; and VERITAS, which combines formal, data-driven, and runtime verification into a single assurance workflow. We demonstrate IDDMBSE on a Trusted Autonomous Ground Robot across its development lifecycle, spanning sensor-suite selection, risk-sensitive path planning, behavior-tree task verification, conformal-prediction-based robust perception, and assured multi-robot coordination, all exercised in a contested-terrain Isaac Sim test range that we release with the tool chain. We close by sketching how IDDMBSE is being re-formulated on SysML v2 / KerML foundations to enable language-native composability and tighter ML/AI integration.


[60] 2606.06770

Degrees of Freedom of Over-the-Air Computation over a MIMO Gaussian Network with Two Transmitters and Two Receivers

The fundamental limits of over-the-air computation (AirComp) are explored in a two-transmitter, two-receiver MIMO Gaussian network, where both receivers demand the same aggregation of source symbols originating at the two transmitters. An AirComp degrees of freedom (ACDoF) metric is defined, constrained by an asymptotic mean-squared error threshold. For a generic MIMO setting where the two transmitters are equipped with $M_1, M_2$ antennas, and the two receivers with $N_1, N_2$ antennas, the AirComp DoF value is shown to be almost surely equal to $\min\{M_1,M_2,N_1,N_2,(1/3)\max\{M_1+M_2,N_1+N_2\}\}$. For SISO settings results are extended beyond generic channels to arbitrary channel realizations. For finite signal-to-noise ratio(SNR) settings, an iterative alternating optimization algorithm is explored.


[61] 2606.06790

Learning All-Terrain Locomotion for a Planetary Rover with Actively Articulated Suspension

This paper presents ERNEST, a four-wheeled planetary rover concept equipped with a two-degree-of-freedom Active Gimbal Suspension that combines yaw and roll actuation to enable wheel reconfiguration, steering, and active load redistribution. A single neural network controller, trained to track a desired path across challenging terrain, fully unlocks the capabilities of this actuated suspension system for autonomous obstacle negotiation. A reinforcement learning framework is developed using the high-fidelity DARTS simulation engine, which combines rigid-contact dynamics and Bekker-Wong terramechanics, enabling the emergence of locomotion strategies adapted to loose-soil conditions. To obtain a single unified controller across heterogeneous terrains, a policy consolidation strategy merges the experience of terrain-specialized agents into one neural network, eliminating the need for explicit terrain classification and controller switching. The resulting controller operates on a combination of proprioceptive and exteroceptive feedback, including sparse stereo-derived terrain elevation, chassis attitude, joint states, and force-torque measurements. Zero-shot transfer to the physical rover is achieved through domain randomization, sensor noise injection, and model-to-real system identification. Experimental results demonstrate autonomous traversal of rock fields, a bump trap, a wheel-high step, sand ripples, and sandy slopes. On a 20° sandy slope, the learned controller reduces the cost of transport by 37% on dry sand despite the additional actuation, and achieves superior performance on wet sand where the passive suspension becomes completely immobilized.


[62] 2606.06805

Lane Change Trajectory Planning for Personalized Driving Comfort and Mobility Efficiency

Lane changing entails simultaneous longitudinal and lateral motions that affect driving comfort and mobility efficiency. Because these motions are tightly coupled and subject to substantial inter-vehicle variability, trajectory planning for lane-change maneuvers is characterized by a highly personalized nature. This study proposes a neural network-driven planner that integrates a third-order polynomial trajectory generator with a learning module that infers optimal trajectory parameters across diverse driving conditions. Using a shared backbone with dual heads, one head ensures all-condition operational guarantees, while the other captures driver-specific preferences for comfort or mobility efficiency. A head-gated switching mechanism, realized through a statistical gate based on error-winner logistic regression, adaptively selects the appropriate head under varying driving conditions, which enables context-aware lane-change trajectory planning. Representative cases and Monte Carlo simulations show that the proposed planner achieves personalized comfort and mobility during lane changes, while the baseline ensures feasible trajectories under driving conditions where personalized data are insufficient or inaccessible.


[63] 2606.06806

Leveraging Soft Distributions of SSL-Derived Discrete Speech Tokens for Downstream Inference

Discrete speech tokens obtained from self-supervised learning (SSL) models provide efficient data compression while maintaining strong performance, and have been widely used as intermediate representations in various tasks. However, discretization inevitably causes information loss, leading to degraded performance compared with continuous SSL features. In this work, we propose to apply soft token assignment only during downstream inference. This approach preserves the efficiency of hard discretization during training while enhancing the expressiveness of the tokens at inference. The proposed method outperforms conventional hard assignment on both ASR and speech synthesis tasks, and exhibits particularly strong generalizability to out-of-domain data. For ASR of non-native speech, it even surpasses models using continuous SSL features. Moreover, analysis of the resulting representations shows they align more accurately with phonemes compared with conventional hard assignment.


[64] 2606.06928

VoxCPM2 Technical Report

We present VoxCPM2, a https://info.arxiv.org/help/prep#abstractsfully open-source multilingual and controllable speech generation foundation model that extends the hierarchical diffusion-autoregressive modeling paradigm of VoxCPM. VoxCPM2 advances the framework in three key dimensions: (i) capability, by unifying 30 languages, 9 Chinese dialects, natural-language voice design, style-controllable voice cloning, and high-fidelity continuation cloning within a single backbone; (ii) quality, through an asymmetric AudioVAE that encodes at 16 kHz and reconstructs at 48 kHz, enabling implicit super-resolution with high encoding efficiency; and (iii) scale, by jointly scaling the model to 2B parameters and the training data to over 2 million hours of multilingual speech. To support these diverse capabilities within one model, we introduce a unified sequence organization that expresses all generation modes through different arrangements of the same input building blocks, allowing joint training under a single set of parameters and objective. VoxCPM2 achieves state-of-the-art or competitive performance on public zero-shot and instruction-following TTS benchmarks. On our internal 30-language evaluation set, it attains an average WER of 1.68%. These results demonstrate that hierarchical continuous-latent modeling, without relying on any external discrete speech tokenizer, offers a viable and powerful foundation for large-scale multilingual and controllable speech generation. The model weights, fine-tuning code, and inference tools are publicly released under the Apache 2.0 license to foster community research and development.


[65] 2606.06975

MyGardenBird: A Machine-Learning-Ready Bird Sound Dataset for Twelve Common Malaysian Birds

Bioacoustic datasets from tropical regions remain limited, in part due to the absence of reproducible workflows for aggregating recordings from public archives. We present \textbf{MyGardenBird}, a curated dataset of bird vocalisations representing twelve common species across Peninsular Malaysia and the Indo-Malayan region. Recordings were sourced from Xeno-canto and processed through species-level filtering, manual spectrogram segmentation, and quality control checks. The primary release comprises 7,200 manually validated audio clips (16 kHz, 16-bit PCM mono WAV), balanced at 600 three-second clips per species (6.0 hours total) derived from 1,381 distinct recordings. Metadata includes geospatial coordinates, vocalisation categories, and signal-to-noise ratio (SNR) values (range: 0.83--59.18 dB; mean: 15.80 dB). A supplementary 44.1 kHz version is also provided. To mitigate data leakage, dataset partitions are defined at the source-recording level. Baseline classification experiments using convolutional neural networks on Mel-spectrograms achieved test accuracies of 92--96\%, indicating strong interspecies separability. Limitations include reliance on single-annotator curation; however, validation with BirdNET confirmed label consistency. MyGardenBird is openly available at this https URL under a CC BY-NC-SA 4.0 licence. Complete preprocessing code accompanies the release to support reproducibility and future expansion.


[66] 2606.06985

Contrastive Training with LLM-generated Near-Misses for Robust Code-Switching Speech Recognition

Code-switching (CS), the alternation between multiple languages within a single utterance, remains challenging for Automatic Speech Recognition (ASR). To address this issue, we propose a Point-of-Interest (POI)-aware contrastive training framework that improves recognition at CS-critical regions. We first identify CS spans by adopting POI detection method from literature, then construct acoustically plausible near-miss hypotheses by perturbing POIs in ASR N-best outputs and expanding candidates with a large language model. Hard but plausible negatives are retained through filtering with acoustic, phonemic, and textual constraints. Finally, we fine-tune Whisper-small with LoRA using a POI-weighted cross-entropy anchor objective together with a multi-negative contrastive ranking loss. Experiments on CS-FLEURS (cmn-eng) and ViMedCSS (vie-eng) show consistent reductions of over 2% in both general and CS-aware error rates compared to standard LoRA fine-tuning.


[67] 2606.07080

dots.tts Technical Report

We present this http URL, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold. First, we train an AudioVAE with multiple objectives to build a semantically structured and prediction-friendly continuous speech space. Second, we use full-history conditioning in the flow-matching head to preserve long-range consistency and reduce drift during generation. Third, we apply reward-free self-corrective post-training to the flow-matching head to further improve robustness and acoustic quality. After being trained on a large-scale multilingual corpus, this http URL achieves the best average performance on Seed-TTS-Eval, with WERs of 0.94%/1.30%/6.60% and SIM scores of 81.0/77.1/79.5 on the zh/en/zh-hard test sets, respectively. Across other benchmarks, this http URL also consistently demonstrates open-source state-of-the-art performance, exhibiting strong generation stability, voice cloning ability, and emotional expressiveness. For efficient inference, we further apply CFG-aware MeanFlow distillation, enabling low-latency speech generation with first-packet latencies of 85/54 ms in output streaming and dual-streaming modes, respectively. To facilitate reproducible research and practical deployment, we release the training and inference code, together with the pretrained, post-trained, and MeanFlow-distilled checkpoints, under the Apache 2.0 license.


[68] 2606.07179

EvoGS: Constructing Continuous-Layered Gaussian Splatting with Evolution Tree for Scalable 3D Streaming

Streaming 3D Gaussian Splatting requires highly scalable, progressive representations. Existing progressive methods rely on \textit{discrete layering}, accumulating separate splat sets for each level of detail. This structural independence between layers inherently leads to error accumulation, severe splat redundancy, and uncontrolled quality transitions. We propose EvoGS, the first \textit{continuous-layering} representation. Organized as an Evolution Tree, EvoGS generates finer details via an explicit, wavelet-inspired parent-child refinement. This empowers child nodes to structurally correct ancestral errors, yield inherently sparse and highly compressible inter-layer signals. Extensive experiments show EvoGS eliminates splat redundancy from over 65\% to under 25\%. Compared to state-of-the-art baselines, it reduces transmission payload and GPU VRAM footprint by up to 2.4$\times$ and 5.5$\times$, respectively, and achieves smooth quality transitions optimal for real-time adaptive streaming. Project page: this https URL


[69] 2606.07207

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the model is confidently wrong, but this intuition breaks down in supervised diffusion training. We introduce the Eisbach log-barrier, a parameter-free weight derived from the entropy of the DiT output's spatial energy distribution: high entropy damps the gradient, while low entropy preserves it. Applied to LoRA fine-tuning of Stable Audio 3 Medium on MusicCaps, it unexpectedly yields stronger thematic development, clearer acoustic differentiation, and higher textural diversity than unweighted training, the opposite of mode collapse. This works because in supervised diffusion the gradient direction is locked to ground truth, so confidence only scales the step size, and because temporal entropy downweights flat samples while preserving high-contrast ones. The result is an online, self-referential data curriculum that emerges purely from the forward pass, with analyzed noise-level dynamics and testable predictions.


[70] 2606.07437

Re-imagining ISO 26262 in the Age of Autonomous Vehicles: Enhancing Controllability through Transferability and Predictability

The ISO 26262 standard defines functional safety for road vehicles through risk assessments based on Severity, Exposure, and Controllability, grounded in a human-driven vehicle paradigm. In the context of autonomous vehicles (AVs), the absence of a human driver necessitates revisiting these principles. This paper decomposes the Controllability placeholder into two auditable evidence dimensions of ISO 26262 by introducing two measurable sub-concepts: Transferability and Predictability. Transferability extends Controllability to capture AV systems' ability to hand off control to dedicated fallback safety mechanisms, while Predictability captures how easily external agents can anticipate AV behavior. Predictability is formally defined from human-robot interaction-inspired principles, and a mathematical framework is provided to quantify it. A designed-versus-achievable gap is introduced to distinguish architectural fallback claims from scene-conditioned achievable fallback capability. The proposed metrics align with ISO 26262 and ISO/PAS 21448 (SOTIF), rendering fallback and interaction claims falsifiable and traceable across ODD slices. These dimensions complement rather than replace existing standards, and the enhancements preserve the structure of ISO 26262 while extending its applicability to driverless automated systems operating at SAE Levels 4 and 5.


[71] 2606.07457

Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic Forecasting

At commissioning time, Photovoltaic (PV) operators must forecast production before target-site observations are available, limiting the direct use of standard supervised forecasters. This cold-start setting is addressed with a zero-shot pipeline that generates a synthetic production history from plant metadata and meteorological covariates, enabling time-series foundation models (TSFMs) to forecast through inference-time conditioning. Five TSFMs are benchmarked against classical baselines under strict Cold-Start Baseline, Real Feedback, and Self-Forecast Feedback strategies. The evaluation spans $440$ PV sites across four datasets and diverse climate regimes. Covariate-aware foundation models outperform baselines by approximately $1.7-2\times$: TabPFN-TS achieves the lowest error under Real Feedback (MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$), while Chronos-2 is most robust under Self-Forecast Feedback. Performance is largely insensitive to the synthetic-history source, indicating that accuracy is driven more by the availability of plausible temporal context than by the specific generator.


[72] 2606.07494

Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech

Recent neural audio codec-based speech generation (CodecFake) produces highly realistic audio, posing a challenge to existing deepfake countermeasure models. While using codec resynthesized speech (CoRS) as proxy data improves performance, it often suffers from limited generalization. We propose Domain-Shift Feature Augmentation (DSFA), which simulates "in-the-wild" variations by transforming deterministic feature statistics into stochastic distributions during fine-tuning. To evaluate generalization, we further introduce Codec-based Speech Generation Extension Evaluation (CoSG ExtEval) dataset, a more challenging extension of the CoSG Eval (from CodecFake+) dataset, featuring 40 unseen generative models and long-form audio. Experimental results demonstrate that combining a post-trained SSL backbone with DSFA effectively narrows the proxy-to-wild domain gap. This approach achieves state-of-the-art performance across diverse CodecFake attacks in both CoSG Eval and CoSG ExtEval.


[73] 2203.07904

Unsupervised Learning Based Focal Stack Camera Depth Estimation

We propose an unsupervised deep learning based method to estimate depth from focal stack camera images. On the NYU-v2 dataset, our method achieves much better depth estimation accuracy compared to single-image based methods.


[74] 2508.02357

Data-Driven Adaptive Second-Order Sliding Mode Control with Noisy Data

This paper proposes a data-driven approach to designing adaptive suboptimal second-order sliding mode (ASSOSM) controllers for a class of single-input nonlinear systems with partially unknown dynamics, subject to both matched and unmatched disturbances. We first view the system as comprising two coupled dynamics, referred to as the upper and lower dynamics, with the last state serving as a virtual input to the upper dynamics. The proposed control-design methodology then follows a two-stage procedure: (i) designing a virtual state-feedback control law for the upper dynamics and (ii) synthesizing an ASSOSM controller for the full-order system. To this end, we collect noise-corrupted data from the system throughout a finite-time experiment. We then formulate a data-dependent condition, whose feasibility enables the design of a virtual state-feedback control law that renders the closed-loop upper dynamics input-to-state stable with respect to the unmatched disturbance. Building on this virtual state-feedback control law, we subsequently propose a data-driven nonlinear sliding variable, based on which an ASSOSM controller is designed for the full-order system. The state trajectories of the resulting closed-loop system are semiglobally ultimately bounded (S-GUB), with the ultimate bound explicitly depending on the magnitude of the unmatched disturbance. In particular, the control design parameters can be selected for any prescribed bounded set of initial conditions so that the state trajectories of the closed-loop system are S-GUB. Moreover, the effect of the matched disturbance is totally rejected after a finite time. The effectiveness of the proposed method is satisfactorily demonstrated in the simulation.


[75] 2508.15006

Structure-preserving Optimal Kron-based Reduction of Radial Distribution Networks

Network reduction simplifies complex electrical networks to address computational challenges of large-scale transmission and distribution grids. Traditional network reduction methods are often based on a predefined set of nodes or lines to remain in the reduced network. This paper builds upon previous work on optimal Kron-based reduction of networks, which was formulated as a mixed-integer linear program, to enhance the framework in three aspects. First, the scalability is improved via a cutting plane restriction, tightened Big M bounds, and a zero-injection node reduction stage. Next, we introduce a radiality-preservation step to identify and recover nodes whose restoration ensures radiality. A linearized voltage magnitude error constraint is incorporated to explicitly bound the difference between full and reduced networks. The model is validated through its application to the 533-bus distribution test system and a 3499-bus utility feeder for a set of representative loading scenarios. In the 533-bus system, an 85% reduction was achieved with a maximum voltage error below 0.0025 per unit, while in the 3499-bus feeder, over 94% reduction was obtained with maximum voltage errors below 0.002 per unit. Additionally, we show that the radialization step accelerates the runtime of optimal voltage control problems when applied to Kron-reduced networks.


[76] 2508.18813

Recursive Experiment Design for Closed-Loop Identification of ARMAX Systems with Output Perturbation Limits

In many applications, system identification experiments must be performed in closed loop to ensure safety or to maintain system operation. In this paper, we consider the recursive design of informative experiments for ARMAX models by adding a bounded probing signal to the input generated by a fixed output feedback controller. The resulting output perturbations should be kept within user-specified limits. We analyze the identifiability and feasibility conditions of this setting and then proceed to derive a probing signal that can be efficiently computed in closed form. We demonstrate the effectiveness and properties of the design in numerical experiments.


[77] 2509.22685

VIRTUS-FPP: Virtual Sensor Modeling for Fringe Projection Profilometry in NVIDIA Isaac Sim

Fringe projection profilometry (FPP) is a high-precision structured-light sensing technique for 3D surface reconstruction, yet its practical deployment is often constrained by complex calibration procedures, sensitivity to environmental conditions, and the high cost of physical experimentation. At the same time, robotics research increasingly relies on simulation platforms such as NVIDIA Isaac Sim for scalable development and validation, but accurate virtual representations of optical metrology sensors such as FPP are not currently available. In this work, we present VIRTUS-FPP, the first end-to-end virtual sensor modeling framework for fringe projection profilometry implemented in NVIDIA Isaac Sim, enabling physically grounded simulation of the complete FPP pipeline, including structured light projection, image formation, calibration, and 3D reconstruction, without dependence on pre-calibrated physical systems. The framework leverages an inverse camera model for projector representation, ensuring geometric and photometric fidelity consistent with structured-light principles. By bridging optical metrology and robotics simulation, VIRTUS-FPP enables high-fidelity synthetic data generation, systematic evaluation of sensing pipelines, and digital twin replication of real-world FPP systems. Experimental results demonstrate sub-millimeter reconstruction accuracy and strong correspondence between simulated and physical measurements, highlighting the framework's effectiveness and its potential to advance perception-driven robotics, simulation-to-reality transfer, and scalable optical sensor design.


[78] 2510.17462

ORIX: Orchestration of RIS with xApps for Smart Wireless Factory Environments

The vision of a smart wireless factory (SWF) demands highly flexible, low-latency, and reliable connectivity that goes beyond conventional wireless solutions. Reconfigurable intelligent surface (RIS)-empowered communications, when integrated with the open radio access network (O-RAN) architectures, have emerged as a promising enabler to meet these challenging requirements. This article introduces the methodology for the orchestration of RIS with xApps (ORIX), bringing the RIS technology into the O-RAN ecosystem through xApp-based control for SWF environments. ORIX features three key components: an O-RAN-compliant RIS service model for dynamic configuration, an RIS channel simulator that supports 3GPP indoor factory models with multiple industrial scenarios, and practical RIS optimization strategies with finite-resolution control. Together, these elements provide a realistic end-to-end emulation platform for evaluating RIS placement, control, and performance in SWF environments prior to deployment. The presented case study demonstrates how ORIX enables the evaluation of achievable performance gains, exploration of trade-offs among key RIS design parameters, and identification of deployment strategies that balance system performance with practical implementation constraints. By bridging theoretical advances with industrial feasibility, ORIX lays the groundwork for RIS-assisted O-RAN networks to power next-generation wireless communication in industrial scenarios.


[79] 2511.19961

Toward Trustworthy Digital Twins in AI Agent-based Wireless Network Optimization: Challenges, Solutions, and Opportunities

Optimizing modern wireless networks is exceptionally challenging due to their high dynamism and complexity. While the AI agent powered by reinforcement learning (RL) offers a promising solution, its practical application is limited by prohibitive exploration costs and potential risks in the real world. The emerging digital twin (DT) technology provides a safe and controlled virtual environment for agent training, but its effectiveness critically depends on the DT's reliability. Policies trained in an unreliable DT that does not accurately represent the physical network may experience severe performance degradation upon real-world deployment. In this article, we introduce a new DT evaluation framework to ensure trustworthy DTs in AI agent-based network optimization. This framework shifts from model-level accuracy, such as wireless channel and user trajectory similarities, to a more holistic, task-centric DT assessment, which relies on the Markov decision process that the agent actually perceives. We demonstrate it as an effective guideline for design, selection, and lifecycle management of wireless network DTs. A comprehensive case study on a real-world wireless network testbed shows how this evaluation framework is used to pre-filter candidate DTs, leading to a significant reduction in training and testing costs without sacrificing deployment performance. Finally, potential research opportunities are discussed.


[80] 2512.07353

Off-grid solar energy storage system with hybrid lithium iron phosphate (LFP) and lead-acid batteries in high mountains: a case report of Jiujiu Cabins in Taiwan

Mountain huts are buildings located at high altitude, offering a place for hikers and providing shelter. Energy supply on mountain huts is still an open issue. Using renewable energies could be an appropriate solution. Jiujiu Cabins, a famous mountain hut in Shei-Pa National Park, Taiwan, has operated an off-grid solar energy storage system (ESS) with lead-acid batteries. In 2021, a serious system failure took place, leading to no electricity. After a detailed on-site survey, a reorganization and repair project implemented, the energy system came back to operate normally. Meanwhile, an eco-friendly lithium iron phosphate battery (LFP battery) ESS replaces part of the lead-acid battery ESS, forming a hybrid ESS, making a better and green off-grid solar ESS. In this case report, the energy architecture, detailed descriptions, and historical status of the system are provided. An on-site survey of the failed energy system, a system improvement project, and future plan are listed.


[81] 2512.23294

Agentic AI-Enhanced Semantic Communications: Foundations, Architecture, and Applications

Semantic communications (SemCom), as one of the key technologies for 6G, is shifting networks from bit transmission to semantic information exchange. On this basis, introducing agentic artificial intelligence (AI) with perception, memory, reasoning, and action capabilities provides a practicable path to intelligent communications. This paper provides a systematic exposition of how agentic AI empowers SemCom from the perspectives of research foundations, system architecture, and application scenarios. We first provide a comprehensive review of existing studies by agent types, covering embedded agents, large language model (LLM)/large vision model (LVM) agents, and reinforcement learning (RL) agents. Additionally, we propose a unified agentic AI-enhanced SemCom framework covering the application layer, the semantic layer, and the cloud-edge collaboration layer, forming a closed loop from intent to encoding to transmission to decoding to action to evaluation. We also present several typical scenarios, including multi-vehicle collaborative perception, multi-robot cooperative rescue, and agentic operations for intellicise (intelligent and concise) networks. Furthermore, we introduce an agentic knowledge base (KB)-based joint source-channel coding case study, AKB-JSCC, where the source KB and channel KB are built by LLM/LVM agents and RL agents, respectively. Experimental results show that AKB-JSCC achieves higher information reconstruction quality under different channel conditions. Finally, we discuss future evolution and research directions, providing a reference for portable, verifiable, and controllable research and deployment of agentic SemCom.


[82] 2601.22096

Reformulating Energy Storage Capacity Accreditation Problem with Marginal Reliability Impact

To enhance the efficiency of capacity markets, many electricity markets in the U.S. are adopting or planning to implement marginal capacity accreditation reforms. This paper provides new insights into energy storage capacity accreditation using Marginal Reliability Impact (MRI). We reformulate the commonly used reliability-based storage dispatch model as an optimization problem, enabling direct calculation of the MRI from the Lagrange multipliers, rather than using brute-force perturbation analysis. The analysis demonstrates that the EUE is a piecewise linear function and the storage MRI retains a non-negative property across various system scenarios. We further explore the influence of qualified capacity (QC), storage dispatch rules, and other key factors on storage accreditation, providing practical insights for system operators. Additionally, comparisons of storage capacity accreditation under different reliability criteria offer valuable guidance for policymakers in setting future standards. Numerical results from a modified California system validate our findings and highlight several important phenomena associated with the MRI-based accreditation scheme.


[83] 2603.21510

Unregistered Spectral Image Fusion: Unmixing, Adversarial Learning, and Recoverability

This paper addresses the fusion of a pair of spatially unregistered hyperspectral image (HSI) and multispectral image (MSI) covering roughly overlapping regions. HSIs offer high spectral but low spatial resolution, while MSIs provide the opposite. The goal is to integrate their complementary information to enhance both HSI spatial resolution and MSI spectral resolution. While hyperspectral-multispectral fusion (HMF) has been widely studied, the unregistered setting remains challenging. Many existing methods focus solely on MSI super-resolution, leaving HSI unchanged. Supervised deep learning approaches were proposed for HSI super-resolution, but rely on accurate training data, which is often unavailable. Moreover, theoretical analyses largely address the co-registered case, leaving unregistered HMF poorly understood. In this work, an unsupervised framework is proposed to simultaneously super-resolve both MSI and HSI. The method integrates coupled spectral unmixing for MSI super-resolution with latent-space adversarial learning for HSI super-resolution. Theoretical guarantees on the recoverability of the super-resolution MSI and HSI are established under reasonable generative models -- providing, to our best knowledge, the first such insights for unregistered HMF. The approach is validated on semi-real and real HSI-MSI pairs across diverse conditions.


[84] 2604.23182

An Exponentially stable Extended Kalman Filter with Estimate dependent Process noise Covariance for Chemical Reaction Networks

Biomolecular systems are often modeled with partially known nonlinear stochastic dynamics, making state and parameter estimation a central challenge. While Kalman filtering techniques are widely used in this setting, their performance critically depends on the choice of the process noise covariance, which is typically assumed constant and heuristically tuned. Such assumptions are not justified for biomolecular systems, where intrinsic noise arises from underlying reaction kinetics. In previous works, a process noise covariance update based on the Chemical Langevin Equation (CLE) was introduced for Extended Kalman Filter (EKF)-based estimation in Chemical Reaction Networks (CRN). In this work, we analyze the stochastic stability of this filtering framework. In particular, we obtain a conservative upper bound on sampling interval for discrete-time biomolecular systems that ensures mean-square exponential boundedness under stated assumptions. The proposed framework is validated through simulations on a nonlinear gene expression model. The analysis provides theoretical justification for CLE-based process noise covariance modeling in EKF design for biomolecular circuits, reducing reliance on heuristic covariance tuning.


[85] 2604.24453

NL-COMM-Sat: Breaking the Direct Device-to-Satellite Communication Barrier via "Aggressive" Non-Orthogonal Transmissions and Non-Linear Processing

Direct Device-to-Satellite (D2S) communications, which enable direct satellite connectivity with unmodified user equipment (UE), not only expand global coverage but also reshape the evolution of future access networks. However, D2S links face fundamental challenges due to inherently low signal-to-noise ratios (SNRs) and limited spatial multiplexing gains arising from near line-of-sight propagation, both of which severely constrain achievable spectral efficiency. Despite the lack of spatial multiplexing, this work shows that aggressive non-orthogonal transmissions, where multiple users (e.g., four) transmit concurrently over the same frequency resources, even to a single receive antenna, can unlock substantial capacity gains that remain entirely unexploited by existing systems. Realizing these gains in practice, however, requires receiver architectures that, to the best of our knowledge, have not yet been developed. To this end, we introduce NL-COMM-Sat, an efficient and flexible framework that overcomes this limitation by enabling aggressive non-orthogonal signal transmissions. In contrast to conventional non-orthogonal multiple access (NOMA) schemes, NL-COMM-Sat supports more than two UEs per receive antenna on the same frequency resource. The framework revisits optimal receiver design principles and proposes computationally efficient processing schemes that translate previously unexplored theoretical gains into tangible throughput improvements, even under realistic channel estimation errors and high-mobility Doppler conditions. Our evaluation shows that NL-COMM-Sat achieves up to a 2x increase in spectral efficiency compared to orthogonal multiple access and NOMA baselines across all considered SNR and Doppler regimes, even with a single-antenna receiver and user speeds of up to 500 km/h.


[86] 2605.04222

Safety by Invariance, Liveness through Refinement: Heterogeneous Contract Framework for Co-Design of Layered Control

Real-world control systems must achieve long-horizon objectives (liveness) while respecting continuous-time safety constraints, a combination that motivates hierarchical layered control architectures (LCAs). Existing LCA research, however, lacks (i) a uniform specification language across discrete planning and continuous execution, (ii) formal guarantees that specifications are preserved when interconnecting subsystems at heterogeneous time scales, and (iii) compositional separation between layers, owing to reliance on naive input-filtering laws. This paper addresses all three gaps by importing the safety--liveness decomposition into a heterogeneous assume--guarantee framework: \emph{safety is enforced by invariance} at the continuous-time layer, while \emph{liveness is achieved through refinement} at the discrete-time layer, with inter-layer coordination formalized via vertical refinement and timing-compatibility conditions. We instantiate this contract with a novel LCA combining an MPC planner, an input-to-state stabilizing (ISS) low-level controller, and a reference-governor bridge, and validate it on a Hybrid Energy Storage System (HESS) comprising a battery and a supercapacitor.


[87] 2605.16794

Modeling Coincident Peak Pricing in Electricity Markets: Challenges and Peak Shaving Effectiveness

Coincident Peak (CP) pricing is widely used in U.S. electricity markets to allocate capacity and transmission costs. This paper develops a behavioral game-theoretic framework for CP-driven load shifting that couples a nonlinear cost-allocation model with day-ahead (one-shot) and real-time (sequential-learning) decision processes. We examine two update rules, namely best-response dynamics (BRD) and fictitious-play dynamics (FPD), across continuous and finite action spaces to quantify how flexibility, action resolution, and participation influence peak outcomes. Using ERCOT peak-day data, we find that FPD reliably reduces system peaks, whereas BRD is more variable and can increase peaks under tight-capacity conditions. Finer action resolution improves peak shaving, while the number of participants is largely neutral when aggregate flexibility is fixed. Meanwhile, information-provider signals can induce herding, whereas response-aware or diverse signals improve peak shaving. These results highlight both the potential and limits of CP pricing: smoothing information and enabling granular control are as important as the amount of available flexibility. The framework offers practical guidance for system operators and consumers: For ISOs, broadcasting smoothed CP signals and setting minimum controllable-capacity thresholds enhance coordination. For consumers, greater flexibility and finer control resolution improve both cost savings and peak-shaving performance.


[88] 2605.23770

Reachability for Low-Thrust Trajectories via Maximum Initial Mass

Reachability analysis plays a central role in low-thrust spacecraft trajectory optimization by identifying which target states can be achieved under constraints on time, thrust, and propellant. Classical approaches construct reachable sets by solving many optimal control problems over grids of terminal states, requiring extensive forward simulations with fixed initial conditions. While effective, this approach is computationally expensive and becomes impractical for high-dimensional systems or strongly nonlinear dynamics, such as those encountered in cislunar environments or solar sail missions. This work introduces a dual formulation of the reachability problem. Instead of computing reachable sets directly, we determine, for fixed transfer time and boundary conditions, the maximum allowable initial mass (or, for solar sails, a scalar sail-strength parameter) that permits a successful transfer. A target is reachable if the spacecraft's initial mass does not exceed this threshold. This reformulation reduces reachability assessment to a scalar optimization problem for each target, producing a smooth scalar field that encodes equivalent feasibility information to classical reachable sets. We develop indirect maximum-initial-mass (MIM) formulations for both electric low-thrust and solar-sail dynamics and show how they can serve as efficient reachability oracles. Building on this formulation, we construct data-driven surrogate models to approximate the MIM-based reachability indicator. We investigate fully connected neural networks and demonstrate that residual networks provide the best trade-off between accuracy, training stability, and model complexity. The resulting surrogates enable rapid reachability evaluation while preserving the numerical advantages of the dual formulation, offering a practical tool for preliminary mission design and feasibility assessment.


[89] 2606.01446

Spatially Distributed Task-Oriented Compression for Multi-Emitter Localization and Characterization with Spectral Overlap

Radio frequency spectrum awareness requires the ability to detect, localize, and characterize emitters in dense and contested wireless environments. In this work, we propose a task-oriented distributed compression framework for joint multi-emitter localization and characterization using spatially distributed receivers. Each receiver observes a short window of complex IQ samples, converts the observation to a time--frequency representation, and encodes it into a compact latent vector. A central fusion decoder combines the receiver latents to estimate an unordered set of active emitters, including their locations, center-frequency offsets, occupied bandwidths, and waveform families. A permutation-invariant training objective is used to handle the arbitrary ordering of emitters and predictions. Experiments on synthetic multi-emitter scenes with spectral overlap show that even extremely compact receiver-side representations can preserve useful information for emitter counting and waveform-family estimation. However, accurate localization and spectral-parameter regression require larger latent dimensions. Increasing the receiver latent dimension from $d_{\mathrm{rx}}=1$ to $d_{\mathrm{rx}}=16$ provides the largest improvement, while further increasing to $d_{\mathrm{rx}}=64$ gives smaller gains. These results demonstrate the potential of learned task-oriented compression for communication-efficient distributed spectrum awareness.


[90] 2606.03960

SNF-PRP: A Covert Integrating Sensing and Communications Framework

Integrated sensing and communication (ISAC) enables simultaneous sensing and data transmission but exposes a critical vulnerability: probing signals may be intercepted, revealing both the transmitted information and the act of sensing itself. Existing physical layer security approaches mitigate interception yet operate with detectable signals, leaving sensing activity observable to a passive warden. This paper introduces sub-noise-floor pseudo-random probing (SNF-PRP), a covert sensing framework for OFDM-based ISAC systems under an energy-detection adversary model. SNF-PRP establishes an $\epsilon$-covertness guarantee via Kullback-Leibler (KL) divergence, exploits an $N_{\mathrm{sc}}$-fold spreading gain absent from prior wideband analyses, and derives in closed form the minimum integration length required to achieve a target Cramér-Rao bound (CRB). Simulations under 5G~NR n78 numerology confirm sub-0.5\,m range and sub-0.5\,m/s velocity accuracy with KL divergence $5.8\times$ below the covertness threshold, validating joint feasibility at $-12$\,dB and $-15$\,dB probing powers.


[91] 2606.04003

A sharp analysis of Root-MUSIC: locations of correct and extraneous roots

Root-MUSIC is a spectral estimation algorithm that approximates the unknown signal frequencies by constructing a high-degree polynomial and finding a subset of roots which are closest to the complex unit circle. Previous works found asymptotic expectation formulas for the performance of Root-MUSIC under the implicit assumption that the aforementioned root selection criterion does not select extraneous roots -- those which are unrelated to the correct parameters. This paper removes the need for this assumption by showing all extraneous roots lie outside an annulus of a certain thickness and therefore are not selected by the algorithm. This paper also provides sharp, non-asymptotic, and explicit error bounds for the correct roots in terms of fundamental model parameters. All results hold under a natural separation condition on the correct signal frequencies and are applicable in both the single- and multi-snapshot models. More specifically, in the multi-snapshot model, we prove that Root-MUSIC estimates the frequencies with error at most $O(\sigma /(m \sqrt n))$, where $\sigma^2$ is the noise variance, $m$ is the number of sensors, and $n$ is the number of snapshots. A novelty of this non-asymptotic bound is the explicit $1/m$ decay, which indicates that there is a significant advantage in utilizing additional sensors. Numerical simulations confirm our theory. The main mathematical insight of this paper is a geometric property of the Root-MUSIC polynomial: its correct roots are highly stable to noise while its extraneous roots must lie outside of an annulus.


[92] 2606.05763

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

Audio-Visual Speech Recognition (AVSR) enhances speech recognition robustness by leveraging visual cues, while real-world scenarios remain challenging due to viewpoint variation, audio distortion, and visual occlusion, which degrade modality quality and increase audio-visual asynchrony. In this paper, we propose a novel Modality-aware Multi-view Self-supervised representation framework for robust Audio-Visual Speech Recognition (M2S-AVSR). First, we introduce a multi-view representation learning encoder to learn view-invariant visual speech representations. Next, we employ a modality-aware module that explicitly models modality quality and cross-modal synchrony to perform fine-grained modality-aware fusion, enabling fine-grained visual information injection during decoding. In addition, we release AISHELL8-RealScene, a public multi-scenario, multi-view conversational audio-visual dataset recorded in real-world environments, and establish a speech recognition benchmark on it. Experiments on English and Mandarin benchmarks demonstrate the effectiveness of the proposed method under challenging conditions. On LRS3, M2S-AVSR achieves up to 29.4% relative improvement under viewpoint perturbation and visual degradation settings. Our method also achieves new state-of-the-art performance on the MISP2021-AVSR test set. On AISHELL8-RealScene, it achieves the best result in outdoor scenes. The proposed method and dataset provide useful support for future research on robust speech and multimodal tasks under realistic conditions.


[93] 2501.15768

Error-State LQR Formulation for Quadrotor UAV Trajectory Tracking

This article presents an error-state Linear Quadratic Regulator (LQR) formulation for robust trajectory tracking in quadrotor Unmanned Aerial Vehicles (UAVs). The proposed approach leverages error-state dynamics and employs exponential coordinates to represent orientation errors, enabling a linearized system representation for real-time control. The control strategy integrates an LQR-based full-state feedback controller for trajectory tracking, combined with a cascaded bodyrate controller to handle actuator dynamics. Detailed derivations of the error-state dynamics, the linearization process, and the controller design are provided, highlighting the applicability of the method for precise and stable quadrotor control in dynamic environments.


[94] 2502.16531

Efficient Coordination and Synchronization of Multi-Robot Systems Under Recurring Linear Temporal Logic

We consider multi-robot systems under recurring tasks formalized as linear temporal logic (LTL) specifications. To solve the planning problem efficiently, we propose a bottom-up approach combining offline plan synthesis with online coordination, dynamically adjusting plans via real-time communication. To address action delays, we introduce a synchronization mechanism ensuring coordinated task execution, leading to a multi-agent coordination and synchronization framework that is adaptable to a wide range of multi-robot applications. The software package is developed in Python and ROS2 for broad deployment. We validate our findings through lab experiments involving nine robots showing enhanced adaptability compared to previous methods. Additionally, we conduct simulations with up to ninety agents to demonstrate the reduced computational complexity and the scalability features of our work.


[95] 2504.10102

A Human-Sensitive Controller: Adapting to Human Musculoskeletal Disorder-Related Constraints via Reinforcement Learning

Work-Related Musculoskeletal Disorders continue to be a major challenge in industrial environments, leading to reduced workforce participation, increased healthcare costs, and long-term disability. This study introduces a human-sensitive robotic system aimed at reintegrating individuals with a history of musculoskeletal disorders into standard job roles, while simultaneously optimizing ergonomic conditions for the broader workforce. This research leverages reinforcement learning (RL) to develop a human-aware control strategy for collaborative robots, focusing on optimizing ergonomic conditions and preventing pain during task execution. Two RL approaches, Q-Learning and Deep Q-Network (DQN), were implemented and tested to personalize control strategies based on individual user characteristics. Although experimental results revealed a simulation-to-real gap, a fine-tuning phase successfully adapted the policies to real-world conditions. DQN outperformed Q-Learning by completing tasks faster while maintaining zero pain risk and safe ergonomic levels, achieving on average 38% shorter task completion times across all tested anthropometries. The structured testing protocol confirmed the system's adaptability to diverse human anthropometries, underscoring the potential of RL-driven cobots to enable safer, more inclusive workplaces.


[96] 2601.07622

Clipped Affine Policy: Low-Complexity Near-Optimal Online Power Control for Energy Harvesting Communications over Fading Channels

This paper studies online power control for battery-limited point-to-point energy harvesting communications over slow block-fading channels. A linear-policy-based approximation is developed for the relative-value function in the Bellman equation of the power control problem. This approximation leads to two fundamental parameterized clipped affine policies: an optimistic policy derived from a certainty-equivalence-type approximation and a robust policy derived from worst-case analysis. For independent and identically distributed energy arrivals and channel states, two families of power control schemes are developed based on the optimistic clipped affine (OCA) and robust clipped affine (RCA) policies, respectively. The proposed adaptive RCA policy based on reinforcement learning (RCA-RL) is further extended to address four scenarios with contextual information: one-step energy lookahead, one-step channel lookahead, one-step joint energy-channel lookahead, and Markov energy arrivals. Extensive simulation results show that the proposed schemes provide a favorable tradeoff between computational complexity and performance. The adaptive RCA policy based on the maximin optimal linear-policy-slope approximation (RCA-OLA-A) and the RCA-RL scheme achieve the best overall performance, while the RCA policy based on the maximin optimal linear policy (RCA-OL) is the best-performing closed-form policy. In particular, RCA-OLA-A, RCA-RL, and the aforementioned RCA-RL extensions achieve less than 2% performance loss relative to the optimal policy across a range of scenarios, consistently outperforming the considered benchmark approaches, including generic reinforcement learning baselines. The RCA-OL policy also performs well with less than 4% performance loss.


[97] 2602.16073

ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios

Developing autonomous driving systems for complex traffic environments requires balancing multiple objectives, such as avoiding collisions, obeying traffic rules, and making efficient progress. In many situations, these objectives cannot be satisfied simultaneously, and explicit priority relations naturally arise. Also, driving rules require context, so it is important to formally model the environment scenarios within which such rules apply. Existing benchmarks for evaluating autonomous vehicles lack such combinations of multi-objective prioritized rules and formal environment models. In this work, we introduce ScenicRules, a benchmark for evaluating autonomous driving systems in stochastic environments under prioritized multi-objective specifications. We first formalize a diverse set of objectives to serve as quantitative evaluation metrics. Next, we design a Hierarchical Rulebook framework that encodes multiple objectives and their priority relations in an interpretable and adaptable manner. We then construct a compact yet representative collection of scenarios spanning diverse driving contexts and near-accident situations, formally modeled in the Scenic language. Experimental results show that our formalized objectives and Hierarchical Rulebooks align well with human driving judgments and that our benchmark effectively exposes agent failures with respect to the prioritized objectives. Our benchmark can be accessed at this https URL.


[98] 2603.02536

Semantic Forwarding and Codebook-Enhanced Model Division Multiple Access for Satellite-Terrestrial Networks

Satellite-terrestrial communications are severely constrained by high path loss, limited spectrum resources, and time-varying channel conditions, rendering conventional bit-level transmission schemes inefficient and fragile, particularly in low signal-to-noise ratio (SNR) regimes. Semantic communication has emerged as a promising paradigm to address these challenges by prioritizing task-relevant information over exact bit recovery. In this paper, we propose a semantic forwarding-based semantic communication (SFSC) framework optimized for satellite-terrestrial networks. Specifically, we develop a vector-quantized joint semantic coding and modulation scheme, in which the semantic encoder and semantic codebook are jointly optimized to shape the constellation symbol distribution, improving channel adaptability and semantic compression efficiency. To mitigate noise accumulation and reduce on-board computational burden, we introduce a satellite semantic forwarding mechanism, enabling relay satellites to forward signals directly at the semantic level without full decoding and re-encoding. Furthermore, we design a channel-aware semantic reconstruction scheme based on feature-wise linear modulation (FiLM) to fuse the received SNR with semantic features, enhancing robustness under dynamic channel conditions. To support multi-user access, we further propose a codebook split-enhanced model division multiple access (CS-MDMA) method to improve spectral efficiency. Simulation results show that the proposed SFSC framework achieves a peak signal-to-noise ratio (PSNR) gain of approximately 7.9 dB over existing benchmarks in the low-SNR regime, demonstrating its effectiveness for robust and spectrum-efficient semantic transmission in satellite-terrestrial networks.


[99] 2603.08683

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.


[100] 2603.17003

Constricting Tubes for Prescribed-Time Safe Control

We propose a constricting Control Barrier Function (CBF) framework for prescribed-time control of control-affine systems with input constraints. Given a system starting outside a target safe set, we construct a time-varying safety tube that shrinks from a relaxed set containing the initial condition to the target set at a user-specified deadline. Any controller rendering this tube forward invariant guarantees prescribed-time recovery by construction. The constriction schedule is bounded and tunable by design, in contrast to prescribed-time methods where control effort diverges near the deadline. Feasibilityå under input constraints reduces to a single verifiable condition on the constriction rate, yielding a closed-form minimum recovery time as a function of control authority and initial violation. The framework imposes a single affine constraint per timestep regardless of state dimension, scaling to settings where grid-based reachability methods are intractable. We validate on an 18-dimensional multi-agent system, demonstrating scalability and prescribed-time recovery with bounded control effort.


[101] 2606.05739

Do speech foundation models perceive speaker similarity as humans do?

This study presents a comparative analysis between the speaker embeddings of speech foundation models and human subjective perception of speaker similarity. Human listeners have the ability to judge speaker similarity on a continuous scale discerning how similar two voices are. In contrast, speech foundation models embed speaker characteristics into numerical representation. However, a question remains: does the numerical distance between speaker embeddings in these models truly align with the similarity perceived by humans? To address this, we conduct a comprehensive investigation using more than 40 models to compare model-derived distances with human-perceived similarity scores. Furthermore, we identify which factors in model configuration contribute most to a speaker embedding that mirrors human perception. Our findings provide insights for the development of more perceptually grounded speech foundation models.