New articles on Electrical Engineering and Systems Science


[1] 2604.16435

Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals

Sparse Canonical Correlation Analysis (SCCA) is a fundamental statistical tool for identifying linear relationships in high-dimensional, multi-view data. While minimax theory establishes an optimal sample complexity scaling additively with the sparsity levels of the canonical vectors, computationally efficient algorithms typically suffer from a suboptimal multiplicative dependence. This computational-statistical gap is intrinsically tied to worst-case ``flat'' signal assumptions. In practice, however, multi-view signals frequently exhibit structured energy concentration, such as a power-law decay. To exploit this structural concentration and bypass the worst-case bottleneck, we propose Bilateral Spectral Energy Pursuit (Bi-SEP). Operating directly on the cross-covariance matrix, Bi-SEP is a stagewise adaptive algorithm that utilizes a proxy refinement step to dynamically track and capture cross-view signal energy. Theoretically, we establish a profile-adaptive sample complexity bound governed by the coupled energy profiles of the two views. Notably, under power-law decay models, we reveal a synergistic phase transition: the optimal linear sample complexity is attainable provided that the aggregate decay rate of the two views is sufficiently large. This result demonstrates that a highly concentrated signal in one view allows the model to accommodate a completely flat signal in its partner. Numerical experiments validate our theoretical findings, illustrating the advantages of Bi-SEP in structured, non-flat signal regimes.


[2] 2604.16437

Sampling Matters: The Effect of ECG Frequency on Deep Learning-Based Atrial Fibrillation Detection

Deep learning models for atrial fibrillation (AF) detection are increasingly trained on heterogeneous electrocardiogram (ECG) datasets with varying sampling frequencies, yet the specific consequences of these discrepancies on model performance, calibration, and robustness remain insufficiently characterized. To address this, we conducted a systematic benchmark using 12-lead, 10-second recordings from the PTB-XL dataset, resampled to target frequencies of 62, 100, 250, and 500 Hz, to evaluate a standard 1-D Convolutional Neural Network (CNN) and a hybrid CNN-Long Short-Term Memory (LSTM) architecture under a rigorous patient-safe cross-validation framework. Our analysis reveals that sampling frequency significantly impacts detection metrics in an architecture-dependent manner; the hybrid CNN-LSTM model demonstrated optimal performance and consistent calibration at intermediate frequencies (100-250 Hz), whereas the 1-D CNN baseline exhibited marked degradation in accuracy and sensitivity at 500 Hz, suggesting increased susceptibility to high-frequency noise. We conclude that ECG sampling frequency is a critical, underappreciated factor in arrhythmia detection, and future foundation models must explicitly control for temporal resolution to ensure clinical reliability and reproducibility.


[3] 2604.16442

The Breakthrough of Sleep: A Contactless Approach for Accurate Sleep Stage Detection Using the Sleepal AI Lamp

Sleep staging is essential for the assessment of sleep quality and the diagnosis of sleep-related disorders. Conventional polysomnography (PSG), while considered the gold standard, is intrusive, labor-intensive, and unsuitable for long-term monitoring. This study evaluates the performance of the Sleepal AI Lamp, a contactless, radar-based consumer-grade sleep tracker, in comparison with gold-standard polysomnography (PSG), using a large-scale dataset comprising 1022 overnight recordings. We extract multi-scale respiratory and motion-related features from radar signals to train a frequency-augmented deep learning model. For the binary sleep-wake classification task, experimental results demonstrated that the model achieved an accuracy of 92.8% alongside a macro-averaged F1 score of 0.895. For four-stage classification (wake, light NREM (N1 + N2), deep NREM (N3), REM), the model achieved an accuracy of 78.5% with a Cohen's kappa coefficient of 0.695 in healthy individuals and maintained a stable accuracy of 77.2% with a kappa of 0.677 in a heterogeneous population including patients with varying severities of obstructive sleep apnea (OSA). These experimental results demonstrate that the sleep staging performance of the contactless Sleepal AI Lamp is in high agreement with expert-labeled PSG sleep stages. Our findings suggest that non-contact radar sensing, combined with advanced temporal modeling, can provide reliable sleep staging performance without requiring physical contact or wearable devices. Owing to its unobtrusive nature, ease of deployment, and robustness to long-term use, the contactless Sleepal AI Lamp shows strong potential for clinical screening, home-based sleep assessment, and continuous longitudinal sleep monitoring in real-world medical and healthcare applications.


[4] 2604.16443

Thermal-GEMs: Generalized Models for Building Thermal Dynamics

Data-driven models for building thermal dynamics are a scalable approach for enabling energy-efficient operation through fault detection & diagnosis or advanced control. To obtain accurate models, measurement data from a target building spanning months to years are required. Transfer Learning (TL) mitigates this challenge by employing pretrained models based on single or multiple source buildings. General multi-source TL models promise to outperform single-source TL, but alternative multi-source modeling architectures remain to be explored, and evaluation on real-world data is missing. Moreover, time series foundation models (TSFM) have emerged as candidates for the best-performing general models. Hence, we conduct a first, comprehensive assessment of general modeling approaches for building thermal dynamics, including multi-source TL and TSFMs. Our assessment includes ablations using four state-of-the-art multi-source TL architectures and evaluations on synthetic as well as real-world data. We demonstrate that multi-source TL models are highly effective in accurately modeling buildings in real-world applications, yielding up to 63% lower forecasting errors compared to single-source TL. Moreover, our results suggest a trade-off between multi-source TL models exclusively pretrained with building data and TSFMs pretrained with a multitude of different time series, revealing that data from 16-32 source buildings must be available over 1 year for pretraining multi-source TL models to consistently outperform TSFMs as evaluated using the mean absolute error. These findings provide practical guidance for selecting modeling strategies based on the number of source buildings available for pretraining multi-source TL models.


[5] 2604.16445

SAND: The Challenge on Speech Analysis for Neurodegenerative Disease Assessment

Recent advances in Artificial Intelligence (AI) and the exploration of noninvasive, objective biomarkers, such as speech signals, have encouraged the development of algorithms to support the early diagnosis of neurodegenerative diseases, including Amyotrophic Lateral Sclerosis (ALS). Voice changes in subjects suffering from ALS typically manifest as progressive dysarthria, which is a prominent neurodegenerative symptom because it affects patients as the disease progresses. Since voice signals are complex data, the development and use of advanced AI techniques are fundamental to extracting distinctive patterns from them. Validating AI algorithms for ALS diagnosis and monitoring using voice signals is challenging, particularly due to the lack of annotated reference datasets. In this work, we present the outcome of a collaboration between a multidisciplinary team of clinicians and Machine Learning experts to create both a clinically annotated validation dataset and the "Speech Analysis for Neurodegenerative Diseases" (SAND) challenge based on it. Specifically, by analyzing voice disorders, the SAND challenge provides an opportunity to develop, test, and evaluate AI models for the automatic early identification and prediction of ALS disease progression.


[6] 2604.16447

Distributionally Robust Tolls for Traffic Networks with Affine Latency Functions

In network congestion games, system operators often utilize latency models, estimated from real-world traffic flow and travel time data, to design monetary incentives which steer equilibrium user behaviors towards lowering system-wide latency. This work studies the impact of latency model uncertainty when designing incentives in non-atomic network congestion games. Our approach leverages distributionally robust optimization (DRO), which captures data-driven uncertainty in latency models by considering worst-case distribution shifts. We prove that, under mild and practically relevant assumptions, the distributionally robust tolling problem in single origin-destination, affine-latency congestion games can be solved via convex programming. Numerical simulations illustrate that tolls designed to be distributionally robust against unknown disturbances can outperform tolls designed using fixed, nominal disturbance models in minimizing system-wide latency.


[7] 2604.16448

FM-CAC: Carbon-Aware Control for Battery-Buffered Edge AI via Time-Series Foundation Models

As edge AI deployments scale to billions of devices running always-on, real-time compound AI pipelines, they represent a massive and largely unmanaged source of energy consumption and carbon emissions. To reduce carbon emissions while maximizing Quality-of-Service (QoS), this paper proposes FM-CAC, a proactive carbon-aware control framework that leverages a battery as an active temporal buffer. By decoupling energy acquisition from energy consumption, FM-CAC can maximize the use of low-carbon energy, substantially reducing carbon emissions. At each control step, FM-CAC jointly optimizes the software pipeline variant, the hardware operating point, and the battery charging and discharging actions. To support this decision process, FM-CAC leverages edge-friendly Time-Series Foundation Models (TSFMs) for zero-shot carbon forecasting and integrates these forecasts into a dynamic programming solver with deferred cost attribution to prevent myopic battery depletion. Results show that FM-CAC reduces carbon emissions by up to 65.6% while maintaining near-maximum inference accuracy.


[8] 2604.16458

A Unified Control Theory Derivation of Discrete-Time Linear Ensemble Kalman Filters

The ensemble Kalman filter (EnKF) has become a standard methodology for state estimation in high-dimensional systems, yet its various stochastic and deterministic formulations often appear conceptually disconnected. In this paper, a unified derivation framework for EnKF algorithms are established by leveraging the classical duality between estimation and optimal control, which is the key concept in deriving Kalman filter. By recasting the minimum variance estimation problem into second order moment for the ensembles, we demonstrate that seemingly distinct EnKF variants -- both with or without perturbed observation -- can be systematically classified. Specifically, the duality based framework reveals that the operational differences among these variety of EnKF algorithms reduce to a specific choice of hyperparameters. Ultimately, this perspective not only covers existing EnKF variants but also provides a systematic foundation for designing novel hybrid filters using control theory approach.


[9] 2604.16459

Deep Hierarchical Knowledge Loss for Fault Intensity Diagnosis

Fault intensity diagnosis (FID) plays a pivotal role in intelligent manufacturing while neglecting dependencies among target classes hinders its practical deployment. This paper introduces a novel and general framework with deep hierarchical knowledge loss (DHK) to achieve hierarchical consistent representation and prediction. We develop a novel hierarchical tree loss to enable a holistic mapping of same-attribute classes, leveraging tree-based positive and negative hierarchical knowledge constraints. We further design a focal hierarchical tree loss to enhance its extensibility and devise two adaptive weighting schemes based on tree height. In addition, we propose a group tree triplet loss with hierarchical dynamic margin by incorporating hierarchical group concepts and tree distance to model boundary structural knowledge across classes. The joint two losses significantly improve the recognition of subtle faults. Extensive experiments are performed on four real-world datasets from various industrial domains (three cavitation datasets from SAMSON AG and one publicly available dataset) for FID, all showing superior results and outperforming recent state-of-the-art FID methods.


[10] 2604.16466

Projected Variational Quantum Extragradient for Zero-Sum Games

We propose a projected variational quantum extragradient (VQEG) framework for computing approximate Nash equilibria in two-player zero-sum matrix games. Mixed strategies are parameterized as Born distributions of parameterized quantum circuits (PQCs), transforming the classical bilinear saddle point problem into a smooth but generally minmax optimization in circuit-parameter space. The expected payoff is expressed as the expectation of a diagonal observable, enabling gradient evaluation via the parameter shift rule and compatibility with shot based quantum hardware. To support arbitrary game sizes, we introduce a dominated embedding that maps (m,n) games to qubit-compatible power-of-two dimensions while preserving equilibrium structure. We then develop a projected extragradient method using stochastic gradient estimates derived from finite measurement shots, and establish variance bounds scaling as O(1/S) with respect to the number of measurement shots S, along with convergence to approximate first-order stationarity under standard assumptions. Since stationarity does not guarantee equilibrium optimality, we evaluate performance using the game-space Nash gap. Numerical results demonstrate high-precision solutions on structured instances up to 32x32, while highlighting challenges in unstructured settings.


[11] 2604.16556

Goal-oriented Resource Allocation for Collaborative Integrated Sensing and Communication

In this paper, we consider resource allocation for a collaborative integrated sensing and communication (ISAC) scenario, in which distributed smart devices can be scheduled to perform sensing and transmit their sensing features to a fusion center. The fusion center aims to perform classification tasks on the environment based on received features. A scalable networksensing framework is proposed to balance the performance of the sensing service with that of the classical enhanced Mobile Broadband (eMBB) service. We adopt a tractable theoretical metric, the discriminant gain, as a proxy for the classification goal. We formulate cross-layer optimization problems to maximize discriminant gain under constraints on energy consumption and eMBB communication quality for the independent and joint scheduling policies. The joint scheduling policy has considerably higher complexity than the independent scheduling policy, in exchange for better collaborative sensing performance. A simplified gain model is proposed to reduce the complexity and practicality of the joint scheduling policy. Both policies are obtained via successive convex approximation and parametric convex optimization. Extensive experiments are conducted to verify the goal-oriented framework and the two policies. It is demonstrated that the two policies outperform the baseline policies with both synthetic and realistic radar simulation datasets. The joint scheduling policy can exploit device correlations and thus performs better than the independent scheduling policy under strong correlations and strict communication constraints.


[12] 2604.16614

CVaR-Guided Decision-Focused Learning and Risk-Triggered Re-Optimization for Two-Stage Robust Microgrid Operation

Microgrid operation is highly vulnerable to short-term load uncertainty, while conventional predict-then-optimize pipelines cannot fully align probabilistic forecasting quality with downstream robust scheduling performance. This paper proposes a CVaR-guided decision-focused learning and risk-triggered re-optimization framework for two-stage robust microgrid operation. A probabilistic load forecasting model first generates multi-quantile outputs, which are converted into prediction intervals to parameterize the load uncertainty set of the downstream two-stage robust optimization (TSRO) model. To improve forecasting reliability under difficult and high-risk operating conditions, a CVaR-guided forecasting objective is introduced to emphasize tail-sensitive samples. To further close the forecast-decision gap, a convex regularized surrogate TSRO model and a smooth regret loss are developed, enabling downstream operational feedback to be propagated to the forecasting model through KKT-based implicit differentiation. For online deployment, a risk-triggered re-optimization mechanism selectively re-solves the remaining-horizon TSRO only when the schedule mismatch becomes significant, avoiding unnecessary online computation. Case studies on modified IEEE 33-bus and 69-bus microgrids demonstrate superior probabilistic forecasting accuracy, operational economy, and tail-risk mitigation over benchmark methods, while preserving near-full-re-optimization performance with less than 0.5% higher operating cost and up to 91% lower daily solution time.


[13] 2604.16627

Scaling and Analytical Approximation of Porous Electrode Theory for Reaction-limited Batteries

Porous electrode theory (PET) provides essential insights into electrochemical states, but its computational complexity hinders real-time control and obscures scaling relations. To bridge the gap between high-fidelity simulations and reduced-order models, we present a framework of scaling analysis and analytical approximations. By assuming high-performance electrodes minimize transport limitations and overpotentials, we derive a simplified "lean model" governed by four dimensionless numbers: (i) a traditional Damk"ohler number, Da, scaling the characteristic reaction rate to the diffusion rate in the electrolyte-filled pores; (ii) the "process Damk"ohler number," Da_p, scaling the reaction rate to the applied capacity utilization rate (C-rate); (iii) the "wiring Damk"ohler number," Da_w, scaling the reaction rate to an effective electromigration rate for ions in the pores in series with electrons in the conducting matrix; and (iv) the "capacitive Damk"ohler number," Da_c, comparing the rates of Faradaic reactions and double-layer charging. For batteries, we derive analytical solutions for standard protocols, including galvanostatic discharge, chronoamperometry, and electrochemical impedance spectroscopy. Validated against numerical simulations of a practical NMC half-cell, our formulae show excellent agreement at negligible computational cost. This interpretable, physics-based framework accelerates battery design and state estimation while unifying the modeling of batteries, supercapacitors, fuel cells, and other porous electrode systems.


[14] 2604.16655

A Two-Stage Multi-Modal MRI Framework for Lifespan Brain Age Prediction

The accurate quantification of brain age from MRI has emerged as an important biomarker of brain health. However, existing approaches are often restricted to narrow age ranges and single-modality MRI data, limiting their capacity to capture the coordinated macro- and microstructural changes that unfold across the human lifespan. To address these limitations, we developed a multi-modal brain age framework to characterize the integrated evolution of brain morphology and white matter organization. Our model adopts a two-stage architecture, where modalities are processed independently and integrated via late fusion in both stages: first to classify each subject into one of six developmental stages, and then to estimate age within the predicted stage. This design enables a unified and lifespan-spanning assessment of brain maturity across diverse developmental periods.


[15] 2604.16700

Neural Encoding Detection is Not All You Need for Synthetic Speech Detection

This paper reviews the current state and emerging trends in synthetic speech detection. It outlines the main data-driven approaches, discusses the advantages and drawbacks of focusing future research solely on neural encoding detection, and offers recommendations for promising research directions. Unlike works that introduce new detection methods or datasets, this paper aims to guide future state-of-the-art research in the field and to highlight the risk of overcommitting to approaches that may not stand the test of time.


[16] 2604.16705

Synchronization-Safe Dynamic Microgrid Formation for DER-Led Distribution System Restoration With Constraint-Aware Graph Learning

Prolonged blackouts in distribution systems (DSs) with high penetration of distributed energy resources (DERs) necessitate novel restoration strategies to rapidly restore loads. However, the resulting complex optimization problem significantly limits scalability. This paper proposes a synchronization-safe dynamic microgrid (MG) formation (SSDMGF)-enabled restoration framework, in which a constraint-aware graph learning approach is developed to enhance solution efficiency. To characterize the restoration status of systems with evolving boundaries, the concepts of system mode and system class are defined. To ensure synchronization safety during restoration, the transitions of system mode and class for dynamically formed MGs are explicitly restricted. To further accelerate the solution process, a constraint-aware spatio-temporal graph convolutional network (STGCN) is designed to partially generate high-quality warm-start solutions, where synchronization-related constraints are embedded into a differentiable feasibility-resolving layer based on the straight-through estimator (STE). Case studies on a modified IEEE 123-node feeder validate that the proposed method ensures synchronization-safe MG formation and improves restoration performance. Meanwhile, the proposed acceleration framework achieves significant computational speed-ups without compromising final optimality.


[17] 2604.16708

Knowledge Distillation for Lightweight Multimodal Sensing-Aided mmWave Beam Tracking

Beam training and prediction in real-world millimeter-wave (mmWave) communications systems are challenging due to rapidly time-varying channels and strong interference from surrounding objects. In this context, widely available sensors, such as cameras and radars, can capture rich environmental information, enabling efficient beam management. This paper proposes a knowledge-distillation (KD)-enabled learning framework for developing lightweight and low-complexity models for beam prediction and tracking using real-world camera and radar data from the DeepSense 6G dataset. Specifically, a powerful teacher network based on convolutional neural networks (CNNs) and gated recurrent units (GRUs) is first designed to predict current and future beams from historical sensor observations. Then, a compact student model is constructed and trained via KD to transfer the predictive capability of the teacher model to a lightweight architecture. Simulation results demonstrate that jointly leveraging radar and image modalities significantly outperforms single-modality approaches. Moreover, the proposed student model achieves over 96% Top-5 beam prediction accuracy while reducing computational complexity by more than 4 times and the number of parameters by over 27 times compared with the teacher model.


[18] 2604.16709

A Universal Systematic Method to Generate Error Patterns on Memoryless Channels

The high computational cost of approaching the performance of Maximum-likelihood (ML) decoding has limited its practical use for decades. Because the complexity grows exponentially with the message length, researchers have spent years developing algorithms like Ordered Statistics Decoding (OSD), Partial Ordered Statistics Decoding (POSD) and Guessing Random Additive Noise decoding (GRAND) which try to approach ML performance. OSD, POSD and GRAND work by trying to guess the error patterns affecting the received signals. However, there does not exist a systematic method to extend the error pattern guesses to novel channels. This work introduces a systematic method that uses the Probability Density Function (PDF) of a memoryless channel to generate a set of error patterns that can be applied on any future received signal on this channel. Simulation results show that our proposed method applied on GRAND, OSD and POSD generally matches or outperforms current pre-generated error patterns on additive white Gaussian noise (AWGN) channel, mixture of Gaussian distribution channels, Rayleigh fading channel with perfect knowledge of Channel State Information (CSI) and Rayleigh fading channel with no perfect knowledge of Channel State Information (NCSI).


[19] 2604.16710

Timescale Limits of Linear-Threshold Networks

Linear-threshold networks (LTNs) capture the mesoscale behavior of interacting populations of neurons and are of particular interest to control theorists due to their dynamical richness and relative ease of analysis. The aim of this paper is to advance the study of global asymptotic stability in LTNs with asymmetric neural interactions and heterogeneous dissipation under the structural Lyapunov diagonal stability (LDS) condition. To this end, we introduce a one-parameter family of LTNs that preserves the LDS condition and has a parameter-independent equilibrium set. In the fast limit, this family converges to a projected dynamical system (PDS), while in the slow limit, it converges to a discontinuous hard-selector system (HSS). Under LDS, we prove that the fast PDS limit is globally exponentially stable and that the HSS limit is globally asymptotically stable. This alignment suggests that the limiting systems capture essential mechanisms governing stability across the entire LTN family. Together with numerical evidence, these findings indicate that resolving stability at the fast and slow endpoints provides a promising and structurally grounded path toward establishing global stability for LTNs with biologically plausible recurrence and diagonal dissipation.


[20] 2604.16761

A Control-Oriented Framework for Coupling Physics-Based and Data-Driven Models

Design, control, and estimation for dynamic systems require accurate and analytically tractable models. However, modern engineered systems contain components that are described with heterogeneous modeling paradigms, as well as subsystems that are challenging to model from physics alone. There have been significant efforts to address this through heterogeneous coupling frameworks and data-driven modeling. However, these two paths have been pursued in parallel. This work bridges this gap by introducing a control-oriented framework to couple physics-based and data-driven models. A physics-based microgrid with a data-driven data center load model is used to demonstrate the proposed four step methodology. Application of the framework yields a coupled system that allows for rigorous assessment of control properties. Equilibrium and stability tests are conducted, and they both reveal that the coupling structure and functions play a critical role in determining physically meaningful equilibrium points and stability of the integrated system. This information could only be accessed through the proposed framework, highlighting its importance.


[21] 2604.16769

Experimental Characterization Data for Battery Modules with Parallel-Connected Cells across Diverse Module-Level State of Health and Cell-to-Cell Variations

This experimental dataset presents both module-level and cell-level characterization data for lithium-ion battery modules composed of three parallel-connected inhomogeneous cells across a wide range of module-level state of health (M-SoH) and cell-to-cell variation (CtCV). First, 70 cells are aged to establish an inventory with cell-level state of health (C-SoH) ranging approximately from 100% to 80% (80% is considered as the end-of-life for automotive applications). From this inventory, 78 battery modules are then assembled, each exhibiting a distinct M-SoH value (from 100% to 80.98%) and a unique CtCV value (from 0% to 9.31%, defined as population standard deviation of C-SoH within each module). Module-level characterization data are collected at 25°C under 0.5C and 0.25C conditions, enabling extraction of module-level capacities and supporting diagnostic analyses such as incremental capacity analysis and differential voltage analysis. Before a module is assembled and tested, cell-level characterization tests are conducted for every individual cell within that module under 1C conditions, enabling direct quantification of CtCV and providing accurate labels for cell-level capacities and internal resistances. The dataset is organized with both raw time-series data and processed summary information such as C-SoH, M-SoH, and CtCV for all modules. With the paired module-level and cell-level characterization data, this dataset enables understanding and development of advanced degradation monitoring mechanisms for battery modules with parallel-connected cells in the presence of CtCVs.


[22] 2604.16819

Online Reinforcement Learning for Safe Gain Scheduling in Nonlinear Quadrotor Control

This paper presents an online reinforcement-learning framework for safe gain scheduling of a nonlinear quadcopter controller. Rather than learning thrust and torque commands directly, the proposed method selects gain vectors online from a finite library of pre-certified stabilizing controllers, thereby preserving the structure of the underlying snap-based control law. Safety is enforced by restricting the policy to admissible gains that maintain forward invariance of a prescribed safe state set, while dwell-time constraints prevent excessively fast switching. To reduce the action-space dimension, translational gains are shared across spatial axes by exploiting the isotropic structure of the translational dynamics, whereas yaw gains are scheduled independently. A deep Q-network learns to adjust feedback authority according to the current flight condition, using aggressive gains during large transients and milder gains near hover. High-fidelity nonlinear simulations demonstrate accurate trajectory tracking, bounded attitude motion, reduced control effort near convergence, and stable hover regulation under online safe gain scheduling.


[23] 2604.16906

Nesterov Accelerated Distributed Optimization with Efficient Quantized Communication

In modern large-scale networked systems, rapidly solving optimization problems while utilizing communication resources efficiently is critical for addressing complex tasks. In this paper, we consider an unconstrained distributed optimization problem in which information exchange among nodes is governed by a directed communication graph. In our setup we focus on two key challenges. The first is the zigzag phenomenon caused by the objective functions of individual nodes having significantly different curvature along different directions. The second is that the communication channels among nodes are subject to limited bandwidth, which motivates the use of compressed (quantized) messages. To address both challenges simultaneously, we propose QANM, a distributed optimization algorithm that combines Nesterov-accelerated gradient descent with a distributed finite-time quantized consensus protocol, enabling accelerated convergence. Under strong convexity and smoothness assumptions, we show that our proposed algorithm converges linearly to a neighborhood of the optimal solution. Finally, we validate our algorithm on a distributed sensor fusion application for multi-dimensional target parameter estimation, where simulations across two distinct scenarios confirm the convergence guarantees and demonstrate clear acceleration benefits over non-momentum baselines.


[24] 2604.16908

End-to-End ILC for Repetitive Untrackable Tasks: A Cooperative Game Perspective

An inherent assumption of perfect tracking in iterative learning control (ILC) is that there exists an ILC input such that the generated output can track the desired trajectory reference. This assumption may fail in practice, which gives rise to desired but untrackable tasks. This paper gives an end-to-end ILC design for repetitive untrackable tasks in closed-loop systems. The reference input is trial-to-trial updated together with the ILC feedforward input based on the measurement data. This two-player behavior of the closed-loop ILC system is investigated from a cooperative game perspective. A sufficient condition for the two-player end-to-end ILC to have a lower cost than the one-player norm optimal ILC (NOILC) is discovered. Finally, a numerical example is given to verify the effectiveness of the developed method.


[25] 2604.16947

Structured 3D-SVD: A Practical Framework for the Compression and Reconstruction of Biological Volumetric Images

This work introduces Structured 3D-SVD as a practical framework for the reconstruction, compression, and analysis of biological volumetric data. Inspired by the logic of matrix singular value decomposition (SVD), the proposed approach represents third-order volumetric data in the spatial domain and supports progressive reconstruction through ordered quasi-singular coeffients. The experimental evaluation was carried out on two biological volumetric datasets: one full-volume scan of a fish and another of a brain. The results show that Structured 3D-SVD achieves reconstruction quality close to that of Tucker decomposition while requiring shorter computation times and outperforms canonical polyadic decomposition (CPD) in both accuracy and runtime. In addition, a progressive reconstruction analysis shows that relatively low truncation levels are sufficient to preserve the main volumetric structures, while higher truncation levels lead to more detailed reconstructions.


[26] 2604.16970

A state-space representation of the boundary integral equation for room acoustic modelling

We introduce a new framework for room acoustics modelling based on a state-space model of the boundary integral equation representing the sound field in a room. Whereas state-space models of linear time-invariant systems are traditionally constructed by means of a state vector and a 4-tuple of system matrices, the state-space representation introduced in this work consists of a state function representing the pressure distribution at the room boundary, and a 4-tuple of integral operators. We refer to this representation as a boundary integral operator state-space (BIOSS) model and provide a physical interpretation for each of the integral operators. As many mathematical operations on vectors and matrices translate to functions and operators, the BIOSS representation can be manipulated to obtain two transfer function representations, having either a feedback or a parallel feedforward structure. Consequently, various equivalent representations for room acoustics are obtained in the BIOSS framework, in the time or frequency domain, and in continuous or discrete space. We discuss two future directions for how the proposed framework can be fertile for research on room acoustics modelling. Firstly, we identify equivalences between the BIOSS framework and various existing room acoustics models (boundary element models, delay networks, geometric models), which may be used to establish relations between existing models and to develop novel room acoustics models. Secondly, we postulate on how concepts from state-space theory, such as observability, controllability, and state realization, can be used for developing new inference and control methods for room acoustics.


[27] 2604.16983

Graph-Guided Adaptive Channel Elimination for KV Cache Compression

Large Language Models have revolutionized natural language processing, achieving unprecedented success across a vast range of tasks. However, their practical application in long-context scenarios is severely hampered by the formidable memory footprint of the Key-Value cache. While channel pruning has emerged as a promising compression strategy, existing methods evaluate channel importance in isolation, fundamentally ignoring the inter-channel interactions that collectively dictate model performance. This oversight leads to suboptimal pruning decisions. To address this, we introduce \textbf{GRACE} (\textbf{GR}aph-guided \textbf{A}daptive \textbf{C}hannel \textbf{E}limination), a novel framework that reframes KV cache compression as a graph-based optimization problem. GRACE models channels as nodes and their interactions as weighted edges, enabling the identification of a near-optimal channel subset for pruning by minimizing the reconstruction error of the attention weight matrix. Furthermore, GRACE incorporates an adaptive protection mechanism that shields salient key channels from removal, ensuring a robust autoregressive decoding process. Extensive experiments show that GRACE can reduce KV cache size by 60\% with negligible performance degradation, consistently outperforming the state-of-the-art method.


[28] 2604.17000

Anonymization, Not Elimination: Utility-Preserved Speech Anonymization

The growing reliance on large-scale speech data has made privacy protection a critical concern. However, existing anonymization approaches often degrade data utility, for example by disrupting acoustic continuity or reducing vocal diversity, which compromises the value of speech data for downstream tasks such as Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Speech Emotion Recognition (SER). Current evaluation practices are also limited, as they mainly rely on direct testing of anonymized speech with pretrained models, providing only a partial view of utility. To address these issues, we propose a novel two-stage framework that protects both linguistic content and acoustic identity while maintaining usability. For content privacy, we employ a generative speech editing model to seamlessly replace personally identifiable information (PII), and for voice privacy, we introduce F3-VA, a flow-matching-based anonymization framework with a three-stage design that produces diverse and distinct anonymized speakers. To enable a more comprehensive assessment, we evaluate privacy using both acoustic- and content-based speaker verification metrics, and assess utility by training ASR, TTS, and SER models from scratch. Experimental results show that our framework achieves stronger privacy protection with minimal utility degradation compared to baselines from the VoicePrivacy Challenge, while the proposed evaluation protocol provides a more realistic reflection of the utility of anonymized speech under privacy protection.


[29] 2604.17012

Net Load Forecasting Using Machine Learning with Growing Renewable Power Capacity Features: A Comparative Study of Direct and Indirect Methods

Renewable energy adoption has increased significantly over the past few years. However, with the increasing adoption of renewable energy, forecasting the net load has become a major challenge due to the inherent uncertainty associated with these renewable sources. To mitigate the impact of uncertainties, this study utilizes long short-term memory (LSTM) model and fully connected neural networks (FCNN) to predict net load based on two independent approaches: the direct method and indirect method. While the conventional direct method directly forecasts the target net load, the indirect approach derives it by separately predicting total load and renewable energy generation. Furthermore, this study innovatively incorporates renewable energy capacity as an input feature to train the forecasting model. The indirect method for FCNN provided a better estimate than the direct method, and the indirect method for LSTM model gave the best prediction. These findings suggest that recurrent architectures like LSTM are particularly well-suited for net load forecasting applications, while the choice between direct and indirect methods depends on the specific neural network architecture employed. By advancing reliable forecasting tools for renewable energy integration, this work enhances grid resilience and accelerates the transition toward renewable-dominant power systems.


[30] 2604.17026

Learning a Non-linear Surrogate Model for Multistage Stochastic Transmission Planning

Transmission expansion planning (TEP) plays a critical role in ensuring power system reliability and facilitating the integration of renewable energy resources. However, this process requires planners to constantly deal with significant uncertainty. While multistage stochastic TEP models provide a robust framework for identifying investment plans under uncertainty, the rapid growth in problem size hinders their computational tractability. To address this challenge, this paper develops a hybrid machine learning-optimisation framework for stochastic TEP. The proposed approach uses investment decisions and uncertainty scenarios as input features to train surrogate neural networks, which are then reformulated as mixed-integer linear constraints and embedded within an optimisation model. The surrogate model approximates expected operational costs to inform TEP decisions, reducing the burden arising from large operational problems. Case study applications on IEEE test systems demonstrate that, after training, the proposed approach achieves near-optimal investment costs while reducing total computational time by up to a factor of around 13 compared to a single full-optimisation stochastic formulation. This enables performing extensive multi-scenario analysis and stress testing that would otherwise be computationally prohibitive at scale.


[31] 2604.17027

Trapping Regions for Quadratic Systems with Generalized Lossless Nonlinearities

A trapping region is a compact set that is forward invariant with respect to the dynamics. Existence of a trapping region certifies boundedness of trajectories, and the size of the set provides an estimate of the ultimate bound. Prior work on trapping region analysis has focused on quadratic systems with energy-preserving (lossless) nonlinearities. In this work, we focus on a generalization of the lossless property and present an efficient parameterization that enables optimal trapping region computation for a broader class of quadratic systems than afforded by existing methods. We also formulate conditions for ellipsoidal trapping regions, whereas spherical regions have been the focus of prior works. Three numerical examples are used to demonstrate the proposed framework: (1) a four dimensional system for which the prior state-of-the art is incapable of identifying a trapping region; (2) a low-order unsteady aerodynamics model for which the proposed approach yields trapping regions approximately an order of magnitude smaller than prevailing methods; and (3) a two-state academic example in which the proposed approach correctly identifies a globally asymptotically stable equilibrium point.


[32] 2604.17032

Enabling Safety-Critical Wireless Communications via Safe Reinforcement Learning

Ensuring strict safety guarantees is the paramount challenge for emerging 5G/6G wireless systems, particularly as they increasingly govern mission-critical applications ranging from autonomous UAV swarms to industrial automation. While deep reinforcement learning (DRL) offers a promising solution for complex resource allocation, standard algorithms frequently violate essential constraints, such as QoS mandates and power limits, posing unacceptable risks of system failure and regulatory non-compliance. We propose Safe-Deep Q-Learning, a novel algorithm that simultaneously addresses all three challenges: it handles mixed-integer nonconvex problems by approximating the Q-function, adapts to stochastic dynamics, and enforces dual-timescale constraints using integrated Lagrangian methods. Our framework features adaptive penalty scaling and constraint violation tracking, specifically tailored for wireless environments, and is designed to operate in both distributed and centralized architectural modes. We prove convergence to optimal constraint-satisfying policies under mild conditions and demonstrate robustness through dual variable stabilization. Validation on unmanned aerial vehicle (UAV) swarm control network and post-disaster emergency communications applications shows that Safe-Deep Q-Learning achieves stringent adherence to safety bounds with near-zero violation rates, significantly outperforming existing constrained RL baselines, establishing its effectiveness for safety-critical wireless deployments.


[33] 2604.17034

A Hybrid STFT-Based Machine Learning Framework for Physically Interpretable Arc Stability Classification in Electric Arc Welding Systems

This study presents a physically informed hybrid time-frequency and machine learning (STFT-ML) framework for arc stability monitoring in electric arc welding systems. The primary current signal is modeled as a stochastic representation of plasma dynamics and transformed into a structured feature space using localized spectral energy distributions. Within this framework, the Arc Stability Index (ASI), spectral entropy (Hs), and harmonic distortion (THDarc) are defined as energy-based descriptors and integrated with complementary time-domain features to capture both spectral redistribution and temporal variability. Experimental evaluation demonstrates that the SVM-RBF classifier achieves a hold-out accuracy of 94.4%. However, cross-validation results (85.6% for Leave-One-Out and 87.5% +/- 9.4 for 10-fold) and a 95% confidence interval of [81.65%, 92.50%] provide a more realistic assessment of generalization performance. Receiver Operating Characteristic (ROC) and Precision-Recall (PR) analyses further confirm strong class separability, particularly for stable and extinction regimes, while transient states remain more challenging due to their non-stationary nature. Compared to high-dimensional deep learning approaches, the proposed framework significantly reduces computational complexity and inference latency, enabling real-time deployment in resource-constrained environments. The results indicate that spectral energy redistribution around the fundamental frequency serves as a reliable precursor to arc instability. The main contribution of this work lies in the development of a computationally efficient and physically interpretable feature representation framework that bridges time-frequency analysis and machine learning-based classification for industrial diagnostic applications.


[34] 2604.17047

E2E-WAVE: End-to-End Learned Waveform Generation for Underwater Video Multicasting

We present E2E-WAVE, the first end-to-end learned waveform generation system for underwater video multicasting. Acoustic channels exhibit 20--46% bit error rates where forward error correction becomes counterproductive -- LDPC increases rather than decreases errors beyond its decoding threshold. E2E-WAVE addresses this by embedding semantic similarity directly into physical layer waveforms: when decoding errors are unavoidable, the system preferentially selects semantically similar tokens rather than arbitrary corruption. Combining VideoGPT tokenization (1024x compression) with a trainable waveform bank and fully differentiable OFDM transmission, E2E-WAVE achieves +5 dB (19.26%) PSNR and +0.10 (14.28%) SSIM over the strongest FEC-protected baseline in less challenging underwater channel (NOF1) while delivering real-time 16 FPS video at 128x128 resolution over 2.3 kbps channels -- impossible for conventional digital modulation. The performance gap only increases in harsher channels (BCH1, NCS1). Trained on a single channel, E2E-WAVE generalizes to unseen underwater environments without retraining, while HEVC fails at sub-5 kbps rates and SoftCast's AWGN assumptions collapse on frequency-selective channels.


[35] 2604.17081

Coordinated Dynamic Operating Envelopes for Unlocking Additional Flexibility at Grid Edge

Dynamic operating envelopes (DOEs) provide a systematic framework to integrate the flexibility of distribution grid resources while safeguarding network limits such as line ratings and voltage bounds. However, the flexibility derived from individual DOEs is often restricted and conservative, especially when some resources can coordinate via communication with an aggregator. This paper presents a convex, geometry-aware framework for constructing DOE for distribution grid customers under partial coordination, with coordinated customers modeled through polytopal flexibility sets and non-coordinated customers through hyperrectangles. The framework additionally incorporates fairness constraints for export and import headroom allocated to the customers within the DOE design. To account for forecast uncertainty in inelastic injections, the DOE design is extended to a robust formulation for bounded uncertainty sets. Case studies on the European Low Voltage Test Feeder indicate that the proposed DOE construction expands total harnessed flexibility, while being consistent with network limits, export/import fairness constraints and is robust to forecast uncertainty. Specifically, coordinating 30% of customers increased the achievable aggregate active-power injection range by approximately 25% relative to the non-coordinated baseline.


[36] 2604.17099

Movable Antenna Optimization for Multi-User MIMO Systems in Realistic Ray-Traced Propagation Environments

To meet the growing data traffic demand in future wireless systems, novel transmission architectures capable of adapting to complex propagation environments are required. Movable antenna (MA) systems have recently emerged as a promising approach, enabling the physical repositioning of antenna elements to exploit spatial degrees of freedom. However, existing studies largely rely on idealized or simplistic channel models, leaving open the question of whether the performance gains of MA systems persist under realistic propagation conditions. This paper investigates the performance of downlink multi-user MIMO systems with movable antennas using deterministic ray-traced channel models. A simulation framework combining three-dimensional ray tracing and field-response channel modeling is developed, and antenna positions are optimized using particle swarm optimization and genetic algorithms. Simulation results reveal that while simplified distance-based channel models predict large performance disparities between competing array configurations, realistic ray-traced channels significantly compress these differences, indicating that propagation effects dominate over pure array geometry optimization. Nevertheless, movable antenna systems retain strong effectiveness over conventional fixed arrays across different user distributions, array sizes, and multipath conditions, even in geometry-constrained propagation environments.


[37] 2604.17118

A Two-Stage Deep Learning Framework for Segmentation of Ten Gastrointestinal Organs from Coronal MR Enterography

Accurate segmentation of gastrointestinal (GI) organs in magnetic resonance enterography (MRE) is critical for diagnosing inflammatory bowel disease (IBD). However, anatomical variability, class imbalance, and low tissue contrast hinder reliable automation. This study proposes a dual-stage deep learning framework for organ-specific segmentation of GI structures from coronal MRE images to address these challenges. A publicly available MRE dataset of 3,195 coronal T2-weighted HASTE slices from 114 IBD patients was used. Initially, a DenseNet201-UNet++ model generated coarse masks for ROI extraction. A DenseNet121-SelfONN-UNet model was then trained on organ-specific patches. Extensive data augmentation, normalization, five-fold cross-validation, and class-specific weighting were applied to mitigate severe class imbalance, particularly for the appendix. The initial stage achieved strong organ localization but underperformed for the appendix; class weighting improved its DSC from 6.76% to 85.76%. The second-stage DenseNet121-SelfONN-UNet significantly enhanced segmentation across all GI structures, with notable DSC gains (cecum +23.62%, sigmoid +18.57%, rectum +17.99%, small intestine +16.06%). Overall, the framework achieved mDSC of 88.99%, mIoU of 84.76%, and mHD95 of 6.94 mm, outperforming all baselines. This framework demonstrates the effectiveness of a coarse-to-fine, organ-aware segmentation strategy for intestinal MRE. Despite higher computational cost, it shows strong potential for clinical translation and enables anatomically informed diagnostic tools in gastroenterology.


[38] 2604.17165

On the Unification of Optimal Current Reference Theory for Wound Rotor Synchronous Machines

Controllers for motor drives typically require a current reference which will satisfy the requested torque subject to system constraints. This work generalizes existing current reference theory to the case of the Wound Rotor Synchronous Machine (WRSM). By incorporating the additional rotor-current degree-of-freedom, along with magnetic saturation, cross-coupling, and speed-dependent core losses, the problem of finding an optimal current reference is formulated within affine flux regions as a quadratically constrained quadratic program using a piecewise-affine approximation derived from finite-element data. The solution is characterized according to the active constraint regime, yielding closed-form or low-dimensional polynomial solutions in several cases, and a small semidefinite program in the voltage constrained regime. The proposed framework extends unified optimal current reference theory beyond the permanent-magnet setting to three degree-of-freedom WRSMs while remaining computationally tractable. Results on a physical WRSM prototype illustrate the effectiveness of the approach across the torque-speed operating envelope.


[39] 2604.17169

Two-Tier High Altitude Platform Stations (HAPS) for Exploring Wireless Energy Harvesting

In sixth-generation (6G) cellular networks and beyond, aerial platforms, such as uncrewed aerial vehicles (UAVs) and high-altitude platform stations (HAPS), are anticipated to play a crucial role in enhancing connectivity, expanding network coverage, and supporting advanced communication services. However, the deployment of energy-efficient onboard communication systems is essential for their widespread adoption and effectiveness. The integration of energy harvesting (EH) into aerial platforms is envisioned to be pivotal in promoting both energy and cost efficiency. In this paper, we propose a new paradigm for aerial platforms in which they can collect energy from the transmitted signals of nearby aerial platforms. The paper employs a two-tier architecture with HAPS super-macro base stations (HAPS-SMBS) system: regular HAPS-SMBS nodes serve as base stations, while a "mother" HAPS-SMBS node acts as a manager to coordinate communications between regular HAPS-SMBS and the ground station, thus enabling wireless energy transfer. Specifically, we analyze the characteristics of EH-enabled HAPS-SMBS and compare their performance with those without EH. Additionally, we derive the optimal regular HAPS-SMBS positioning to mitigate signal attenuation and power loss. Subsequently, we formulate a joint optimization problem for regular HAPS-SMBS positioning and the EH factor. We solve the problem using the iterative distance and EH factor algorithm (IDFA); however, we employ $Q$-learning to verify its effectiveness. Our findings indicate that, compared to conventional EH systems, IDFA and $Q$-learning exhibit higher data rate performance. In contrast, $Q$-learning outperforms IDFA systems in linear modelswith intensive training in approximating optimal values. Furthermore, maximizing transmit power achieves higher gains than systems without EH.


[40] 2604.17176

Intent-aligned Autonomous Spacecraft Guidance via Reasoning Models

Future spacecraft operations require autonomy that can interpret high-level mission intent while preserving safety. However, existing trajectory optimization still relies heavily on expert-crafted formulations and does not support intent-conditioned decision-making. This paper proposes an intent-aligned spacecraft guidance framework that links high-level reasoning and safe trajectory optimization through explicit intermediate abstractions, based on behavior sequences and waypoint constraints. A foundation model first predicts an intent-aligned behavior plan, a waypoint generation model then converts it into waypoint constraints, and the safe trajectory is computed via optimization. This decomposition enables scalable supervision without sacrificing safety. Numerical experiments in close-proximity operation scenarios demonstrate that the proposed pipeline achieves over 90\% SCP convergence and yields a $1.5\times$ higher rate of generating trajectories that satisfy the top intent-prioritized performance criteria than heuristic decision-making. These results support the use of intermediate behavior abstraction as a practical interface between foundation-model reasoning and safety-critical onboard spacecraft autonomy.


[41] 2604.17192

CADRE: Card-Agnostic Domain-Aligned RF Embeddings for Virtual PIN Pads on Passive NFC Cards

Near Field Communication (NFC) cards are widely used for identification, but their passive nature often limits the ability to incorporate additional security mechanisms. As a result, anyone holding the card may be incorrectly recognized as an authenticated user. To overcome this limitation, this paper presents a secure manual password input framework using a virtual PIN pad for passive NFC cards. Users input passwords by pressing designated regions on the card, which induces measurable impedance variations in the NFC antenna. These variations change the RF signals subtly, and a deep learning model is used to infer the intended password from the resulting signal patterns. A key challenge is that identical press interactions can produce significantly different responses across NFC cards, which yields unreliable recognition. To address this, we introduce a lightweight recognition approach that operates directly within the RF feature space at the penultimate layer of a temporal neural encoder. An adversarial domain-alignment module reshapes virtual PIN pad press-response embeddings into compact, card-invariant clusters, which enables stable and consistent recognition across heterogeneous cards. To support model training and evaluation, a reconfigurable software-defined radio (SDR) testbed is developed, and PIN pad press-response data are collected from commercially available ISO/IEC 15693 cards. Recognition is performed using a Mahalanobis distance metric derived from a calibration-based covariance model that captures feature correlations. Experimental results show that the proposed system achieves a 98.20\% recognition acceptance rate and remains robust under substantial noise degradation. The framework is fully card-agnostic and can be seamlessly integrated into existing NFC infrastructures.


[42] 2604.17201

Robust Resource Allocation in RIS-Assisted Wireless Networks Integrating NOMA and Over-the-Air Federated Learning

This paper addresses the critical issue of spectrum scarcity and the need to support diverse services, including communication and learning tasks, by presenting a reconfigurable intelligent surface (RIS)-aided wireless network framework that integrates non-orthogonal multiple access (NOMA) with over-the-air federated learning (AirFL). The proposed system leverages the ability of RIS to adaptively shape wireless channels, aiming to enhance overall network performance for both communication and learning through concurrent uplink transmissions. To tackle critical challenges such as co-channel interference, imperfect channel state information (CSI), and successive interference cancellation (SIC), we develop an optimization framework that focuses on minimizing the optimality gap. This joint optimization is formulated as a non-convex problem, complicated by the intricate interactions between NOMA and AirFL users as well as the impact of imperfect CSI and SIC. To overcome these challenges and reduce the optimality gap, we reformulate the optimization problem as a Markov decision process and solve it using a long short-term memory deep deterministic policy gradient (LSTM-DDPG) algorithm, a memory-based approach within deep reinforcement learning (DRL). Simulation results demonstrate that the proposed approach achieves faster convergence, lower variance, and improved robustness under channel uncertainty, outperforming baseline DRL algorithms such as DDPG, soft actor-critic (SAC), and advantage actor-critic (A2C).


[43] 2604.17205

Power Flow Solvability with Volt-Var Controlled Inverter-Based Resources

This paper establishes a sufficient condition for guaranteeing power flow solvability in distribution grids with inverter-based resources (IBRs) operating under IEEE 1547 compliant Volt-Var control. While designed to improve voltage profiles, reactive power injection can drive the system toward its operational limits. Under these stressed conditions, any further incremental reactive power injection can trigger voltage collapse, the point at which a power flow solution ceases to exist. In this paper, by leveraging a phasor-based voltage representation, the power flow equations with Volt-Var control are developed in the complex fixed point form, enabling a compact formulation and the rigorous application of fixed-point theorems. Addressing the challenges posed by the non-holomorphicity of the complex power flow equations due to the Volt-Var function's dependence on voltage magnitude, the solvability conditions are then developed using the Brouwer fixed-point theorem. The proposed conditions are validated through simulations on distribution test feeders, with a primary focus on their application to real-time decision-making for voltage regulation services.


[44] 2604.17221

Bilinear Input Modulation for Mamba: Koopman Bilinear Forms for Memory Retention and Multiplicative Computation

Selective State Space Models (SSMs), notably Mamba, employ diagonal state transitions that limit both memory retention and bilinear computational capacity. We propose a factorized bilinear input modulation that augments the SSM with a state-input product, interpretable as a finite-dimensional Koopman bilinear form. After introducing a shared state across channels (Coupled SSM), the modulation admits two implementations. Coupled Bilinear Input Modulation (Coupled-BIM) retains the full bilinear product at the cost of sequential computation, while Coupled Gated Modulation (Coupled-GM) linearizes it into a gate modulation that is compatible with the parallel scan. Experiments on a multiple input-delay pendulum (memory retention) and NARMA-10 (bilinear computation) reveal a clear dissociation. Coupled-GM substantially improves memory retention but not bilinear computation, while Coupled-BIM improves both. A pathway ablation confirms that the two downstream routes of the bilinear signal serve complementary roles. The improvement is statistically robust, with Coupled-BIM consistently outperforming all other variants on bilinear computation. Furthermore, only Coupled-BIM benefits from increasing the SSM state dimension, while coupling or gate modulation alone show no improvement, establishing the bilin-ear mechanism as uniquely capable of exploiting larger state spaces.


[45] 2604.17248

VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech

Large Audio-Language Models (LALMs) are increasingly integrated into daily applications, yet their generative biases remain underexplored. Existing speech fairness benchmarks rely on synthetic speech and Multiple-Choice Questions (MCQs), both offering a fragmented view of fairness. We propose VIBE, a framework that evaluates generative bias through open-ended tasks such as personalized recommendations, using real-world human recordings. Unlike MCQs, our method allows stereotypical associations to manifest organically without predefined options, making it easily extensible to new tasks. Evaluating 11 state-of-the-art LALMs reveals systematic biases in realistic scenarios. We find that gender cues often trigger larger distributional shifts than accent cues, indicating that current LALMs reproduce social stereotypes.


[46] 2604.17300

Chaos-Enhanced Prototypical Networks for Few-Shot Medical Image Classification

The scarcity of labeled clinical data in oncology makes Few-Shot Learning (FSL) a critical framework for Computer Aided Diagnostics, but we observed that standard Prototypical Networks often struggle with the "prototype instability" caused by morphological noise and high intra-class variance in brain tumor scans. Our work attempts to minimize this by integrating a non-linear Logistic Chaos Module into a fine-tuned ResNet-18 backbone creating the Chaos-Enhanced ProtoNet(CE-ProtoNet). Using the deterministic ergodicity of the logistic chaos map we inject controlled perturbations into support features during episodic training-essentially for "stress testing" the embedding space. This process makes the model to converge on noise-invariant representations without increasing computational overhead. Testing this on a 4-way 5-shot brain tumor classification task, we found that a 15% chaotic injection level worked efficiently to stabilize high-dimensional clusters and reduce class dispersion. Our method achieved a peak test accuracy of 84.52%, outperforming standard ProtoNet. Our results suggest the idea of using chaotic perturbation as an efficient, low-overhead regularization tool, for the data-scarce regimes.


[47] 2604.17311

Distributed Nesterov Flows for Multi-agent Optimization

Various distributed gradient descent algorithms for multi-agent optimization have incorporated the Nesterov accelerated gradient method, where the use of momentum enhances convergence rates. These algorithms have found broad applications in large-scale machine learning and optimization owing to their simplicity and low communication complexity. In this paper, we establish a continuous-time approximation of distributed Nesterov gradient descent. The convergence properties and convergence rate of the resulting distributed Nesterov flow are analyzed using Lyapunov methods. Building on these insights, we design new parameter choices within the flow, from which we derive flow-inspired discrete-time algorithms for multi-agent optimization. Surprisingly, the resulting algorithms achieve faster convergence compared to existing distributed gradient descent methods: they require fewer iterations to reach the same accuracy for strongly convex functions and exhibit an improved convergence rate for general convex functions without incurring additional communication rounds. Furthermore, we investigate the influence of the network topology on algorithm performance and derive an explicit relationship between the convergence rate and the graph condition number. Numerical simulations are presented to validate the effectiveness of the proposed approach.


[48] 2604.17343

CAR-EnKF: A Covariance-Adaptive and Recalibrated Ensemble Kalman Filter Framework

The ensemble Kalman filter (EnKF) is widely used for nonlinear and high-dimensional state estimation because it replaces complex covariance propagation with simple ensemble statistics. However, conventional EnKF implementations can become overconfident in the presence of measurement nonlinearity. The commonly used covariance inflation technique only partially alleviates this issue. This paper proposes a covariance-adaptive and recalibrated ensemble Kalman filter (CAR-EnKF) framework for nonlinear state estimation. The framework introduces two improvements that are only active for nonlinear measurements and reduce to the conventional EnKF framework without covariance inflation in the linear case: (i) a recalibration mechanism that reassesses the effect of the chosen Kalman gain after updating the ensemble mean, and (ii) a positive semidefinite covariance compensation term that accounts for measurement nonlinearity. An adaptive update law based on the normalized innovation squared further tunes the compensation magnitude online. The framework is algorithmically general and is specialized here to the stochastic EnKF and the ensemble transform Kalman filter (ETKF). Experiments on feature-based SLAM and the Lorenz--96 system show that CAR-EnKF consistently reduces RMSE relative to conventional EnKF baselines, with especially large improvements at low measurement-noise levels. The related codes are available at \href{this https URL}


[49] 2604.17350

SPaRSe-TIME: Saliency-Projected Low-Rank Temporal Modeling for Efficient and Interpretable Time Series Prediction

Time series forecasting is traditionally dominated by sequence-based architectures such as recurrent neural networks and attention mechanisms, which process all time steps uniformly and often incur substantial computational cost. However, real-world temporal signals typically exhibit heterogeneous structure, where informative patterns are sparsely distributed and interspersed with redundant observations. This work introduces \textbf{SPaRSe-TIME}, a structured and computationally efficient framework that models time series through a decomposition into three complementary components: saliency, memory, and trend. The proposed approach reformulates temporal modeling as a projection onto informative subspaces, where saliency acts as a data-dependent sparsification operator, memory captures dominant low-rank temporal patterns, and trend encodes low-frequency dynamics. These components are integrated through a lightweight, adaptive mapping that enables simplified, selective, and interpretable temporal reasoning. Extensive experiments on diverse real-world datasets demonstrate that SPaRSe-TIME achieves competitive predictive performance compared to recurrent and attention-based architectures, while significantly reducing computational complexity. The model is particularly effective in structured time series with clear temporal components and provides explicit interpretability through component-wise contributions. Furthermore, analysis reveals both the strengths and limitations of decomposition-based modeling, highlighting challenges in highly stochastic and complex multivariate settings. Overall, SPaRSe-TIME offers a principled alternative to monolithic sequence models, bridging efficiency, interpretability, and performance, and providing a scalable framework for time series learning.


[50] 2604.17362

FARM: Foundational Aerial Radio Map for Intelligent Low-Altitude Networking

Precise aerial radio environment characterization is vital for low-altitude planning. However, existing datasets and estimation methods lack the high-resolution granularity required for complex aerial spaces. Additionally, current schemes suffer from poor generalization and heavy reliance on environmental priors. To address these gaps, this paper introduces FARM, a pioneering foundation model for unified aerial radio map estimation. This model is supported by a newly curated, high-resolution dataset featuring multi-band and multi-antenna configurations specifically for low-altitude environments. FARM utilizes a masked autoencoder to extract deep latent representations of the aerial radio environment, which then guide a diffusion-based decoder to generate high-fidelity signal distributions through iterative refinement. Extensive experiments demonstrate that FARM significantly outperforms state-of-the-art benchmarks and exhibits superior generalization capabilities across unseen scenarios. Ultimately, FARM serves as a critical infrastructure for low-altitude economy by enabling autonomous aerial logistics and intelligent urban networking.


[51] 2604.17368

Stochastic Delayed Dynamics of Rumor Propagation with Awareness and Fact-Checking

This paper presents a stochastic delayed differential model for rumor propagation during infodemic that incorporates human behavioral response, public skepticism and fact-checking mechanisms. A discrete time delay is introduced to model natural lags in information processing and institutional response. Additionally, we adopt additive stochastic perturbations to model random fluctuations in social interaction and exposure. We present a rigorous stability analysis of the proposed rumor transmission model and derive convergence guarantees under reproduction number conditions. We also validate the model by numerical simulations and analyze the outbreak severity and quantify uncertainty under variable information processing delays. The results highlight the importance of timely awareness and fact-checking interventions for mitigating misinformation spread during pandemics


[52] 2604.17371

Leveraging Kernel Symmetry for Joint Compression and Error Mitigation in Edge Model Transfer

This paper investigates communication-efficient neural network transmission by exploiting structured symmetry constraints in convolutional kernels. Instead of transmitting all model parameters, we propose a degrees-of-freedom (DoF) based codec that sends only the unique coefficients implied by a chosen symmetry group, enabling deterministic reconstruction of the full weight tensor at the receiver. The proposed framework is evaluated under quantization and noisy channel conditions across multiple symmetry patterns, signal-to-noise ratios, and bit-widths. To improve robustness against transmission impairments, a projection step is further applied at the receiver to enforce consistency with the symmetry-invariant subspace, effectively denoising corrupted parameters. Experimental results on MNIST and CIFAR-10 using a DeepCNN architecture demonstrate that DoF-based transmission achieves substantial bandwidth reduction while preserving significantly higher accuracy than pruning-based baselines, which often suffer catastrophic degradation. Among the tested symmetries, \textit{central-skew symmetry} consistently provides the best accuracy-compression tradeoff, confirming that structured redundancy can be leveraged for reliable and efficient neural model delivery over constrained links.


[53] 2604.17414

Physics-Aware Query-Conditioned Graph Attention Networks for Radio Map Estimation

Radio map estimation from sparse measurements is fundamental to wireless network planning, optimization, and localized map updating. Most recent learning-based approaches formulate the problem as dense map completion over a predefined grid, whereas many practical deployments require estimating transmitter-specific received signal strength only at queried locations or refining an existing map after local changes. This paper proposes a physics-aware query-conditioned hierarchical graph attention network for transmitter-resolved point-wise radio map estimation. For each queried target--transmitter pair, the proposed encoder constructs a bounded local graph over sampled reference observations and aggregates reference-to-query evidence through transmitter-referenced geometric descriptors. A global graph then exchanges representation-level context among nearby target locations to improve neighborhood consistency without revisiting a large number of reference measurements. On top of this shared architecture, we instantiate three operating regimes: direct RSS estimation, prior-conditioned residual correction, and post-hoc gated attenuation of the learned correction. The framework uses only measurement-side quantities and does not rely on environment-side inputs. Simulations on the DeepMIMO scenario show that, in the direct regime, the proposed HGAT achieves the lowest RMSE and MAE among the evaluated learning-based baselines on all reported sites. When conventional prior estimate is available, the residual and gated regimes further reduce the prior error.


[54] 2604.17421

The structure of technological learning: insights from water electrolysis for cost forecasting, policy, and strategy

Forecasting the cost evolution of emerging clean technologies is crucial for informed policy, investment, and decarbonization decisions, yet it remains deeply uncertain. Learning curves, which link cost declines to cumulative deployment, are widely used for technological cost forecasting. However, applying them to emerging technologies is challenging due to parametric uncertainty in learning rates, which are scarce and highly uncertain, and structural uncertainty stemming from multiple plausible learning frameworks. Using water electrolysis as a case study, we evaluate how different learning structures, from shared to fragmented learning across technology variants and regions, alter expected cost paths. We interrogate model assumptions that represent contrasting industrial realities, including competition among electrolyzer variants and supply chain fragmentation associated with protectionism and industrial policy. We find that plausible modeling choices generate widely different trajectories, with materially different implications for policy design and technology strategy. We argue for routinely applying multiple learning frameworks to explore decision spaces and stress-test conclusions for scale-up planning, national industrial strategy, and energy-systems modeling.


[55] 2604.17434

Time-Delay Compensators for Linear Systems with Delayed Output Measurements

This paper provides a comprehensive framework for designing functional observers for linear systems subject to delayed output measurements. Moving beyond traditional methodologies, the proposed observer generates an estimate $\hat{z}(t)$ that predicts the current state functional $z(t)=Fx(t)$ using delayed data. By neutralizing sensing latency, the observer serves as a potent time-delay compensator, effectively expanding the practical utility of functional observer theory. The proposed observer architecture offers greater robustness and versatility than traditional Luenberger-type observers by leveraging multiple delayed components to preserve accuracy despite latency. A key contribution of this work is a novel method for extending the maximum allowable measurement delay while maintaining the asymptotic stability of the estimation-error system. Existence conditions are established together with constructive synthesis procedures. Extensive numerical examples are given to illustrate the proposed theory.


[56] 2604.17440

WirelessAgent: A Unified Agent Design for General Wireless Resource Allocation Problem without Current Channel State Information

This paper investigates the agent design for solving the wireless resource allocation problem without sufficient channel state information (CSI), which cannot be effectively solved via conventional method. In the considered wireless agent design, we provide the general sense-repair-decide-act workflow, which can be used to intelligently solve general wireless resource allocation problem. A multi-objective optimization problem is formulated to adaptively satisfy different user requirements including both spectrum and energy efficiency. This work addresses the challenge of incomplete CSI for multiple optimization objectives. To solve this problem, we use an artificial intelligence (AI) model to predict missing channel data and construct an agent on the Coze platform, allowing the network operators to optimize multiple objectives through natural language conversations. To tackle the resource scheduling under different objectives, we develop adaptive algorithms. Simulation results validate the effectiveness of our proposed design, demonstrating that the proposed AI method reduces the root mean square error by approximately up to 67\% compared to the traditional approach. Moreover, the data-driven scheduling balances system performance compared to conventional baseline approaches.


[57] 2604.17442

BreathAI: Transfer Learning-Based Thermal Imaging for Automated Breathing Pattern Recognition

This study presents an Adaptive Transfer Learning and Thresholding-based Deep Learning Model (ATL-TDLM) for automated breathing pattern recognition using thermal imaging. Unlike conventional methods that rely on sound-based respiratory data, our approach leverages hierarchical deep feature extraction and adaptive multi-thresholding (AMT) to enhance feature segmentation. The model integrates knowledge distillation-based fine-tuning (KD-FT) to optimize learning transfer and contrastive representation learning (CRL) to improve inter-class separability between inhalation (INH) and exhalation (EXH) phases. The ATL-TDLM framework achieves an accuracy of 98.8%, significantly outperforming state-of-the-art models while ensuring computational efficiency. This approach has potential applications in respiratory disorder detection, including sleep apnea and asthma monitoring.


[58] 2604.17444

System representations in subspaces of finite-sample signals and their application to data-driven fault detection

This paper deals with system representations in finite-sample signal subspaces and their application to data-driven fault detection. The first part addresses concepts of finite-sample image and kernel system representations and, associated with them, image and residual subspaces of finite-sample signals. On this basis, the equivalence between the fundamental lemma and finite-sample image subspace is demonstrated. While the image representation models the nominal system dynamics, the residual representation describes uncertainties in the input-output data and is essential for fault detection. This result extends the fundamental lemma and builds the basis for exploring data-driven fault detection. In the second part, a data-driven projection-based fault detection approach is developed. By means of a singular value decomposition, orthogonal projections onto the image and residual subspaces are realized in the context of a low-rank matrix approximation, leading to projection-based residual generation and evaluation. Finally, analysis of detection performance in the framework of matrix perturbation theory and comparison with existing data-driven fault detection methods are explored.


[59] 2604.17453

Learned Nonlocal Feature Matching and Filtering for RAW Image Denoising

Being one of the oldest and most basic problems in image processing, image denoising has seen a resurgence spurred by rapid advances in deep learning. Yet, most modern denoising architectures make limited use of the technical knowledge acquired researching the classical denoisers that came before the mainstream use of neural networks, instead relying on depth and large parameter counts. This poses a challenge not only for understanding the properties of such networks, but also for deploying them on real devices which may present resource constraints and diverse noise profiles. Tackling both issues, we propose an architecture dedicated to RAW-to-RAW denoising that incorporates the interpretable structure of classical self-similarity-based denoisers into a fully learnable neural network. Our design centers on a novel nonlocal block that parallels the established pipeline of neighbor matching, collaborative filtering and aggregation popularized by nonlocal patch-based methods, operating on learned multiscale feature representations. This built-in nonlocality efficiently expands the receptive field, sufficing a single block per scale with a moderate number of neighbors to obtain high-quality results. Training the network on a curated dataset with clean real RAW data and modeled synthetic noise while conditioning it on a noise level map yields a sensor-agnostic denoiser that generalizes effectively to unseen devices. Both quantitative and visual results on benchmarks and in-the-wild photographs position our method as a practical and interpretable solution for real-world RAW denoising, achieving results competitive with state-of-the-art convolutional and transformer-based denoisers while using significantly fewer parameters. The code is available at this https URL .


[60] 2604.17474

Integrated Sensing, User Location and Orientation Estimation in RIS-Assisted Dynamic Rich Scattering Environment

This paper investigates an uplink user equipment (UE) location and orientation estimation problem in an indoor rich-scattering environment (RSE) for a multiple-input-multiple-output (MIMO) narrowband reconfigurable intelligent surfaces (RIS)-assisted communication system. The localization problem in RSE is challenging as the uplink pilot signal undergoes multiple interactions with the RIS and dynamic scattering objects (SOs). This paper proposes an approach where base station (BS) adaptively senses the environment with the help of RIS. Based on this sensing, it sequentially designs RIS configuration, BS beamforming and UE beamforming vectors, using the sequence of pilot transmissions from the UE to the BS, with an objective of progressively focusing them onto the UE. Towards this end, we train a bidirectional long-short term memory (biLSTM) network based controller to capture the temporal dependencies between measurements to first adaptively sense the RSE and then design RIS, BS and UE beamforming vectors to localize the UE. We evaluate the proposed approach under various RSE conditions such as various distributed RIS installations, varying number of randomly moving SOs and sensing RIS elements. Simulation results illustrate that it effectively enables adaptive sensing to achieve low localization error with robustness in various RSEs.


[61] 2604.17485

Adaptive RIS Configuration Design with Environmental Sensing for User Localization in Dynamic Rich Scattering Environment

This paper addresses the problem of adaptive reconfigurable intelligent surfaces (RIS) configuration design for user localization in rich-scattering environment (RSE), where electromagnetic waves undergo multiple interactions with dynamic scatterers and RIS elements. We propose an adaptive learning-based localization approach for a distributed RIS-assisted network in a RSE using a bidirectional long-short term memory (biLSTM) model that captures temporal correlations between observations. The proposed approach actively senses the environment using sequential pilot transmissions from the base station (BS), accounting for scattering effects, and adaptively updates the RIS configuration based on prior measurements to eventually accurately estimate and minimize the user localization error. The proposed model comprises two neural sub-networks: Scattering Estimation Network (Bi-SEN), for estimation of scattering in the environment, and Adaptive RIS-Assisted User Localization Network (Bi-ARULN), for RIS configuration and localization. Bayesian optimization is used for hyperparameter tuning of the model. The simulation results demonstrate the effectiveness of the proposed approach, achieving significantly lower localization root mean squared error (RMSE compared to random configuration, prestored codebook look-ups, and adaptive baselines in both single-input-single-output (SISO) and multiple-input-multiple output(MIMO) RIS-assisted networks in RSE. The design is generalized across configurations and scales with RIS size and network dimensions. The results highlight the strong potential of RIS deployment and of the proposed approach to enable reliable location services in RSE.


[62] 2604.17525

VIDS: A Verified Imaging Dataset Standard for Medical AI

Medical imaging AI development is fundamentally dependent on annotated datasets, yet no existing standard provides machine-enforceable validation across dataset structure, annotation provenance, quality documentation, and ML readiness within a single framework. DICOM standardizes image acquisition, storage, and communication at the individual study level. BIDS organizes neuroimaging research datasets with consistent naming conventions. Neither addresses the curation layer, viz., who annotated what, when, with what tool, and to what quality standard. This paper presents VIDS (Verified Imaging Dataset Standard), an open specification that defines folder layout, file naming, annotation provenance schemas, quality documentation, and 21 machine-enforceable validation rules across two compliance profiles. VIDS uses NIfTI as a canonical working format while preserving full DICOM metadata in sidecars for traceability, and supports export to any downstream ML framework (nnU-Net, MONAI, COCO, flat NIfTI) without loss of provenance. Twenty-two compliance dimensions are defined and four major public datasets -- LIDC-IDRI, BraTS, CheXpert, and the Medical Segmentation Decathlon -- are benchmarked against these dimensions. Even widely used datasets satisfy only 20--39% of these dimensions, with provenance and quality documentation as the largest systematic gaps. LIDC-Hybrid-100 is released as a 100-subject VIDS-compliant reference CT dataset with consensus segmentation masks from four radiologist annotations (mean pairwise Dice 0.7765), validating 21/21 on the Full compliance profile. VIDS is fully open source: the specification is CC BY 4.0, all tools are Apache 2.0, the reference validator is available on PyPI (pip install vids-validator), and LIDC-Hybrid-100 is published on Zenodo (this https URL).


[63] 2604.17533

A Novel 3D Antenna Architecture with Spatial Resource Allocation for Massive MIMO HAPS

Spatial correlation poses a significant challenge in massive multiple-input multiple-output (MIMO) high-altitude platform station (HAPS) systems. The inherent spatial correlation among antenna elements on the HAPS induces high correlation and interference among users' channel gains. To mitigate this issue, we propose an integrated approach that combines spatial resource allocation and user clustering. In our proposed solution, we assign the same resource blocks to users with orthogonal channel gains, while users with non-orthogonal channel gains receive different resource blocks. Additionally, we propose a sectorized antenna architecture for the massive MIMO HAPS base station, specifically designed to directly transmit three-dimensional beams to users and reduce spatial correlation among antenna elements. This work addresses the joint optimization problem of power allocation and resource allocation to maximize the overall data rate of the massive MIMO HAPS system. Simulation results revealed the role of spatial resource allocation in managing spatial correlation and interference among users.


[64] 2604.17566

Target Parameterization in Diffusion Models for Nonlinear Spatiotemporal System Identification

Machine learning is becoming increasingly important for nonlinear system identification, including dynamical systems with spatially distributed outputs. However, classical identification and forecasting approaches become markedly less reliable in turbulent-flow regimes, where the dynamics are high-dimensional, strongly nonlinear, and highly sensitive to compounding rollout errors. Diffusion-based models have recently shown improved robustness in this setting and offer probabilistic inference capabilities, but many current implementations inherit target parameterizations from image generation, most commonly noise or velocity prediction. In this work, we revisit this design choice in the context of nonlinear spatiotemporal system identification. We consider a simple, self-contained patch-based transformer that operates directly on physical fields and use turbulent flow simulation as a representative testbed. Our results show that clean-state prediction consistently improves rollout stability and reduces long-horizon error relative to velocity- and noise-based objectives, with the advantage becoming more pronounced as the per-token dimensionality increases. These findings identify target parameterization as a key modeling choice in diffusion-based identification of nonlinear systems with spatial outputs in turbulent regimes.


[65] 2604.17572

An Innovation-Based Approach to Detect Stealthy Disturbance Attacks in Maritime Monitoring

Modern maritime navigation and control systems rely on digital sensing, estimation, and communication pipelines that fuse GNSS, radar, inertial, and AIS data through approaches such as Kalman-filter-based estimators. While these technologies are essential for safety and efficiency, their growing interconnection also exposes vessels to faults and cyber-physical anomalies. This paper introduces a Statistical Detection Suite (SDS) to detect malicious stealthy disturbances. Specifically, the SDS operates directly on the innovations of Kalman filters, providing a lightweight yet statistically grounded layer of anomaly monitoring within maritime estimation frameworks. The SDS jointly evaluates whitened innovations through four complementary checks: (i) bias, (ii) covariance consistency via the normalized innovation squared (NIS), (iii) Gaussianity, and (iv) temporal independence via portmanteau statistics. The analysis further examines how an adversary can craft stealthy finite-impulse-response (FIR) Gaussian disturbances that can evade classical chi-square checks, formulating an optimization-based design that balances stealth and trajectory impact. An evaluation in maritime navigation scenarios illustrates how the SDS exposes colored spoofing attacks that bypass traditional methods, highlighting the role of innovation-based monitoring in strengthening maritime resilience against cyber-physical threats.


[66] 2604.17582

Active MIMO Sensing With Exploration-Exploitation Tradeoff

This paper develops an active sensing framework for designing the transmit and receive beamformers of a multiple-input multiple-output (MIMO) radar system. In the proposed technique, the beamformers are adaptively designed in each sensing stage based on the measurements made in the previous sensing stages. The beamformers are determined by minimizing the Bayesian Cram{é}r-Rao bound (BCRB) for the estimation of the unknown sensing parameters at each stage via Lagrangian dual optimization. To address the exploration-exploitation tradeoff that is inherent to such an adaptive design, this paper proposes two variants of the BCRB optimization problem: an exploration-centric variant, that ensures that multiple orthogonal beamforming directions are probed in each sensing stage, and an exploitation-centric variant, that does not restrict the number of optimal beamformers. Each variant of the optimization problem is solved via an alternating optimization algorithm that alternates between solving for the transmit beamformers and solving for the receive beamformers. The algorithm is shown to converge to a stationary point provided that each optimization problem is solved to global optimality. Moreover, this paper studies each of the two BCRB optimization sub-problems in the Lagrangian dual domain and shows that despite the non-convexity, global optimality is guaranteed provided that certain sufficient conditions hold. The conditions pertain to the multiplicity of the eigenvalues of a specific direction matrix that can be analytically written in terms of the optimal dual variables. These conditions further imply the tightness of the semidefinite relaxation of the optimization problems. Simulation results demonstrate the benefits of the proposed BCRB-based design compared to state-of-the-art adaptive beamforming strategies.


[67] 2604.17586

Structural Misalignment in Financial Transmission Rights

Financial Transmission Rights (FTRs) enable electricity market participants to hedge congestion risk in Day Ahead Market (DAM) operations, but for the market to be solvent, Independent System Operators (ISOs) must ensure that FTR payouts do not exceed the collected DAM merchandising surplus that funds them. We show that FTR underfunding (or conversely, hedging efficiency) can arise structurally from misalignment between the network models used in the FTR auction and the DAM, independent of bidding behavior. We develop a geometric framework in which both DAM merchandising surplus and the maximum supportable FTR payout are expressed as support functions of network-feasible injection polytopes. The resulting dual representation assigns nonnegative weights to transmission element-contingency constraints, enabling constraint-level attribution of model misalignment. Using this framework, we derive sharp implications for canonical FTR network modeling choices like uniform transmission element derates, and for structural sources of underfunding like unplanned DAM outages. We further show that multi-interval FTR products impose an intrinsic hedging inefficiency when DAM shadow prices vary over time, even under perfect model alignment. These results provide ISOs with rigorous tools to diagnose underfunding and quantify the efficiency cost of conservative FTR network modeling choices.


[68] 2604.17631

Conjugate Beamforming Variants for Multicasting in Cell-Free Massive MIMO Systems

This paper studies scalable conjugate beamforming (CB) variants for physical-layer multicasting in cell-free massive multiple-input multiple-output (CF-mMIMO) systems. Focusing on fully distributed precoding, we analyze classical CB, normalized CB (NCB), and enhanced CB (ECB) within a subgroup-centric multicast framework. Multicast users are partitioned into subgroups based on large-scale fading similarity, which enables composite channel estimation, pilot reuse, and distributed precoding with low complexity. The performance of the different CB variants is evaluated in terms of aggregated spectral efficiency (ASE) under representative user geometries, including uniformly distributed users, spatially clustered deployments, and heterogeneous scenarios combining hotspots with more dispersed users. Monte Carlo simulations reveal a strong spatial geometry-dependent behavior: unicast transmission is preferable in uniform deployments, while subgroup-based multicasting becomes essential in clustered and heterogeneous scenarios. Among the CB-based precoders, NCB offers a robust performance-complexity trade-off across most scenarios, whereas ECB provides additional gains only when sufficient channel hardening is present. These results provide practical insights into the selection of low-complexity distributed precoders and multicast transmission modes in CF-mMIMO systems supporting broadband and multimedia services.


[69] 2604.17634

RIS-Assisted Cell-Free Massive MIMO: RIS-MS Selection in FR1 and FR3

This paper explores the integration of reconfigurable intelligent surfaces (RISs) into cell-free massive multiple-input-multiple-output (CF-mMIMO) networks operating in FR1 and FR3 frequency bands. We present a comprehensive framework for analyzing RIS-assisted CF-mMIMO systems under realistic propagation conditions, accounting for frequency-dependent characteristics and RIS configurations. A novel RIS-user association algorithm is proposed to optimize phase-shift settings by assigning each RIS to a single user based on line of sight (LoS) connectivity. The system model incorporates spatially correlated Ricean fading channels and employs scalable partial-minimum mean square error (P-MMSE) combining. The numerical results demonstrate that the proposed RIS-user selection strategy significantly improves the spectral efficiency compared to random or exhaustive RIS configurations, particularly when the number of RISs is moderate. We also analyze the trade-off between training overhead and performance gains, showing that excessive pilot requirements can offset benefits when RIS density or element count increases. The results highlight the potential of the FR3 bands for RIS-assisted CF-mMIMO, provided advanced channel estimation techniques are adopted to mitigate overhead. These findings emphasize the importance of intelligent RIS-user pairing and scalable estimation methods for future 6G deployments.


[70] 2604.17642

HCFD: A Benchmark for Audio Deepfake Detection in Healthcare

In this study, we present Healthcare Codec-Fake Detection (HCFD), a new task for detecting codec-fakes under pathological speech conditions. We intentionally focus on codec based synthetic speech in this work, since neural codec decoding forms a core building block in modern speech generation pipelines. First, we release Healthcare CodecFake, the first pathology-aware dataset containing paired real and NAC-synthesized speech across multipl clinical conditions and codec families. Our evaluations show that SOTA codec-fake detectors trained primarily on healthy speech perform poorly on Healthcare CodecFake, highlighting the need for HCFD-specific models. Second, we demonstrate that PaSST outperforms existing speech-based models for HCFD, benefiting from its patch-based spectro-temporal representation. Finally, we propose PHOENIX-Mamba, a geometry-aware framework that models codec-fakes as multiple self-discovered modes in hyperbolic space and achieves the strongest performance on HCFD across clinical conditions and codecs. Experiments on HCFK show that PHOENIX-Mamba (PaSST) achieves the best overall performance, reaching 97.04 Acc on E-Dep, 96.73 on E-Alz, and 96.57 on E-Dys, while maintaining strong results on Chinese with 94.41 (Dep), 94.40 (Alz), and 93.20 (Dys). This geometry-aware formulation enables self-discovered clustering of heterogeneous codec-fake modes in hyperbolic space, facilitating robust discrimination under pathological speech variability. PHOENIX-Mamba achieves topmost performance on the HCFD task across clinical conditions and codecs.


[71] 2604.17647

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

In this work, we introduce a paralinguistic supervision paradigm for low-resource multilingual speech emotion recognition (LRM-SER) that leverages non-verbal vocalizations to exploit prosody-centric emotion cues. Unlike conventional SER systems that rely heavily on labeled verbal speech and suffer from poor cross-lingual transfer, our approach reformulates LRM-SER as non-verbal-to-verbal transfer, where supervision from a labeled non-verbal source domain is adapted to unlabeled verbal speech across multiple target languages. To this end, we propose NOVA ARC, a geometry-aware framework that models affective structure in the Poincaré ball, discretizes paralinguistic patterns via a hyperbolic vector-quantized prosody codebook, and captures emotion intensity through a hyperbolic emotion lens. For unsupervised adaptation, NOVA-ARC performs optimal transport based prototype alignment between source emotion prototypes and target utterances, inducing soft supervision for unlabeled speech while being stabilized through consistency regularization. Experiments show that NOVA-ARC delivers the strongest performance under both non-verbal-to-verbal adaptation and the complementary verbal-to-verbal transfer setting, consistently outperforming Euclidean counterparts and strong SSL baselines. To the best of our knowledge, this work is the first to move beyond verbal-speech-centric supervision by introducing a non-verbal-to-verbal transfer paradigm for SER.


[72] 2604.17690

Path-Based Quantum Meta-Learning for Adaptive Optimization of Reconfigurable Intelligent Surfaces

Reconfigurable intelligent surfaces (RISs) modify signal reflections to enhance wireless communication capabilities. Classical RIS phase optimization is highly non convex and challenging in dynamic environments due to high interference and user mobility. Here we propose a hierarchical multi-objective quantum metalearning algorithm that switches among specific quantum paths based on historical success, energy cost, and current data rate. Candidate RIS control directions are arranged as switch paths between quantum neural network layers to minimize inference, and a scoring mechanism selects the top performing paths per layer. Instead of merely storing past successful settings of the RIS and picking the closest match when a new problem is encountered, the algorithm learns how to select and recombine the best parts of different solutions to solve new scenarios. In our model, high-dimensional RIS scenario features are compressed into a quantum state using the tensor product, then superimposed during quantum path selection, significantly improving quantum computational advantage. Results demonstrate efficient performance with enhanced spectral efficiency, convergence rate, and adaptability.


[73] 2604.17744

Input-Side Variance Suppression under Non-Normal Transient Amplification in Continuous-Control Reinforcement Learning

Continuous-control reinforcement learning (RL) often exhibits large closed-loop variance, high-frequency control jitter, and sensitivity to disturbance injection. Existing explanations usually emphasize disturbance sources such as action noise, exploration perturbations, or policy nonsmoothness. This letter studies a complementary amplifier-side perspective: in nominally stable yet strongly non-normal closed loops, small input perturbations can undergo transient amplification and lead to disproportionately large state covariance. Motivated by this source--amplifier decomposition, we introduce an input-side variance suppression layer that operates between the learned policy and the plant input to reduce applied-input variance and step-to-step jitter. To separate mechanism from correlation, we use two control-theoretic interventions: one varies only eigenvector geometry under fixed eigenvalues and spectral radius, and the other varies only applied-input statistics under fixed strongly non-normal geometry. We then provide mechanism-consistent external validation on planar quadrotor tasks. Throughout, Koopman/ALE surrogates are used only as analysis and certification tools, not as direct performance paths. Taken together, the results support a narrower claim: in the studied settings, non-normal transient amplification is an important and under-emphasized contributor to execution-time closed-loop variance, and source-side suppression can reduce downstream covariance without changing the structural peak gain.


[74] 2604.17776

Trajectory-Based Optimization for Air Traffic Control in the Terminal Maneuvering Area

We present a trajectory-based optimization framework for arrival sequencing and scheduling in the terminal maneuvering area (TMA). Unlike node-link scheduling models that reduce trajectories to time-delay variables, the proposed method computes implementable per-aircraft speed profiles and path extensions that achieve required landing separation through terminal air traffic control actions. The framework combines an analytic TMA path model, consisting of a tangent leg, a radius-to-fix turn, and a final-approach segment, with a nonlinear program (NLP) that jointly optimizes path stretch and segment speeds under a weighted objective. Three landing-order policies are examined: First-Entry-First-Serve (FEFS), First-on-Final-First-Serve (FOFFS), and FOFFS with Constrained Position Shifting (CPS) up to $k$ positions. CPS is implemented through a two-phase approach coupling mixed-integer linear programming (MILP) with NLP to select an optimized landing order before trajectory optimization. The aircraft population follows a realistic weight-class fleet mix with pair-specific wake-turbulence separation, and each scenario is perturbed by a Gaussian wind sample projected onto each segment to convert commanded airspeeds into ground speeds. An online rolling-horizon formulation commits each aircraft trajectory irrevocably upon entry, enabling real-time decision-making. Monte Carlo experiments on the simplified A80 TMA show that: (i) FOFFS consistently outperforms FEFS in delay, path stretch, and fuel burn by exploiting geometric asymmetries among arrival streams; (ii) CPS further reduces separation violations and path stretch, though with diminishing returns and rapidly increasing solver cost; (iii) fuel estimates from BADA 3 and OpenAP show consistent qualitative trends; and (iv) per-entry optimization completes in near real-time, supporting practical deployment.


[75] 2604.17781

Building Low-Altitude Communication Networks: A Digital Twin-Based Optimization Framework

Low-altitude communication networks (LACNs) serve as the critical infrastructure of the emerging low-altitude economy (LAE), supporting services such as drone delivery and infrastructure inspection. However, LACNs operate in highly dynamic three-dimensional (3D) environments characterized by high mobility and predominantly line-of-sight (LoS) propagation, creating strong coupling among key performance objectives including coverage, interference mitigation, handover management, and sensing capability. Isolated tuning of individual objectives cannot capture these cross-objective interactions, rendering conventional approaches based on experience-driven tuning and repeated field trials inefficient and costly. To address these challenges, we propose DT-MOO, a Digital Twin-based Multi-Objective Optimization framework for LACNs. By constructing a high-fidelity virtual replica that integrates realistic environmental models, electromagnetic (EM) propagation, and traffic dynamics within a unified environment, DT-MOO enables joint evaluation and systematic optimization of interdependent network parameters, scoring candidate configurations by their combined effect on multiple objectives. As the foundational validation of the framework, we report real-world experiments in a 5G-enabled LACN focusing on coverage-interference co-optimization, where DT-MOO increases the high-quality coverage rate from 14.0% to 52.9% across all evaluated altitudes compared to an operator-provisioned, experience-based baseline, while achieving a net SINR gain under stringent criteria despite local spatial trade-offs, confirming its ability to handle coupled objectives in practical LACN deployment.


[76] 2604.17791

Movable-Antenna Enabled Robust Vehicular Consumer Networks Under Imperfect CSI

The accelerating advancement of intelligent transportation systems has established consumer-oriented vehicular networks (CVNs) as a critical infrastructure for next-generation connected mobility. However, the high mobility of vehicular users (VUs) introduces significant channel state information (CSI) uncertainty, which severely undermines the performance of conventional fixed-position antenna systems. To address this, this paper explores the deployment of movable-antennas (MAs) to enhance communication robustness in CVNs under imperfect CSI conditions. We develop a joint optimization framework that dynamically coordinates the spatial positioning of MAs and transmit beamforming at the base station, with the objective of maximizing the worst-case sum rate across all VUs. The problem is formulated as a non-convex max-min optimization problem, subject to bounded CSI estimation errors, transmit power limits, and physical constraints on antenna displacement. By adopting an alternating optimization strategy, the original problem is decomposed into tractable subproblems, solved via techniques including the S-Procedure, Schur complement, and successive convex approximation. Numerical evaluations confirm that the proposed approach achieves substantial gains over existing benchmarks in terms of worst-case throughput.


[77] 2604.17802

Optimally Bridging Semantics and Data: Generative Semantic Communication via Schrödinger Bridge

Generative Semantic Communication (GSC) is a promising solution for image transmission over narrow-band and high-noise channels. However, existing GSC methods rely on long, indirect transport trajectories from a Gaussian to an image distribution guided by semantics, causing severe hallucination and high computational cost. To address this, we propose a general framework named Schrödinger Bridge-based GSC (SBGSC). By leveraging the Schrödinger Bridge (SB) to construct optimal transport trajectories between arbitrary distributions, SBGSC breaks Gaussian limitations and enables direct generative decoding from semantics to images. Within this framework, we design Diffusion SB-based GSC (DSBGSC). DSBGSC reconstructs the nonlinear drift term of diffusion models using Schrödinger potentials, achieving direct optimal distribution transport to reduce hallucinations and computational overhead. To further accelerate generation, we propose a self-consistency-based objective guiding the model to learn a nonlinear velocity field pointing directly toward the image, bypassing Markovian noise prediction to significantly reduce sampling steps. Simulation results demonstrate that DSBGSC outperforms state-of-the-art GSC methods, improving FID by at least 38% and SSIM by 49.3%, while accelerating inference speed by over 8 times.


[78] 2604.17811

Kill-Probability-Maximization Guidance: Breaking from the Miss-Distance-Minimization Paradigm

Classical guidance laws aim at minimizing the miss distance, thus implicitly determining the minimum warhead lethality radius required against nominal targets. However, nonnominal targets or scenarios might render the designed warhead insufficient, causing a significant degradation in the single-shot kill probability (SSKP). We propose a guidance methodology that shifts the interceptor's objective from minimizing the miss distance to directly maximizing the SSKP, while taking into account the warhead's probabilistic lethality model. Complying with the generalized separation theorem, the new paradigm is based on modifying deterministic differential-game-based guidance laws using Bayesian decision theory. Extensive Monte Carlo simulations demonstrate consistent SSKP improvement over the standard and recently introduced estimation-aware guidance laws, when tested against nominal and nonnominal evasively maneuvering targets.


[79] 2604.17845

Low-Complexity Learning-Based Beamforming for Ultra-Massive MIMO THz Communications

Terahertz (THz) communications have emerged as a key technology for escalating data rates in future generation wireless networks. However, severe propagation losses at THz frequencies pose significant challenges, which can be mitigated via ultra-massive multiple-input multiple-output (UM-MIMO) systems employing highly directional transmissions. To this end, codebook-based analog beamforming constitutes a realistic solution, eliminating the need for explicit channel estimation. However, in UM-MIMO systems, the use of extremely narrow beams makes beam training and alignment increasingly challenging, leading to a substantial increase in the number of codewords to be tested and, thus, to high computational complexity. In this paper, a novel artificial neural network architecture for low-complexity beam training in UM-MIMO THz systems is presented, which does not require a constant feedback link between transmitter and receiver to obtain the best beamformer and combiner pair. An inception and residual network, which is trained based on the received signal powers using the transmit and receive codewords generated from predefined hierarchical codebooks, is designed. Our numerical investigations demonstrate that the proposed machine learning approach significantly reduces the complexity of UM-MIMO transmit and receive beamforming design, as compared to the standard exhaustive and hierarchical beam searching methods.


[80] 2604.17853

Symbol-Level Mask-Compliant Hybrid Precoding for Multi-User MIMO-OFDM Systems

Millimeter-wave (mmWave) technology is a crucial enabler for next-generation networks because it offers substantially greater available bandwidth. mmWave multiple-input multiple-output (MIMO) systems cannot rely solely on fully digital precoding due to hardware costs. As a result, hybrid precoding, which combines digital baseband processing with RF precoding, has emerged as a practical solution that balances performance and implementation complexity. As mmWave links typically operate over wideband, frequency-selective channels, orthogonal frequency-division multiplexing (OFDM) is commonly used to mitigate dispersive effects, yet OFDM introduces practical drawbacks, including out-of-band (OOB) emissions from abrupt spectral transitions among subcarriers and additional spectral leakage induced by windowing. Moreover, nonideal phase shifters (PS) in the RF transmit precoder and the user combiner impose inherent implementation limits that result in phase errors. We investigate robust joint digital--RF precoder design for minimizing the downlink sum mean-squared error (MSE) in hybrid multi-user (MU) MIMO--OFDM systems subject to maximum transmit-power, clipping, and OOB spectral-mask constraints. The resulting optimization is nonconvex and challenging to solve. To address this, we develop a minimum mean-squared error (MMSE) based block coordinate descent (BCD) algorithm that alternates between updating the transmitter-side digital--RF precoders and the user-side digital--RF combiners. For each BCD subproblem, we propose computationally efficient and scalable, closed-form solution strategies suitable for practical implementation. Extensive simulations validate the proposed methods and show clear performance improvements over established benchmark schemes.


[81] 2604.17858

Joint Phase Noise and Off-Grid Channel Estimation for AFDM Systems via Sparse Bayesian Learning

In practical affine frequency division multiplexing (AFDM) systems, the intricate coupling of oscillator phase noise (PN) and off-grid fractional shifts traps conventional estimators in a severe high-SNR error floor. To address these challenges, we propose a joint PN and channel estimation method based on sparse Bayesian learning (JPNCE-SBL). Specifically, a reduced-rank subspace projection is first introduced to capture the dominant eigen-energy of the Wiener PN process. Concurrently, a dynamic grid evolution strategy is designed to iteratively eliminate off-grid errors without requiring computationally prohibitive global grid densification. Both components are integrated into a unified Expectation-Maximization (EM) framework, where the channel and PN estimates are jointly updated at each iteration to prevent error propagation. Simulation results demonstrate that JPNCE-SBL significantly outperforms existing benchmarks in both NMSE and BER, closely approaching the perfect channel state information case under practical PN conditions.


[82] 2604.17902

Quantitative Verification of Constrained Occupation Time for Stochastic Discrete-time Systems

This paper addresses the quantitative verification of constrained occupation time in stochastic discrete-time systems, focusing on the probability of visiting a target set at least $k$ times while maintaining safety. Such cumulative properties are essential for certifying repeated behaviors like surveillance and periodic charging. To address this, we present the first barrier certificate framework capable of certifying these behaviors. We introduce multiplicative stochastic barrier functions that encode visitation counts implicitly within the algebraic structure of a scalar barrier. By adopting a switched-system reformulation to handle safety, we derive rigorous probabilistic bounds for both finite and infinite horizons. Specifically, we show that dissipative barriers establish upper bounds ensuring the exponential decay of frequent visits, while attractive barriers provide lower bounds via submartingale analysis. The efficacy of the proposed framework is demonstrated through numerical examples.


[83] 2604.17929

Ray Tracing-Enabled Digital Twin for RIS Phase Optimization: Implementation and Experimental Validation

Determining the optimal phase configurations of reconfigurable intelligent surface (RIS) elements typically requires complex channel estimation procedures with high pilot overhead, creating a bottleneck for real-time deployment in time-varying wireless environments. In this paper, we propose a digital twin (DT)-driven framework for RIS phase shift optimization that eliminates extensive signaling overhead associated with estimating high-dimensional RIS channels. Leveraging the NVIDIA Sionna ray-tracing library, we construct a DT of the physical environment based on a three-dimensional map. The proposed system utilizes the location information of the transceivers to compute the optimal RIS phase shift configurations within the DT. These computationally generated configurations are then transferred to a physical RIS prototype. Experimental results demonstrate that the phase configurations obtained from the DT significantly enhance the received signal power in the physical environment, validating the fidelity of the ray-tracing model and the feasibility of the proposed optimization strategy.


[84] 2604.17934

Robust Distributed Sub-Optimal Coordination of Linear Agents with Uncertain Input Nonlinearities

In this paper, we study robust distributed sub-optimal coordination of linear agents subject to input nonlinearities. Inspired by the robust agreement literature, we formulate a bounded distributed sub-optimal coordination problem, in which each agent converges to a neighborhood of the optimizer of a global optimization problem defined over a communication network. We propose a novel control protocol, and analyze convergence by employing a robust control approach, in which both the input nonlinearities and the gradients of the objective functions are treated in a unified manner via sector conditions. In particular, we derive sufficient conditions for the solvability of the considered problem and characterize them in terms of matrix inequalities. The effectiveness of the proposed method is demonstrated through a numerical simulation.


[85] 2604.17938

A Novel CSI-RS Reporting Scheme for RIS Optimization in O-RAN-based NextG Networks

Reconfigurable intelligent surface (RIS) technology is a promising enabler for next-generation (NextG) wireless systems, capable of dynamically shaping the propagation environment. Integrating RIS within the open radio access network (O-RAN) architecture enables flexible and intelligent control of wireless links. However, practical RIS-assisted operation requires efficient acquisition and reporting of channel state information (CSI) to support real-time control from the base station side. This paper proposes a CSI reference signal (CSI-RS)-based reporting scheme for downlink complex channel information (CCI) to facilitate RIS optimization in an O-RAN-compliant environment. The proposed framework establishing CCI extraction and CSI-RS reporting procedures is experimentally validated on a real-world testbed integrating an open-source O-RAN system with an RIS prototype operating in the n78 frequency band. Existing channel estimation-based RIS optimization algorithms, including Hadamard and orthogonal matching pursuit (OMP), are tailored for integration into the O-RAN architecture. Experimental results demonstrate notable improvements in received signal power for both near and far users, highlighting the effectiveness and practical viability of the proposed scheme.


[86] 2604.17958

MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech

Instruction-following text-to-speech (TTS) has emerged as an important capability for controllable and expressive speech generation, yet its evaluation remains underdeveloped due to limited benchmark coverage, weak diagnostic granularity, and insufficient multilingual support. We present \textbf{MINT-Bench}, a comprehensive multilingual benchmark for instruction-following TTS. MINT-Bench is built upon a hierarchical multi-axis taxonomy, a scalable multi-stage data construction pipeline, and a hierarchical hybrid evaluation protocol that jointly assesses content consistency, instruction following, and perceptual quality. Experiments across ten languages show that current systems remain far from solved: frontier commercial systems lead overall, while leading open-source models become highly competitive and can even outperform commercial counterparts in localized settings such as Chinese. The benchmark further reveals that harder compositional and paralinguistic controls remain major bottlenecks for current systems. We release MINT-Bench together with the data construction and evaluation toolkit to support future research on controllable, multilingual, and diagnostically grounded TTS evaluation. The leaderboard and demo are available at this https URL


[87] 2604.17991

EcoTIM: Fuel-saving multi-brand tillage with ISO 11783 TIM

Tillage operations account for a large share of on-farm diesel consumption, yet the fuel efficiency of the combined tractor-implement system is not optimised in current practice. Modern continuously variable transmission (CVT) tractors minimise engine fuel consumption internally, but they treat the implement as an unknown load and do not account for the effect of vehicle speed on implement draft force. This paper presents EcoTIM, a distributed fuel-optimisation concept in which the tractor and tillage implement cooperate through the extended ISO 11783 (ISOBUS) Tractor Implement Management (TIM) interface to minimise fuel consumption per hectare in real time. In the EcoTIM concept, the tractor Electric Control Unit fuses its internal engine, transmission, and traction efficiencies into a single combined efficiency value and its derivative with respect to vehicle speed, and broadcasts both to the implement at the standard 100 ms CAN bus cycle. The implement ECU combines these two received scalars with its own analytically known draft force model to evaluate the fuel-consumption gradient, and commands the optimal speed, and as a novel TIM extension, the desired acceleration, back to the tractor. Because only two scalar values are exchanged and neither party discloses proprietary subsystem models, the architecture is inherently multi-brand and plug-and-play. The required data exchange is realised with three new messages and one backward-compatible byte-level extension to the standard TIM speed command, and this paper proposes that these messages are standardised within ISO 11783. The acceleration command enables feed-forward torque and CVT ratio planning on the tractor side, improving transient response compared with speed-only TIM commands. This paper also contains a proof-of-concept simulation with six tillage scenarios and a spatially varying 1km test track for initial concept validation.


[88] 2604.18021

Paradigm Shift from Statistical Channel Modeling to Digital Twin Prediction: An Environment-Generalizable ChannelLM for 6G AI-enabled Air Interface

As 6G advances, ubiquitous connectivity and higher capacity requirements of the air interface pose substantial challenges for accurate and real-time wireless channel acquisition in diverse environments. Conventional statistical channel modeling relies on offline measurement data from limited environments, struggling to support online applications facing diverse environments. To this end, the digital twin channel (DTC) has emerged as a novel paradigm that constructs a digital replica of the physical environment through high-fidelity sensing and predicts corresponding channel in real time utilizing artificial intelligence (AI) models. As the engine of DTC, existing AI models struggle to simultaneously achieve strong environmental generalization in real-world and end-to-end channel prediction for real time tasks. Therefore, this paper proposes a channel large model (ChannelLM)-driven DTC architecture comprising three modules: low-complexity and high-accuracy environment reconstruction based on dynamic object detection and multimodal alignment of image and point cloud data, physically interpretable environment feature extraction, and a ChannelLM core to mapping these features into generalized environment representations for multi-task channel prediction. Simulation results demonstrate that, in unseen test environments, compared with small-scale AI models, ChannelLM reduces prediction errors by 4.23 dB in channel state information prediction while achieving an end-to-end inference latency of 70 milliseconds in the real world.


[89] 2604.18040

User Mobility Demands Near-Field Communications in Terahertz Band Wireless Networks Beyond 6G

Near-field propagation is often unavoidable at terahertz (THz) frequencies due to the large apertures needed for sufficient array gain, yet near-field operation complicates practical system design, especially under user mobility. This paper asks whether a mobile THz link can remain broadband, achieve the desired high rates and coverage, while operating exclusively in the radiative far field. To answer this question, we develop a proof-by-contradiction feasibility framework that jointly enforces (i) a far-field requirement based on the Fraunhofer distance and (ii) a reliability requirement specified by a target SNR at the worst-case link distance. We derive closed-form upper bounds on the far-field-feasible bandwidth for stationary and mobile links. We further incorporate practical misalignment through several UE rotation and mobility scenarios. Numerical results show that stationary THz links can remain far-field-only with physically realizable apertures while supporting extremely large bandwidths, whereas practical mobile THz systems cannot. In practically relevant mobile THz access settings, the far-field-feasible bandwidth becomes a severe limiting factor: achieving tens-of-GHz targets would require unrealistically high UE transmit power. A cross-band comparison further shows that far-field-only operation is largely attainable at sub-6~GHz and, to a significant extent, at mmWave for moderate bandwidths, while near-field-aware designs become essential for mobile THz access.


[90] 2604.18056

Joint Detection and Velocity Estimation in OFDM-ISAC Cell-Free Massive MIMO Networks

This paper develops a Doppler-aware sensing framework for cell-free massive MIMO (CF-mMIMO) networks operating under OFDM-based integrated sensing and communication (ISAC). The framework explicitly incorporates the 3D-bistatic Doppler geometry across distributed access points (APs) into a generalized likelihood ratio test (GLRT) detector. To address the scalability, a user-target-centric AP association approach is utilized. The 3D tangential components of the target's velocity vector are estimated, and several search and optimization strategies, including coarse grid search, gradient-based refinement, and particle swarm optimization (PSO), are developed and evaluated. The Doppler-aware GLRT statistic and receive sensing signal-to-noise ratio (SNR) are derived. Simulation results demonstrate that the proposed PSO-aided detector achieves the most favorable accuracy-complexity trade-off, while Doppler mismatch can cause substantial sensing-SNR degradation in high-mobility scenarios. Additionally, leveraging more OFDM subcarriers enhances frequency-domain diversity and yields further sensing-SNR gains.


[91] 2604.18060

Low-Complexity Tone Injection via Candidate Ranking for PAPR Reduction in OFDM and AFDM Systems

Tone injection (TI) is a promising distortionless PAPR reduction technique that incurs no spectral efficiency loss. However, state-of-the-art TI schemes based on random candidate generation or clipping noise spectrum suffer from fundamental limitations in PAPR performance. In this paper, we propose novel TI schemes compatible with both OFDM and AFDM systems. The proposed schemes iteratively update the TI sequence via a candidate ranking procedure guided by time-domain local peaks. This accurately selects effective candidates while achieving a complexity comparable to that of the fast Fourier transform. Depth-first search is further integrated to enhance PAPR performance by exploiting the tree structure of the process. Simulations demonstrate that the proposed schemes achieve over 1 dB PAPR gain over baseline TI schemes at comparable complexity. The gain is consistent across various numbers of subcarriers under controlled per-iteration complexities, confirming a superior performance-complexity trade-off for both OFDM and AFDM.


[92] 2604.18105

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR

Integrating large language models (LLMs) into automatic speech recognition (ASR) has become a mainstream paradigm in recent years. Although existing LLM-based ASR models demonstrate impressive performance on public benchmarks, their training remains predominantly data-driven, leaving key practical challenges insufficiently addressed -- particularly limited downward scalability in resource-constrained deployments and hallucinations under acoustically challenging conditions. To address these issues, we present NIM4-ASR, a production-oriented LLM-based ASR framework optimized for both efficiency and robustness. Grounded in a principled delineation of functional roles between the encoder and the LLM, we redesign the multi-stage training paradigm to align each module with its intended capability boundary. Specifically, we reformulate the pre-training architecture and objective to mitigate the modality gap and improve parameter efficiency; introduce an iterative asynchronous SFT stage to preserve acoustic fidelity and constrain representation drift; and design an ASR-specialized reinforcement learning stage to further enhance recognition quality and robustness. We additionally incorporate a suite of production-oriented optimizations, including robustness under noisy and silent conditions, real-time streaming inference, and hotword customization via retrieval-augmented generation (RAG). Experiments show that NIM4-ASR achieves state-of-the-art performance on multiple public benchmarks with merely 2.3B parameters, while substantially outperforming larger-scale competitors on internal benchmarks -- particularly in entity-intensive real-world scenarios. NIM4-ASR further supports million-scale hotword customization via RAG with sub-millisecond retrieval latency, enabling efficient adaptation to emerging entities and personalized user requirements.


[93] 2604.18138

Semi-Blind Receivers for RIS-Aided Fluid Antenna Systems

Reconfigurable intelligent surfaces (RISs) and fluid antennas (FAs) are key technologies for enhancing spatial degrees of freedom in future wireless networks. However, channel acquisition in RIS-aided FA systems is challenging as cascaded links depend on time-varying antenna-port selections and RIS configurations, leading to high training overhead in conventional pilot-based methods. We propose a semi-blind estimation framework for this joint architecture to estimate channels and symbols concurrently. Two hierarchical transmission protocols are introduced, resulting in distinct tensor models. Protocol 1 uses a two-time-scale structure yielding a PARAFAC (PF) model, while Protocol 2 employs a single-time-scale structure with blockwise spatial variations, leading to a Nested PARAFAC2 (NPF) model. For both, we develop semi-blind receivers based on trilinear alternating least squares to jointly estimate user-to-RIS channels, RIS-to-BS channels, and transmitted symbols by exploiting spatio-temporal diversity from FA and RIS reconfiguration. We derive identifiability conditions and computational complexity, revealing a fundamental trade-off: the PF receiver (Protocol 1) more aggressively exploits joint RIS/FA reconfiguration for stronger robustness, whereas the NPF receiver (Protocol 2) offers a flexible, lower-complexity alternative. Simulations show the proposed receivers achieve accurate recovery with significantly reduced training overhead, demonstrating the effectiveness of tensor-based semi-blind processing for RIS-aided fluid antenna communications.


[94] 2604.18140

Leader-Follower Formation Control Using Differential Drag and Effective Surface Regulation

The growing interest in space activities has led to the emergence of new space operators and innovative mission concepts. Small satellites such as CubeSats reduce mission costs and are typically deployed in constellations or formation flights. Since they are often propulsionless, passive orbital control strategies are the standard, primarily through differential drag achieved via attitude control maneuvers. This work develops a control system to achieve a generic relative positioning between two small satellites in a virtual leader and real follower formation flight, relying entirely on differential drag achieved through attitude maneuvers. We propose a control law based on the integrator backstepping technique, which, in a closed loop with the rotational dynamics, results in the asymptotic stability of the closed-loop system equilibrium points. We demonstrate the asymptotic stability of the closed-loop system equilibrium points using the Lyapunov theory, and a numerical simulation assesses the effectiveness and accuracy of the control strategy.


[95] 2604.18141

Frugal Geofencing via Energy-aware Sensing and Reporting

Timely and accurate monitoring in geofencing scenarios is challenging when relying on ultra-low power Internet of Things devices (IoTDs) powered by energy harvesting (EH). This is mainly because frequent wake-ups for data acquisition and data uploading may quickly deplete their limited energy buffer. Conventional grid-like IoT deployments overlook these limitations and merely rely on continuously powered sensing. Herein, we propose an energy-aware geofencing framework for camera-equipped EH IoTDs deployed around a protected area and its surrounding perimeter zone. The framework integrates a directional sensing power model with an operational representation of EH, sensing, sleeping, and reporting, accounting for the limited field-of-view (FoV) and distance-dependent detection confidence of the IoTDs. Device activity is controlled by the coverage-providing access point, which hosts a mobile edge host and a facility geocencing system to ensure timely and reliable detection under tight energy constraints. Reinforcement learning is used to determine IoTD placement, enabling earlier intruder detection than uniform grid-based deployments. Numerical results show that the proposed coordinated sensing and reporting configuration achieves frugal geofencing with fewer devices, while concurrently improving detection timeliness and dependability.


[96] 2604.18149

Informativity of Data-Knowledge Pairs for Lyapunov Equations

In the past few years, data informativity with prior knowledge has attracted increasing attention. This line of research aims to characterize a dataset on a dynamical system that enables system analysis or design only by the dataset and given prior knowledge on the system. In this paper, we investigate such a characterization for the data-driven problem of computing a unique solution to Lyapunov equations. First, we introduce a notion of joint informativity for data-knowledge pairs as an extension of the standard informativity concept. Second, we derive an algebraic equivalent condition for the joint informativity. Finally, we provide further insights into the joint informativity by considering a special case of prior knowledge. The characterization presented in this paper is developed for a wide class of prior knowledge, enabling the incorporation of various forms of system information.


[97] 2604.18156

Geometry-Aware Networking for Low-Altitude Economy: Movable Antennas in Space-Air-Ground Integrated Systems

Space--air--ground integrated networks (SAGINs) are emerging as a key foundation for future non-terrestrial networks (NTNs) and low-altitude economy services. However, their performance is increasingly limited not only by communication resources, but by the inability to adapt to rapidly changing spatial geometry. Here, spatial geometry refers to the relative configuration among network nodes, obstacles, and targets, which directly determines propagation conditions, blockage states, interference patterns, and sensing this http URL trend becomes more pronounced as low-altitude operations grow in density and complexity, causing the dominant bottleneck to shift from static resource allocation toward real-time maintenance of favorable spatial geometry across this http URL this article, we argue that movable antenna (MA) technology provides a fundamentally new perspective for SAGIN design. By enabling controlled antenna displacement, MA introduces a spatial degree of freedom that allows the network to directly adapt local spatial geometry at fine granularity, rather than passively reacting to it through beamforming or platform this http URL present a geometry-aware, layered SAGIN architecture, where Low-Earth-Orbit (LEO) provides macro-scale coverage and coordination, High-Altitude Platform Stations (HAPS) enables regional continuity and backhaul support, and MA is incorporated into the layered design to enable fine-grained geometry adaptation, particularly at unmanned aerial vehicles (UAVs) and terrestrial layers where local channel dynamics are most pronounced. We further discuss how such geometry control enhances robustness, supports multi-functional operation spanning communication, sensing, control, and navigation, and enables more flexible spatial cooperation across layers.


[98] 2604.18166

Cramér-Rao Bound Optimization for Near-Field ISAC with Extended Targets

Near-field integrated sensing and communication (ISAC) requires target models beyond the point-target abstraction when the target has a non-negligible spatial extent. In this letter, a geometry-aware transmit design is developed for a parametric extended target (ET) described by its center, orientation, and size under spherical-wave propagation. The CRB for the geometric parameters is formulated around a nominal ET state, an exact ET-aware reduced subspace is identified for the lifted covariance formulation, and a reduced-dimensional semidefinite relaxation (SDR) is developed under signal-to-interference-plus-noise ratio (SINR) and power constraints. Simulation results show lower CRB values than point-target and geometry-agnostic baselines together with substantially reduced runtime for large arrays.


[99] 2604.18207

A Novel Piecewise Atmospheric Attenuation Model for Free Space Optical Links in Vertical Heterogeneous Networks

Free-space optical (FSO) communication is emerging as a key backhaul technology for next-generation vertical heterogeneous networks (VHetNets), whose architecture spans satellites, high-altitude platform stations (HAPS), unmanned aerial vehicles (UAVs), and terrestrial nodes. Along these vertical and slant paths, optical beams traverse successive atmospheric layers that may contain clouds, fog, rain, and aerosols, conditions that conventional single-coefficient Beer-Lambert models typically handle only in isolation. Instead of such simplified formulas, we present a unified attenuation model that incorporates aerosols, fog, rain, cloud layers, and drizzle, accounts for the zenith angle, and provides a holistic estimate of the cumulative power loss across atmospheric layers. Numerical results show several-decibel attenuation variations across representative weather scenarios, while the difference between the proposed model predictions and the layer-resolved MODTRAN simulations remains within 1 dB, thereby validating the accuracy of the proposed model and its practical relevance for VHetNet link-budget studies.


[100] 2604.18255

WiFo-MiSAC: A Wireless Foundation Model for Multimodal Sensing and Communication Integration via Synesthesia of Machines (SoM)

Current learning-based wireless methods struggle with generalization due to the fragmented processing of communication and sensing data. WiFo-MiSAC addresses this as a task-agnostic foundation model that tokenizes heterogeneous signals into a unified space for self-supervised pre-training. A shared-specific disentangled mixture-of-experts (SS-DMoE) architecture is employed to decouple modality-shared and modality-specific representations, facilitating interaction without cross-modal interference. By combining masked reconstruction with contrastive alignment, the model achieves state-of-the-art performance across downstream tasks, including beam prediction and channel estimation. Experimental results demonstrate robust few-shot adaptation and seamless integration of new modalities, positioning WiFo-MiSAC as a scalable backbone for future integrated sensing and communication systems.


[101] 2604.18263

Passive RIS Is Not Silent: Revisiting Performance Limits Under Thermal Noise

Reconfigurable intelligent surfaces (RISs) have emerged as a promising solution for enabling energy-efficient and flexible spectrum usage in wireless communication, particularly in the context of sixth-generation (6G) networks. While passive RIS architectures are widely regarded as virtually noiseless due to the lack of active components, this idealized assumption can lead to misleading performance evaluations. In this paper, we revisit this assumption and demonstrate that the thermal noise generated by passive RIS elements, though often neglected, can significantly affect system performance. We propose a tractable approximated analytical framework that incorporates RIS-induced thermal noise into the system and derive closed-form expressions for key performance metrics, such as outage probability and throughput. Simulation results validate our approximated analysis and highlight the substantial performance discrepancies that arise when RIS thermal noise is ignored. Our results offer valuable insights into the trade-offs between receiver and RIS noise, guiding the development of robust and efficient 6G communication systems.


[102] 2604.18268

Scenario-Based Stochastic MPC for Energy Hubs with EV Fleets Under Persistent Grid Outages

Emissions reduction and resilience to outages motivate the adoption of renewable microgrids. Surprisingly, research integrating both probabilistic grid outages and electric vehicle (EV) charging requirements remains limited. This paper addresses this gap by developing a scenario-based stochastic model predictive controller (SMPC) for a microgrid energy hub comprising solar generation, battery storage, diesel backup, and an EV fleet connected to a weak grid. Grid outage and campus load scenarios are generated from a continuous-time Markov chain and a Gaussian Process, respectively. Using 2023 operational data from the Ashesi University Energy Hub in Ghana, we demonstrate that the SMPC achieves performance within 1\% of a perfect-forecast benchmark. In contrast, a naive MPC that assumes continuous grid availability offers no economic or sustainability advantage over rule-based control, with both incurring significantly higher costs and emissions than the SMPC. These results highlight that outage anticipation is essential for economic viability. Finally, we show that incorporating a deterministic buffer against EV consumption uncertainty eliminates over 90\% of state-of-charge violations with negligible impact on total operating costs


[103] 2604.18269

Impact of CSIR, SIC, and Hardware Impairments on the Ergodic Rate of Downlink RSMA

This work investigates the ergodic rate performance analysis of rate-splitting multiple access (RSMA) in a downlink communication system under practical impairments. Closed-form expressions are derived for key performance metrics such as ergodic rate, energy efficiency, sum-rate, and Jains fairness index, capturing the joint effects of imperfect channel state information at the receiver (CSIR), imperfect successive interference cancellation (SIC), and hardware impairments. Numerical simulations validate the accuracy of the analytical expressions and reveal several insightful trends. At low transmit powers, imperfect CSIR is the dominant performance-limiting factor, followed by hardware impairments and imperfect SIC. However, as the transmit power increases, hardware impairments become the primary bottleneck, with the impact of imperfect CSIR gradually diminishing, and imperfect SIC becoming a more prominent bottleneck. Moreover, RSMA consistently outperforms non-orthogonal multiple access (NOMA) in terms of ergodic rate, fairness, and sum-rate, even under severe non-idealities. These findings underscore the importance of incorporating fairness as a core design objective alongside rate and energy efficiency, positioning RSMA as a robust and strong multiple access candidate for next-generation wireless networks.


[104] 2604.18270

Incremental learning for audio classification with Hebbian Deep Neural Networks

The ability of humans for lifelong learning is an inspiration for deep learning methods and in particular for continual learning. In this work, we apply Hebbian learning, a biologically inspired learning process, to sound classification. We propose a kernel plasticity approach that selectively modulates network kernels during incremental learning, acting on selected kernels to learn new information and on others to retain previous knowledge. Using the ESC-50 dataset, the proposed method achieves 76.3% overall accuracy over five incremental steps, outperforming a baseline without kernel plasticity (68.7%) and demonstrating significantly greater stability across tasks.


[105] 2604.18279

RSMA-Aided Full-Duplex Networks Under Imperfect CSI and SIC: Performance Evaluation

This work investigates a full-duplex (FD)-enhanced Rate-Splitting Multiple Access (RSMA) system under practical constraints, including imperfect channel state information (CSI) and successive interference cancellation (SIC). We derive closed-form expressions for key performance metrics, such as outage probability and throughput, for both uplink and downlink users. The analysis considers co-channel interference (CCI) from uplink to downlink users and models the self-interference (SI) channel as a random variable. Monte Carlo simulations validate the analytical results and highlight the impact of system imperfections on RSMA-FD performance. At low transmit power, imperfect CSI significantly affects the system, though this effect weakens as power increases. In contrast, imperfect SIC becomes more detrimental at high transmit power, causing severe degradation. Additionally, neglecting CCI and assuming perfect SI cancellation leads to substantial overestimation of performance. Lastly, we demonstrate that the SI cancellation factor must be carefully selected to suppress interference effectively. Otherwise, a poor choice limits the full potential of FD technology.


[106] 2604.18392

Composite Control of Grid-Following Inverters for Stabilizing AI-Induced Fast Power Disturbances

AI data center loads create query-driven power transients on millisecond timescales. Such loads can violate the timescale separation assumptions underlying internal inverter control of grid-following resources collocated with data centers as supplementary generation. This paper develops a singular perturbation-based modeling and control for stabilizing fast power imbalances. We show that physically-implementable droop control is derived and valid by requiring reduced-system stability rather than being imposed a priori, and that AI workloads satisfy a bounded-rate disturbance class due to physical filtering in power delivery hardware. The analysis yields explicit gain bounds linking inverter parameters to disturbance rejection performance, a modulation admissibility condition ensuring physical realizability of the feedback linearizing control, and a feasibility condition identifying the maximum tolerable load ramp rate. Numerical simulations validate the theoretical predictions under stochastic AI transients.


[107] 2604.18409

Far-Field Absolute Gain Antenna Measurements at Sub-THz Frequencies: A New Interpretation

The evolution of large aperture antennas and arrays at the sub-THz band (100-300 GHz) results in traditional far-field (FF) gain measurements to require large distances due to the high frequency nature making them impractical in many laboratory environments. In the presented work, absolute antenna gain measurements are performed in localized distance clusters for commercial horn antennas in the sub-THz range of 145-170 GHz using the three-antenna method, leveraging a theoretically derived modified FF equation along with the Friis transmission equation to enable a compact measurement setup. By applying the proposed modified FF formulation, the approach aims to redefine the FF distance by considering the combined effects of both the transmitting and receiving antennas, accounting for their aperture sizes and radiation characteristics. This allows precise gain characterization within a compact measurement footprint. The proposed theoretical model was validated through radiated measurements and simulations, demonstrating its effectiveness in this case study. Also, measurements were performed using dissimilar antenna pair combinations due to inventory constraints, a common challenge both in research and in industry. Despite the mismatches, the presented work demonstrates that reliable and sufficiently accurate measurement results can still be achieved. This highlights the practical feasibility of the compact cluster measurement technique without compromising measurement integrity. The compact setup ensures efficiency in the measurement time and cost, making it a robust solution for both research and industrial needs in sub-THz antenna characterization for applications including 6G, high frequency sensing, and imaging systems.


[108] 2604.18411

Grid-Supporting Equipment Supply Chains Constrain the Feasible Pace of Power System Expansion

Power system expansion depends on the equipment required to connect, convert, regulate, and condition electricity, yet grid-supporting equipment (GSE) is rarely modeled as an explicit constraint. We develop a framework integrating dynamic stock-flow modeling, bill-of-materials accounting, multi-regional supply-use analysis, and expansion optimization to quantify GSE deployment requirements and upstream material dependence. Because manufacturing data are often fragmented or proprietary, we use critical material requirements as a physically grounded proxy for GSE supply constraints. In a U.S. case study, GSE shortages reach 269.6--274.1 GVA (28.5%--28.6%) by 2030 under high-growth conditions. Copper becomes fully binding, with steel and nickel forming additional constraints. Trade disruption intensifies shortages, while grid-enhancing technologies provide limited relief. These results show that grid expansion depends on the timely manufacturability, replacement, and material support of GSE, motivating planning frameworks that explicitly incorporate deliverability, supply chain exposure, and resilience strategies.


[109] 2604.18435

Quasi-Constant Modulus Design for Nonlinearity-Tolerant Geometric Shaped Four Dimensional Modulation Format

In this paper, the quasi-constant modulus (QCM) property is analyzed and leveraged in the design of nonlinearity-tolerant four-dimensional (4D) modulation formats. Accordingly, we propose a family of QCM-based quadrature amplitude modulation (QCM-QAM) constellations with high spectral efficiencies (SEs) of 9, 11, and 13 bit/4D-sym, respectively. The quasi-constant modulus design theoretically enhances tolerance to fiber nonlinearities. Meanwhile, QCM-QAM is evaluated in an unrepeatered wavelength-division multiplexing (WDM) system over both standard single-mode fiber (SSMF) and non-zero dispersion-shifted fiber (NZDSF). Across all SEs, QCM-QAM demonstrates robust nonlinear tolerance in both SSMF and NZDSF. This is evidenced by a consistent shift of the optimal launch power toward higher values and a significant improvement in effective signal-to-noise ratio (SNR). QCM-QAM also delivers generalized mutual information (GMI) gains of 0.22, 0.09, and 0.21 bit/4D-sym in SSMF, and 0.24, 0.10, and 0.22 bit/4D-sym, in NZDSF at the optimal transmission power, corresponding to the SEs of 9, 11, and 13 bit/4D-sym. Furthermore, QCM-QAM achieves transmission reach extensions of 1.6%, 0.9%, and 1.7% in SSMF, and 1.7%, 1.5%, and 1.8% in NZDSF, respectively, for the three SE levels.


[110] 2604.18453

On the Effect of Quadratic Regularization in Direct Data-Driven LQR

This paper proposes an explainability concept for direct data-driven linear quadratic regulation (LQR) with quadratic regularization. Our perspective follows the parametric effect of regularization, an analysis approach that translates regularization costs from auxiliary variables to system quantities, enabling intuitive interpretations. The framework further enables the elimination of auxiliary variables, thereby reducing computational complexity. We demonstrate the effectiveness of our approach and the identified effect of regularization via simulations.


[111] 2604.18479

Warm-Start Quantum Approximate Optimization Algorithm for QAM MIMO Data Detection

Data detection in large-scale multiple-input multiple-output (MIMO) systems with higher-order quadrature amplitude modulation (QAM) remains a challenging problem due to the exponential complexity of the classical maximum likelihood (ML) detector. This challenge is further amplified by Gray-coded modulation, which introduces nonlinear symbol-to-bit mappings and transforms the problem into a higher-order unconstrained binary optimization (HUBO) formulation. To address this problem, this paper presents a hybrid quantum-classical detection framework that leverages a warm-start linear-ramp Quantum Approximate Optimization Algorithm (WSLR-QAOA) for solving the resulting HUBO problem. A structured warm-start based on a low-rank semidefinite relaxation, solved via a block coordinate descent (BCD) method, provides an efficient and high-quality initialization, while a linear ramp parameterization guides the QAOA optimization. Simulation results show that the proposed framework outperforms classical methods in terms of symbol error rate (SER) and converges faster than standard QAOA, while achieving performance close to the optimal ML detector. Furthermore, the WSLR-QAOA algorithm is validated on actual IBM quantum hardware, where it achieves near-ML performance at low SNR and maintains competitive accuracy at higher SNR despite moderate degradation due to hardware noise. This demonstrates the practical potential of the HUBO-based WSLR-QAOA algorithm for large-scale MIMO data detection.


[112] 2604.18482

Safe Control using Learned Safety Filters and Adaptive Conformal Inference

Safety filters have been shown to be effective tools to ensure the safety of control systems with unsafe nominal policies. To address scalability challenges in traditional synthesis methods, learning-based approaches have been proposed for designing safety filters for systems with high-dimensional state and control spaces. However, the inevitable errors in the decisions of these models raise concerns about their reliability and the safety guarantees they offer. This paper presents Adaptive Conformal Filtering (ACoFi), a method that combines learned Hamilton-Jacobi reachability-based safety filters with adaptive conformal inference. Under ACoFi, the filter dynamically adjusts its switching criteria based on the observed errors in its predictions of the safety of actions. The range of possible safety values of the nominal policy's output is used to quantify uncertainty in safety assessment. The filter switches from the nominal policy to the learned safe one when that range suggests it might be unsafe. We show that ACoFi guarantees that the rate of incorrectly quantifying uncertainty in the predicted safety of the nominal policy is asymptotically upper bounded by a user-defined parameter. This gives a soft safety guarantee rather than a hard safety guarantee. We evaluate ACoFi in a Dubins car simulation and a Safety Gymnasium environment, empirically demonstrating that it significantly outperforms the baseline method that uses a fixed switching threshold by achieving higher learned safety values and fewer safety violations, especially in out-of-distribution scenarios.


[113] 2604.18520

Joint Scheduling of Multi-Band Radar Sensing and DNN Inference for Cross-Stage Parallelism

This paper studies end-to-end latency minimization for a multi-band radar sensing and deep neural network (DNN) inference pipeline. Unlike conventional stage-wise designs that treat radar sensing and DNN inference as two sequential stages, the proposed framework exploits cross-stage parallelism by allowing the inference branch associated with a sensed band to start as soon as that band completes sensing, without waiting for all bands to finish. To characterize this interaction, we formulate a joint scheduling problem that couples sensing-time allocation, branch release timing, and non-preemptive multi-core execution of a directed acyclic graph (DAG) under sensing-feasibility, precedence, and core-capacity constraints. Since the resulting problem is combinatorial and strongly time-coupled, we further develop a release-aware heuristic that evaluates each sensing decision according to its downstream impact on the DAG makespan, together with a greedy list scheduler for multi-core DAG execution under release times. Simulation results show that the proposed design can effectively exploit cross-stage parallelism and reduce end-to-end latency relative to a decoupled baseline in many heterogeneous sensing scenarios, while also clarifying the operating regimes in which the latency gain becomes limited.


[114] 2509.14804

Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages

Speech large language models (SLLMs) built on speech encoders, adapters, and LLMs demonstrate remarkable multitask understanding performance in high-resource languages such as English and Chinese. However, their effectiveness substantially degrades in low-resource languages such as Thai. This limitation arises from three factors: (1) existing commonly used speech encoders, like the Whisper family, underperform in low-resource languages and lack support for broader spoken language understanding tasks; (2) the ASR-based alignment paradigm requires training the entire SLLM, leading to high computational cost; (3) paired speech-text data in low-resource languages is scarce. To overcome these challenges in the low-resource language Thai, we introduce XLSR-Thai, the first self-supervised learning (SSL) speech encoder for Thai. It is obtained by continuously training the standard SSL XLSR model on 36,000 hours of Thai speech data. Furthermore, we propose U-Align, a speech-text alignment method that is more resource-efficient and multitask-effective than typical ASR-based alignment. Finally, we present Thai-SUP, a pipeline for generating Thai spoken language understanding data from high-resource languages, yielding the first Thai spoken language understanding dataset of over 1,000 hours. Multiple experiments demonstrate the effectiveness of our methods in building a Thai multitask-understanding SLLM. We open-source XLSR-Thai and Thai-SUP to facilitate future research.


[115] 2604.16446

A High-Accuracy Optical Music Recognition Method Based on Bottleneck Residual Convolutions

Optical Music Recognition (OMR) aims to convert printed or handwritten music score images into editable symbolic representations. This paper presents an end-to-end OMR framework that combines residual bottleneck convolutions with bidirectional gated recurrent unit (BiGRU)-based sequence modeling. A convolutional neural network with ResNet-v2-style residual bottleneck blocks and multi-scale dilated convolutions is used to extract features that encode both fine-grained symbol details and global staff-line structures. The extracted feature sequences are then fed into a BiGRU network to model temporal dependencies among musical symbols. The model is trained using the Connectionist Temporal Classification loss, enabling end-to-end prediction without explicit alignment annotations. Experimental results on the Camera-PrIMuS and PrIMuS datasets demonstrate the effectiveness of the proposed framework. On Camera-PrIMuS, the proposed method achieves a sequence error rate (SeER) of $7.52\%$ and a symbol error rate (SyER) of $0.45\%$, with pitch, type, and note accuracies of $99.33\%$, $99.60\%$, and $99.28\%$, respectively. The average training time is 1.74~s per epoch, demonstrating high computational efficiency while maintaining strong recognition performance. On PrIMuS, the method achieves a SeER of $8.11\%$ and a SyER of $0.49\%$, with pitch, type, and note accuracies of $99.27\%$, $99.58\%$, and $99.21\%$, respectively. A fine-grained error analysis further confirms the effectiveness of the proposed model.


[116] 2604.16452

Compiling OpenSCENARIO 2.1 for Scenario-Based Testing in CARLA

While the ASAM OpenSCENARIO 2.1 Domain-Specific Language (DSL) enables declarative, intent-driven authoring for Scenario-Based Testing (SBT), its integration into open-source simulators like CARLA remains limited by legacy parsers. We propose a multi-pass modern compiler architecture that translates the OpenSCENARIO 2.1 DSL directly into executable CARLA behaviors. The pipeline features an ANTLR4 frontend for Abstract Syntax Tree (AST) generation, a semantic middle-end, and a runtime backend that synthesizes deterministic py_trees behavior trees. Mapping the standardized domain ontology directly to CARLA's procedural API via a custom method registry eliminates the need for external logic solvers. A demonstrative multi-actor cut-in and evasive maneuver, selected from a wider suite of validated scenarios, confirms the compiler's ability to process concurrent actions, dynamic mathematical expressions, and asynchronous signaling. This framework establishes a functional baseline for reproducible, large-scale SBT, paving the way for future C++ optimizations to mitigate current Python-based computational overhead.


[117] 2604.16662

Resource-Efficient Quantum-Enhanced Compressive Imaging via Quantum Classical co-Design

Quantum sensing can enhance imaging performance by reducing measurement noise below the classical limit, thereby improving the signal-to-noise ratio (SNR) of acquired data. In conventional quantum imaging schemes, squeezing is applied independently to each pixel or spatial mode, leading to a quantum resource cost that scales linearly with image dimension. This approach implicitly separates quantum enhancement from classical post-processing, treating them as independent layers. In this work, we demonstrate that integrating quantum resource allocation with the guidance from classical compressive imaging, via co-design between the quantum hardware layer and the classical software layer, substantially reduces the required quantum resources. We employ principal component analysis (PCA) to identify a low-dimensional principal component subspace for measurement and apply squeezing selectively to the most informative spatial modes corresponding to these principal components. Our numerical experiments show that high-accuracy image classification and high-fidelity image reconstruction can be achieved with significantly fewer squeezed modes compared to pixel-wise squeezing. Our results establish a joint quantum classical co-design framework for resource-efficient quantum-enhanced imaging.


[118] 2604.16668

Distance characteristics for incremental quantities

We derive distance relay characteristics in terms of incremental quantities. The characteristics are operating-point independent in that they depend on the network structure and types of sources, but not their real-time voltages or current injections.


[119] 2604.16696

LOD-Net: Locality-Aware 3D Object Detection Using Multi-Scale Transformer Network

3D object detection in point cloud data remains a challenging task due to the sparsity and lack of global structure inherent in the input. In this work, we propose a novel Multi-Scale Attention (MSA) mechanism integrated into the 3DETR architecture to better capture both local geometry and global context. Our method introduces an upsampling operation that generates high-resolution feature maps, enabling the network to better detect smaller and semantically related objects. Experiments conducted on the ScanNetv2 dataset demonstrate that our 3DETR + MSA model improves detection performance, achieving a gain of almost 1% in mAP@25 and 4.78% in mAP@50 over the baseline. While applying MSA to the 3DETR-m variant shows limited improvement, our analysis reveals the importance of adapting the upsampling strategy for lightweight models. These results highlight the effectiveness of combining hierarchical feature extraction with attention mechanisms in enhancing 3D scene understanding.


[120] 2604.16749

ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection

Audio deepfakes pose a significant security threat, yet current state-of-the-art (SOTA) detection systems do not generalize well to realistic in-the-wild deepfakes. We introduce a novel \textbf{I}n-\textbf{C}ontext \textbf{L}earning paradigm with comparison-guidance for \textbf{A}udio \textbf{D}eepfake detection (\textbf{ICLAD}). The framework enables the use of audio language models (ALMs) for training-free generalization to unseen deepfakes and provides textual rationales on the detection outcome. At the core of ICLAD is a pairwise comparative reasoning strategy that guides the ALM to discover and filter hallucinations and deepfake-irrelevant acoustic attributes. The ALM works alongside a specialized deepfake detector, whereby a routing mechanism feeds out-of-distribution samples to the ALM. On in-the-wild datasets, ICLAD improves macro F1 over the specialized detector, with up to $2\times$ relative improvement. Further analysis demonstrates the flexibility of ICLAD and its potential for deployment on recent open-source ALMs.


[121] 2604.16781

Zak-OTFS: A Predictable Physical Layer for Communications and Sensing

This tutorial derives the mathematical foundations of what it means for a carrier waveform to be predictable and non-selective. We focus on Zak-OTFS, where each carrier waveform is a pulse in the delay-Doppler (DD) domain, formally a quasi-periodic localized function with specific periods along delay and Doppler. Viewed in the time domain, the Zak-OTFS carrier is realized as a pulse train modulated by a tone (termed a pulsone). We start by providing physical intuition, describing what it means for the Zak-OTFS carrier waveforms to be geometric modes of the Heisenberg-Weyl (HW) group of discrete delay and Doppler shifts that define the discrete-time communication model. In fact, we show that these geometric modes are common eigenvectors of a maximal commutative subgroup of our discrete HW group. When the channel delay spread is less than the delay period, and the channel Doppler spread is less than the Doppler period, we show that the Zak-OTFS input-output (I/O) relation is predictable and non-selective. Given the I/O response at one DD point in a frame, it is possible to predict the I/O response at all other points, without recourse to some mathematical model of the channel. While it may be intuitive that geometric modes of the HW group are predictable and non-selective wireless carriers, this is not a requirement. We provide a necessary and sufficient condition that depends on the ambiguity properties of the basis of carrier waveforms. In fact, we show that the structure of a pulse train modulated by a Hadamard matrix is common to several families of waveforms proposed for 6G, including Zak-OTFS, AFDM, OTSM and ODDM.


[122] 2604.16796

Generative Semantic Communication via Alternating Dual-Domain Posterior Sampling

Generative semantic communication (SemCom) harnesses pretrained generative priors to improve the perceptual quality of wireless image transmission. Existing generative SemCom receivers, however, rely on maximum a posteriori (MAP) estimation, which fundamentally cannot preserve the data distribution and thus limits achievable perceptual quality. Moreover, current diffusion-based approaches using single-domain guidance face significant limitations: latent-domain guidance is sensitive to channel noise, while image-domain guidance inherits decoder bias. Simply combining both domains simultaneously yields an overconfident pseudo-posterior. In this paper, we formulate semantic decoding as a Bayesian inverse problem and prove that posterior sampling achieves optimal perceptual quality by preserving the data distribution. Building on this insight, we propose alternating dual-domain posterior sampling (ADDPS), a diffusion-based SemCom receiver that alternately enforces latent-domain and image-domain consistency during the sampling process. This alternating strategy decomposes joint posterior sampling into simpler subproblems, avoiding gradient conflicts while retaining the complementary strengths of both domains. Experiments on FFHQ demonstrate that the proposed ADDPS achieves superior perceptual quality compared with existing methods.


[123] 2604.16802

A Stackelberg Game Framework with Drainability Guardrails for Pricing and Scaling in Multi-Tenant GPU Cloud Platforms

Modern Graphics Processing Unit (GPU)-backed services must satisfy strict latency service-level objectives (SLOs) while controlling spare-capacity cost. In multi-tenant GPU cloud platforms, this trade-off is inherently dynamic because workload demand is endogenous; specifically, pricing shapes the submissions of heterogeneous tenants, which subsequently impact congestion and delay. We formulate the joint pricing-and-scaling problem as a large-population Stackelberg game problem, and we derive an explicit equilibrium demand map. The resulting closed-loop model reveals a structural failure mode in which delay-insensitive workloads sustain a residual demand floor, making the backlog undrainable under bounded price and service capacity. This observation motivates a computable drainability guardrail that certifies uniformly negative drift in the residual-demand regime. For any fixed price-capacity pair satisfying the drainability guardrail, we establish a unique operating point and global convergence towards it under a checkable step-size condition. Building on this fixed-pair analysis, we further develop an optimizer-agnostic action shield for the full dynamic problem and show empirically that it improves safety and robustness for model-free reinforcement learning (RL) in this setting.


[124] 2604.16850

Refinement of Accelerated Demonstrations via Incremental Iterative Reference Learning Control for Fast Contact-Rich Imitation Learning

Fast execution of contact-rich manipulation is critical for practical deployment, yet providing fast demonstrations for imitation learning (IL) remains challenging: humans cannot demonstrate at high speed, and naively accelerating demonstrations alters contact dynamics and induces large tracking errors. We present a method to autonomously refine time-accelerated demonstrations by repurposing Iterative Reference Learning Control (IRLC) to iteratively update the reference trajectory from observed tracking errors. However, applying IRLC directly at high speed tends to produce larger early-iteration errors and less stable transients. To address this issue, we propose Incremental Iterative Reference Learning Control (I2RLC), which gradually increases the speed while updating the reference, yielding high-fidelity trajectories. We validate on real-robot whiteboard erasing and peg-in-hole tasks using a teleoperation setup with a compliance-controlled follower and a 3D-printed haptic leader. Both IRLC and I2RLC achieve up to 10x faster demonstrations with reduced tracking error; moreover, I2RLC improves spatial similarity to the original trajectories by 22.5% on average over IRLC across three tasks and multiple speeds (3x-10x). We then use the refined trajectories to train IL policies; the resulting policies execute faster than the demonstrations and achieve 100% success rates in the peg-in-hole task at both seen and unseen positions, with I2RLC-trained policies exhibiting lower contact forces than those trained on IRLC-refined demonstrations. These results indicate that gradual speed scheduling coupled with reference adaptation provides a practical path to fast, contact-rich IL.


[125] 2604.16914

Unified Ultrasound Intelligence Toward an End-to-End Agentic System

Clinical ultrasound analysis demands models that generalize across heterogeneous organs, views, and devices, while supporting interpretable workflow-level analysis. Existing methods often rely on task-wise adaptation, and joint learning may be unstable due to cross-task interference, making it hard to deliver workflow-level outputs in practice. To address these challenges, we present USTri, a tri-stage ultrasound intelligence pipeline for unified multi-organ, multi-task analysis. Stage I trains a universal generalist USGen on different domains to learn broad, transferable priors that are robust to device and protocol variability. To better handle domain shifts and reach task-aligned performance while preserving ultrasound shared knowledge, Stage II builds USpec by keeping USGen frozen and finetuning dataset-specific heads. Stage III introduces USAgent, which mimics clinician workflows by orchestrating USpec specialists for multi-step inference and deterministic structured reports. On the FMC\_UIA validation set, our model achieves the best overall performance across 4 task types and 27 datasets, outperforming state-of-the-art methods. Moreover, qualitative results show that USAgent produces clinically structured reports with high accuracy and interpretability. Our study suggests a scalable path to ultrasound intelligence that generalizes across heterogeneous ultrasound tasks and supports consistent end-to-end clinical workflows.


[126] 2604.16926

Test-Time Adaptation for EEG Foundation Models: A Systematic Study under Real-World Distribution Shifts

Electroencephalography (EEG) foundation models have shown strong potential for learning generalizable representations from large-scale neural data, yet their clinical deployment is hindered by distribution shifts across clinical settings, devices, and populations. Test-time adaptation (TTA) offers a promising solution by enabling models to adapt to unlabeled target data during inference without access to source data, a valuable property in healthcare settings constrained by privacy regulations and limited labeled data. However, its effectiveness for EEG remains largely underexplored. In this work, we introduce NeuroAdapt-Bench, a systematic benchmark for evaluating test-time adaptation methods on EEG foundation models under realistic distribution shifts. We evaluate representative TTA approaches from other domains across multiple pretrained foundation models, diverse downstream tasks, and heterogeneous datasets spanning in-distribution, out-of-distribution, and extreme modality shifts (e.g., Ear-EEG). Our results show that standard TTA methods yield inconsistent gains and often degrade performance, with gradient-based approaches particularly prone to heavy degradation. In contrast, optimization-free methods demonstrate greater stability and more reliable improvements. These findings highlight the limitations of existing TTA techniques in EEG, provide guidance for future development, and underscore the need for domain-specific adaptation strategies.


[127] 2604.16949

L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing

The paper considers the computation of L1 regularization paths in a state space setting, which includes L1 regularized Kalman smoothing, linear SVM, LASSO, and more. The paper proposes two new algorithms, which are duals of each other; the first algorithm applies to L1 regularization of independent variables while the second applies to L1 regularization of dependent variables. The heart of the proposed algorithms is parametric Gaussian message passing (i.e., Kalman-type forward-backward recursions) in the pertinent factor graphs. The proposed methods are broadly applicable, they (usually) require only matrix multiplications, and their complexity can be competitive with prior methods in some cases.


[128] 2604.16969

Hyperspectral Unmixing Hierarchies

Unmixing reveals the spatial distribution and spectral details of different constituents, called endmembers, in a hyperspectral image. Because unmixing has limited ground truth requirements, can accommodate mixed pixels, and is closely tied to light propagation, it is a uniquely powerful tool for analyzing hyperspectral images. However, spectral variability inhibits unmixing performance, the proper way to determine the number of endmembers is ambiguous, and the clarity of the endmembers degrades as more are included. Hierarchical structure is a possible solution to all three problems. Here, hierarchical unmixing is defined by imposing a hierarchical abundance sum constraint on Deep Nonnegative Matrix Factorization. Binary Linear Unmixing Tactile Hierarchies (BLUTHs) solve the hierarchical unmixing problem with a simple network architecture. Sparsity modulation unmixing growth tailors the topology of a BLUTH to each scene. The structure imposed by BLUTHs allows endmembers with varying levels of spectral contrast to be revealed, mitigating the challenge of spectral variability. The performance of BLUTHs exceeds state-of-the-art unmixing algorithms on laboratory scenes, particularly with regard to abundance estimation, while their performance remains competitive on remote sensing scenes. In addition, ocean color unmixing by BLUTHs is demonstrated on hyperspectral scenes from the HYPSO and PACE satellites.


[129] 2604.16991

Semi-definite programs for online control of nonlinear systems with stability guarantees

This paper develops a semidefinite-programming-based method for online feedback control of nonlinear systems using a state-dependent representation. We formulate sequences of time-varying SDPs whose optimal solutions jointly yield a stabilizing feedback controller and a Lyapunov certificate satisfying stability conditions and quadratic performance specifications. We further establish compact conditions certifying recursive feasibility of the resulting SDP sequences and derive estimates of the region of attraction. Numerical examples on representative nonlinear systems illustrate the flexibility and effectiveness of the proposed method.


[130] 2604.17109

A fully parallel densely connected probabilistic Ising machine with inertia for real-time applications

Ising machines -- special-purpose hardware for heuristically solving Ising optimization problems -- based on probabilistic bits (p-bits) have been established as a promising alternative to heuristic optimization algorithms run on conventional computers. However, it has -- until now -- been thought that Ising spins that are connected in probabilistic Ising machines cannot be updated in parallel without ruining the machine's solving ability. This has been a major challenge for using probabilistic Ising machines as fast solvers for densely connected problems. Here, we circumvent this by introducing a modified Ising spin dynamics with an added inertia term, and verify in algorithm simulations, FPGA hardware emulation, and FPGA experiments that it enables fully parallel, synchronous updates while improving rather than degrading success probability. We evaluated on various types of abstract (Max-Cut and Sherrington-Kirkpatrick-model) and application-derived (MIMO, wireless detection) dense Ising benchmark instances. Performing fully parallel updates results in a speed advantage that grows faster than linearly with the number of spins, giving rise to large time-to-solution increases for practical problem sizes. For both Max-Cut and the SK-1 model at a problem size of 200, our approach achieved an average speedup of $\approx 35\times$, with the best single-instance speedup reaching $150\times$. As an example of the practical utility of our approach in an application where speed is critical, we further show by co-designing the algorithm dynamics with the hardware implementation -- co-optimizing for solver ability and silicon resource usage -- that probabilistic Ising machines based on our approach satisfy the stringent solution quality and latency/throughput requirements for real-time MIMO detection in modern 5G cellular wireless networks while using a practically reasonable silicon area.


[131] 2604.17199

Modeling, Control and Self-sensing of Dielectric Elastomer Soft Actuators: A Review

Dielectric elastomer actuators (DEAs) have garnered extensive attention especially in soft robotic applications over the past few decades owing to the advantages of lightweight, large strain, fast response and high energy density. However, because the DEAs suffer from nonlinear elasticity, inherent viscoelastic creep, hysteresis and vibrational dynamics, the modeling, control and self-sensing of DEAs are challenging, thereby hindering the practical applications of DEAs. In order to address these challenges, numerous studies have been conducted. In this review, various physics-based modeling methods and phenomenological modeling methods for predicting the electromechanical response of DEAs are presented and discussed. Different control methods for DEAs are reviewed, which are classified into open-loop feedforward control, feedback control, feedforward-feedback control and adaptive feedforward control. Physics-based self-sensing methods and data-driven self-sensing methods for reconstructing the DEA displacement without the need for additional sensors are discussed. Finally, the existing problems and new opportunities for the further studies are summarized.


[132] 2604.17209

DREAM: Dynamic Retinal Enhancement with Adaptive Multi-modal Fusion for Expert Precision Medical Report Generation

Automating medical reports for retinal images requires a sophisticated blend of visual pattern recognition and deep clinical knowledge. Current Large Vision-Language Models (LVLMs) often struggle in specialized medical fields where data is scarce, leading to models that overfit and miss subtle but critical pathologies. To address this, we introduce DREAM (Dynamic Retinal Enhancement with Adaptive Multi-modal Fusion), a novel framework for high-fidelity medical report generation that excels even with limited data. DREAM employs a unique two-stage fusion mechanism that intelligently integrates visual data with clinical keywords curated by ophthalmologists. First, the Abstractor module maps image and keyword features into a shared space, enhancing visual data with pathology-relevant insights. Next, the Adaptor performs adaptive multi-modal fusion, dynamically weighting the importance of each modality using learnable parameters to create a unified representation. To ensure the model's outputs are semantically grounded in clinical reality, a Contrastive Alignment module aligns these fused representations with ground-truth medical reports during training. By combining medical expertise with an efficient fusion strategy, DREAM sets a new state-of-the-art on the DeepEyeNet benchmark, achieving a BLEU-4 score of 0.241, and further demonstrates strong generalization to the ROCO dataset.


[133] 2604.17213

Symplectic Inductive Bias for Data-Driven Target Reachability in Hamiltonian Systems

Inductive bias refers to restrictions on the hypothesis class that enable a learning method to generalize effectively from limited data. A canonical example in control is linearity, which underpins low sample-complexity guarantees for stabilization and optimal control. For general nonlinear dynamics, by contrast, guarantees often rely on smoothness assumptions (e.g., Lipschitz continuity) which, when combined with covering arguments, can lead to data requirements that grow exponentially with the ambient dimension. In this paper we argue that data-efficient nonlinear control demands exploiting inductive bias embedded in nature itself, namely, structure imposed by physical laws. Focusing on Hamiltonian systems, we leverage symplectic geometry and intrinsic recurrence on energy level sets to solve target reachability problems. Our approach combines the recurrence property with a recently proposed class of policies, called chain policies, which composes locally certified trajectory segments extracted from demonstrations to achieve target reachability. We provide sufficient conditions for reachability under this construction and show that the resulting data requirements depend on explicit geometric and recurrence properties of the Hamiltonian rather than the state dimension.


[134] 2604.17222

Region-Affinity Attention for Whole-Slide Breast Cancer Classification in Deep Ultraviolet Imaging

Breast cancer diagnosis demands rapid and precise tools, yet traditional histopathological methods often fall short in intra-operative settings. Deep Ultraviolet (DUV) fluorescence imaging emerges as a transformative approach, offering high-contrast, label-free visualization of whole-slide images (WSIs) with unprecedented detail, surpassing conventional hematoxylin and eosin (H&E) staining in speed and resolution. However, existing deep learning methods for breast cancer classification, predominantly patch-based, fragment spatial context and incur significant preprocessing overhead, limiting their clinical utility. Moreover, standard attention mechanisms, such as Spatial, Squeeze-and-Excitation, Global Context and Guided Context Gating, fail to fully exploit the rich, multi-scale regional relationships inherent in DUV-WSI data, often prioritizing generic feature recalibration over diagnostic specificity. This study introduces a novel Region-Affinity Attention mechanism tailored for DUV-WSI breast cancer classification, processing entire slides without patching to preserve spatial integrity. By modeling local neighbor distances and constructing a full affinity matrix, our method dynamically highlights diagnostically relevant regions, augmented by a contrastive loss to enhance feature discriminability. Evaluated on a dataset of 136 DUV-WSI samples, our approach achieves an accuracy of 92.67 +/- 0.73% and an AUC of 95.97%, outperforming existing attention methods.


[135] 2604.17376

Towards Generalizable Deepfake Image Detection with Vision Transformers

In today's day and age, we face a challenge in detecting deepfake images because of the fast evolution of modern generative models and the poor generalization capability of existing methods. In this paper, we use an ensemble of fine-tuned vision transformers like DINOv2, AIMv2 and OpenCLIP's ViT-L/14 to create generalizable method to detect deepfakes. We use the DF-Wild dataset released as part of the IEEE SP Cup 2025, because it uses a challenging and diverse set of manipulations and generation techniques. We started our experiments with CNN classifiers trained on spatial features. Experimental results show that our ensemble outperforms individual models and strong CNN baselines, achieving an AUC of 96.77% and an Equal Error Rate (EER) of just 9% on the DF-Wild test set, beating the state-of-the-art deepfake detection algorithm Effort by 7.05% and 8% in AUC and EER respectively. This was the winning solution for SP Cup, presented at ICASSP 2025.


[136] 2604.17417

Project resilience as network robustness

Engineering projects are the result of the combined effort of their members. Yet, it has been documented that labor division withing projects is unevenly distributed: some project members are specialists undertaking only few tasks, whereas other are generalists and are responsible for the success of many tasks. Moreover, the latter are often facilitators of project integration. Such a workload distribution prompts one question: how resilient is a project to key personnel loss? Far from being a theoretical problem, the reliance of a project on a few key people can lead to severe economic losses and delays. We argue that current methods to estimate such a risk are unsatisfactory: some methods offer a best-case estimate and are, therefore, too optimistic; other methods fail to capture project fragmentation leading to biased estimates and unrealistic consequences in many settings. In this paper, we develop a novel method to assess project vulnerability by looking at it from the lens of network robustness. We compare our method against existing alternatives and show that it offers better and more consistent estimates of project resilience to personnel loss.


[137] 2604.17435

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Recent Speech-to-Speech Translation (S2ST) systems achieve strong semantic accuracy yet consistently strip away non-verbal vocalizations (NVs), such as laughter and crying that convey pragmatic intent, which severely limits real-world utility. We address this via three contributions. First, we propose a synthesis pipeline for building scalable expressive datasets to overcome the data scarcity limitation. Second, we propose MoVE, a Mixture-of-LoRA-Experts architecture with expressive-specialized adapters and a soft-weighting router that blends experts for capturing hybrid expressive states. Third, we show pretrained AudioLLMs enable striking data efficiency: 30 minutes of curated data is enough for strong performance. On English-Chinese S2ST, while comparing with strong baselines, MoVE reproduces target NVs in 76% of cases and achieves the highest human-rated naturalness and emotional fidelity among all compared systems, where existing S2ST systems preserve at most 14% of NVs.


[138] 2604.17457

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

Dynamic programming is one of the most fundamental methodologies for solving Markov decision problems. Among its many variants, Q-value iteration (Q-VI) is particularly important due to its conceptual simplicity and its classical contraction-based convergence guarantee. Despite the central role of this contraction property, it does not fully reveal the geometric structure of the Q-VI trajectory. In particular, when one is interested not only in the final limit $Q^*$ but also in when the induced greedy policy becomes effectively optimal, the standard contraction argument provides only a coarse characterization. To formalize this notion, we denote by $\mathcal X^*$ the set of $Q$-functions whose corresponding tie-broken greedy policies are optimal, referred to as the practically optimal solution set (POS). In this paper, we revisit discounted Q-VI through the lens of switching system theory and derive new geometric insights into its behavior. In particular, we show that although Q-VI does not reach $Q^*$ in finite time in general, it identifies the optimal action class in finite time. Furthermore, we prove that the distance from the iterate to a particular subset of $\mathcal X^*$ decays exponentially at a rate governed by the joint spectral radius (JSR) of a restricted switching family. This rate can be strictly faster than the standard $\gamma$ rate when the restricted JSR is strictly smaller than $\gamma$, while the convergence of the entire $Q$-function to $Q^*$ can still be dominated by the slower $\gamma$ mode, where $\gamma$ denotes the discount factor. These results reveal a two-stage geometric behavior of Q-VI: a fast convergence toward $\mathcal X_1$, followed by a slower convergence toward $Q^*$ in general.


[139] 2604.17476

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

Multi-user virtual reality enables immersive interaction. However, rendering avatars for numerous participants on each headset incurs prohibitive computational overhead, limiting scalability. We introduce a framework, Privatar, to offload avatar reconstruction from headset to untrusted devices within the same local network while safeguarding attacks against adversaries capable of intercepting offloaded data. Privatar's key insight is that domain-specific knowledge of avatar reconstruction enables provably private offloading at minimal cost. (1) System level. We observe avatar reconstruction is frequency-domain decomposable via BDCT with negligible quality drop, and propose Horizontal Partitioning (HP) to keep high-energy frequency components on-device and offloads only low-energy components. HP offloads local computation while reducing information leakage to low-energy subsets only. (2) Privacy level. For individually offloaded, multi-dimensional signals without aggregation, worst-case local Differential Privacy requires prohibitive noise, ruining utility. We observe users' expression statistical distribution are slowly changing over time and trackable online, and hence propose Distribution-Aware Minimal Perturbation. DAMP minimizes noise based on each user's expression distribution to significantly reduce its effects on utility, retaining formal privacy guarantee. Combined, HP provides empirical privacy against expression identification attacks. DAMP further augments it to offer a formal guarantee against arbitrary adversaries. On a Meta Quest Pro, Privatar supports 2.37x more concurrent users at 6.5% higher reconstruction loss and 9% energy overhead, providing a better throughout-loss Pareto frontier over quantization, sparsity and local construction baselines. Privatar provides both provable privacy guarantee and stays robust against both empirical and NN-based attacks.


[140] 2604.17567

Multi-Camera Self-Calibration in Sports Motion Capture: Leveraging Human and Stick Poses

Multi-camera systems are widely employed in sports to capture the 3D motion of athletes and equipment, yet calibrating their extrinsic parameters remains costly and labor-intensive. We introduce an efficient, tool-free method for multi-camera extrinsic calibration tailored to sports involving stick-like implements (e.g., golf clubs, bats, hockey sticks). Our approach jointly exploits two complementary cues from synchronized multi-camera videos: (i) human body keypoints with unknown metric scale and (ii) a rigid stick-like implement of known length. We formulate a three-stage optimization pipeline that refines camera extrinsics, reconstructs human and stick trajectories, and resolves global scale via the stick-length constraint. Our method achieves accurate extrinsic calibration without dedicated calibration tools. To benchmark this task, we present the first dataset for multi-camera self-calibration in stick-based sports, consisting of synthetic sequences across four sports categories with 3 to 10 cameras. Comprehensive experiments demonstrate that our method delivers SOTA performance, achieving low rotation and translation errors. Our project page: this https URL.


[141] 2604.17603

Decentralized Stability-Constrained Optimal Power Flow for Inverter-Based Power Systems

Future inverter-dominated power systems feature higher variability and more stressed operating conditions, which motivates the consideration of stability in operational settings. Existing approaches to stability-constrained OPF often rely on eigenvalue calculation, global model information, or dynamic evaluation inside optimization formulation, which are computationally intensive and difficult to scale. This paper proposes the first decentralized stability-constrained OPF framework for inverter-based power systems. The key novelty lies in the incorporation of a class of algebraic decentralized small-signal stability criteria that admits tractable representations in steady-state variables and is therefore suitable for optimization. The decentralized stability condition is based on local voltage differences and enables clear theoretical and practical economic interpretation of the stability contribution from each inverter. We define a Nodal Stability Shadow Price (NSSP) for each inverter, and characterize the role of these stability constraints through their associated shadow prices, enabling a nodal interpretation of their economic impacts. It is proved that under active-power-only objectives in lossless networks, binding stability constraints may occur but will admit zero shadow prices if all other operational constraints are inactive. Most importantly, we reveal the importance of considering the opportunity cost of reactive power for inverter-based resources (IBRs) that have limited capacity. When reactive power costs are considered, stability constraints can carry strictly positive shadow prices and admit meaningful economic impacts.


[142] 2604.17686

Steady-state Based Approach to Online Non-stochastic Control

We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of literature in ONC develops algorithms that enjoy sublinear regret with respect to a benchmark based on the set of steady-states that are attainable by a constant input. In this work, we extend this research direction by giving an algorithm that enjoys $\mathcal{O}(\sqrt{T})$ regret with respect to a richer benchmark set, namely the set of steady-states attainable under an \emph{affine controller}. Since this benchmark substantially broadens the comparison class, it provides significantly stronger performance guarantees. Our proposed algorithm combines a Follow-The-Perturbed-Leader-style online non-convex optimization approach with a batching method that maintains stability despite changing policies. Although our proposed algorithm requires solving non-convex subproblems, we show that an approximate solution to this subproblem is sufficient to ensure $\mathcal{O}(\sqrt{T})$ regret. Furthermore, numerical experiments show that our algorithm enjoys lower total cost and similar computation to existing methods in certain settings.


[143] 2604.18289

Relative State Estimation using Event-Based Propeller Sensing

Autonomous swarms of multi-Unmanned Aerial Vehicle (UAV) system requires an accurate and fast relative state estimation. Although monocular frame-based camera methods perform well in ideal conditions, they are slow, suffer scale ambiguity, and often struggle in visually challenging conditions. The advent of event cameras addresses these challenging tasks by providing low latency, high dynamic range, and microsecond-level temporal resolution. This paper proposes a framework for relative state estimation for quadrotors using event-based propeller sensing. The propellers in the event stream are tracked by detection to extract the region-of-interests. The event streams in these regions are processed in temporal chunks to estimate per-propeller frequencies. These frequency measurements drive a kinematic state estimation module as a thrust input, while camera-derived position measurements provide the update step. Additionally, we use geometric primitives derived from event streams to estimate the orientation of the quadrotor by fitting an ellipse over a propeller and backprojecting it to recover body-frame tilt-axis. The existing event-based approaches for quadrotor state estimation use the propeller frequency in simulated flight sequences. Our approach estimates the propeller frequency under 3% error on a test dataset of five real-world outdoor flight sequences, providing a method for decentralized relative localization for multi-robot systems using event camera.


[144] 2604.18343

DAG-STL: A Hierarchical Framework for Zero-Shot Trajectory Planning under Signal Temporal Logic Specifications

Signal Temporal Logic (STL) is a powerful language for specifying temporally structured robotic tasks. Planning executable trajectories under STL constraints remains difficult when system dynamics and environment structure are not analytically available. Existing methods typically either assume explicit models or learn task-specific behaviors, limiting zero-shot generalization to unseen STL tasks. In this work, we study offline STL planning under unknown dynamics using only task-agnostic trajectory data. Our central design philosophy is to separate logical reasoning from trajectory realization. We instantiate this idea in DAG-STL, a hierarchical framework that converts long-horizon STL planning into three stages. It first decomposes an STL formula into reachability and invariance progress conditions linked by shared timing constraints. It then allocates timed waypoints using learned reachability-time estimates. Finally, it synthesizes trajectories between these waypoints with a diffusion-based generator. This decomposition--allocation--generation pipeline reduces global planning to shorter, better-supported subproblems. To bridge the gap between planning-level correctness and execution-level feasibility, we further introduce a rollout-free dynamic consistency metric, an anytime refinement search procedure for improving multiple allocation hypotheses under finite budgets, and a hierarchical online replanning mechanism for execution-time recovery. Experiments in Maze2D, OGBench AntMaze, and the Cube domain show that DAG-STL substantially outperforms direct robustness-guided diffusion on complex long-horizon STL tasks and generalizes across navigation and manipulation settings. In a custom environment with an optimization-based reference, DAG-STL recovers most model-solvable tasks while retaining a clear computational advantage over direct optimization based on the explicit system model.


[145] 2604.18379

Forecasting Ionospheric Irregularities on GNSS Lines of Sight Using Dynamic Graphs with Ephemeris Conditioning

Most data-driven ionospheric forecasting models operate on gridded products, which do not preserve the time-varying sampling structure of satellite-based sensing. We instead model the ionosphere as a dynamic graph over ionospheric pierce points (IPPs), with connectivity that evolves as satellite positions change. Because satellite trajectories are predictable, the graph topology over the forecast horizon can be constructed in advance. We exploit this property to condition forecasts on the future graph structure, which we term ephemeris conditioning. This enables prediction on lines of sight that appear only in the forecast horizon. We evaluate our framework on multi-GNSS (Global Navigation Satellite System) data from a co-located receiver pair in Singapore spanning January 2023 through April 2025. The task is to forecast Rate of TEC Index (ROTI)-defined irregularities at 5-minute cadence up to 2 hours ahead as binary probabilistic classification per node. The resulting model, IonoDGNN, achieves a Brier Skill Score (BSS) of 0.49 and a precision-recall area under the curve (PR-AUC) of 0.75, improving over persistence by 35\% in BSS and 52\% in PR-AUC, with larger gains at longer lead times. Ablations confirm that graph structure and ephemeris conditioning each contribute meaningfully, with conditioning proving essential for satellites that rise during the forecast horizon (receiver operating characteristic AUC: 0.95 vs.\ 0.52 without). Under simulated coverage dropout, the model retains predictive skill on affected nodes through spatial message passing from observed neighbors. These results suggest that dynamic graph forecasting on evolving lines of sight is a viable alternative to grid-based representations for ionospheric irregularity forecasting. The model and evaluation code will be released upon publication.


[146] 2604.18391

Feedforward Phase Noise Compensation for Intersymbol Interference Channels

A non-iterative phase noise compensation method based on the sum-product algorithm (SPA) is applied to the outputs of intersymbol interference (ISI) channels. The outputs are modeled as independent Gaussian random variables, and the receiver applies mismatched processing with von Mises statistics. The performance is compared with that of linear minimum-mean-square-error filtering. The SPA achieves higher information rates at similar complexity for three channel types: ISI-free, standard single-mode fiber, and multipath channels with orthogonal frequency-division multiplexing.


[147] 2604.18438

Scalable Physics-Informed Neural Differential Equations and Data-Driven Algorithms for HVAC Systems

We present a scalable, data-driven simulation framework for large-scale heating, ventilation, and air conditioning (HVAC) systems that couples physics-informed neural ordinary differential equations (PINODEs) with differential-algebraic equation (DAE) solvers. At the component level, we learn heat-exchanger dynamics using an implicit PINODE formulation that predicts conserved quantities (refrigerant mass $M_r$ and internal energy $E_\text{hx}$) as outputs, enabling physics-informed training via automatic differentiation of mass/energy balances. Stable long-horizon prediction is achieved through gradient-stabilized latent evolution with gated architectures and layer normalization. At the system level, we integrate learned components with DAE solvers (IDA and DASSL) that explicitly enforce junction constraints (pressure equilibrium and mass-flow consistency), and we use Bayesian optimization to tune solver parameters for accuracy--efficiency trade-offs. To reduce residual system-level bias, we introduce a lightweight corrector network trained on short trajectory segments. Across dual-compressor and scaled network studies, the proposed approach attains multi-fold speedups over high-fidelity simulation while keeping errors low (MAPE below a few percent) and scales to systems with up to 32 compressor--condenser pairs.


[148] 2604.18489

Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints

Large Language Models (LLMs) show promise in lyric-to-melody generation, but models trained with Supervised Fine-Tuning (SFT) often produce musically implausible melodies with issues like poor rhythm and unsuitable vocal ranges, a phenomenon we term "constraint violation". To address this, we propose a novel alignment framework that instills musical knowledge without human annotation. We define rule-based musical constraints to automatically generate a preference dataset from an SFT model's outputs. The model is then aligned through a sequential process, first using Direct Preference Optimization (DPO) on paired preference data, followed by Kahneman-Tversky Optimization (KTO) on unpaired negative samples. Experimental results demonstrate that our aligned model substantially reduces rule violations and outperforms strong baselines in both objective and subjective evaluations, generating melodies with substantially improved musicality and coherence. An interactive demo with audio comparisons is available at this https URL.


[149] 2604.18492

Barrier-enforced multi-objective optimization for direct point and sharp interval forecasting

This paper proposes a multi-step probabilistic forecasting framework using a single neural-network based model to generate simultaneous point and interval forecasts. Our approach ensures non-crossing prediction intervals (PIs) through a model structure design that strictly satisfy a target coverage probability (PICP) while maximizing sharpness. Unlike existing methods that rely on manual weight tuning for scalarized loss functions, we treat point and PI forecasting as a multi-objective optimization problem, utilizing multi-gradient descent to adaptively select optimal weights. Key innovations include a new PI loss function based on an extended log-barrier with an adaptive hyperparameter to guarantee the coverage, a hybrid architecture featuring a shared temporal model with horizon-specific submodels, and a training strategy. The proposed loss is scale-independent and universally applicable; combined with our training algorithm, the framework eliminates trial-and-error hyperparameter tuning for balancing multiple objectives. Validated by an intra-day solar irradiance forecasting application, results demonstrate that our proposed loss consistently outperforms those in current literature by achieving target coverage with the narrowest PI widths. Furthermore, when compared against LSTM encoder-decoder and Transformer architectures--including those augmented with Chronos foundation models--our method remains highly competitive and can be seamlessly adapted to any deep learning structure.


[150] 2604.18546

Wasserstein Distributionally Robust Risk-Sensitive Estimation via Conditional Value-at-Risk

We propose a distributionally robust approach to risk-sensitive estimation of an unknown signal x from an observed signal y. The unknown signal and observation are modeled as random vectors whose joint probability distribution is unknown, but assumed to belong to a given type-2 Wasserstein ball of distributions, termed the ambiguity set. The performance of an estimator is measured according to the conditional value-at-risk (CVaR) of the squared estimation error. Within this framework, we study the problem of computing affine estimators that minimize the worst-case CVaR over all distributions in the given ambiguity set. As our main result, we show that, when the nominal distribution at the center of the Wasserstein ball is finitely supported, such estimators can be exactly computed by solving a tractable semidefinite program. We evaluate the proposed estimators on a wholesale electricity price forecasting task using real market data and show that they deliver lower out-of-sample CVaR of squared error compared to existing methods.


[151] 2108.05287

Semantic Mobile Base Station Placement

Location of Base Stations (BS) in mobile networks plays an important role in coverage and received signal strength. As Internet ofThings (IoT), autonomous vehicles and smart cities evolve, wireless net-work coverage will have an important role in ensuring seamless connectivity. Due to use of higher carrier frequencies, blockages cause communication to primarily be Line of Sight (LoS), increasing the importance of base station placement. In this paper, we propose a novel placement pipeline in which we perform semantic segmentation of aerial drone imagery using DeepLabv3+ and create its 2.5D model with the help ofDigital Surface Model (DSM). This is used along with Vienna simulator for finding the best location for deploying base stations by formulating the problem as a multi-objective function and solving it using Non-Dominated Sorting Genetic Algorithm II (NSGA-II). The case with and without prior deployed base station is considered. We evaluate the basestation deployment based on Signal to Interference Noise Ratio (SINR)coverage probability and user down-link throughput. This is followed by comparison with other base station placement methods and the bene-fits offered by our approach. Our work is novel as it considers scenarios where there is high ground elevation and building density variation, and shows that irregular BS placement improves coverage.


[152] 2310.07464

Multi-Beholder: Biomarker Prediction for Low-Grade Glioma with Multiple Instance Learning and One-Class Classification

Biomarker detection is an indispensable part of the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, named Multi-Biomarker Histomorphology Discoverer (Multi-Beholder), to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images. Specifically, Multi-Beholder incorporates one-class classification into the multiple instance learning framework to achieve accurate instance-level pseudo-labeling, thereby complementing slide-level labels and improving prediction performance. Multi-Beholder demonstrates high performance on two LGG cohorts with diverse races and scanning protocols, with area under the receiver operating characteristic curve up to 0.973 on the internal-validated TCGA-LGG dataset and 0.820 on the external-validated Xiangya cohort. Moreover, the interpretability of Multi-Beholder allows for discovering quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression. Code can be accessed at this https URL.


[153] 2404.06752

A Necessary and Sufficient Condition for Local Synchronization in Nonlinear Oscillator Networks

Determining conditions on the coupling strength for the synchronization in networks of interconnected oscillators is a challenging problem in nonlinear dynamics. While sophisticated mathematical methods have been used to derive conditions, these conditions are usually only sufficient and/ or based on numerical methods. We addressed the gap between the sufficient coupling strength and numerically observations using the Lyapunov-Floquet Theory and the Master Stability Function framework. We showed that a positive coupling strength is a necessary and sufficient condition for local synchronization in a network of identical oscillators coupled linearly and in full state fashion. For partial state coupling, we showed that a positive coupling constant results in an asymptotic contraction of the trajectories in the state space, which results in synchronisation for two-dimensional oscillators. We extended the results to networks with non-identical coupling over directed graphs and showed that positive coupling constants is a sufficient condition for synchronisation. These theoretical results are validated using numerical simulations and experimental implementations. Our results contribute to bridging the gap between the theoretically derived sufficient coupling strengths and the numerically observed ones.


[154] 2407.11256

Controlled Invariant Sets for Gaussian Process State Space Models

We compute probabilistic controlled invariant sets for nonlinear systems using Gaussian process state space models, which are data-driven models that account for unmodeled and unknown nonlinear dynamics. We propose a semidefinite programming scheme for designing state-feedback controllers that maximize the probability of the trajectories staying within a probabilistic controlled invariant set while satisfying input constraints. The results are validated on a quadrotor, both in simulation and on a physical platform.


[155] 2410.21160

KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation

Background and Objective: In the realm of ophthalmic imaging, accurate vascular segmentation is paramount for diagnosing and managing various eye diseases. Contemporary deep learning-based vascular segmentation models rival human accuracy but still face substantial challenges in accurately segmenting minuscule blood vessels in neural network applications. Due to the necessity of multiple downsampling operations in the CNN models, fine details from high-resolution images are inevitably lost. The objective of this study is to design a structure to capture the delicate and small blood vessels. Methods: To address these issues, we propose a novel network (KaLDeX) for vascular segmentation leveraging a Kalman filter based linear deformable cross attention (LDCA) module, integrated within a UNet++ framework. Our approach is based on two key components: Kalman filter (KF) based linear deformable convolution (LD) and cross-attention (CA) modules. The LD module is designed to adaptively adjust the focus on thin vessels that might be overlooked in standard convolution. The CA module improves the global understanding of vascular structures by aggregating the detailed features from the LD module with the high level features from the UNet++ architecture. Finally, we adopt a topological loss function based on persistent homology to constrain the topological continuity of the segmentation. Results: The proposed method is evaluated on retinal fundus image datasets (DRIVE, CHASE_BD1, and STARE) as well as the 3mm and 6mm of the OCTA-500 dataset, achieving an average accuracy (ACC) of 97.25%, 97.77%, 97.85%, 98.89%, and 98.21%, respectively. Conclusions: Empirical evidence shows that our method outperforms the current best models on different vessel segmentation datasets. Our source code is available at: this https URL.


[156] 2411.05824

Navigating Distribution Shifts in Medical Image Analysis: A Survey

Medical Image Analysis (MedIA) has become indispensable in modern healthcare, enhancing clinical diagnostics and personalized treatment. Despite the remarkable advancements supported by deep learning (DL) technologies, their practical deployment faces challenges posed by distribution shifts, where models trained on specific datasets underperform on others from varying hospitals, or patient populations. To address this issue, researchers have been actively developing strategies to increase the adaptability of DL models, enabling their effective use in unfamiliar environments. This paper systematically reviews approaches that apply DL techniques to MedIA systems affected by distribution shifts. Rather than organizing existing methods by technical characteristics, we explicitly bridge real-world clinical constraints -- such as limited data accessibility, strict privacy requirements, and heterogeneous collaboration protocols -- with the technical paradigms able to address them. By establishing this connection between operational constraints and methodological evolution, we categorize existing works into Joint Training, Federated Learning, Fine-tuning, and Domain Generalization, each aligned with specific healthcare scenarios. Beyond this taxonomy, our empirical analysis suggests that, as domain information becomes progressively less accessible across these paradigms, performance improvements become increasingly constrained, and further uncovers a gradual shift in methodological focus from explicit distribution alignment toward uncertainty-aware modeling, ultimately pointing to the need for more deployability-aware design in real-world MedIA.


[157] 2411.09593

SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms

The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15.


[158] 2502.05762

Non-invasive electromyographic speech neuroprosthesis: a geometric perspective

We present a neuromuscular speech interface that translates silently voiced articulations directly into text. We record surface electromyographic (EMG) signals from multiple articulatory sites on the face and neck as participants silently articulate speech, enabling direct EMG-to-text translation. Such an interface has the potential to restore communication for individuals who have lost the ability to produce intelligible speech due to laryngectomy, neuromuscular disease, stroke, or trauma-induced damage (e.g., radiotherapy toxicity) to the speech articulators. Prior work has largely focused on mapping EMG collected during audible articulation to time-aligned audio targets or transferring these targets to silent EMG recordings, which inherently requires audio and limits applicability to patients who can no longer speak. In contrast, we propose an efficient representation of high-dimensional EMG signals and demonstrate direct sequence-to-sequence EMG-to-text conversion at the phonemic level without relying on time-aligned audio.


[159] 2504.03293

Chance-Constrained Neural MPC under Uncontrollable Agents via Sequential Convex Programming

This work investigates the challenge of ensuring safety guarantees in the presence of uncontrollable agents, whose behaviors are stochastic and depend on both their own and the system's states. We present a neural model predictive control (MPC) framework that predicts the trajectory of the uncontrollable agent using a predictor learned from offline data. To provide formal probabilistic guarantees on prediction errors despite policy-induced distribution shifts, we propose a region-wise robust conformal prediction scheme to construct time-dependent uncertainty bounds, which are integrated into the MPC formulation. To solve the resulting non-convex, discontinuous optimization problem, we propose a two-loop iterative sequential convex programming algorithm. The inner loop solves convexified subproblems with fixed error bounds, while the outer loop refines these bounds based on updated control sequences. We establish convergence guarantees and analyze the optimality of the algorithm. We illustrate our method with an autonomous driving scenario involving interactive pedestrians. Experimental results demonstrate that our approach achieves superior safety and efficiency compared to baseline methods, with success rates exceeding 99.5% while maintaining higher average speeds in multi-pedestrian scenarios.


[160] 2504.04814

Explaining Uncertainty in Multiple Sclerosis Cortical Lesion Segmentation Beyond Prediction Errors

Trustworthy artificial intelligence (AI) is essential in healthcare, particularly for high-stakes tasks like medical image segmentation. Explainable AI and uncertainty quantification significantly enhance AI reliability by addressing key attributes such as robustness, usability, and explainability. Despite extensive technical advances in uncertainty quantification for medical imaging, understanding the clinical informativeness and interpretability of uncertainty remains limited. This study presents an interpretability framework for analyzing lesion-scale predictive uncertainty in cortical lesion segmentation in multiple sclerosis using deep ensembles. The analysis shifts the focus from the uncertainty--error relationship towards clinically relevant medical and engineering factors. Our findings reveal that instance-wise uncertainty is strongly related to lesion size, shape, and cortical involvement. Expert rater feedback confirms that similar factors impede annotator confidence. Evaluations conducted on two datasets (206 patients, almost 2000 lesions) under both in-domain and distribution-shift conditions highlight the utility of the framework in different scenarios.


[161] 2504.08644

Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

Sound event localization and detection (SELD) involves predicting active sound event classes over time while estimating their positions. The localization subtask in SELD is usually treated as a direction of arrival estimation problem, ignoring source distance. Only recently, SELD was extended to 3D by incorporating distance estimation, enabling the prediction of sound event positions in 3D space (3D SELD). However, existing methods lack input features specifically designed for distance estimation. We address this gap by introducing two novel reverberation-based feature formats: one using the direct-to-reverberant ratio (DRR) and another leveraging signal autocorrelation to capture early reflections. We extensively evaluate and benchmark these features on the STARSS23 dataset, combining them with established SELD features for sound event detection (SED) and direction-of-arrival estimation (DOAE), and testing across different network architectures. Our proposed features, applicable to both FOA and MIC formats, achieve state-of-the-art distance estimation, enhancing overall 3D SELD performance.


[162] 2506.00720

Bi-Level optimization for interpolation-based parameter estimation of differential equations

Inverse problem or parameter estimation of ordinary differential equations (ODEs), the iterative process of minimizing the mismatch between model-predicted and experimental states by tuning the parameter values within an optimization formulation, is commonplace in chemical engineering applications. A popular method for parameter estimation is sequential optimization (single-shooting), which numerically integrates the ODE in each iteration. However, computing the gradients for the optimization steps requires calculating sensitivities, i.e., the derivatives of states with respect to the parameters, through the numerical integrator, which can be computationally expensive. In this work, we use interpolation to reduce the cost of these sensitivity calculations. Leveraging this interpolation, we also propose a bi-level optimization framework that exploits the structure of the differential equations and solves a convex inner problem. We apply this framework to examples spanning conventional parameter estimation and the emerging concept of data-driven dynamic model discovery. We show that our approach not only estimates the correct parameters for benchmark problems, but can also be readily extended to delay, stiff, and partially observed differential equations without major modifications.


[163] 2506.03178

LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning

Automated radiology report generation holds significant potential to reduce radiologists' workload and enhance diagnostic accuracy. However, generating precise and clinically meaningful reports from chest radiographs remains challenging due to the complexity of medical language and the need for contextual understanding. Existing models often struggle with maintaining both accuracy and contextual relevance. In this paper, we present LLaMA-XR, a novel framework that integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning. LLaMA-XR achieves improved coherence and clinical accuracy while maintaining computational efficiency. This efficiency is driven by an optimization strategy that enhances parameter utilization and reduces memory overhead, enabling faster report generation with lower computational resource demands. Extensive experiments conducted on the IU X-ray benchmark dataset demonstrate that LLaMA-XR outperforms a range of state-of-the-art methods. Our model achieves a ROUGE-L score of 0.433 and a METEOR score of 0.336, establishing new performance benchmarks in the domain. These results underscore LLaMA-XR's potential as an effective and efficient AI system for automated radiology reporting, offering enhanced clinical utility and reliability.


[164] 2507.17869

Integrating Feature Selection and Machine Learning for Nitrogen Assessment in Grapevine Leaves using In-Field Hyperspectral Imaging

Nitrogen (N) is one of the most critical nutrients in winegrape production, influencing vine vigor, fruit composition, and wine quality. Because soil N availability varies spatially and temporally, accurate estimation of leaf N concentration is essential for optimizing fertilization at the individual plant level. In this study, in-field hyperspectral images (400-1000 nm) were collected from four grapevine cultivars (Chardonnay, Pinot Noir, Concord, and Syrah) across two growth stages (bloom and veraison) during the 2022 and 2023 growing seasons at both the leaf and canopy levels. An ensemble feature selection framework was developed to identify the most informative spectral bands for N estimation within individual cultivars, effectively reducing redundancy and selecting compact, physiologically meaningful band combinations spanning the visible, red-edge, and near-infrared regions. At the leaf level, models achieved the highest predictive accuracy for Chardonnay (R^2 = 0.82, RMSE = 0.19 %DW) and Pinot Noir (R^2 = 0.69, RMSE = 0.20 %DW). Canopy-level predictions also performed well, with R^2 values of 0.65, 0.72, and 0.70 for Chardonnay, Concord, and Syrah, respectively. White cultivars exhibited balanced spectral contributions across the visible, red-edge, and near-infrared regions, whereas red cultivars relied more heavily on visible bands due to anthocyanin-chlorophyll interactions. Leaf-level N-sensitive bands selected for Chardonnay and Pinot Noir were successfully transferred to the canopy level, improving or maintaining prediction accuracy across cultivars. These results confirm that ensemble feature selection captures spectrally robust, scale-consistent bands transferable across measurement levels and cultivars, demonstrating the potential of integrating in-field hyperspectral imaging with machine learning for vineyard N status monitoring.


[165] 2508.16457

Wide-Area Power System Oscillations from Large-Scale AI Workloads

This paper develops a new dynamic power profiling approach for modeling AI-centric datacenter loads and analyzing their impact on grid operations, particularly their potential to induce wide-area grid oscillations. We characterize the periodic stochastic power fluctuations inherent to large-scale AI workloads during both the training and fine-tuning stages, driven by the state-of-the-art graphics processing unit (GPU) computing architecture design. % and distributed mini-batch processing cycles. These sustained, large power fluctuations, unlike conventional load ramping, act as persistent forcing inputs capable of interacting with and amplifying local and inter-area oscillation modes. Using the WECC 179-bus system and the NPCC 140-bus system, we have numerically studied the amplitude and variability of oscillatory responses under different factors. These factors include system strength, penetration level, fluctuation frequency range, individual datacenter size, geographical deployment, fluctuation suppression level, and workload ratio. Simulation results show that, notably, narrower fluctuation bands, larger single-site capacities, or dispersed siting can intensify oscillations across multiple modes. Our models and numerical studies provide a quantitative basis for integrating AI-dominant electricity demand into grid oscillation studies and further support the development of new planning and operational measures to power the growth of AI/computing load demands.


[166] 2509.03070

CWT-Enhanced Vibration Sensing With Spatial Fault Localization Using YOLO

This letter presents a CWT-enhanced vibration sensing framework for bearing fault monitoring through spatial localization on time-frequency spectrograms. Vibration signals are transformed into continuous wavelet transform (CWT) spectrograms to improve the observability of weak and non-stationary fault signatures, and YOLOv9, YOLOv10, and YOLOv11 are employed to localize and identify fault-related energy regions. Experiments on the CWRU, PU, and IMS datasets show that the proposed framework improves the detectability and robustness of fault-related sensing patterns compared with conventional time-series models, modern vision backbones, and short-time Fourier transform (STFT)-based representations, achieving mAP values up to 99.4%, 97.8%, and 99.5%, respectively. In addition, the region-aware localization provides a more interpretable connection between time-frequency energy distributions and bearing fault characteristics. These results demonstrate that spatial localization on CWT spectrograms offers an effective and generalizable approach for enhancing vibration sensing capability in non-stationary environments.


[167] 2509.19275

A Novel Site-Specific Inference Model for Urban Canyon Channels: From Measurements to Modeling

With the rapid development of intelligent transportation and smart city applications, urban canyon has become a critical scenario for the design and evaluation of wireless communication systems. Due to its unique environmental layout, the channel characteristics in urban canyon are strongly a street geometry and building distribution, thereby exhibiting significant site-specific channel condition. However, this feature has not been well captured in existing channel models. In this paper, we propose a site-specific channel inference model based on environmental geometry, the model is parameterized using sub-6GHz channel measurements. Multipath components (MPCs) are extracted and clustered according to geometric propagation, which are explicitly derived from the influence of canyon width, thereby establishing an interpretable mapping between the physical environment and statistical characteristics of MPCs. A step-by-step implementation scheme is presented. Subsequently, the proposed site-specific channel inference model is validated by comparing second-order statistics of channels, derived from the model and measurements. The results show that the proposed model achieves high accuracy and robustness in different urban canyon scenarios.


[168] 2510.02556

Multi-Source Position and Direction-of-Arrival Estimation Based on Euclidean Distance Matrices

A popular method to estimate the positions or directions-of-arrival (DOAs) of multiple sound sources using an array of microphones is based on steered-response power (SRP) beamforming. For a three-dimensional scenario, SRP-based methods require joint optimization of three continuous variables for position estimation or two continuous variables for DOA estimation, which can be computationally expensive when high localization accuracy is desired. In this paper, we propose novel methods for multi-source position and DOA estimation by exploiting properties of Euclidean distance matrices (EDMs) and their respective Gram matrices. All methods require estimated time-differences of arrival (TDOAs) between the microphones. In the proposed multi-source position estimation method, only a single continuous variable per source, representing the distance to a reference microphone, needs to be optimized. For each source, the optimal distance variable and set of candidate TDOA estimates are determined by minimizing a cost function defined using the eigenvalues of the Gram matrix. The estimated relative source positions are then mapped to absolute source positions by solving an orthogonal Procrustes problem. The proposed multi-source DOA estimation method eliminates the need for continuous variable optimization. The optimal set of candidate TDOA estimates is determined by minimizing a cost function defined using the eigenvalues of a rank-reduced Gram matrix. For two sources in a noisy and reverberant environment, experimental results for different source and microphone configurations with six microphones show that the proposed EDM-based method consistently outperforms the SRP-based method in terms of position and DOA estimation accuracy and run time.


[169] 2510.02636

Guaranteed Time Control using Linear Matrix Inequalities

This paper presents a synthesis approach aiming to guarantee a minimum upper-bound for the time taken to reach a target set of non-zero measure that encompasses the origin, while taking into account uncertainties and input and state constraints. This approach is based on a harmonic transformation of the Lyapunov function and a novel piecewise quadratic representation of this transformed Lyapunov function over a simplicial partition of the state space. The problem is solved in a policy iteration fashion, whereas the evaluation and improvement steps are formulated as linear matrix inequalities employing the structural relaxation approach. Though initially formulated for uncertain polytopic systems, extensions to piecewise and nonlinear systems are discussed. Three examples illustrate the effectiveness of the proposed approach in different scenarios.


[170] 2510.08047

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

Robust ASR under domain shift is crucial because real-world systems encounter unseen accents and domains with limited labeled data. Although pseudo-labeling offers a practical workaround, it often introduces systematic, accent-specific errors that filtering fails to fix. We ask: How can we correct these recurring biases without target ground truth? We propose a simple parameter-space correction: in a source domain containing both real and pseudo-labeled data, two ASR models are fine-tuned from the same initialization, one on ground-truth labels and the other on pseudo-labels, and their weight difference forms a correction vector that captures pseudo-label biases. When applied to a pseudo-labeled target model, this vector enhances recognition, achieving up to a 35% relative Word Error Rate (WER) reduction on AfriSpeech-200 across ten African accents with the Whisper tiny model.


[171] 2510.09943

Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace

We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.


[172] 2511.10526

Evaluation of Grid-based Uncertainty Propagation for Collaborative Self-Calibration in Indoor Positioning Systems

Radio-based localization systems conventionally require stationary reference points (e.g. anchors) with precisely surveyed positions, making deployment time-consuming and costly. This paper presents an empirical evaluation of collaborative self-calibration for Ultra-Wideband (UWB) networks, extending a discrete Bayesian approach based on grid-based uncertainty propagation. The enhanced algorithm reduces measurement availability requirements while maintaining positioning accuracy through probabilistic state estimation. We validate the approach using real-world data from controlled indoor UWB network experiments with 12 nodes in a static environment. Experimental evaluation demonstrates 0.28~m mean ranging error under line-of-sight conditions and 1.11~m overall ranging error across mixed propagation scenarios, achieving sub-meter positioning accuracy. Results demonstrate the algorithm's robustness to measurement noise and partial connectivity scenarios typical in industrial deployments. The findings contribute to automated UWB network initialization for indoor positioning applications, reducing infrastructure dependency compared to manual anchor calibration procedures.


[173] 2511.11308

Policy Optimization for Unknown Systems using Differentiable Model Predictive Control

Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.


[174] 2511.13546

On the controller form for linear hyperbolic MIMO systems with dynamic boundary conditions

This contribution develops an algebraic approach to obtain a controller form for a class of linear hyperbolic MIMO systems, bidirectionally coupled with a linear ODE system at the unactuated boundary. After a short summary of established controller forms for SISO and MIMO ODE as well as SISO hyperbolic PDE systems, it is shown that the approach to state a controller form for SISO systems cannot easily be transferred to the MIMO case as it already fails for a very simple example. Next, a generalised hyperbolic controller form with different variants is proposed and a new flatness-based scheme to compute said form is presented. Therein, the system is treated in an algebraic setting where quasipolynomials are used to express the predictions and delays in the system. The proposed algorithm is then applied to the motivating example.


[175] 2512.10738

Conformal Prediction-Based MPC for Stochastic Linear Systems

We propose a stochastic model predictive control (MPC) framework for linear systems subject to joint-in-time chance constraints under unknown disturbance distributions. Unlike existing approaches that rely on parametric or Gaussian assumptions, or require expensive offline computation, the method uses conformal prediction to construct finite-sample confidence regions for the system's error trajectories with minimal computational effort. These probabilistic sets enable relaxation of the joint-in-time chance constraints into a deterministic closed-loop formulation based on indirect feedback, ensuring recursive feasibility and chance constraint satisfaction. Further, we extend to the output feedback setting and establish analogous guarantees from output measurements alone, given access to noise samples. Numerical examples demonstrate the effectiveness and advantages compared to existing approaches.


[176] 2512.12638

Electric Road Systems for Smart Cities: A Scalable Infrastructure Framework for Dynamic Wireless Charging

The transition to electric transportation is a key enabler for intelligent and sustainable cities; however, inadequate charging infrastructure remains a major barrier to large-scale electric vehicle (EV) adoption. This paper presents a scalable Electric Road System (ERS) architecture that enables Dynamic Wireless Charging (DWC) of EVs during motion. The proposed framework integrates inductive charging coils embedded in road pavement, real-time vehicle-to-infrastructure (V2I) communication, and adaptive energy management coordinated with smart grid systems. Modular road segments with a standardized charging process are employed to ensure scalability across urban corridors and interoperability among different EV platforms. System performance is evaluated using a co-simulation framework combining MATLAB-based power analysis with traffic inputs generated in SUMO. Key performance metrics include charging efficiency, energy cost per kilometer, and battery lifecycle improvement. Simulation results indicate a potential reduction in range anxiety and an increase in battery lifespan due to frequent shallow charging cycles. The study further discusses deployment challenges, policy considerations, and energy distribution strategies aligned with climate-resilient urban development. A case study of a tier-1 Indian city is presented to analyze the cost-benefit trade-offs of retrofitting high-density urban corridors with ERS. The proposed framework provides a practical foundation for next-generation EV infrastructure planning in smart cities.


[177] 2601.03065

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

Modeling fine-grained speaking styles remains challenging for language-speech representation pre-training, as existing speech-text models are typically trained with coarse captions or task-specific supervision, and scalable fine-grained style annotations are unavailable. We present FCaps, a large-scale dataset with fine-grained free-text style descriptions, encompassing 47k hours of speech and 19M fine-grained captions annotated via a novel end-to-end pipeline that directly grounds detailed captions in audio, thereby avoiding the error propagation caused by LLM-based rewriting in existing cascaded pipelines. Evaluations using LLM-as-a-judge demonstrate that our annotations surpass existing cascaded annotations in terms of correctness, coverage, and naturalness. Building on FCaps, we propose CLSP, a contrastive language-speech pre-trained model that integrates global and fine-grained supervision, enabling unified representations across multiple granularities. Extensive experiments demonstrate that CLSP learns fine-grained and multi-granular speech-text representations that perform reliably across global and fine-grained speech-text retrieval, zero-shot paralinguistic classification, and speech style similarity scoring, with strong alignment to human judgments. Code and dataset are publicly available at this https URL.


[178] 2601.03632

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

Zero-shot text-to-speech models can clone a speaker's timbre from a short reference audio, but they also strongly inherit the speaking style present in the reference. As a result, synthesizing speech with a desired style often requires carefully selecting reference audio, which is impractical when only limited or mismatched references are available. While recent controllable TTS methods attempt to address this issue, they typically rely on absolute style targets and discrete textual prompts, and therefore do not support continuous and reference-relative style control. We propose ReStyle-TTS, a framework that enables continuous and reference-relative style control in zero-shot TTS. Our key insight is that effective style control requires first reducing the model's implicit dependence on reference style before introducing explicit control mechanisms. To this end, we introduce Decoupled Classifier-Free Guidance (DCFG), which independently controls text and reference guidance, reducing reliance on reference style while preserving text fidelity. On top of this, we apply style-specific LoRAs together with Orthogonal LoRA Fusion to enable continuous and disentangled multi-attribute control, and introduce a Timbre Consistency Optimization module to mitigate timbre drift caused by weakened reference guidance. Experiments show that ReStyle-TTS enables user-friendly, continuous, and relative control over pitch, energy, and multiple emotions while maintaining intelligibility and speaker timbre, and performs robustly in challenging mismatched reference-target style scenarios.


[179] 2601.04198

Identification of a Kalman filter: consistency of local solutions

Prediction error and maximum likelihood methods are powerful tools for identifying linear dynamical systems and, in particular, enable the joint estimation of model parameters and the Kalman filter used for state estimation. A key limitation, however, is that these methods require solving a generally non-convex optimization problem to global optimality. This paper analyzes the statistical behavior of local minimizers in the special case where only the Kalman gain is estimated. We prove that these local solutions are statistically consistent estimates of the true Kalman gain. This follows from asymptotic unimodality: as the dataset grows, the objective function converges to a limit with a unique local (and therefore global) minimizer. We further provide guidelines for designing the optimization problem for Kalman filter tuning and discuss extensions to the joint estimation of additional linear parameters and noise covariances. Finally, the theoretical results are illustrated using three examples of increasing complexity. The main practical takeaway of this paper is that difficulties caused by local minimizers in system identification are, at least, not attributable to the tuning of the Kalman gain.


[180] 2601.12695

From Noise to Knowledge: System Identification with Systematic Polytope Construction via Cyclic Reformulation

Model-based robust control requires not only accurate nominal models but also systematic uncertainty representations to guarantee stability and performance. However, constructing polytopic uncertainty models typically demands multiple experiments or a priori structural this http URL paper proposes an identification framework based on intentional periodicity induction, in which cyclic reformulation with period $N$ is applied to a linear time-invariant system to interpret noise-induced parameter fluctuations as a structured manifestation of estimation uncertainty. The $N$ parameter sets obtained from a single identification experiment -- which would coincide in the noise-free case -- are used as polytope vertices, providing systematic control over the granularity of the uncertainty description through the choice of $N$. The practical utility of the constructed polytope is demonstrated through robust $H_\infty$ state-feedback synthesis via LMI optimization at the polytope vertices; the synthesis uses only noisy identification data and is shown across Monte Carlo trials to stabilize the true plant with only marginal conservatism. Complementarily, a diagnostic assessment based on the best in-polytope point confirms that the polytope captures meaningful uncertainty information. For a third-order system under Gaussian and uniform noise, a comparison with bootstrap-inspired resampling baselines indicates that cyclic reformulation provides a competitive or favorable trade-off by utilizing the full data record; the construction is further validated on a fourth-order MIMO system.


[181] 2601.20178

Coverage Performance Analysis of FAS-enhanced LoRa Wide Area Networks under both Co-SF and Inter-SF Interference

This paper presents an analytical framework for evaluating the coverage performance of the fluid antenna system (FAS)-enhanced LoRa wide-area networks (LoRaWANs). We investigate the effects of large-scale pathloss in LoRaWAN, small-scale fading characterized by FAS, and dense interference (i.e., packet collisions under the ALOHA protocol) arising from randomly deployed end devices (EDs). Both co-spreading factor (co-SF) interference (with the same SF) and inter-SF interference (with different SFs) are introduced into the network, and their differences in physical characteristics are also considered in the analysis. Additionally, simple yet accurate statistical approximations of the FAS channel envelope and power are derived using the extreme-value theorem. Based on the approximated channel expression, the theoretical coverage probability of the proposed FAS-enhanced LoRaWAN is derived. Numerical results validate our analytical approximations by exhibiting close agreement with the exact correlation model. Notably, it is revealed that a FAS with a normalized aperture of 1 times 1 can greatly enhance network performance, in terms of both ED numbers and coverage range.


[182] 2602.08129

Adjustment of Cluster-Then-Predict Framework for Multiport Scatterer Load Prediction

Predicting interdependent load values in multiport scatterers is challenging due to high dimensionality and complex dependence between impedance and scattering ability, yet this prediction remains crucial for the design of communication and measurement systems. In this paper, we propose a two-stage cluster-then-predict framework for multiple load values prediction task in multiport scatterers. The proposed cluster-then-predict approach effectively captures the underlying functional relation between S-parameters and corresponding load impedances, achieving up to a 46% reduction in Root Mean Square Error (RMSE) compared to the baseline when applied to gradient boosting (GB). This improvement is consistent across various clustering and regression methods. Furthermore, we introduce the Real-world Unified Index (RUI), a metric for quantitative analysis of trade-offs among multiple metrics with conflicting objectives and different scales, suitable for performance assessment in realistic scenarios. Based on RUI, the combination of K-means clustering and k-nearest neighbors (KNN) is identified as the optimal setup for the analyzed multiport scatterer.


[183] 2602.20039

On the Spatial Consistency of Sub-Terahertz Channel Characteristics for Beyond-6G Systems

Ray tracing is a versatile approach for precise sub-terahertz (sub-THz, 100-300 GHz) channel modeling when designing new mechanisms for beyond-6G cellular systems. Theoretically, wireless channels may exhibit variations over wavelength distances. In the sub-THz band, close-to-millimeter wavelengths thus require extremely large computational efforts for ray-tracing modeling. However, in practice, channel characteristics may remain quantitatively similar over much larger distances, which can drastically decrease computational efforts. The aim of this study is to experimentally characterize the degree of spatial consistency in sub-THz channel characteristics. To this end, we performed a large-scale measurement campaign in the 140-150 GHz frequency band in an indoor-hall (InH) environment and characterized the channel at separation distances from 2.5 mm up to 1 m. Our results show that channel characteristics including delay spread, angular delay spread, and K-factor change only slightly over multiple tens of centimeter distances. This implies that, in the considered InH environment, the mesh grid can be in the range of 10-50 wavelengths (at 145 GHz) along stable line-of-sight (LoS) directions, while a finer resolution is needed in regions not dominated by LoS. For coarser grids, advanced interpolation is required to capture rapidly varying scattered components.


[184] 2603.16126

Wireless Digital Twin Calibration: Refining DFT-Domain Channel Information

Wireless digital twins can be leveraged to provide site-specific synthetic channel information through precise physical modeling and signal propagation simulations. This can help reduce the overhead of channel state information (CSI) acquisition, particularly needed for large-scale MIMO systems. For high-quality digital twin channels, the classical approach is to increase the digital twin fidelity via more accurate modeling of the environment, propagation, and hardware. This, however, comes with high computational cost, making it unsuitable for real-time applications. In this paper, we propose a new framework that, instead of calibrating the digital twin model itself, calibrates the DFT-domain channel information to reduce the gap between the low-fidelity digital twin and its high-fidelity counterpart or the real world. This allows systems to leverage a low-complexity digital twin for generating real-time channel information without compromising quality. To evaluate the effectiveness of the proposed approach, we adopt codebook-based CSI feedback as a case study, where refined synthetic channel information is used to identify the most relevant DFT codewords for each user. Simulation results demonstrate the effectiveness of the proposed digital twin calibration approach in achieving high CSI acquisition accuracy while reducing the computational overhead of the digital twin. This paves the way for realizing digital twin assisted wireless systems.


[185] 2603.21713

Simple Trajectory Smoothing for UAV Reference Path Planning Based on Decoupling, Spatial Modeling and Linear Programming

A method for trajectory smoothing for UAV reference path planning is presented. It is derived based on the dynamics of a Dubins airplane model, and involves a decoupling step, spatial modeling and linear programming. The decoupling step enables algebraic control laws for flight-path angle and speed control. Only for roll angle control an optimization step is applied, involving the solution of a small linear program. Two variations are discussed. They differ by reference centerline tracking and the introduction of a path shaping constraint. The benefit of natural dimensionality reduction for spatial modeling is discussed. The simplicity of the overall method is highlighted. An extension to aerobatic flight is outlined, which comes at the cost of a model approximation, however at the gain of maintaining the general model structure. An extension of the method to tractor path planning along 3D terrain is discussed. The method is validated in simulations.


[186] 2603.22460

Data-Driven Synthesis of Robust Positively Invariant Sets from Noisy Data

This paper develops a method to construct robust positively invariant (RPI) tube sets from finite noisy input-state data of an unknown linear time-invariant (LTI) system, yielding tubes that can be directly embedded in tube-based robust data-driven predictive control. Data-consistency uncertainty sets are constructed under process/measurement noise with polytopic/ellipsoidal bounds. In the measurement-noise case, we provide a deterministic and data-consistent procedure to certify the induced residual bound from data. Based on these sets, a robustly stabilizing state-feedback gain is certified via a common quadratic contraction, which in turn enables constructive polyhedral/ellipsoidal RPI tube computation. Numerical examples quantify the conservatism induced by noisy data and the employed certification step.


[187] 2603.23017

Modelling Emotions is an Elusive Pursuit in Affective Computing

Affective computing - combining sensor technology, machine learning, and psychology - have been studied for over three decades and is employed in AI-powered technologies to enhance emotional awareness in AI systems, and detect symptoms of mental health disorders such as anxiety and depression. However, the uncertainty in such systems remains high, and the application areas are limited by categorical definitions of emotions and emotional concepts. This paper argues that categorical emotion labels obscure emotional nuance in affective computing, and therefore continuous dimensional definitions are needed to advance the field, increase application usefulness, and lower uncertainties.


[188] 2603.23727

End-to-End Optical Propagation Modeling for Water-to-Air Channels under Sea Surface and UAV Effects

Underwater observatories have recently emerged as an efficient solution for marine biodiversity monitoring. The primary objective of this work is to enable efficient and cost-effective data muling from underwater sensors by investigating the use of optical wireless communications to transmit data from the underwater sensors to an aerial node close to the water surface, such as an unmanned aerial vehicle (UAV). More specifically, we utilize a direct water-to-air (W2A) optical communication link between the sensor node equipped with an LED emitter and the UAV equipped with an ultra-sensitive receiver, i.e., a silicon photo-multiplier. As a main contribution, we develop a comprehensive Monte Carlo-based ray-tracing algorithm to characterize this complex channel. This framework rigorously incorporates the impact of air bubbles modeled through the Mie scattering theory, a realistic sea surface representation derived from the JONSWAP spectrum, and an analytical derivation of the channel loss resulting from UAV instability under wind-induced perturbations. Furthermore, we conduct a comprehensive analysis of the W2A channel, examining the influence of key parameters such as wind speed, transmitter configurations, and receiver characteristics. The end-to-end performance evaluation demonstrates the practical feasibility of the proposed approach, achieving a bit-error rate of $10^{-3}$ at a data rate of 1 Mbps for a transmitter depth of 47 m and wind speeds up to 13 m/s.


[189] 2603.23748

Data-driven online control for real-time optimal economic dispatch and temperature regulation in district heating systems

District heating systems (DHSs) require coordinated economic dispatch and temperature regulation under uncertain operating conditions. Existing DHS operation strategies often rely on disturbance forecasts and nominal models, so their economic and thermal performance may degrade when predictive information or model knowledge is inaccurate. This paper develops a data-driven online control framework for DHS operation by embedding steady-state economic optimality conditions into the temperature dynamics, so that the closed-loop system converges to the economically optimal operating point without relying on disturbance forecasts. Based on this formulation, we develop a Data-Enabled Policy Optimization (DeePO)-based online learning controller and incorporate Adaptive Moment Estimation (ADAM) to improve closed-loop performance. We further establish convergence and performance guarantees for the resulting closed-loop system. Simulations on an industrial-park DHS in Northern China show that the proposed method achieves stable near-optimal operation and strong empirical robustness to both static and time-varying model mismatch under practical disturbance conditions.


[190] 2604.01338

A Comprehensive Test System for Transmission Expansion Planning: Modeling N-1 Contingencies and Multi-Loading Scenarios

This paper presents a high-voltage test system designed specifically for transmission expansion planning (TEP) and explores multiple TEP studies using this test system. The network incorporates long transmission lines, lines are accurately modeled, and line parameters are calculated using the equivalent {\pi} circuit model for long transmission lines to account for the distributed nature of line parameters. The paper provides detailed load flow analyses for both normal and all contingency conditions for three different loading conditions (peak load, dominant load, and light load), demonstrating that the proposed test system offers technically feasible load flow solutions at these loading scenarios. As the real power system is subject to various loading scenarios and should be effectively operable under all conditions, this test system accurately replicates the properties of real power systems. Furthermore, this paper presents multiple TEP cases to supply the load at a new location. TEP cases are conducted with different numbers of transmission line connections, and each case is underscored by its respective maximum capacity satisfying all technical requirements for normal and all single contingencies under three different scenarios. The cost of TEP for each case is calculated and compared in terms of the average cost per MW of power delivered to the new bus.


[191] 2604.06371

Multiobjective optimization-based design and dispatch of islanded, hybrid microgrids for remote, off-grid communities in sub-Saharan Africa

Reliable, affordable electricity remains inaccessible to over 600 million people in sub-Saharan Africa (SSA), where islanded hybrid microgrids combining renewable generation, battery storage, and diesel backup offer a viable electrification pathway. This paper presents a multiobjective, multiperiod optimization framework for the design, sizing, and dispatch of such systems, with a case study for a remote community in Kenya. System sizing is optimized over a one-year horizon and dispatch over a representative day, both at hourly resolution. The formulation jointly minimizes lifecycle levelized cost of energy (LCOE), emissions, lost load, and dumped energy, while maximizing renewable penetration. Seven optimization algorithms are benchmarked; particle swarm optimization (PSO) achieves the best trade-off between runtime (63 s) and solution quality (normalized objective 0.146) and is used for subsequent analyses. The optimal configuration of solar PV, wind, lithium-ion battery storage, and diesel backup achieves a normalized LCOE of 0.46 USD per kWh with over 94 percent renewable penetration, outperforming alternatives. Pareto fronts highlight trade-offs between cost, emissions, and reliability, showing that cost-only optimization yields inferior outcomes. Sensitivity analyses identify fuel prices and discount rates as the most influential parameters in SSA contexts. A break-even distance analysis shows microgrids are economically competitive with grid extension at the study site. The dispatch model produces day-ahead schedules that are robust to short-term uncertainty, though extended wind lulls increase diesel reliance. This work fills a critical gap by providing a comprehensive multiobjective design and dispatch framework tailored to SSA resource, economic, and operational conditions.


[192] 2604.12297

Modular Drive Architecture for Software-defined Vehicles Enabled by Power-packet-based Sensorless Control

The transition toward software-defined vehicles requires standardization and modularization of hardware decoupled from software, along with centralized electrical/electronic architectures. While electrified drive units, such as integrated in-wheel drives, are expected to realize the hardware standardization and unprecedented flexibility in vehicle design, their implementation remains constrained by complex signal wiring between the module and the vehicle body and by control units decentralized across them. This paper proposes a modular drive architecture that achieves complete hardware-software separation by leveraging the power packet dispatching system. We introduce a sensorless control method that estimates motor internal states, specifically winding current and rotor angle, solely from physical quantities measured on the vehicle side. This completely eliminates the need for physical sensors in the drive module, reducing it to a passive actuator governed by the vehicle-side power system via a standardized packet protocol. The proposed architecture significantly reduces wiring complexity and centralizes control logic, advancing fully standardized, plug-and-play platforms for next-generation electrified mobility.


[193] 2604.12527

Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models

Recent advances in reasoning models have driven significant progress in text and multimodal domains, yet audio reasoning remains relatively limited. Only a few Large Audio Language Models (LALMs) incorporate explicit Chain-of-Thought (CoT) reasoning, and their capabilities are often inconsistent and insufficient for complex tasks. To bridge this gap, we introduce Audio-Cogito, a fully open-source solution for deep audio reasoning. We develop Cogito-pipe for high-quality audio reasoning data curation, producing 545k reasoning samples that will be released after review. Based on this dataset, we adopt a self-distillation strategy for model fine-tuning. Experiments on the MMAR benchmark, the only audio benchmark evaluating the CoT process, show that our model achieves the best performance among open-source models and matches or surpasses certain closed-source models in specific metrics. Our approach also ranks among the top-tier systems in the Interspeech 2026 Audio Reasoning Challenge.


[194] 2604.13807

Uplink Single-Snapshot Frugal SLAM in Phase-Coherent Distributed MIMO Systems

We consider uplink frugal simultaneous localization and mapping (SLAM) in phase-coherent distributed MIMO (D-MIMO) systems, where a network of spatially separated single-antenna access points (APs) coherently receives narrowband, single-snapshot pilot signals from a single-antenna user equipment (UE). In contrast to existing phase-coherent localization and SLAM methods that rely on wideband measurements and/or multi-antenna APs, the proposed frugal setting operates with the minimum possible localization resources: a single subcarrier and a single snapshot at each single-antenna AP. In this paper, we formulate phase-coherent frugal SLAM as a coherent imaging problem, constructing a spatial image over a region of interest by treating the distributed AP observations as coming from a large synthetic aperture. Based on the coherent image, we develop a detection and localization framework that jointly identifies the UE, reflective surfaces, and scatterers. Simulation results validate the proposed framework and provide insights into the impact of grid resolution and off-grid error on detection and localization performance.


[195] 2604.13815

Dynamic Heartbeat Modeling with Recurrent Neural Networks and Inverse Gaussian Point Process

Heart rate variability (HRV) analysis is important for the assessment of autonomic cardiovascular regulation. The inverse Gaussian process (IGP) has been widely used for beat-to-beat HRV modeling, as it gives a physiological relevant interpretation of heart depolarization process. A key challenge in IGP-based heartbeat modeling is the accurate estimation of time-varying parameters. In this study, we investigated whether recurrent neural networks (RNNs) can be used for IGP parameter identification and thereby enhance probabilistic modeling of R-R dynamics. Specifically, four representative RNN architectures, namely, GRU, LSTM, Structured State Space sequence model (S4), and Mamba, were evaluated using the Kolmogorov-Smirnov statistics. The results demonstrate the possibility of combining neural sequence models with the IGP framework for beat-wise R-R series modeling. This approach provides a flexible basis for probabilistic HRV modeling and for future incorporation of more complex physiological mechanisms and dynamic conditions.


[196] 2604.16229

Simulating Arbitrage Optimization for Market Monitoring in Gas and Electricity Transmission Networks

We examine market outcomes in energy transport networks with a focus on gas-fired generators, which are producers in a wholesale electricity market and consumers in the natural gas market. Market administrators monitor bids to determine whether a participant wields market power to manipulate the price of energy, reserves, or financial transmission rights. If economic or physical withholding of generation from the market is detected, mitigation is imposed by replacing excessive bids with reference level bids to prevent artificial supply shortages. We review market monitoring processes in the power grid, and present scenarios in small interpretable test networks to show how gas-fired generators can bid in the gas market to alter outcomes in a power market. We develop a framework based on DC optimal power flow (OPF) and steady-state optimal gas flow (OGF) formulations to represent two interacting markets with structured exchange of price and quantity bids. We formulate optimization-based methods to identify market power in a power grid, as well as to identify market conditions that indicate market power being exerted by a generator using gas market bids.


[197] 2308.01802

Multi-Carrier Modulation: An Evolution from Time-Frequency Domain to Delay-Doppler Domain

The recently proposed orthogonal delay-Doppler division multiplexing (ODDM) modulation, which is a delay-Doppler (DD) domain multi-carrier (DDMC) modulation scheme based on the DD domain orthogonal pulse (DDOP), is studied. We first revisit the linear time-varying (LTV) channel model for the wireless channel, and review the conventional multi-carrier (MC) modulation schemes and their design guidelines for both linear time-invariant (LTI) and LTV channels. We then focus on the representation of the LTV channel in an equivalent sampled DD (ESDD) domain, and propose an impulse-function-based transmission strategy for the ESDD channel. Next, we take an in-depth look into the DDOP and show that it achieves orthogonality with respect to the fine time and frequency resolutions in the ESDD domain thus behaves like an impulse function. This allows us to unveil the unique input-output relation of the resultant ODDM modulation over the ESDD channel. We point out that the conventional MC modulation design guidelines based on the Weyl-Heisenberg (WH) frame theory can be relaxed without compromising its orthogonality or violating the WH frame theory. More specifically, for a practical communication system with bandwidth and duration constraints, MC modulation signals can be designed considering so-called local or sufficient (bi)orthogonality, which refers to the (bi)orthogonality among a WH subset for the MC signal within a specific bandwidth and duration. This novel design guideline could potentially open up opportunities for developing future waveforms required by new applications such as communication systems associated with high delay and/or Doppler shifts, as well as integrated sensing and communications.


[198] 2401.10747

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach

Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most existing research assume that all modalities are available during both training and testing, which makes their algorithms susceptible to the missing-modality scenarios. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio features. Moreover, we develop a cross-modality attention mechanism to maximize the information extracted from the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baseline methods and achieve comparable results to the previous methods with complete multi-modality supervision.


[199] 2406.06543

SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification

Deep learning has driven significant technological advancements, but its high energy consumption limits its use on battery-operated edge devices. Spiking Neural Networks (SNNs) offer promising reductions in inference-time energy consumption. However, existing neuromorphic architectures optimize scalable, many-core NoC execution, suited to large models but mismatched to edge devices, and their prevalent integrate-and-fire neurons re-read weights across \(T\) timesteps, inflating data-movement and dynamic-control energy. To address this challenge, we propose SparrowSNN, an optimized end-to-end design tailored for edge applications. SparrowSNN proposes: (1) a hardware-friendly spike activation function SSF (Sum-Spike-and-Fire); (2) a customizable $\mu$W-level-power quantized hybrid ANN-SNN model that can be designed per application; (3) a compact and low-power reconfigurable ASIC architecture, supporting the aforementioned designs. Evaluated on biomedical MIT-BIH ECG and DEAP EEG datasets, SparrowSNN achieves state-of-the-art accuracy with $20\times$ to $100\times$ lower energy consumption, significantly outperforming existing ultra-low power solutions.


[200] 2408.02786

City-Wide Low-Altitude Urban Air Mobility: A Scalable Global Path Planning Approach via Risk-Aware Multi-Scale Cell Decomposition

The realization of Urban Air Mobility (UAM) necessitates scalable global path planning algorithms capable of ensuring safe navigation within complex urban environments. This paper proposes a multi-scale risk-aware cell decomposition method that efficiently partitions city-scale airspace into variable-granularity sectors, assigning each cell an analytically estimated risk value based on obstacle proximity and expected risk. Unlike uniform grid approaches or sampling-based methods, our approach dynamically balances resolution with computational speed by bounding cell risk via Mahalanobis distance projections, eliminating exhaustive field sampling. Comparative experiments against classical A*, Artificial Potential Fields (APF), and Informed RRT* across five diverse urban topologies demonstrate that our method generates safer paths with lower cumulative risk while reducing computation time by orders of magnitude. The proposed framework, Larp Path Planner, is open-sourced and supports any map provider via its modified GeoJSON internal representation, with experiments conducted using OpenStreetMap data to facilitate reproducible research in city-wide aerial navigation.


[201] 2411.17690

Mechanisms of Multimodal Synchronization: Insights from Decoder-Based Video-Text-to-Speech Synthesis

Unified decoder-only transformers have shown promise for multimodal generation, yet the mechanisms by which they synchronize modalities with heterogeneous sampling rates remain underexplored. We investigate these mechanisms through video-text-to-speech (VTTS) synthesis-a controlled task requiring fine-grained temporal alignment between sparse text, video, and continuous speech. Using a unified decoder-only transformer, dubbed Visatronic, trained on VoxCeleb2, we study: (i) how modalities contribute complementary information, (ii) how positional encoding strategies enable synchronization across heterogeneous rates, (iii) how modality ordering shapes the trade-off between in-domain performance and cross-domain transfer, (iv) how phoneme-level synchronization metrics provide diagnostic insight into per-phoneme timing errors. Our findings reveal that both "global sequential indexing'' (unique position IDs across modalities) and "co-temporal ordered indexing'' (identical IDs for temporally corresponding tokens) achieve strong synchronization performance, with co-temporal ordered indexing providing a simple mechanism without explicit timestamp metadata. Both text and video contribute complementary signals: text ensures intelligibility while video provides temporal cues and emotional expressiveness. Modality ordering reveals a consistent trade-off: video-first ordering achieves stronger in-domain performance while text-first ordering generalizes more robustly to unseen domains. Our findings also reveal, that diverse large-scale training enables transferable synchronization strategies. To enable fine-grained analysis, we also introduce TimeSync, a phoneme-level metric that reveals temporal misalignments overlooked by frame-level metrics. These insights establish VTTS as a valuable testbed for understanding temporal synchronization in unified multimodal decoders.


[202] 2506.00955

Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection

Sarcasm fundamentally alters meaning through tone and context, yet detecting it in speech remains a challenge due to data scarcity. In addition, existing detection systems often rely on multimodal data, limiting their applicability in contexts where only speech is available. To address this, we propose an annotation pipeline that leverages large language models (LLMs) to generate a sarcasm dataset. Using a publicly available sarcasm-focused podcast, we employ GPT-4o and LLaMA 3 for initial sarcasm annotations, followed by human verification to resolve disagreements. We validate this approach by comparing annotation quality and detection performance on a publicly available sarcasm dataset using a collaborative gating architecture. Finally, we introduce PodSarc, a large-scale sarcastic speech dataset created through this pipeline. The detection model achieves a 73.63% F1 score, demonstrating the dataset's potential as a benchmark for sarcasm detection research.


[203] 2508.08468

Audio-Visual Speech Enhancement: Architectural Design and Deployment Strategies

Real-time audio-visual speech enhancement (AVSE) is a key enabler for immersive and interactive multimedia services, yet its performance is tightly constrained by network latency, uplink capacity, and computational delay. This paper presents the design, deployment, and evaluation of a complete cloud-edge-assisted AVSE system operating over a public 5G edge network. The system integrates CNN-based acoustic enhancement and OpenCV-based facial feature extraction with an LSTM fusion network to preserve temporal coherence, and is deployed on a Vodafone-compatible AWS Wavelength edge cloud. Through extensive stress testing, we analyze end-to-end performance under varying network load and adaptive multimedia profiles. Results show that compute placement at the network edge is critical for meeting real-time coherence constraints, and that uplink capacity is often the dominant bottleneck for interactive AVSE services. Only 5G and wired Ethernet consistently satisfied the required communication delay bound for uncompressed audio-video chunks, while aggressive compression reduced payload sizes by up to 80% with negligible perceptual degradation, enabling robust operation under constrained conditions. We further demonstrate a fundamental trade-off between processing latency and enhancement quality, where reduced model complexity lowers delay but degrades reconstruction performance in low-SNR scenarios. Our findings indicate that public 5G edge environments can sustain real-time, interactive AVSE workloads when network and compute resources are carefully orchestrated, although performance margins remain tighter than in dedicated infrastructures. The architectural insights derived from this study provide practical guidelines for the design of delay-sensitive multimedia and perceptual enhancement services on emerging 5G edge-cloud platforms.


[204] 2508.18025

Adaptive Quantized Planetary Crater Detection System for Autonomous Space Exploration

Autonomous planetary exploration demands real-time, high-fidelity environmental perception. Standard deep learning models require massive computational resources. Conversely, space-qualified onboard computers operate under strict power, thermal, and memory limits. This disparity creates a severe engineering bottleneck, preventing the deployment of highly capable perception architectures on extraterrestrial exploration platforms. In this foundational concept paper, we propose the theoretical architecture for the Adaptive Quantized Planetary Crater Detection System (AQ-PCDSys) to resolve this bottleneck. We present a mathematical blueprint integrating an INT8 Quantized Neural Network (QNN) designed specifically for Quantization Aware Training (QAT). To address sensor fragility, we mathematically formalize an Adaptive Multi-Sensor Fusion (AMF) module. By deriving the exact integer requantization multiplier required for spatial attention gating, this module actively selects and fuses Optical Imagery (OI) and Digital Elevation Models (DEMs) at the feature level, ensuring reliable perception during extreme cross-illuminations and optical hardware dropouts. Furthermore, the architecture introduces anchor-free, center-to-edge regression heads, protected by a localized FP16 coordinate conversion, to accurately frame asymmetrical lunar craters without catastrophic integer truncation. Rather than presenting physical hardware telemetry, this manuscript establishes the theoretical bounds, structural logic, and mathematical justifications for the architecture. We outline a rigorous Hardware-in-the-Loop (HITL) evaluation protocol to define the exact testing criteria required for future empirical validation, paving the way for next-generation space-mission software design.


[205] 2509.18272

StereoFoley: Object-Aware Stereo Audio Generation from Video

We present StereoFoley, a video-to-audio generation framework that produces semantically aligned, temporally synchronized, and spatially accurate stereo sound at 48 kHz. While recent generative video-to-audio models achieve strong semantic and temporal fidelity, they largely remain limited to mono or fail to deliver object-aware stereo imaging, constrained by the lack of professionally mixed, spatially accurate video-to-audio datasets. First, we develop a base model that generates stereo audio from video, achieving performance on par with state-of-the-art V2A models in both semantic accuracy and synchronization. Next, to overcome dataset limitations, we introduce a synthetic data generation pipeline that combines video analysis, object tracking, and audio synthesis with dynamic panning and distance-based loudness controls, enabling spatially accurate object-aware sound. Finally, we fine-tune the base model on this synthetic dataset, yielding clear object-audio correspondence. Since no established metrics exist, we introduce a stereo object-awareness metric and report it alongside a human listening study; the two evaluations exhibit consistent trends. This work establishes the first end-to-end framework for stereo object-aware video-to-audio generation, addressing a critical gap in the field.


[206] 2510.08878

ControlAudio: Tackling Text-Guided, Timing-Indicated and Intelligible Audio Generation via Progressive Diffusion Modeling

Text-to-audio (TTA) generation with fine-grained control signals, e.g., precise timing control or intelligible speech content, has been explored in recent works. However, constrained by data scarcity, their generation performance at scale is still compromised. In this study, we recast controllable TTA generation as a multi-task learning problem and introduce a progressive diffusion modeling approach, ControlAudio. Our method adeptly fits distributions conditioned on more fine-grained information, including text, timing, and phoneme features, through a step-by-step strategy. First, we propose a data construction method spanning both annotation and simulation, augmenting condition information in the sequence of text, timing, and phoneme. Second, at the model training stage, we pretrain a diffusion transformer (DiT) on large-scale text-audio pairs, achieving scalable TTA generation, and then incrementally integrate the timing and phoneme features with unified semantic representations, expanding controllability. Finally, at the inference stage, we propose progressively guided generation, which sequentially emphasizes more fine-grained information, aligning inherently with the coarse-to-fine sampling nature of DiT. Extensive experiments show that ControlAudio achieves state-of-the-art performance in terms of temporal accuracy and speech clarity, significantly outperforming existing methods on both objective and subjective evaluations. Demo samples are available at: this https URL.


[207] 2510.14858

Non-Diffracting Beams for Near-Field Millimeter-Wave Communications: Advantage Regimes Under Aperture and Blockage Constraints

Near-field blockage changes the beam-design objective in millimeter-wave links: maximizing the unblocked on-axis gain does not necessarily maximize blocked-link performance. This paper studies when phase-only, aperture-constrained non-diffracting (ND) beams provide a blocked-link advantage over equal-aperture, equal-power conventional reference beams. We develop a unified annular-spectrum framework that generates isotropic Bessel-like and anisotropic Mathieu-like beams under discrete phased-array constraints, and a geometry-aware analysis centered on three propagation landmarks: the peak-intensity distance, the crossover distance, and an effective post-blockage recovery distance. Their relationship yields a recovery-before-crossover condition linking blockage size, depth, cone angle, and usable ND range, and motivates a blocked-link gain ratio that maps directly onto an achievable-rate gap at every operating SNR. The analysis also explains why anisotropic Mathieu-like beams can outperform isotropic ones under direction-dependent blockage. Monte Carlo simulations verify the predicted advantage regimes, an auxiliary comparison against a near-field focusing baseline confirms that the advantage persists against an unblocked-optimal array, and sensitivity studies over cone-angle choice and partial-transmission blockers show that the opaque-screen picture is a conservative reading of the underlying physics. The results identify Bessel-like and Mathieu-like beams as practical candidates for blockage-resilient near-field communications.


[208] 2510.16756

End-to-end Listen, Look, Speak and Act

Human interaction is inherently multimodal and full-duplex: we listen while watching, speak while acting, and fluidly adapt to turn-taking and interruptions. Realizing these capabilities is essential for building models simulating humans. We present ELLSA (End-to-end Listen, Look, Speak and Act), which, to our knowledge, is the first full-duplex, end-to-end model that simultaneously perceives and generates across vision, text, speech, and action within a single architecture, enabling interaction patterns previously out of reach, yielding more natural, human-like behaviors. At its core is a novel SA-MoE architecture (Self-Attention Mixture-of-Experts) that routes each modality to specialized experts and fuses them through a unified attention backbone. This provides a generalizable solution for joint multimodal perception and concurrent generation, leveraging strong pre-trained components while enabling efficient modality integration and mitigating modality interference. On speech-interaction and robot-manipulation benchmarks, ELLSA matches modality-specific baselines, while uniquely supporting advanced multimodal and full-duplex behaviors such as dialogue and action turn-taking, defective instruction rejection, speaking-while-acting, context-grounded visual question answering, and action barge-ins. We contend that ELLSA represents a step toward more natural and general interactive intelligence, contributing to the broader pursuit of artificial general intelligence. All data, code and model checkpoints will be released at this https URL.


[209] 2510.23969

emg2speech: Synthesizing speech from electromyography using self-supervised speech models

We present a neuromuscular speech interface that translates electromyographic (EMG) signals recorded from orofacial muscles during speech articulation directly into audio. We find that self-supervised speech (S3) representations are strongly linearly related to the electrical power of muscle activity: a simple linear mapping predicts EMG power from S3 representations with a correlation of r = 0.85. In addition, EMG power vectors associated with distinct articulatory gestures form structured, separable clusters. Together, these observations suggest that S3 models implicitly encode articulatory mechanisms, as reflected in EMG activity. Leveraging this structure, we map EMG signals into the S3 representation space and synthesize speech, enabling end-to-end EMG-to-speech generation without explicit articulatory modeling or vocoder training. We demonstrate this system with a participant with amyotrophic lateral sclerosis (ALS), converting orofacial EMG recorded while she silently articulated speech into audio.


[210] 2511.20853

MODEST: Multi-Optics Depth-of-Field Stereo Dataset

Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472$\times$3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.


[211] 2512.10906

Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets

We consider a class of finite-horizon, linear-quadratic stochastic control problems, where the probability distribution governing the noise process is unknown but assumed to belong to an ambiguity set consisting of all distributions whose mean and covariance lie within norm balls centered at given nominal values. To cope with this ambiguity, we design causal affine control policies to minimize the worst-case expected regret over all distributions in the ambiguity set. The resulting minimax optimal control problem is shown to admit an equivalent reformulation as a tractable convex program, which can be interpreted as a regularized version of the nominal linear-quadratic stochastic control problem. Based on the dual of this convex reformulation, we develop a scalable projected subgradient method for computing optimal controllers to arbitrary accuracy. Numerical experiments are provided to compare the proposed method with state-of-the-art data-driven control design methods.


[212] 2512.17890

Spectro-temporal unitary transformations for coherent modulation: design trade-offs and practical considerations

This paper analyzes the performance of spectro-temporal unitary transforms for coherent optical modulation. Unlike conventional IQ modulation, such transforms are based on a cascade of phase modulators and dispersive elements, so are theoretically lossless and not limited by the bandwidth of the constituent modulators. We analyse the performance limits and design trade-offs of this scheme: estimating how the number of stages, amount of dispersion, modulator bandwidth, symbol block length and electrical signal power impacts the achievable signal-to-distortion ratio (SDR). Importantly, we show that high (>30 dB) SDRs suitable for modern >200 GBd class coherent optical communications are achievable with a low (<6) number of stages and reasonable parameters for driver power, modulator bandwidth and on-chip dispersion. Finally we address the SDR penalties associated with potential phase, amplitude, or dispersion errors, and limited DAC resolution.


[213] 2512.20249

Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion

Multimodal brain decoding aims to reconstruct semantic information that is consistent with visual stimuli from brain activity signals such as fMRI, and then generate readable natural language descriptions. However, multimodal brain decoding still faces key challenges in cross-subject generalization and interpretability. We propose a BrainROI model and achieve leading-level results in brain-captioning evaluation on the NSD dataset. Under the cross-subject setting, compared with recent state-of-the-art methods and representative baselines, metrics such as BLEU-4 and CIDEr show clear improvements. Firstly, to address the heterogeneity of functional brain topology across subjects, we design a new fMRI encoder. We use multi-atlas soft functional parcellations (soft-ROI) as a shared space. We extend the discrete ROI Concatenation strategy in MINDLLM to a voxel-wise gated fusion mechanism (Voxel-gate). We also ensure consistent ROI mapping through global label alignment, which enhances cross-subject transferability. Secondly, to overcome the limitations of manual and black-box prompting methods in stability and transparency, we introduce an interpretable prompt optimization process. In a small-sample closed loop, we use a locally deployed Qwen model to iteratively generate and select human-readable prompts. This process improves the stability of prompt design and preserves an auditable optimization trajectory. Finally, we impose parameterized decoding constraints during inference to further improve the stability and quality of the generated descriptions.


[214] 2601.05543

Closing the Modality Reasoning Gap for Speech Large Language Models

Although Speech Large Language Models have achieved notable progress, a substantial modality reasoning gap remains: their reasoning performance on speech inputs is markedly weaker than on text. This gap could be associated with representational drift across Transformer layers and behavior deviations in long-chain reasoning. To address this issue, we introduce TARS, a reinforcement-learning framework that aligns text-conditioned and speech-conditioned trajectories through an asymmetric reward design. The framework employs two dense and complementary signals: representation alignment, which measures layer-wise hidden-state similarity between speech- and text-conditioned trajectories, and behavior alignment, which evaluates semantic consistency between generated outputs and reference text completions. Experiments on challenging reasoning benchmarks, including MMSU and OBQA, show that our approach significantly narrows the modality reasoning gap and achieves state-of-the-art performance among 7B-scale Speech LLMs.


[215] 2601.11929

Indoor Occupancy Classification using a Compact Hybrid Quantum-Classical Model Enabled by a Physics-Informed Radar Digital Twin

Indoor occupancy classification enables privacy-preserving monitoring in settings such as remote elder care, where presence information helps triage alarms without cameras or wearables. Radar suits this role by sensing motion through occlusions and in darkness. Modern deep-learning pipelines are the standard for interpreting radar returns effectively; however, they are often parameter-heavy and sensitive at low signal-to-noise ratios (SNR), motivating compact alternatives like Hybrid Quantum Neural Networks (HQNNs). A two-qubit HQNN is benchmarked against convolutional neural networks (CNNs) using a physics-informed 60GHz digital twin and real radar measurements under matched training protocols. In clean conditions, the HQNN achieves high accuracy (99.7% synthetic; 97.0% real) with up to 170x fewer parameters (0.066M). Its parameter efficiency is shown to be structural, as an ablation of the parameterized quantum circuit (PQC) causes sharp performance drops on real data (to 68.5% and 31.5% for the control heads). A domain-dependent sensitivity emerges under additive-noise evaluation, where the HQNN begins recovery earlier in synthetic data while CNNs recover more steeply and peak higher on real measurements. In label-fraction ablations, CNNs prove more sample-efficient on real Range-Doppler Maps (RDMs), with the performance gap being most pronounced (at 50% labels, BA 0.89-0.99 vs. HQNN 0.75). On synthetic data, this gap narrows significantly, largely vanishing by the 50% label mark. Overall, the HQNN's value lies in parameter efficiency and a compact inductive bias that shapes its distinct sensitivity profile; this work establishes a rigorous baseline for hybrid quantum models in privacy-preserving radar occupancy sensing.


[216] 2601.14075

Utilizing the Perceived Age to Maximize Freshness in Query-Based Update Systems

Query-based sampling has become an increasingly popular technique for monitoring Markov sources in pull-based update systems. However, most of the contemporary literature on this assumes an exponential distribution for query delay and often relies on the assumption that the feedback or replies to the queries are instantaneous. In this work, we relax both of these assumptions and find optimal sampling policies for monitoring continuous-time Markov chains (CTMC) under generic delay distributions. In particular, we show that one can obtain significant gains in terms of mean binary freshness (MBF) by employing a waiting based strategy for query-based sampling.


[217] 2601.17262

Unsupervised segmentation and clustering workflow for efficient processing of 4D-STEM and 5D-STEM data

Four-dimensional scanning transmission electron microscopy (4D-STEM) enables mapping of diffraction information with nanometer-scale spatial resolution, offering detailed insight into local structure, orientation, and strain. However, as data dimensionality and sampling density increase, particularly for in situ scanning diffraction experiments (5D-STEM), robust segmentation of structurally consistent behavior across sequential measurements becomes essential for efficient and physically meaningful analysis. Here, we introduce a clustering framework that identifies crystallographically distinct domains from 4D-STEM datasets. By using local diffraction-pattern similarity as a metric, the method extracts closed contours delineating spatially contiguous regions. This approach produces cluster-averaged diffraction patterns that improve signal quality while reducing data volume by orders of magnitude, enabling rapid and accurate orientation, phase, and strain mapping. We demonstrate the applicability of this approach to in situ liquid-cell 4D-STEM data of gold nanoparticle growth. Our method provides a scalable and generalizable route for spatially coherent segmentation, data compression, and quantitative structure-strain mapping across diverse 4D-STEM modalities. The full analysis code and example workflows are publicly available to support reproducibility and reuse.


[218] 2601.20867

Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

Prompt tuning has achieved remarkable progress in vision-language models (VLMs) and is recently being adopted for audio-language models (ALMs). However, its generalization ability in ALMs remains largely underexplored. We observe that conventional prompt tuning for ALMs also suffers from the Base-New Tradeoff, and we identify that this issue stems from the disrupted semantic structure of the embedding space. To address this issue, we propose Semantically Expanded Prompt Tuning (SEPT)-a plug-and-play framework that explicitly regularizes the prompt embedding space by incorporating semantic neighbors generated by large language models. SEPT introduces a novel semantic expansion loss with margin constraints that promote intra-class compactness and inter-class separability, thereby enhancing the semantic structure of the prompt embedding space. For comprehensive evaluation, we establish the first benchmark setup for prompt generalization in ALMs, covering both base-to-new generalization and cross-dataset transferability. Extensive experiments demonstrate that SEPT consistently improves generalization performance across multiple prompt tuning baselines, while maintaining computational cost during inference.


[219] 2602.03986

eCP: Equivariant Conformal Prediction with pre-trained models

Conformal prediction, a post-hoc, distribution-free, finite-sample method of uncertainty quantification that offers formal coverage guarantees under the assumption of data exchangeability. Unfortunately, the resulting uncertainty regions can grow significantly in long horizon missions, rendering the statistical guarantees uninformative. To that end, we propose infusing CP with geometric information via group-averaging of the pretrained predictor to distribute the non-conformity mass across the orbits. Each sample now is treated as a representative of an orbit, thus uncertainty can be mitigated by other samples entangled to it via the orbit inducing elements of the symmetry group. Our approach provably yields contracted non-conformity scores in increasing convex order, implying improved exponential-tail bounds and sharper conformal prediction sets in expectation, especially at high confidence levels. We then propose an experimental design to test these theoretical claims in pedestrian trajectory prediction.


[220] 2603.02378

Authenticated Contradictions from Desynchronized Provenance and Watermarking

Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.


[221] 2603.22924

Positive Observers Revisited

The paper shows that positive linear systems can be stabilized using positive Luenberger-type observers. This is achieved by structuring the observer as monotonically converging upper and lower bounds on the state. Analysis of the closed-loop properties under linear observer feedback gives conditions that cover a larger class than previous observer designs. The results are applied to nonpositive systems by enforcing positivity of the dynamics using feedback from the upper bound observer. The setting is expanded to include stochastic noise, giving conditions for convergence in expectation using feedback from positive observers.


[222] 2603.26475

Foundation Model for Cardiac Time Series via Masked Latent Attention

Electrocardiograms (ECGs) are among the most widely available clinical signals and play a central role in cardiovascular diagnosis. While recent foundation models (FMs) have shown promise for learning transferable ECG representations, most existing pretraining approaches treat leads as independent channels and fail to explicitly leverage their strong structural redundancy. We introduce the latent attention masked autoencoder (LAMAE) FM that directly exploits this structure by learning cross-lead connection mechanisms during self-supervised pretraining. Our approach models higher-order interactions across leads through latent attention, enabling permutation-invariant aggregation and adaptive weighting of lead-specific representations. We provide empirical evidence on the Mimic-IV-ECG database that leveraging the cross-lead connection constitutes an effective form of structural supervision, improving representation quality and transferability. Our method shows strong performance in predicting ICD-10 codes, outperforming independent-lead masked modeling and alignment-based baselines.


[223] 2604.01897

FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

Recent advances in AudioLLMs have enabled spoken dialogue systems to move beyond turn-based interaction toward real-time full-duplex communication, where the agent must decide when to speak, yield, or interrupt while the user is still talking. Existing full-duplex approaches either rely on voice activity cues, which lack semantic understanding, or on ASR-based modules, which introduce latency and degrade under overlapping speech and noise. Moreover, available datasets rarely capture realistic interaction dynamics, limiting evaluation and deployment. To mitigate the problem, we propose \textbf{FastTurn}, a unified framework for low-latency and robust turn detection. To advance latency while maintaining performance, FastTurn combines streaming CTC decoding with acoustic features, enabling early decisions from partial observations while preserving semantic cues. We also release a test set based on real human dialogue, capturing authentic turn transitions, overlapping speech, backchannels, pauses, pitch variation, and environmental noise. Experiments show FastTurn achieves higher decision accuracy with lower interruption latency than representative baselines and remains robust under challenging acoustic conditions, demonstrating its effectiveness for practical full-duplex dialogue systems.


[224] 2604.02846

Adaptive Local Frequency Filtering for Fourier-Encoded Implicit Neural Representations

Fourier-encoded implicit neural representations (INRs) have shown strong capability in modeling continuous signals from discrete samples. However, conventional Fourier feature mappings use a fixed set of frequencies over the entire spatial domain, making them poorly suited to signals with spatially varying local spectra and often leading to slow convergence of high-frequency details. To address this issue, we propose an adaptive local frequency filtering method for Fourier-encoded INRs. The proposed method introduces a spatially varying parameter $\alpha(\mathbf{x})$ to modulate encoded Fourier components, enabling a smooth transition among low-pass, band-pass, and high-pass behaviors at different spatial locations. We further analyze the effect of the proposed filter from the neural tangent kernel (NTK) perspective and provide an NTK-inspired interpretation of how it reshapes the effective kernel spectrum. Experiments on 2D image fitting, 3D shape representation, and sparse data reconstruction demonstrate that the proposed method consistently improves reconstruction quality and leads to faster optimization compared with fixed-frequency baselines. In addition, the learned $\alpha(\mathbf{x})$ provides an intuitive visualization of spatially varying frequency preferences, which helps explain the behavior of the model on non-stationary signals. These results indicate that adaptive local frequency modulation is a practical enhancement for Fourier-encoded INRs.


[225] 2604.03634

Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations

We establish that temporal averaging over multiple observations is the degenerate case of algebraic group action with the trivial group $G=\{e\}$. A General Replacement Theorem proves that a group-averaged estimator from one snapshot achieves equivalent subspace decomposition to multi-snapshot covariance estimation. The Trivial Group Embedding Theorem proves that the sample covariance is the accumulation of trivial-group estimates, with variance governed by a $(G,L)$ continuum as $1/(|G|\cdot L)$. The processing gain $10\log_{10}(M)$ dB equals the classical beamforming gain, establishing that this gain is a property of group order, not sensor count. The DFT, DCT, and KLT are unified as group-matched special cases. We conjecture a General Algebraic Averaging Theorem extending these results to arbitrary statistics, with variance governed by the effective group order $d_{\mathrm{eff}}$. Monte Carlo experiments on the first four sample moments across five group types confirm the conjecture to four-digit precision. The framework exploits the $structure$ of information (representation-theoretic symmetry of the data object) rather than the content, complementing Shannon's theory. Five applications are demonstrated: single-snapshot MUSIC, massive MIMO with 64% throughput gain, single-pulse waveform classification at 90% accuracy, graph signal processing with non-abelian groups, and algebraic analysis of transformer LLMs.


[226] 2604.03728

Carbon-Driven Incentive Mechanism for Renewable Power-to-Ammonia Production in Coupled Carbon and Ammonia Markets

Renewable power-to-ammonia (ReP2A) production offers a promising pathway to decarbonize the power, transport and, chemical sectors, yet its competitiveness remains limited by high costs and fragmented carbon-policy frameworks. In particular, a unified mechanism that links ReP2A producers with fossil-based gray ammonia (GA) competitors in carbon and ammonia markets, while coordinating incentives among renewable generation, hydrogen production, and ammonia synthesis stakeholders in the ReP2A process chain, is still lacking. To address this gap, this paper proposes a hierarchical carbon-driven incentive mechanism (PCIM) that integrates carbon policy with multi-energy market interactions. A two-layer trading framework is developed, where ReP2A and GA compete in carbon allowance (CA) and ammonia markets (outer layer), while electricity and hydrogen transactions coordinate the ReP2A chain (inner layer). The resulting interactions are modeled as a hierarchical equilibrium, where the inner layer is reformulated as a tractable equivalent optimization problem, and the outer layer is solved as a mixed-integer linear program (MILP) derived from Karush-Kuhn-Tucker conditions. Based on equilibrium analysis, the carbon-related revenue of ReP2A is quantified, and a CA allocation mechanism (PCAM) is proposed to ensure individually rationality among stakeholders. Results show that the proposed mechanism reduces carbon emissions by 12.9% with only a 1.8% decrease in sector-wide revenue. Moreover, carbon pricing under the proposed framework redistributes profits between green and gray ammonia without reducing total welfare, and the PCAM further enhances stakeholders' willingness to participate in ReP2A production.


[227] 2604.11807

Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems

The stable operation of off-grid photovoltaic systems requires accurate, computationally efficient solar forecasting. Contemporary deep learning models often suffer from massive computational overhead and physical blindness, generating impossible predictions. This paper introduces the Physics-Informed State Space Model (PISSM) to bridge the gap between efficiency and physical accuracy for edge-deployed microcontrollers. PISSM utilizes a dynamic Hankel matrix embedding to filter stochastic sensor noise by transforming raw meteorological sequences into a robust state space. A Linear State Space Model replaces heavy attention mechanisms, efficiently modeling temporal dependencies for parallel processing. Crucially, a novel Physics-Informed Gating mechanism leverages the Solar Zenith Angle and Clearness Index to structurally bound outputs, ensuring predictions strictly obey diurnal cycles and preventing nocturnal errors. Validated on a multi-year dataset for Omdurman, Sudan, PISSM achieves superior accuracy with fewer than 40,000 parameters, establishing an ultra-lightweight benchmark for real-time off-grid control.


[228] 2604.11909

Thermodynamic Liquid Manifold Networks: Physics-Bounded Deep Learning for Solar Forecasting in Autonomous Off-Grid Microgrids

The stable operation of autonomous off-grid photovoltaic systems requires solar forecasting algorithms that respect atmospheric thermodynamics. Contemporary deep learning models consistently exhibit critical anomalies, primarily severe temporal phase lags during cloud transients and physically impossible nocturnal power generation. To resolve this divergence between data-driven modeling and deterministic celestial mechanics, this research introduces the Thermodynamic Liquid Manifold Network. The methodology projects 22 meteorological and geometric variables into a Koopman-linearized Riemannian manifold to systematically map complex climatic dynamics. The architecture integrates a Spectral Calibration unit and a multiplicative Thermodynamic Alpha-Gate. This system synthesizes real-time atmospheric opacity with theoretical clear-sky boundary models, structurally enforcing strict celestial geometry compliance. This completely neutralizes phantom nocturnal generation while maintaining zero-lag synchronization during rapid weather shifts. Validated against a rigorous five-year testing horizon in a severe semi-arid climate, the framework achieves an RMSE of 18.31 Wh/m2 and a Pearson correlation of 0.988. The model strictly maintains a zero-magnitude nocturnal error across all 1826 testing days and exhibits a sub-30-minute phase response during high-frequency optical transients. Comprising exactly 63,458 trainable parameters, this ultra-lightweight design establishes a robust, thermodynamically consistent standard for edge-deployable microgrid controllers.


[229] 2604.13366

Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics

Accurate modeling of robot dynamics is essential for model-based control, yet remains challenging under distributional shifts and real-time constraints. In this work, we formulate system identification as an in-context meta-learning problem and compare deterministic and generative sequence models for forward dynamics prediction. We take a Transformer-based meta-model, as a strong deterministic baseline, and introduce to this setting two complementary diffusion-based approaches: (i) inpainting diffusion (Diffuser), which learns the joint input-observation distribution, and (ii) conditioned diffusion models (CNN and Transformer), which generate future observations conditioned on control inputs. Through large-scale randomized simulations, we analyze performance across in-distribution and out-of-distribution regimes, as well as computational trade-offs relevant for control. We show that diffusion models significantly improve robustness under distribution shift, with inpainting diffusion achieving the best performance in our experiments. Finally, we demonstrate that warm-started sampling enables diffusion models to operate within real-time constraints, making them viable for control applications. These results highlight generative meta-models as a promising direction for robust system identification in robotics.


[230] 2604.13455

Outperforming Self-Attention Mechanisms in Solar Irradiance Forecasting via Physics-Guided Neural Networks

Accurate Global Horizontal Irradiance (GHI) forecasting is critical for grid stability, particularly in arid regions characterized by rapid aerosol fluctuations. While recent trends favor computationally expensive Transformer-based architectures, this paper challenges the prevailing "complexity-first" paradigm. We propose a lightweight, Physics-Informed Hybrid CNN-BiLSTM framework that prioritizes domain knowledge over architectural depth. The model integrates a Convolutional Neural Network (CNN) for spatial feature extraction with a Bi-Directional LSTM for capturing temporal dependencies. Unlike standard data-driven approaches, our model is explicitly guided by a vector of 15 engineered features including Clear-Sky indices and Solar Zenith Angle - rather than relying solely on raw historical data. Hyperparameters are rigorously tuned using Bayesian Optimization to ensure global optimality. Experimental validation using NASA POWER data in Sudan demonstrates that our physics-guided approach achieves a Root Mean Square Error (RMSE) of 19.53 W/m^2, significantly outperforming complex attention-based baselines (RMSE 30.64 W/m^2). These results confirm a "Complexity Paradox": in high-noise meteorological tasks, explicit physical constraints offer a more efficient and accurate alternative to self-attention mechanisms. The findings advocate for a shift towards hybrid, physics-aware AI for real-time renewable energy management.


[231] 2604.13459

Asymmetric-Loss-Guided Hybrid CNN-BiLSTM-Attention Model for Industrial RUL Prediction with Interpretable Failure Heatmaps

Turbofan engine degradation under sustained operational stress necessitates robust prognostic systems capable of accurately estimating the Remaining Useful Life (RUL) of critical components. Existing deep learning approaches frequently fail to simultaneously capture multi-sensor spatial correlations and long-range temporal dependencies, while standard symmetric loss functions inadequately penalize the safety-critical error of over-estimating residual life. This study proposes a hybrid architecture integrating Twin-Stage One-Dimensional Convolutional Neural Networks (1D-CNN), a Bidirectional Long Short-Term Memory (BiLSTM) network, and a custom Bahdanau Additive Attention mechanism. The model was trained and evaluated on the NASA Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) FD001 sub-dataset employing a zero-leakage preprocessing pipeline, piecewise-linear RUL labeling capped at 130 cycles, and the NASA-specified asymmetric exponential loss function that disproportionately penalizes over-estimation to enforce industrial safety constraints. Experiments on 100 test engines achieved a Root Mean Squared Error (RMSE) of 17.52 cycles and a NASA S-Score of 922.06. Furthermore, extracted attention weight heatmaps provide interpretable, per-engine insights into the temporal progression of degradation, supporting informed maintenance decision-making. The proposed framework demonstrates competitive performance against established baselines and offers a principled approach to safe, interpretable prognostics in industrial settings.


[232] 2604.14548

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

As speech language models (SLMs) transition from personal devices into shared, multi-user environments, their responses must account for far more than the words alone. Who is speaking, how they sound, and where the conversation takes place can each turn an otherwise benign request into one that is unsafe, unfair, or privacy-violating. Existing benchmarks, however, largely focus on basic audio comprehension, study individual risks in isolation, or conflate content that is inherently harmful with content that only becomes problematic due to its acoustic context. We introduce VoxSafeBench, among the first benchmarks to jointly evaluate social alignment in SLMs across three dimensions: safety, fairness, and privacy. VoxSafeBench adopts a Two-Tier design: Tier1 evaluates content-centric risks using matched text and audio inputs, while Tier2 targets audio-conditioned risks in which the transcript is benign but the appropriate response hinges on the speaker, paralinguistic cues, or the surrounding environment. To validate Tier2, we include intermediate perception probes and confirm that frontier SLMs can successfully detect these acoustic cues yet still fail to act on them appropriately. Across 22 tasks with bilingual coverage, we find that safeguards appearing robust on text often degrade in speech: safety awareness drops for speaker- and scene-conditioned risks, fairness erodes when demographic differences are conveyed vocally, and privacy protections falter when contextual cues arrive acoustically. Together, these results expose a pervasive speech grounding gap: current SLMs frequently recognize the relevant social norm in text but fail to apply it when the decisive cue must be grounded in speech. Code and data are publicly available at: this https URL


[233] 2604.14654

ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning

In bandwidth-constrained communication such as satellite and underwater channels, speech must often be transmitted at ultra-low bitrates where intelligibility is the primary objective. At such extreme compression levels, codecs trained with acoustic reconstruction losses tend to allocate bits to perceptual detail, leading to substantial degradation in word error rate (WER). This paper proposes ClariCodec, a neural speech codec operating at 200 bit per second (bps) that reformulates quantisation as a stochastic policy, enabling reinforcement learning (RL)-based optimisation of intelligibility. Specifically, the encoder is fine-tuned using WER-driven rewards while the acoustic reconstruction pipeline remains frozen. Even without RL, ClariCodec achieves 3.68% WER on the LibriSpeech test-clean set at 200 bps, already competitive with codecs operating at higher bitrates. Further RL fine-tuning reduces WER to 3.20% on test-clean and 8.93% on test-other, corresponding to a 13% relative reduction while preserving perceptual quality.