New articles on Electrical Engineering and Systems Science


[1] 2603.10138

Data-Driven Successive Linearization for Optimal Voltage Control

Power distribution systems are increasingly exposed to large voltage fluctuations driven by intermittent solar photovoltaic generation and rapidly varying loads (e.g., electric vehicles and storage). To address this challenge, a number of advanced controllers have been proposed for voltage regulation. However, these controllers typically rely on fixed linear approximations of voltage dynamics. As a result, the solutions may become infeasible when applied to the actual voltage behavior governed by nonlinear power flow equations, particularly under heavy power injection from distributed energy resources. This paper proposes a data-driven successive linearization approach for voltage control under nonlinear power flow constraints. By leveraging the fact that the deviation between the nonlinear power flow solution and its linearization is bounded by the distance from the operating point, we perform data-driven linearization around the most recent operating point. Convergence of the proposed method to a neighborhood of KKT points is established by exploiting the convexity of the objective function and the structural properties of the nonlinear constraints. Case studies show that the proposed approach achieves fast convergence and adapts quickly to changes in net load.


[2] 2603.10175

Calibration-Reasoning Framework for Descriptive Speech Quality Assessment

Explainable speech quality assessment requires moving beyond Mean Opinion Scores (MOS) to analyze underlying perceptual dimensions. To address this, we introduce a novel post-training method that tailors the foundational Audio Large Language Model for multidimensional reasoning, detection and classification of audio artifacts. First, a calibration stage aligns the model to predict predefined perceptual dimensions. Second, a reinforcement learning stage leverages Group Relative Policy Optimization (GRPO) with dimension-specific rewards to heavily enhance accuracy of descriptions and temporal localization of quality issues. With this approach we reach state-of-the-art results of 0.71 mean PCC score on the multidimensional QualiSpeech benchmark and 13% improvement in MOS prediction driven by RL-based reasoning. Furthermore, our fine-grained GRPO rewards substantially advance the model's ability to pinpoint and classify audio artifacts in time.


[3] 2603.10188

ARCHE: Autoregressive Residual Compression with Hyperprior and Excitation

Recent progress in learning-based image compression has demonstrated that end-to-end optimization can substantially outperform traditional codecs by jointly learning compact latent representations and probabilistic entropy models. However, many existing approaches achieve high rate-distortion efficiency at the expense of increased computational cost and limited parallelism. This paper presents ARCHE - Autoregressive Residual Compression with Hyperprior and Excitation, an end-to-end learned image compression framework that balances modeling accuracy and computational efficiency. The proposed architecture unifies hierarchical, spatial, and channel-based priors within a single probabilistic framework, capturing both global and local dependencies in the latent representation of the image, while employing adaptive feature recalibration and residual refinement to enhance latent representation quality. Without relying on recurrent or transformer-based components, ARCHE attains state-of-the-art rate-distortion efficiency: it reduces the BD-Rate by approximately 48% relative to the commonly used benchmark model of Balle et al., 30% relative to the channel-wise autoregressive model of Minnen & Singh and 5% against the VVC Intra codec on the Kodak benchmark dataset. The framework maintains computational efficiency with 95M parameters and 222ms running time per image. Visual comparisons confirm sharper textures and improved color fidelity, particularly at lower bit rates, demonstrating that accurate entropy modeling can be achieved through efficient convolutional designs suitable for practical deployment.


[4] 2603.10222

In-Situ Timing Diagnosis of PDN and Configuration-Upset-Induced Routing Delay Degradation in SRAM-based FPGAs

Timing degradation in SRAM-based FPGAs arises from multiple physical mechanisms that manifest differently in the routing fabric, most notably power-distribution-network (PDN) marginality and configuration-induced routing perturbations. Existing in-situ timing monitors provide limited insight into the physical origin, spatial structure, or statistical characteristics of the degradation. This paper presents a scalable in-situ timing diagnosis architecture that enables fine-grained, routing-aware characterization of timing behavior directly within the FPGA fabric during normal operation. The proposed approach combines non-intrusive delay taps placed at routing switch-matrix boundaries with distributed phase-swept delay monitoring elements and centralized statistical analysis. By extracting probabilistic delay distributions rather than binary timing margins, the framework captures both mean delay shifts and timing variability across spatially distributed routing locations. Experimental results obtained on a modern SRAM-based FPGA show that PDN-induced timing degradation produces globally correlated delay shifts with minimal change in variance, whereas routing-induced perturbations exhibit localized, topology-dependent delay growth and increased timing dispersion. Spatial correlation analysis and two-dimensional correlation heatmaps further reveal distinct signatures that enable systematic differentiation between these mechanisms. The presented architecture operates concurrently with an active user design and does not require external instrumentation, radiation sources, or design modification. These results establish a practical foundation for in-situ timing diagnosis, reliability assessment, and architecture-aware timing management in large FPGA-based systems.


[5] 2603.10245

Over-the-Air Consensus-based Formation Control of Heterogeneous Agents: Communication-Rate and Geometry-Aware Convergence Guarantees

This paper investigates the formation control problem of heterogeneous, autonomous agents that communicate over a wireless multiple access channel. Instead of avoiding interference through orthogonal node-to-node transmissions, we exploit the superposition property of the wireless channel to compute, at each receiver, normalized convex combinations of simultaneously broadcast neighbor signals. At every communication instant, agents update their reference positions from these aggregates, and track the references in continuous time between updates. The only assumption on the agent dynamics is that each agent tracks constant reference positions exponentially, which accommodates a broad class of platforms. Under this assumption, we analyze the resulting jump-flow system under time-varying communication graphs and unknown channel coefficients. We derive a communication-rate based sufficient condition that guarantees convergence to a prescribed formation. We then provide a geometry-aware refinement showing how favorable tracking transients can relax the required condition. Simulations with unicycle agents illustrate the theoretical results and demonstrate a substantial reduction in the number of required orthogonal transmissions compared to interference-avoiding node-to-node communication protocols.


[6] 2603.10262

High-Fidelity Digital Twin Dataset Generation for Inverter-Based Microgrids Under Multi-Scenario Disturbances

Public power-system datasets often lack electromagnetic transient (EMT) waveforms, inverter control dynamics, and diverse disturbance coverage, which limits their usefulness for training surrogate models and studying cyber-physical behavior in inverter-based microgrids. This paper presents a high-fidelity digital twin dataset generated from a MATLAB/Simulink EMT model of a low-voltage AC microgrid with ten inverter-based distributed generators. The dataset records synchronized three-phase PCC voltages and currents, per-DG active power, reactive power, and frequency, together with embedded scenario labels, producing 38 aligned channels sampled at $\Delta t = 2~\mu$s over $T = 1$~s ($N = 500{,}001$ samples) per scenario. Eleven operating and disturbance scenarios are included: normal operation, load step, voltage sag (temporary three-phase fault), load ramp, frequency ramp, DG trip, tie-line trip, reactive power step, single-line-to-ground faults, measurement noise injection, and communication delay. To ensure numerical stability without altering sequence length, invalid samples (NaN, Inf, and extreme outliers) are repaired using linear interpolation. Each scenario is further validated using system-level evidence from mean frequency, PCC voltage magnitude, total active power, voltage unbalance, and zero-sequence current to confirm physical observability and correct timing. The resulting dataset provides a consistent, labeled EMT benchmark for surrogate modeling, disturbance classification, robustness testing under noise and delay, and cyber-physical resilience analysis in inverter-dominated microgrids. The dataset and processing scripts will be released upon acceptance


[7] 2603.10275

Optimal Control Synthesis of Closed-Loop Recommendation Systems over Social Networks

This paper addresses the problem of designing recommendation systems for social networks and e-commerce platforms from a control-theoretic perspective. We treat the design of recommendation systems as a state-feedback infinite-horizon optimal control problem with a performance index that (i) rewards alignment and engagement, (ii) penalizes polarization and large deviations from an uncontrolled baseline, and (iii) regularizes exposure across neighboring users. The recommendation entries are fed to the platform users, who are assumed to follow a networked, multi-topic, continuous-time opinion dynamics. We show that the designed control yields a stabilizing recommendation system under simple algebraic spectral conditions on the weights that encode the platform's preference for engagement, stability of preferences, polarization, and cross-user diversity. Conversely, we show that when ill-posed weights are selected in the optimal control problem (namely, when engagement is excessively rewarded), the closed-loop system can exhibit destabilizing, pathological behaviors that conflict with the design objectives.


[8] 2603.10292

Inverse Learning-Based Output Feedback Control of Nonlinear Systems with Verifiable Guarantees

In this paper, we present a data-driven output feedback controller for nonlinear systems that achieves practical output regulation, using noise-free input/output measurement data. The proposed controller is based on (i) an inverse model of the system identified via kernel interpolation, which maps a desired output and the current state to the corresponding desired control input; and (ii) a data-driven reference selection framework that actively chooses a suitable desired output from the dataset which has been used for the identification. We establish a verifiable sufficient condition on the dataset under which the proposed controller guarantees practical output regulation. Numerical simulations demonstrate the effectiveness of the proposed controller, with additional evaluations in the presence of output measurement noise to assess its robustness empirically.


[9] 2603.10294

Simulation-in-the-Reasoning (SiR): A Conceptual Framework for Empirically Grounded AI in Autonomous Transportation

Large Language Models (LLMs) have advanced reasoning through techniques like Chain-of-Thought (CoT). However, their reasoning largely re-mains textual and hypothetical, lacking empirical grounding in complex, dynamic domains like transportation. This paper introduces Simulation-in-the-Reasoning (SiR), a novel conceptual framework that embeds domain-specific simulators directly into the LLM reasoning loop. By treating intermediate reasoning steps as executable simulation experiments, SiR transforms LLM reasoning from narrative plausibility into a falsifiable, hypothesis-simulate-analyze workflow. We discuss applications, where LLM can formulate Intelligent Transport System (ITS) strategy hypotheses, invoke a traffic simulator via the Model Context Protocol (MCP), evaluate results under different demand patterns, and refine strategies through verification and aggregation. While implementing the framework is part of our ongoing work, this paper primarily establishes the conceptual foundation, discusses design considerations like API granularity, and outlines the vision of SiR as a cornerstone for interactive transportation digital twins. We argue that SiR represents a critical step towards trustworthy, empirically-validated AI for autonomous transportation systems.


[10] 2603.10343

Multi-Modal Intelligent Channel Modeling: From Fine-tuned LLMs to Pre-trained Foundation Models

To meet the evolving demands of sixth-generation (6G) wireless channel modeling, such as precise prediction capability, extension capabilities, and system participation capability, multi-modal intelligent channel modeling (MMICM) has been proposed based on Synesthesia of Machines (SoM) which explores the mapping relationship between multi-modal sensing in physical environment and channel characteristics in electromagnetic space. Furthermore, for integrating heterogeneous sensing, reasoning across scales, and generalizing to complex air-space-ground-sea communication environments, two new paradigms of MMICM are explored, including fine-tuned large language models (LLMs) for Channel Modeling (LLM4CM) and Wireless Channel Foundation Model (WiCo). LLM4CM leverages pre-trained LLMs on channel representations for cross-modal alignment and lightweight adaptation, enabling flexible channel modeling for 6G multi-band and multi-scenario communication systems. WiCo, which pre-trained on physically valid channel realizations and their associated environmental and modal observations, embeds electromagnetic equations for physical interpretability and uses parameterized adapters for scalability. This article details the architectures and features of LLM4CM and WiCo, laying a foundation for artificial intelligence (AI)-native 6G wireless communication systems. Then, we conducts a comparative analysis of the two emerging paradigms, focusing on their distinct characteristics, relative advantages, inherent limitations, and performance attributes. Finally, we discuss the future research directions.


[11] 2603.10362

UAV-Based 3D Spectrum Sensing: Insights on Altitude, Bandwidth, Trajectory, and Effective Antenna Patterns on REM Reconstruction

Spectrum sensing and the generation of 3D Radio Environment Maps (REMs) are essential for enabling spectrum sharing within cognitive radio networks. While Uncrewed Aerial Vehicles (UAVs) offer high-mobility 3D sensing, REM accuracy is challenged by dynamic flight behaviors, where fluctuations in UAV speed and direction introduce measurement inconsistencies. Furthermore, the structural influence of the airframe itself impacts the onboard antenna's radiation characteristics. In this paper, we present a comprehensive analysis of REM reconstruction at various altitudes, using real-world data from a fixed base station tower and a ground-vehicle source. We evaluate diverse reconstruction methodologies, including Kriging (simple, ordinary, and trans-Gaussian), matrix completion, and Gaussian process regression (GPR) for recovery from sparse samples. Our results indicate that simple Kriging and GPR remain more robust under extreme sample sparsity. We also propose a framework to enhance reconstruction accuracy in deep-shadowed regions by decomposing the REM into distinct smooth and deep-shadowed spatial components. We further investigate how REM reconstruction performance is influenced by physical and UAV-related external parameters. First, we demonstrate that the impact of UAV altitude on accuracy follows a tri-phasic trend: an initial performance gain up to $h_1$, a performance dip between $h_1$ and $h_2$, and a final stage of increasing accuracy. Additionally, we show that performance improves with increased spectrum bandwidth. Second, our analysis of UAV trajectories reveals that the variance of shadow fading exhibits a non-monotonic trend, peaking at both very low and mid-high elevation angles. Finally, we demonstrate that antenna pattern calibration from in-field measurements significantly enhances REM reconstruction accuracy by accounting for shadowing induced by the UAV airframe.


[12] 2603.10371

Speech Codec Probing from Semantic and Phonetic Perspectives

Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. These tokenizers are expected to preserve both semantic and acoustic information for downstream understanding and generation. However, emerging evidence suggests that what is termed "semantic" in speech representations does not align with text-derived semantics: a mismatch that can degrade multimodal LLM performance. In this paper, we systematically analyze the information encoded by several widely used speech tokenizers, disentangling their semantic and phonetic content through word-level probing tasks, layerwise representation analysis, and cross-modal alignment metrics such as CKA. Our results show that current tokenizers primarily capture phonetic rather than lexical-semantic structure, and we derive practical implications for the design of next-generation speech tokenization methods.


[13] 2603.10383

Optimal Movable Antenna Placement for Near-Field Wireless Sensing

Movable antennas (MAs) have emerged as a promising technology for wireless sensing by reconfiguring antenna positions to exploit additional spatial degrees of freedom (DoFs). This paper investigates a robust movable antenna placement strategy for near-field wireless sensing to minimize the worst-case squared position error bound (SPEB). By temporarily relaxing the minimum inter-element spacing constraint, we first establish the optimality of centro-symmetric antenna position distribution, which simplifies the identification of the worst-case source, locating it at the array broadside on the Rayleigh boundary. Moreover, by leveraging moment-based analysis with the Richter-Tchakaloff theorem, we derive a closed-form optimal solution with three points supported on the center and two edges of the array. Guided by this structural insight, we finally develop an efficient three-point discrete deployment strategy to ensure the minimum inter-element spacing. Simulations demonstrate that the proposed design consistently outperforms conventional fixed antenna arrays and matches the exhaustive search benchmark at negligible computational complexity.


[14] 2603.10420

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

We present FireRedASR2S, a state-of-the-art industrial-grade all-in-one automatic speech recognition (ASR) system. It integrates four modules in a unified pipeline: ASR, Voice Activity Detection (VAD), Spoken Language Identification (LID), and Punctuation Prediction (Punc). All modules achieve SOTA performance on the evaluated benchmarks: FireRedASR2: An ASR module with two variants, FireRedASR2-LLM (8B+ parameters) and FireRedASR2-AED (1B+ parameters), supporting speech and singing transcription for Mandarin, Chinese dialects and accents, English, and code-switching. Compared to FireRedASR, FireRedASR2 delivers improved recognition accuracy and broader dialect and accent coverage. FireRedASR2-LLM achieves 2.89% average CER on 4 public Mandarin benchmarks and 11.55% on 19 public Chinese dialects and accents benchmarks, outperforming competitive baselines including Doubao-ASR, Qwen3-ASR, and Fun-ASR. FireRedVAD: An ultra-lightweight module (0.6M parameters) based on the Deep Feedforward Sequential Memory Network (DFSMN), supporting streaming VAD, non-streaming VAD, and multi-label VAD (mVAD). On the FLEURS-VAD-102 benchmark, it achieves 97.57% frame-level F1 and 99.60% AUC-ROC, outperforming Silero-VAD, TEN-VAD, FunASR-VAD, and WebRTC-VAD. FireRedLID: An Encoder-Decoder LID module supporting 100+ languages and 20+ Chinese dialects and accents. On FLEURS (82 languages), it achieves 97.18% utterance-level accuracy, outperforming Whisper and SpeechBrain. FireRedPunc: A BERT-style punctuation prediction module for Chinese and English. On multi-domain benchmarks, it achieves 78.90% average F1, outperforming FunASR-Punc (62.77%). To advance research in speech processing, we release model weights and code at this https URL.


[15] 2603.10421

Spyglass: Directional Spectrum Sensing with Single-shot AoA Estimation and Virtual Arrays

In this paper, we introduce Spyglass, a spectrum sensor designed to address the challenges of effective spectrum usage in dense wireless environments. Spyglass is capable of observing a frequency band and accurately estimating the Angle of Arrival (AoA) of any signal during a single transmission. This includes additional signal context such as center frequency, bandwidth, and I/Q samples. We overcome challenges such as the clutter of fleeting transmissions in common bands, the high cost of array processing for AoA estimation, and the difficulty of detecting and estimating channels for unknown signals. Our first contribution is the development of Searchlite, a protocol-agnostic signal detection and separation algorithm. We use a switched array to reduce cost and processing complexity, and we develop SSFP, a signal processing technique using Fourier transforms that is synchronized to switching boundaries. Spyglass performs multi-channel blind AoA estimation synchronized with the array. Implemented using commercially available hardware, Spyglass demonstrates a median AoA accuracy of 1.4$^\circ$ and the ability to separate simultaneous signals from multiple devices in an unconstrained RF environment, providing valuable tools for large-scale RF data collection and analysis.


[16] 2603.10443

3D Spectrum Awareness for Radio Dynamic Zones Using Kriging and Matrix Completion

Radio Dynamic Zones (RDZs) are geographically defined areas specifically allocated for testing new wireless technologies. It is essential to safeguard the regular spectrum users outside the zones from the interference caused by the deployed equipment within this zone. Previous works have utilized sparse reference signal received power (RSRP) measurements collected by unmanned aerial vehicles (UAVs) to construct a dense 3D radio map through ordinary Kriging. In this work, we illustrate that matrix completion can outperform ordinary Kriging. We partitioned a 2D area of interest into small square grids where each grid corresponds to a single entry of a matrix. The matrix completion algorithm learns the global structure of the radio environment map by leveraging the low-rank property of propagation maps. Additionally, we illustrate that the simple Kriging and trans-Gaussian Kriging yield better results when the density of known measurements is lower. Earlier works of RSRP prediction involved a training dataset at a single altitude. In this work, we also show that performance can be improved by utilizing a combined dataset from multiple altitudes.


[17] 2603.10468

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

We study timestamped speaker-attributed ASR for long-form, multi-party speech with overlap, where chunk-wise inference must preserve meeting-level speaker identity consistency while producing time-stamped, speaker-labeled transcripts. Previous Speech-LLM systems tend to prioritize either local diarization or global labeling, but often lack the ability to capture fine-grained temporal boundaries or robust cross-chunk identity linking. We propose G-STAR, an end-to-end system that couples a time-aware speaker-tracking module with a Speech-LLM transcription backbone. The tracker provides structured speaker cues with temporal grounding, and the LLM generates attributed text conditioned on these cues. G-STAR supports both component-wise optimization and joint end-to-end training, enabling flexible learning under heterogeneous supervision and domain shift. Experiments analyze cue fusion, local versus long-context trade-offs and hierarchical objectives.


[18] 2603.10515

A Harmony Composition-Inspired Tensor Modalization Method for Near-Field IRS Channel Estimation

Intelligent reflecting surfaces (IRSs) are poised to revolutionize next-generation wireless communication systems by enhancing channel quality and spectrum efficiency through advanced wave manipulation. However, extremely large-scale IRS {(XL-IRS)} deployments face significant challenges in channel estimation due to multiplicative path loss and near-field (NF) effects, where spherical wavefronts couple distance and angle parameters. Existing polar-domain codebook-based compressive sensing methods for NF channel estimation suffer from low accuracy and high complexity, caused by the need for high-resolution grids of both distance and angle parameters. To address this, we propose a harmonic processing-inspired channel estimation framework for NF {XL-IRS} systems by leveraging tensor modalization to decouple channel parameters. Drawing an analogy to musical harmonic analysis, our approach decomposes the high-dimensional NF channel tensor into independent factor matrices, modeled as ``chords," representing distance and angle parameters. Through harmonic analysis-inspired distance parameter decoupling, we design a compact, distance-dependent codebook that enables high-resolution NF channel parameter estimation. This approach significantly reduces the codebook size compared to polar-domain methods. {Then, we} derive the Cramér-Rao lower bound (CRLB) to evaluate the estimators. Finally, simulation results show an 8.5 dB improvement in normalized mean square error (NMSE) compared to conventional methods, underscoring its low complexity and high accuracy.


[19] 2603.10557

Suppressing Acoustomigration and Temperature Rise for High-power Robust Acoustics

High-frequency acoustic wave transducers, vibrating at gigahertz (GHz), favored for their compact size, are not only dominating the front-end of mobile handsets but are also expanding into various interdisciplinary fields, including quantum acoustics, acoustic-optics, acoustic-fluids, acoustoelectric, and sustainable power conversion systems. However, like strong vibration can "shake off" substances and produce heat, a long-standing bottleneck has been the ability to harness acoustics under high-power vibration loads, while simultaneously suppressing temperature rise, especially for IDT-based surface acoustic wave (SAW) systems. Here, we proposed a layered acoustic wave (LAW) platform, utilizing a quasi-infinite multifunctional top layer, that redefines mechanical and thermal boundary conditions to overcome three fundamental challenges in high-power acoustic wave vibration: self-heating, thermal instability, and acoustomigration. By simply leveraging a simplified, thick single-material overlayer to achieve electro-thermo-mechanical co-design, this acoustic platform moves beyond prior substrate-focused thermal management in SAW technology. It demonstrates, for the first time from the top boundary, simultaneous redistribution of the von Mises stress field and the creation of an efficient vertical thermal dissipation path. The LAW transducer, vibrating at over 2 GHz, achieves a 70% reduction in temperature rise under identical power loads, a first-order temperature coefficient of frequency (TCF) of -13 ppm/C with minimal dispersion, and an unprecedented threshold power density of 45.61 dBm/mm2 - over one order-of-magnitude higher than that of state-of-the-art thin-film surface acoustic wave (TF-SAW) counterparts at the same wavelength.


[20] 2603.10585

Path Planning for Sound Speed Profile Estimation

Accurate estimation of the sound speed profile (SSP) is essential for underwater acoustic communication, sonar performance, and navigation, as the acoustic wave propagation depends strongly on the SSP. This work considers SSP estimation in a region of interest using an autonomous underwater vehicle (AUV) equipped with a conductivity-temperature-depth (CTD) sensor and an acoustic receiver measuring transmission loss (TL) from a sonar transmitter. The SSP is modeled using a linear basis-function expansion and is sequentially estimated with an unscented Kalman filter that fuses local CTD measurements with TL measurements. A receding-horizon path planning scheme is also employed to select future AUV positions by minimizing the predicted total sound speed variance. Simulations using the Bellhop acoustic wave propagation solver show that CTD measurements provide accurate local SSP estimates, whereas TL measurements are seen to capture the global characteristics of the SSP, with their joint use improving the reconstruction of both local variations and large-scale SSP behavior. The results also indicate that the proposed path planning strategy reduces the estimation uncertainty compared to constant-velocity motion, thereby enabling improved environmental characterization for underwater acoustic systems.


[21] 2603.10623

Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context

Environmental sound understanding in computational auditory scene analysis (CASA) is often formulated as an audio-only recognition problem. This formulation leaves a persistent drawback in multi-label audio tagging (AT): acoustic similarity can make certain events difficult to separate from waveforms alone. In such cases, disambiguating cues often lie outside the waveform. Geospatial semantic context (GSC), derived from geographic information system data, e.g., points of interest (POI), provides location-tied environmental priors that can help reduce this ambiguity. A systematic study of this direction is enabled through the proposed geospatial audio tagging (Geo-AT) task, which conditions multi-label sound event tagging on GSC alongside audio. To benchmark Geo-AT, Geo-ATBench is introduced as a polyphonic audio benchmark with geographical annotations, containing 10.71 hours of audio across 28 event categories; each clip is paired with a GSC representation from 11 semantic context categories. GeoFusion-AT is proposed as a unified geo-audio fusion framework that evaluates feature-, representation-, and decision-level fusion on representative audio backbones, with audio- and GSC-only baselines. Results show that incorporating GSC improves AT performance, especially on acoustically confounded labels, indicating geospatial semantics provide effective priors beyond audio alone. A crowdsourced listening study with 10 participants on 579 samples shows that there is no significant difference in performance between models on Geo-ATBench labels and aggregated human labels, supporting Geo-ATBench as a human-aligned benchmark. The Geo-AT task, benchmark Geo-ATBench, and reproducible geo-audio fusion framework GeoFusion-AT provide a foundation for studying AT with geospatial semantic context within the CASA community. Dataset, code, models are on homepage (this https URL).


[22] 2603.10629

Flexible Multi-Target Angular Emulation for Over-the-Air Testing of Large-Scale ISAC Base Stations: Principle and Experimental Verification

Over-the-air (OTA) emulation of diverse sensing target characteristics in a controlled laboratory environment is pivotal for advancing integrated sensing and communication (ISAC) technology, as it facilitates the non-invasive performance evaluation of ISAC base stations (BSs) across complex scenarios. In this work, a flexible multi-target OTA emulation framework based on a wireless cable method is proposed to evaluate the sensing performance of large-scale ISAC BSs. The core concept leverages an amplitude and phase modulation (APM) network to simultaneously establish wireless cables and simulate target spatial characteristics without consuming additional resources on costly radar target emulators. For the wireless cable method, the condition number increases as the number of antennas scales up, which affects the performance of the wireless cable. Although the wireless cable concept has been established for devices-under-test (DUTs) with a limited number of antenna ports, establishing wireless cables for large-scale DUTs remains an open question in the community. We address this problem by optimizing the OTA probe array configuration based on the theoretical properties of strictly diagonally dominant matrices. Experimental results validate the proposed framework, demonstrating high-isolation wireless cables for a 32-element DUT and an extremely low condition number for a 128-element synthetic array. Furthermore, the OTA emulation of a dynamic dual-drone scenario confirms the method's effectiveness and practicality in reproducing complex sensing environments.


[23] 2603.10635

Propagation and Rate-Aware Cell Switching Optimization in HAPS-Assisted Wireless Networks

Cell switching is a promising approach for improving energy efficiency in wireless networks; however, existing studies largely rely on simplified models and energy-centric formulations that overlook key performance-limiting factors. This paper revisits the cell switching concept by redefining its modeling assumptions and mathematical formulation, explicitly incorporating realistic propagation effects such as building entry loss (BEL) and atmospheric losses relevant to non-terrestrial networks (NTN), particularly high-altitude platform station (HAPS). Beyond proposing a new cell switching strategy, the conventional energy-focused problem is reformulated as a multi-objective optimization framework that jointly minimizes power consumption, unconnected users, and data rate degradation. Through this reformulation, the proposed methods ensure that energy-efficient operation is achieved without compromising user connectivity and data rate performance, thereby inherently supporting sustainability objectives for sixth-generation (6G) networks. To solve this reformulated problem, two complementary approaches are employed: the weighted sum method (WSM), which enables flexible and adaptive weighting mechanism, and the {{\epsilon}-constraint-inspired method ({\epsilon}CM), which converts connectivity and rate-related objectives into constraints within the conventional energy-focused problem. Moreover, unlike prior work relying only on simulations, this study combines system-level simulations with Sionna-OpenAirInterface (OAI) based emulation on a smaller network to validate the proposed cell switching concept under realistic conditions. The results show that, compared to the conventional approach, WSM reduces rate degradation for up to 70% for high-loss indoor users and eliminates the 44% drop for low-loss indoor users.


[24] 2603.10656

Distributed State Estimation of Discrete-Time LTI Systems via Jordan Canonical Representation

In this paper, we address the problem of distributed state estimation for a discrete-time, linear time-invariant system. Building on the framework proposed in [2], we exploit the Jordan canonical form of the system matrix to develop a distributed estimation scheme that ensures the asymptotic convergence of the local state estimates to the true system state. The proposed approach relies on the idea that each node reconstructs the components of the system state that are detectable for it through a local Luenberger observer, while employing a consensus-based strategy to estimate the undetectable components. Necessary and sufficient conditions for the existence of a distributed observer that guarantees asymptotic estimation accuracy are derived. Compared with the previous work [2], the proposed design offers greater flexibility in the selection of the coupling gains and leads to a less restrictive set of conditions for solvability.


[25] 2603.10723

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment

The Mean Opinion Score (MOS) serves as the standard metric for speech quality assessment, yet biases in human annotations remain underexplored. We conduct the first systematic analysis of gender bias in MOS, revealing that male listeners consistently assign higher scores than female listeners--a gap that is most pronounced in low-quality speech and gradually diminishes as quality improves. This quality-dependent structure proves difficult to eliminate through simple calibration. We further demonstrate that automated MOS models trained on aggregated labels exhibit predictions skewed toward male standards of perception. To address this, we propose a gender-aware model that learns gender-specific scoring patterns through abstracting binary group embeddings, thereby improving overall and gender-specific prediction accuracy. This study establishes that gender bias in MOS constitutes a systematic, learnable pattern demanding attention in equitable speech evaluation.


[26] 2603.10743

Scaling and Trade-offs in Multi-agent Autonomous Systems

Designing autonomous drone swarms is hampered by a vast design space spanning platform, algorithmic, and numerical-strength choices. We perform large-scale agent-based simulations in three canonical scenarios: swarm-on-swarm battle, cooperative area search with attrition, and pursuit of scattering targets. We demonstrate that dimensional-analysis and data-scaling, established techniques in physical sciences, can be leveraged to collapse performance data onto scaling functions that are mathematically simple, yet counterintuitive and therefore difficult to predict a priori. These scaling laws reveal success-failure boundaries, including sharp break points. Additionally, we show how this technique can be used to quantify trade-offs between agent count and platform parameters such as velocity, sensing or weapon range, and attrition rate. Furthermore, we show the benefits of embedding an optimal path planning loop within this framework, which can qualitatively improve the scaling laws that govern the outcome. The methods we demonstrate are highly flexible and would enable rapid, budget-aware sizing and algorithm selection for large autonomous swarms.


[27] 2603.10779

A Control-Theoretic Foundation for Agentic Systems

This paper develops a control-theoretic framework for analyzing agentic systems embedded within feedback control loops. In such systems, an AI agent may adapt controller parameters, select among control strategies, invoke tools, reconfigure decision architectures, or modify control objectives during operation. We formalize these capabilities by interpreting agency as hierarchical decision authority over the control architecture. A unified dynamical representation is introduced that incorporates memory, learning, tool activation, interaction signals, and goal descriptors within a single closed-loop structure. Based on this representation, we define a five-level hierarchy of agency ranging from reactive rule-based control to the synthesis of control objectives and controller architectures. The framework is presented in both nonlinear and linear settings, allowing agentic behaviors to be interpreted using standard control-theoretic constructs such as feedback gains, switching signals, parameter adaptation laws, and quadratic cost functions. The analysis shows that increasing agency introduces dynamical mechanisms including time-varying adaptation, endogenous switching, decision-induced delays, and structural reconfiguration of the control pipeline. This perspective provides a mathematical foundation for analyzing stability, safety, and performance of AI-enabled control systems.


[28] 2603.10791

Semantic Satellite Communications for Synchronized Audiovisual Reconstruction

Satellite communications face severe bottlenecks in supporting high-fidelity synchronized audiovisual services, as conventional schemes struggle with cross-modal coherence under fluctuating channel conditions, limited bandwidth, and long propagation delays. To address these limitations, this paper proposes an adaptive multimodal semantic transmission system tailored for satellite scenarios, aiming for high-quality synchronized audiovisual reconstruction under bandwidth constraints. Unlike static schemes with fixed modal priorities, our framework features a dual-stream generative architecture that flexibly switches between video-driven audio generation and audio-driven video generation. This allows the system to dynamically decouple semantics, transmitting only the most important modality while employing cross-modal generation to recover the other. To balance reconstruction quality and transmission overhead, a dynamic keyframe update mechanism adaptively maintains the shared knowledge base according to wireless scenarios and user requirements. Furthermore, a large language model based decision module is introduced to enhance system adaptability. By integrating satellite-specific knowledge, this module jointly considers task requirements and channel factors such as weather-induced fading to proactively adjust transmission paths and generation workflows. Simulation results demonstrate that the proposed system significantly reduces bandwidth consumption while achieving high-fidelity audiovisual synchronization, improving transmission efficiency and robustness in challenging satellite scenarios.


[29] 2603.10836

Distributed Safety Critical Control among Uncontrollable Agents using Reconstructed Control Barrier Functions

This paper investigates the distributed safety critical control for multi-agent systems (MASs) in the presence of uncontrollable agents with uncertain behaviors. To ensure system safety, the control barrier function (CBF) is employed in this paper. However, a key challenge is that the CBF constraints are coupled when MASs perform collaborative tasks, which depend on information from multiple agents and impede the design of a fully distributed safe control scheme. To overcome this, a novel reconstructed CBF approach is proposed. In this method, the coupled CBF is reconstructed by leveraging state estimates of other agents obtained from a distributed adaptive observer. Furthermore, a prescribed performance adaptive parameter is designed to modify this reconstruction, ensuring that satisfying the reconstructed CBF constraint is sufficient to meet the original coupled one. Based on the reconstructed CBF, we design a safety-critical quadratic programming (QP) controller and prove that the proposed distributed control scheme rigorously guarantees the safety of the MAS, even in the uncertain dynamic environments involving uncontrollable agents. The effectiveness of the proposed method is illustrated through simulations.


[30] 2603.10845

Human Presence Detection via Wi-Fi Range-Filtered Doppler Spectrum on Commodity Laptops

Human Presence Detection (HPD) is key to enable intelligent power management and security features in everyday devices. In this paper we propose the first HPD solution that leverages monostatic Wi-Fi sensing and detects user position using only the built-in Wi-Fi hardware of a device, with no need for external devices, access points, or additional sensors. In contrast, existing HPD solutions for laptops require external dedicated sensors which add cost and complexity, or rely on camera-based approaches that introduce significant privacy concerns. We herewith introduce the Range-Filtered Doppler Spectrum (RF-DS), a novel Wi-Fi sensing technique for presence estimation that enables both range-selective and temporally windowed detection of user presence. By applying targeted range-area filtering in the Channel Impulse Response (CIR) domain before Doppler analysis, our method focuses processing on task-relevant spatial zones, significantly reducing computational complexity. In addition, the use of temporal windows in the spectrum domain provides greater estimator stability compared to conventional 2D Range-Doppler detectors. Furthermore, we propose an adaptive multi-rate processing framework that dynamically adjusts Channel State Information (CSI) sampling rates-operating at low frame rates (10Hz) during idle periods and high rates (100Hz) only when motion is detected. To our knowledge, this is the first low-complexity solution for occupancy detection using monostatic Wi-Fi sensing on a built-in Wi-Fi network interface controller (NIC) of a commercial off-the-shelf laptop that requires no external network infrastructure or specialized sensors. Our solution can scale across different environments and devices without calibration or retraining.


[31] 2603.10880

The potential and viability of V2G for California BEV drivers

Vehicle-to-Grid (V2G) adoption is hindered by uncertainties regarding its effects on battery lifetime and vehicle usability. These uncertainties are compounded by limited insight into real-world vehicle usage. Here, we leverage real-world Californian BEV usage data to design and evaluate a user-centric V2G strategy. We identified four clustered driver profiles for V2G assessment, ranging from "Daily Chargers" to "Public Chargers". We show that V2G participation is most feasible for "Daily Chargers," and that the effects on battery lifetime depend on calendar aging sensitivity. For batteries with low sensitivity, V2G participation increases capacity loss for all drivers. However, for batteries with high sensitivity, V2G participation can lead to negligible changes in capacity or even improved capacity retention, particularly for drivers who tend to keep their batteries at high states of charge. Our findings enable stakeholders to better assess the potential and viability of V2G adoption.


[32] 2603.10901

Phase Selection and Analysis for Multi-frequency Multi-user RIS Systems Employing Subsurfaces

In this paper, we analyse the performance of a reconfigurable intelligent surface (RIS) aided system where the RIS is divided into subsurfaces. Each subsurface is designed specifically for one user, who is served on their own frequency band. The other subsurfaces (those not designed for this user) provide additional uncontrolled scattering. A new subsurface RIS design is developed based on the optimal single-user design for a pure line-of-sight (LoS) base station (BS) to RIS channel. This is also extended to arbitrary BS-RIS channels. For our method, exact closed form solutions for the mean SNR and a mean rate upper bound are derived for the BS-RIS LoS scenario. For each user, the designed subsurface performs optimally in LoS conditions and is remarkably robust to non-LoS conditions. The system design drives down complexity to extremely low levels, reducing RIS design and receiver processing complexity and reducing the channel estimation requirements. We also quantify the complexity-performance trade-off for the new design relative to multi-user approaches.


[33] 2603.10906

Towards Polynomial Immersion of Port-Hamiltonian Systems

Port-Hamiltonian (pH) systems offer a highly structured and energy-based modular framework for control systems. Many pH systems exhibit non-polynomial non-linearities. We consider the problem of immersing such systems into a higher-dimensional polynomial representation. We prove that, along system trajectories, important features of the non-polynomial pH system are preserved such as the internal interconnection geometry, the energy balance relation with passivity supply rate, as well as energy dissipation. We illustrate how the lifted system enables the design of stabilizing feedback laws by combining sum-of-squares optimization with concepts from passivity-based control. We draw upon several examples to illustrate our findings.


[34] 2603.10909

Level Crossing Rate Analysis for Optimal Single-user RIS Systems

We analyse the level crossing rate (LCR) of an uplink single-user (SU) reconfigurable intelligent surface (RIS) aided system. It is assumed that the RIS to base station (RIS-BS) channel is deployed as line-of-sight (LoS), and the user (UE)-RIS and UE-BS channels are correlated Rayleigh. For the optimal RIS reflection matrix, we derive a novel and exact analytical LCR expression for when the direct (UE-BS) channel is blocked, i.e. the RIS-only channel. Also, the existing exact expression for the direct-only channel (equivalent to classical maximal-ratio-combining (MRC)) suffers from extreme numerical precision problems when the BS has many elements. Therefore, we propose a new stable and accurate approximation to the LCR of the direct channel. The approximation is based on replacing any small similar eigenvalues of the channel correlation matrix by their average. We show that increasing the number of elements at the RIS or BS and decreasing channel correlation makes the LCR drop more rapidly for thresholds away from the mean SNR. Crucially, we find that RIS systems do not significantly amplify temporal variations in the channel. This is particularly beneficial for RIS systems considering the difficulty in acquiring channel state information (CSI).


[35] 2603.10947

Regularizing INR with diffusion prior self-supervised 3D reconstruction of neutron computed tomography data

Recently, generative diffusion priors have made huge strides as inverse problem solvers, including the ability to be adapted for inference on out-of-distribution data. Concurrently, implicit neural representations (INRs) have emerged as fast and lightweight inverse imaging solvers that are amenable to hybrid approaches that combine learned priors with traditional inverse problem formulations. In this paper, we present a diffusive computed tomography (CT) inversion framework for regularizing INRs called Diffusive INR (DINR), designed to enable high-quality reconstruction from sparse-view neutron CT. Pretrained purely on synthetic data, DINR is evaluated on simulated and experimentally obtained observations of concrete microstructures, where traditional reconstruction methods suffer substantial degradation when the number of views is reduced. Our approach delivers superior performance, reduces reconstruction artifacts, and achieves gains in PSNR and SSIM, enabling accurate micro-structural characterization even under extreme data limitations compared to state-of-the-art sparse-view reconstruction techniques.


[36] 2603.10958

Distortion Is Not Noise: On the Limits of the Kappa Model for Monostatic ISAC

Monostatic ISAC sensing differs from communication because the transmitter can monitor its distorted transmit waveform. Thus, the aggregate $\kappa$ distortion model, which treats impairments as unknown noise, is appropriate for communication but pessimistic for monostatic sensing. We derive PA-aware sensing Cramér--Rao bounds (CRBs) and a PN-aware CRB that reveals an irreducible velocity-error floor, and quantify when $\kappa$-based bounds overestimate sensing degradation. Simulations validate the analysis and show robustness to practical DPD template errors (less than 1~dB overhead at a typical $-25$~dB NMSE).


[37] 2603.11030

Exploiting Spatial Modulation for Strong PhaseNoise Mitigation in mmWave Massive MIMO

This letter investigates phase noise (PN) mitigation in generalized receiver spatial modulation (GRSM) massive MIMO systems at mmWave under a common local oscillator (CLO). Under CLO, the received energy remains invariant relative to the no-PN scenario, enabling reliable energy-based spatial detection using the no-PN threshold. PN-sensitivity and geometry-based metrics are introduced to design compact, PN-resilient MQAM symbol pools with low detection complexity. PN robustness is further improved through an enhanced PN-aware GRSM-MQAM system that exploits spatial modulation (SM) to recover part of the MQAM bits and strategically maps spatial-pattern Hamming weights to reduce the effective PN impact. In addition, a practical single-stage PN estimation/compensation architecture is proposed, while a benchmark double-stage compensation is adopted to quantify the upper bound achievable via separate Tx/Rx PN mitigation. Results show that under PN, the overall BER is mainly dominated by MQAM symbol detection errors, especially for denser constellations, whereas spatial detection remains robust. The proposed single-stage compensation improves PN resilience, while the benchmark double-stage compensation approaches near PN-free performance.


[38] 2603.10065

The Epistemic Support-Point Filter: Jaynesian Maximum Entropy Meets Popperian Falsification

The Epistemic Support-Point Filter (ESPF) was designed around a single epistemological commitment: be quick to embrace ignorance and slow to assert certainty. This paper proves that this commitment has a precise mathematical form and that the ESPF is the unique optimal filter implementing it within the class of epistemically admissible evidence-only filters. The ESPF synthesizes two complementary principles acting at different phases of the recursion. In propagation, it enacts Jaynesian maximum entropy: the support spreads as widely as the dynamics allow, assuming maximal ignorance consistent with known constraints. In the measurement update, it enacts Popperian falsification: hypotheses are eliminated by evidence alone. Any rule incorporating prior possibility is strictly suboptimal and risks race-to-bottom bias. The optimality criterion is possibilistic minimax entropy: among all evidence-only selection rules, minimum-q selection minimizes log det(MVEE), the worst-case possibilistic entropy. Three lemmas establish the result: the Possibilistic Entropy Lemma identifies the ignorance functional; the Possibilistic Cramér-Rao Lemma bounds entropy reduction per measurement; the Evidence-Optimality Lemma proves minimum-q selection is the unique minimizer. The ESPF differs from Bayesian filters by minimizing worst-case epistemic ignorance rather than expected uncertainty. The Kalman filter is recovered in the Gaussian limit. Numerical validation over a 2-day 877-step Smolyak Level-3 orbital tracking run confirms the regime structure under both nominal and stress conditions.


[39] 2603.10149

A neural operator for predicting vibration frequency response curves from limited data

In the design of engineered components, rigorous vibration testing is essential for performance validation and identification of resonant frequencies and amplitudes encountered during operation. Performing this evaluation numerically via machine learning has great potential to accelerate design iteration and make testing workflows more efficient. However, dynamical systems are conventionally difficult to solve via machine learning methods without using physics-based regularizing loss functions. To properly perform this forecasting task, a structure that has an inspectable physical obedience can be devised without the use of regularizing terms from first principles. The method employed in this work is a neural operator integrated with an implicit numerical scheme. This architecture enables operators to learn of the underlying state-space dynamics from limited data, allowing generalization to untested driving frequencies and initial conditions. This network can infer the system's global frequency response by training on a small set of input conditions. As a foundational proof of concept, this investigation verifies the machine learning algorithm with a linear, single-degree-of-freedom system, demonstrating implicit obedience of dynamics. This approach demonstrates 99.87% accuracy in predicting the Frequency Response Curve (FRC), forecasting the frequency and amplitude of linear resonance training on 7% of the bandwidth of the solution. By training machine learning models to internalize physics information rather than trajectory, better generalization accuracy can be realized, vastly improving the timeframe for vibration studies on engineered components.


[40] 2603.10239

Learning from Radio using Variational Quantum RF Sensing

In modern wireless networks, radio channels serve a dual role. Whilst their primary function is to carry bits of information from a transmitter to a receiver, the intrinsic sensitivity of transmitted signals to the physical structure of the environment makes the channel a powerful source of knowledge about the world. In this paper, we consider an agent that learns about its environment using a quantum sensing probe, optimised using a quantum circuit, which interacts with the radio-frequency (RF) electromagnetic field. We use data obtained from a ray-tracer to train the quantum circuit and learning model and we provide extensive experiments under realistic conditions on a localisation task. We show that using quantum sensors to learn from radio signals can enable intelligent systems that require no channel measurements at deployment, remain sensitive to weak and obstructed RF signals, and can learn about the world despite operating with strictly less information than classical baselines.


[41] 2603.10240

nlm: Real-Time Non-linear Modal Synthesis in Max

We present \texttt{nlm}, a set of Max externals that enable efficient real-time non-linear modal synthesis for strings, membranes, and plates. The externals, implemented in C++, offer interactive control of physical parameters, allow the loading of custom modal data, and provide multichannel output. By integrating interactive physical-modelling capabilities into a familiar environment, \texttt{nlm} lowers the barrier for composers, performers, and sound designers to explore the expressive potential of non-linear modal synthesis. The externals are available as open-source software at this https URL.


[42] 2603.10280

Avoiding Semi-Infinite Programming in Distributionally Robust Control Based on Mean-Variance Metrics

Conventional stochastic control methods have several limitations. They focus on optimizing the average performance and, in some cases, performance variability; however, their problem settings still require an explicit specification of the probability distributions that determine the system's stochastic behavior. Distributionally robust control (DRC) methods have recently been developed to address these challenges. However, many DRC approaches involve handling infinitely many inequalities. For instance, DRC problems based on the Wasserstein distance are commonly obtained by solving semi-infinite programming (SIP) problems. Our proposed method eliminates the need for SIP when solving discrete-time, discounted, distributionally robust optimal control problems. By introducing a penalty term based on a specific distributional distance, we establish upper bounds, and under appropriate conditions, demonstrate the equivalence between distributionally robust optimization problems and mean-variance minimization problems. This reformulation reduces the original DRC problem to a discounted mean-variance cost optimization problem. In linear-quadratic regulator settings, the corresponding control laws are obtained by solving the Riccati equation. Numerical experiments demonstrate that the theoretical maximum value of the discounted cumulative cost for the proposed method is lower than that for the conventional method.


[43] 2603.10325

Geo-ADAPT-VQE: Quantum Information Metric-Aware Circuit Optimization for Quantum Chemistry

Adaptive ansatz construction has emerged as a powerful technique for reducing circuit depth and improving optimization efficiency in variational quantum eigensolvers. However, existing adaptive methods, including ADAPT-VQE, rely solely on first-order gradients and therefore ignore the underlying geometry of the quantum state space, limiting both convergence behavior and operator-selection efficiency. We introduce Geo-ADAPT-VQE, a geometry-aware adaptive VQE algorithm that selects operators from a pool using the natural gradient rule. The geometric operator-selection rule enables the ansatz to grow along directions aligned with the underlying quantum-state geometry, thereby improving convergence and reducing the algorithm's susceptibility to shallow local minima and saddle-point regions. We further provide an asymptotic convergence result. We present numerical simulations involving five molecules, which demonstrate that Geo-ADAPT-VQE achieves faster and more stable convergence compared to existing methods, while producing significantly shorter ansatz. In particular, Geo-ADAPT achieves up to 100-fold reduction in energy error compared to existing methods.


[44] 2603.10340

Overcoming Visual Clutter in Vision Language Action Models via Concept-Gated Visual Distillation

Vision-Language-Action (VLA) models demonstrate impressive zero-shot generalization but frequently suffer from a "Precision-Reasoning Gap" in cluttered environments. This failure is driven by background-induced feature dilution, where high-frequency semantic noise corrupts the geometric grounding required for precise manipulation. To bridge this gap, we propose Concept-Gated Visual Distillation (CGVD), a training-free, model-agnostic inference framework that stabilizes VLA policies. CGVD operates by parsing instructions into safe and distractor sets, utilizing a two-layer target refinement process--combining cross-validation and spatial disambiguation--to explicitly penalize false positives and isolate genuine manipulation targets. We then process the scene via Fourier-based inpainting, generating a clean observation that actively suppresses semantic distractors while preserving critical spatial geometry and visual proprioception. Extensive evaluations in highly cluttered manipulation tasks demonstrate that CGVD prevents performance collapse. In environments with dense semantic distractors, our method significantly outperforms state-of-the-art baselines, achieving a 77.5% success rate compared to the baseline's 43.0%. By enforcing strict attribute adherence, CGVD establishes inference-time visual distillation as a critical prerequisite for robust robotic manipulation in the clutter.


[45] 2603.10426

3-D Trajectory Optimization for Robust Direction Sensing in Movable Antenna Systems

This paper presents a novel wireless sensing system where a movable antenna (MA) continuously moves and receives sensing signals within a three-dimensional (3-D) region to enhance sensing performance compared with conventional fixed-position antenna (FPA)-based sensing. We show that the performance of direction vector estimation for a target is fundamentally related to the 3-D MA trajectory in terms of the mean square angular error lower-bound (MSAEB), which is adopted as a coordinate-invariant performance metric. In particular, the closed-form expression of the MSAEB is derived as a function of the trajectory covariance matrix. Theoretical analysis shows that two-dimensional (2-D) antenna movement suffers from performance divergence for target direction close to the endfire direction of the 2-D MA plane, whereas 3-D movement can achieve isotropic sensing performance over the entire angular region. To achieve robust sensing performance, we formulate a min-max optimization problem to minimize the maximum (worst-case) MSAEB over a given continuous angular region wherein the target is located. An efficient successive convex approximation (SCA) algorithm is developed to optimize the 3-D MA trajectory and obtain a locally optimal solution. Numerical results demonstrate that the proposed 3-D MA sensing scheme is able to significantly reduce the worst-case mean square angular error (MSAE) compared with conventional arrays with FPAs and MA systems with 2-D movement only, thus achieving more accurate and robust direction estimation over the entire angular region.


[46] 2603.10527

World Model for Battery Degradation Prediction Under Non-Stationary Aging

Degradation prognosis for lithium-ion cells requires forecasting the state-of-health (SOH) trajectory over future cycles. Existing data-driven approaches can produce trajectory outputs through direct regression, but lack a mechanism to propagate degradation dynamics forward in time. This paper formulates battery degradation prognosis as a world model problem, encoding raw voltage, current, and temperature time-series from each cycle into a latent state and propagating it forward via a learned dynamics transition to produce a future trajectory spanning 80 cycles. To investigate whether electrochemical knowledge improves the learned dynamics, a Single Particle Model (SPM) constraint is incorporated into the training loss. Three configurations are evaluated on the Severson LiFePO4 (LFP) dataset of 138 cells. Iterative rollout halves the trajectory forecast error compared to direct regression from the same encoder. The SPM constraint improves prediction at the degradation knee where the resistance to SOH relationship is most applicable, without changing aggregate accuracy.


[47] 2603.10549

Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues

Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon fiber-reinforced polymers (CFRP). Deploying AI-based AIRT methodologies for inspecting CFRPs requires the creation of time consuming and expensive datasets of CFRP inspection sequences to train neural networks. To address this challenge, this work introduces a novel language-guided framework for cognitive defect analysis in CFRPs using AIRT and vision-language models (VLMs). Unlike conventional learning-based approaches, the proposed framework does not require developing training datasets for extensive training of defect detectors, instead it relies solely on pretrained multimodal VLM encoders coupled with a lightweight adapter to enable generative zero-shot understanding and localization of subsurface defects. By leveraging pretrained multimodal encoders, the proposed system enables generative zero-shot understanding of thermographic patterns and automatic detection of subsurface defects. Given the domain gap between thermographic data and natural images used to train VLMs, an AIRT-VLM Adapter is proposed to enhance the visibility of defects while aligning the thermographic domain with the learned representations of VLMs. The proposed framework is validated using three representative VLMs; specifically, GroundingDINO, Qwen-VL-Chat, and CogVLM. Validation is performed on 25 CFRP inspection sequences with impacts introduced at different energy levels, reflecting realistic defects encountered in industrial scenarios. Experimental results demonstrate that the AIRT-VLM adapter achieves signal-to-noise ratio (SNR) gains exceeding 10 dB compared with conventional thermographic dimensionality-reduction methods, while enabling zero-shot defect detection with intersection-over-union values reaching 70%.


[48] 2603.10562

Quantization Robustness of Monotone Operator Equilibrium Networks

Monotone operator equilibrium networks are implicit-layer models whose output is the unique equilibrium of a monotone operator, guaranteeing existence, uniqueness, and convergence. When deployed on low-precision hardware, weights are quantized, potentially destroying these guarantees. We analyze weight quantization as a spectral perturbation of the underlying monotone inclusion. Convergence of the quantized solver is guaranteed whenever the spectral-norm weight perturbation is smaller than the monotonicity margin; the displacement between quantized and full-precision equilibria is bounded in terms of the perturbation size and margin; and a condition number characterizing the ratio of the operator norm to the margin links quantization precision to forward error. MNIST experiments confirm a phase transition at the predicted threshold: three- and four-bit post-training quantization diverge, while five-bit and above converge. The backward-pass guarantee enables quantization-aware training, which recovers provable convergence at four bits.


[49] 2603.10670

Dynamic Modeling and Attitude Control of a Reaction-Wheel-Based Low-Gravity Bipedal Hopper

Planetary bodies characterized by low gravitational acceleration, such as the Moon and near-Earth asteroids, impose unique locomotion constraints due to diminished contact forces and extended airborne intervals. Among traversal strategies, hopping locomotion offers high energy efficiency but is prone to mid-flight attitude instability caused by asymmetric thrust generation and uneven terrain interactions. This paper presents an underactuated bipedal hopping robot that employs an internal reaction wheel to regulate body posture during the ballistic flight phase. The system is modeled as a gyrostat, enabling analysis of the dynamic coupling between torso rotation and reaction wheel momentum. The locomotion cycle comprises three phases: a leg-driven propulsive jump, mid-air attitude stabilization via an active momentum exchange controller, and a shock-absorbing landing. A reduced-order model is developed to capture the critical coupling between torso rotation and reaction wheel dynamics. The proposed framework is evaluated in MuJoCo-based simulations under lunar gravity conditions (g = 1.625 m/s^2). Results demonstrate that activation of the reaction wheel controller reduces peak mid-air angular deviation by more than 65% and constrains landing attitude error to within 3.5 degrees at touchdown. Additionally, actuator saturation per hop cycle is reduced, ensuring sufficient control authority. Overall, the approach significantly mitigates in-flight attitude excursions and enables consistent upright landings, providing a practical and control-efficient solution for locomotion on irregular extraterrestrial terrains.


[50] 2603.10671

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

Recently, progress has been made on the Intra Pattern Copy (IPC) tool for JPEG XS, an image compression standard designed for low-latency and low-complexity coding. IPC performs wavelet-domain intra compensation predictions to reduce spatial redundancy in screen content. A key module of IPC is the displacement vector (DV) search, which aims to solve the optimal prediction reference offset. However, the DV search process is computationally intensive, posing challenges for practical hardware deployment. In this paper, we propose an efficient pipelined FPGA architecture design for the DV search module to promote the practical deployment of IPC. Optimized memory organization, which leverages the IPC computational characteristics and data inherent reuse patterns, is further introduced to enhance the performance. Experimental results show that our proposed architecture achieves a throughput of 38.3 Mpixels/s with a power consumption of 277 mW, demonstrating its feasibility for practical hardware implementation in IPC and other predictive coding tools, and providing a promising foundation for ASIC deployment.


[51] 2603.10711

Parallel-in-Time Nonlinear Optimal Control via GPU-native Sequential Convex Programming

Real-time trajectory optimization for nonlinear constrained autonomous systems is critical and typically performed by CPU-based sequential solvers. Specifically, reliance on global sparse linear algebra or the serial nature of dynamic programming algorithms restricts the utilization of massively parallel computing architectures like GPUs. To bridge this gap, we introduce a fully GPU-native trajectory optimization framework that combines sequential convex programming with a consensus-based alternating direction method of multipliers. By applying a temporal splitting strategy, our algorithm decouples the optimization horizon into independent, per-node subproblems that execute massively in parallel. The entire process runs fully on the GPU, eliminating costly memory transfers and large-scale sparse factorizations. This architecture naturally scales to multi-trajectory optimization. We validate the solver on a quadrotor agile flight task and a Mars powered descent problem using an on-board edge computing platform. Benchmarks reveal a sustained 4x throughput speedup and a 51% reduction in energy consumption over a heavily optimized 12-core CPU baseline. Crucially, the framework saturates the hardware, maintaining over 96% active GPU utilization to achieve planning rates exceeding 100 Hz. Furthermore, we demonstrate the solver's extensibility to robust Model Predictive Control by jointly optimizing dynamically coupled scenarios under stochastic disturbances, enabling scalable and safe autonomy.


[52] 2603.10763

Prioritizing Gradient Sign Over Modulus: An Importance-Aware Framework for Wireless Federated Learning

Wireless federated learning (FL) facilitates collaborative training of artificial intelligence (AI) models to support ubiquitous intelligent applications at the wireless edge. However, the inherent constraints of limited wireless resources inevitably lead to unreliable communication, which poses a significant challenge to wireless FL. To overcome this challenge, we propose Sign-Prioritized FL (SP-FL), a novel framework that improves wireless FL by prioritizing the transmission of important gradient information through uneven resource allocation. Specifically, recognizing the importance of descent direction in model updating, we transmit gradient signs in individual packets and allow their reuse for gradient descent if the remaining gradient modulus cannot be correctly recovered. To further improve the reliability of transmission of important information, we formulate a hierarchical resource allocation problem based on the importance disparity at both the packet and device levels, optimizing bandwidth allocation across multiple devices and power allocation between sign and modulus packets. To make the problem tractable, the one-step convergence behavior of SP-FL, which characterizes data importance at both levels in an explicit form, is analyzed. We then propose an alternating optimization algorithm to solve this problem using the Newton-Raphson method and successive convex approximation (SCA). Simulation results confirm the superiority of SP-FL, especially in resource-constrained scenarios, demonstrating up to 9.96\% higher testing accuracy on the CIFAR-10 dataset compared to existing methods.


[53] 2603.10800

AI-Enhanced Spatial Cellular Traffic Demand Prediction with Contextual Clustering and Error Correction for 5G/6G Planning

Accurate spatial prediction of cellular traffic demand is essential for 5G NR capacity planning, network densification, and data-driven 6G planning. Although machine learning can fuse heterogeneous geospatial and socio-economic layers to estimate fine-grained demand maps, spatial autocorrelation can cause neighborhood leakage under naive train/test splits, inflating accuracy and weakening planning reliability. This paper presents an AI-driven framework that reduces leakage and improves spatial generalization via a context-aware two-stage splitting strategy with residual spatial error correction. Experiments using crowdsourced usage indicators across five major Canadian cities show consistent mean absolute error (MAE) reductions relative to location-only clustering, supporting more reliable bandwidth provisioning and evidence-based spectrum planning and sharing assessments.


[54] 2603.10802

Towards Intelligent Spectrum Management: Spectrum Demand Estimation Using Graph Neural Networks

The growing demand for wireless connectivity, combined with limited spectrum resources, calls for more efficient spectrum management. Spectrum sharing is a promising approach; however, regulators need accurate methods to characterize demand dynamics and guide allocation decisions. This paper builds and validates a spectrum demand proxy from public deployment records and uses a graph attention network in a hierarchical, multi-resolution setup (HR-GAT) to estimate spectrum demand at fine spatial scales. The model captures both neighborhood effects and cross-scale patterns, reducing spatial autocorrelation and improving generalization. Evaluated across five Canadian cities and against eight competitive baselines, HR-GAT reduces median RMSE by roughly 21% relative to the best alternative and lowers residual spatial bias. The resulting demand maps are regulator-accessible and support spectrum sharing and spectrum allocation in wireless networks.


[55] 2603.10812

Distributed Stability Certification and Control from Local Data

Most data-driven analysis and control methods rely on centralized access to system measurements. In contrast, we consider a setting in which the measurements are distributed across multiple agents and raw data are not shared. Each agent has access only to locally held samples, possibly as little as a single measurement, and agents exchange only locally computed signals. Consequently, no individual agent possesses sufficient information to identify the entire system or synthesize a controller independently. To address this limitation, we develop distributed dynamical algorithms that enable the agents to collectively compute global system certificates from local data. Two problems are addressed. First, for stable linear time-invariant (LTI) systems, the agents compute a Lyapunov certificate by solving the Lyapunov equation in a fully distributed manner. Second, for general LTI systems, they compute the stabilizing solution of the algebraic Riccati equation and hence the optimal linear-quadratic regulator (LQR). An initially proposed scheme guarantees practical convergence, while a subsequent augmented PI-type algorithm achieves exact convergence to the desired solution. We further establish robustness of the resulting LQR controller to uncertainty and measurement noise. The approach is illustrated through distributed Lyapunov certification of a quadruple-tank process and distributed LQR design for helicopter dynamics.


[56] 2603.10890

A gripper for flap separation and opening of sealed bags

Separating thin, flexible layers that must be individually grasped is a common but challenging manipulation primitive for most off-the-shelf grippers. A prominent example arises in clinical settings: the opening of sterile flat pouches for the preparation of the operating room, where the first step is to separate and grasp the flaps. We present a novel gripper design and opening strategy that enables reliable flap separation and robust seal opening. This capability addresses a high-volume repetitive hospital procedure in which nurses manually open up to 240 bags per shift, a physically demanding task linked to musculoskeletal injuries. Our design combines an active dented-roller fingertip with compliant fingers that exploit environmental constraints to robustly grasp thin flexible flaps. Experiments demonstrate that the proposed gripper reliably grasps and separates sealed bag flaps and other thin-layered materials from the hospital, the most sensitive variable affecting performance being the normal force applied. When two copies of the gripper grasp both flaps, the system withstands the forces needed to open the seals robustly. To our knowledge, this is one of the first demonstrations of robotic assistance to automate this repetitive, low-value, but critical hospital task.


[57] 2603.10970

Reference Architecture of a Quantum-Centric Supercomputer

Quantum computers have demonstrated utility in simulating quantum systems beyond brute-force classical approaches. As the community builds on these demonstrations to explore using quantum computing for applied research, algorithms and workflows have emerged that require leveraging both quantum computers and classical high-performance computing (HPC) systems to scale applications, especially in chemistry and materials, beyond what either system can simulate alone. Today, these disparate systems operate in isolation, forcing users to manually orchestrate workloads, coordinate job scheduling, and transfer data between systems -- a cumbersome process that hinders productivity and severely limits rapid algorithmic exploration. These challenges motivate the need for flexible and high-performance Quantum-Centric Supercomputing (QCSC) systems that integrate Quantum Processing Units (QPUs), Graphics Processing Units (GPUs), and Central Processing Units (CPUs) to accelerate discovery of such algorithms across applications. These systems will be co-designed across quantum and classical HPC infrastructure, middleware, and application layers to accelerate the adoption of quantum computing for solving critical computational problems. We envision QCSC evolution through three distinct phases: (1) quantum systems as specialized compute offload engines within existing HPC complexes; (2) heterogeneous quantum and classical HPC systems coupled through advanced middleware, enabling seamless execution of hybrid quantum-classical algorithms; and (3) fully co-designed heterogeneous quantum-HPC systems for hybrid computational workflows. This article presents a reference architecture and roadmap for these QCSC systems.


[58] 2001.08480

Segmentation of Retinal Low-Cost Optical Coherence Tomography Images using Deep Learning

The treatment of age-related macular degeneration (AMD) requires continuous eye exams using optical coherence tomography (OCT). The need for treatment is determined by the presence or change of disease-specific OCT-based biomarkers. Therefore, the monitoring frequency has a significant influence on the success of AMD therapy. However, the monitoring frequency of current treatment schemes is not individually adapted to the patient and therefore often insufficient. While a higher monitoring frequency would have a positive effect on the success of treatment, in practice it can only be achieved with a home monitoring solution. One of the key requirements of a home monitoring OCT system is a computer-aided diagnosis to automatically detect and quantify pathological changes using specific OCT-based biomarkers. In this paper, for the first time, retinal scans of a novel self-examination low-cost full-field OCT (SELF-OCT) are segmented using a deep learning-based approach. A convolutional neural network (CNN) is utilized to segment the total retina as well as pigment epithelial detachments (PED). It is shown that the CNN-based approach can segment the retina with high accuracy, whereas the segmentation of the PED proves to be challenging. In addition, a convolutional denoising autoencoder (CDAE) refines the CNN prediction, which has previously learned retinal shape information. It is shown that the CDAE refinement can correct segmentation errors caused by artifacts in the OCT image.


[59] 2211.01720

Response time central-limit and failure rate estimation for stationary periodic rate monotonic real-time systems

Real-time systems consist of a set of tasks, a scheduling policy, and a system architecture, all constrained by timing requirements. Many everyday embedded systems, within devices such as airplanes, cars, trains, and spatial probes, operate as real-time systems. To ensure safe failure rates, response times-the time required for the exection of a task-must be bounded. Rate Monotonic real-time systems prioritize tasks according to their arrival rate. This paper focuses on the use of the central limit of response times built in \cite{zagalo2022} and an approximation of their distribution with an inverse Gaussian mixture distribution. The distribution parameters and their associated failure rates are estimated through a suitable re-parameterization of the inverse Gaussian distribution and an adapted Expectation-Maximization algorithm. Extensive simulations demonstrate that the method is well-suited for the approximation of failure rates. We discuss the extension of such method to a chi-squared independence test adapted to real-time systems.


[60] 2402.18719

Max-Consensus with Deterministic Convergence in Directed Graphs with Unreliable Communication Links

We present DMaC, a novel distributed, finite-time algorithm that guarantees max-consensus in directed networks with unreliable communication links experiencing packet drops. Unlike existing methods, DMaC ensures all nodes compute the exact maximum state under arbitrary packet loss patterns. It incorporates a fully distributed termination mechanism, enabling nodes to autonomously determine whether convergence has occurred. Our algorithm leverages narrowband error-free feedback channels to acknowledge successful (single-bit) transmissions with minimal communication overhead. We analyze our algorithm's operation, and we provide a convergence proof establishing explicit bounds on the required time steps. We validate its correctness in a wireless sensor network for environmental monitoring, and finally, we compare against existing approaches highlighting our algorithm's operational advantages.


[61] 2411.00143

Enhancing Brain Source Reconstruction by Initializing 3D Neural Networks with Physical Inverse Solutions

Reconstructing brain sources is a fundamental challenge in neuroscience, crucial for understanding brain function and dysfunction. Electroencephalography (EEG) signals have a high temporal resolution. However, identifying the correct spatial location of brain sources from these signals remains difficult due to the ill-posed structure of the problem. Traditional methods predominantly rely on manually crafted priors, missing the flexibility of data-driven learning, while recent deep learning approaches focus on end-to-end learning, typically using the physical information of the forward model only for generating training data. We propose the novel hybrid method 3D-PIUNet for EEG source localization that effectively integrates the strengths of traditional and deep learning techniques. 3D-PIUNet starts from an initial physics-informed estimate by using the pseudo inverse to map from measurements to source space. Secondly, by viewing the brain as a 3D volume, we use a 3D convolutional U-Net to capture spatial dependencies and refine the solution according to the learned data prior. Training the model relies on simulated pseudo-realistic brain source data, covering different source distributions. Trained on this data, our model significantly improves spatial accuracy, demonstrating superior performance over both traditional and end-to-end data-driven methods. Additionally, we validate our findings with real EEG data from a visual task, where 3D-PIUNet successfully identifies the visual cortex and reconstructs the expected temporal behavior, thereby showcasing its practical applicability.


[62] 2411.15965

Phase Selection and Analysis for Multi-frequency Multi-user RIS Systems Employing Subsurfaces in Correlated Ricean and Rayleigh Environments

Phase selection design for reconfigurable intelligent surfaces (RISs) is a significant research challenge, as a closed-form optimal solution for a multi-user (MU) system is believed to be intractable. While existing methods achieve strong near-optimal performance, they typically entail high computational complexity. In this work, we take a different approach and propose a practical method that achieves competitive performance while substantially reducing computational complexity. To do so, we consider a RIS divided into subsurfaces. Each subsurface is designed specifically for one user, who is served on their own frequency band. The other subsurfaces (those not designed for this user) provide additional uncontrolled scattering. We derive the exact closed-form expression for the mean signal-to-noise ratio (SNR) for the proposed subsurface design (SD) when all channels experience correlated Ricean fading. We simplify this to find the mean SNR for line-of-sight (LoS) channels and channels experiencing correlated Rayleigh fading. An iterative SD (ISD) process is proposed, where subsurfaces are designed sequentially, and the phases that are already set are used to enhance the design of the remaining subsurfaces. This is extended to a converged ISD (CISD), where the ISD process is repeated multiple times until the SNR increases by less than a specified tolerance. The ISD and CISD both provide a performance improvement over SD, which increases as the number of RIS elements increases. The SD is significantly simpler than the lowest complexity MU method we know of, and despite each user having less bandwidth, the SD outperforms the existing method in some key scenarios. The SD is more robust to strongly LoS channels and clustered users, as it does not rely on spatial multiplexing like other MU methods. Combined with the complexity reduction, this makes the SD an attractive phase selection method.


[63] 2412.00462

Signal Processing over Time-Varying Graphs: A Systematic Review

As irregularly structured data representations, graphs have received a large amount of attention in recent years and have been widely applied to various real-world scenarios such as social, traffic, and energy settings. Compared to non-graph algorithms, numerous graph-based methodologies benefited from the strong power of graphs for representing high-dimensional and non-Euclidean data. In the field of Graph Signal Processing (GSP), analogies of classical signal processing concepts, such as shifting, convolution, filtering, and transformations are developed. However, many GSP techniques usually postulate the graph is static in both signal and typology. This assumption hinders the effectiveness of GSP methodologies as the assumption ignores the time-varying properties in numerous real-world systems. For example, in the traffic network, the signal on each node varies over time and contains underlying temporal correlation and patterns worthy of analysis. To tackle this challenge, more and more work are being done recently to investigate the processing of time-varying graph signals. They cope with time-varying challenges from three main directions: 1) graph time-spectral filtering, 2) multi-variate time-series forecasting, and 3) spatiotemporal graph data mining by neural networks, where non-negligible progress has been achieved. Despite the success of signal processing and learning over time-varying graphs, there is no survey to compare and conclude the current methodology for GSP and graph learning. To compensate for this, in this paper, we aim to review the development and recent progress on signal processing and learning over time-varying graphs, and compare their advantages and disadvantages from both the methodological and experimental side, to outline the challenges and potential research directions for future research.


[64] 2504.07496

Modular Control of Discrete Event System for Modeling and Mitigating Power System Cascading Failures

Cascading failures in power systems caused by sequential tripping of components are a serious concern as they can lead to complete or partial shutdowns, disrupting vital services and causing damage and inconvenience. In prior work, we developed a new approach for identifying and preventing cascading failures in power systems. The approach uses supervisory control technique of discrete event systems (DES) by incorporating both on-line lookahead control and forcible events. In this paper, we use modular supervisory control of DES to reduce computation complexity and increase the robustness and reliability of control. Modular supervisory control allows us to predict and mitigate cascading failures in power systems more effectively. We implemented the proposed control technique on a simulation platform developed in MATLAB and applied the proposed DES controller. The calculations of modular supervisory control of DES are performed using an external tool and imported into the MATLAB platform. We conduct simulation studies for the IEEE 30-bus, 118-bus and 300-bus systems, and the results demonstrate the effectiveness of our proposed approach.


[65] 2506.01925

Platform-Aware Channel Knowledge Mapping via Mutual Antenna Pattern Learning in 3D Wireless Links

This letter proposes a platform-aware framework to characterize wireless links by empirically modeling the `near-platform' scattering and reflections induced by the hardware mounting structures of both endpoints. We model the link characteristics as a novel mutual antenna pattern: a joint function of the angle of arrival (AoA) and angle of departure (AoD). We demonstrate that while individual platform-aware patterns are mathematically unidentifiable from power measurements, the coupled mutual pattern can be effectively estimated in a least-squares sense. Our framework is evaluated using noisy measurement data, revealing that as few as 10 measurements per joint-angular bin are sufficient. The proposed methodology is validated through cross-validation of experimental subsets, demonstrating that the learned mutual radiation pattern reduces path loss estimation errors by up to 10 dB compared to traditional models using isolated anechoic chamber antenna gains.


[66] 2508.09942

Beam Cross Sections Create Mixtures: Improving Feature Localization in Secondary Electron Imaging

Secondary electron (SE) imaging techniques, such as scanning electron microscopy and helium ion microscopy (HIM), use electrons emitted by a sample in response to a focused beam of charged particles incident at a grid of raster scan positions. Spot size -- the diameter of the incident beam's spatial profile -- is one of the limiting factors for resolution, along with various sources of noise in the SE signal. The effect of the beam spatial profile is commonly understood as convolutional. We show that under a simple and plausible physical abstraction for the beam, though convolution describes the mean of the SE counts, the full distribution of SE counts is a mixture. We demonstrate that this more detailed modeling can enable resolution improvements over conventional estimators through a stylized application inspired by semiconductor inspection: localizing the edge in a two-valued sample. We derive Fisher information about edge location in conventional and time-resolved measurements (TRM) and also derive the maximum likelihood estimate (MLE) from the latter. Empirically, the MLE computed from TRM is approximately efficient except at very low beam diameter, so Fisher information comparisons are predictive of performance and can be used to optimize the beam diameter relative to the raster scan spacing. Monte Carlo simulations provide an example of the MLE giving a 5-fold reduction in root mean-squared error (RMSE) of edge localization as compared to conventional interpolation-based estimation. The RMSE is substantially below both the beam diameter and the raster scan spacing and thus sub-pixel localization is demonstrated. Applied to three real HIM datasets, the average RMSE reduction factor is 5.4.


[67] 2508.16107

Constant-Envelope ISAC via FM-OFDM: Analytical Framework and Receiver Design

Integrated Sensing and Communication (ISAC) systems face stringent hardware constraints, particularly regarding the high Peak-to-Average Power Ratio (PAPR) of standard OFDM, which necessitates power amplifier (PA) back-off and reduces sensing range. This paper investigates Frequency Modulated OFDM (FM-OFDM) as a constant-envelope solution capable of operating in the PA saturation region, thereby maximizing output power without the non-linear distortion penalties typical of conventional waveforms. We derive a comprehensive analytical framework for FM-OFDM in doubly dispersive channels, explicitly quantifying the inter-carrier interference (ICI) dynamics and effective channel gains in the discriminator domain. To address the unique phase structure of the waveform, we propose a tailored sensing receiver architecture utilizing slow time phase differencing for robust velocity estimation. Unlike prior works, we evaluate performance under a strictly normalized bandwidth constraint (B99), ensuring a fair comparison against CP-OFDM and Constant-Envelope OFDM (CE-OFDM). Simulation results demonstrate that FM-OFDM maintains superior detection accuracy and low BER even under fully saturated PA conditions and high Doppler shifts, validating its suitability for hardware-constrained ISAC transceivers.


[68] 2509.00782

Deep Unfolding with Approximated Computations for Rapid Optimization

Optimization-based solvers play a central role in a wide range of signal processing and communication tasks. However, their applicability in latency-sensitive systems is limited by the sequential nature of iterative methods and the high computational cost per iteration. While deep unfolding has emerged as a powerful paradigm for converting iterative algorithms into learned models that operate with a fixed number of iterations, it does not inherently address the cost of each iteration. In this paper, we introduce a learned optimization framework that jointly tackles iteration count and per-iteration complexity. Our approach is based on unfolding a fixed number of optimization steps, replacing selected iterations with low-complexity approximated computations, and learning extended hyperparameters from data to compensate for the introduced approximations. We demonstrate the effectiveness of our method on two representative problems: (i) hybrid beamforming; and (ii) robust principal component analysis. These fundamental case studies show that our learned approximated optimizers can achieve state-of-the-art performance while reducing computational complexity by over three orders of magnitude. Our results highlight the potential of our approach to enable rapid, interpretable, and efficient decision-making in real-time systems.


[69] 2509.12583

Robust Audio-Visual Target Speaker Extraction with Emotion-Aware Multiple Enrollment Fusion

Audio-Visual Target Speaker Extraction (AVTSE) is crucial for cocktail party scenarios. Leveraging multiple cues --such as utterance-level speaker embeddings or steady face images, and frame-level lip motion or facial expression features --can significantly improve performance. However, real-world applications often suffer from intermittent signal loss, especially for frame-level cues. This paper systematically investigates the robustness of multi-enrollment fusion under varying degrees of modality missing. Results show that while full multimodal fusion excels under ideal conditions, its performance degrades sharply when encountering unseen modalities missing during the testing. Crucially, training with a high missing rate dramatically enhances robustness, maintaining stable performance even under severe test-time modality missing. We demonstrate that fusing the complementary one frame of face image with frame-level lip features achieves both strong performance and robustness for the AVTSE task. The model and codes are shared.


[70] 2509.20246

Reciprocal Beyond-Diagonal Reconfigurable Intelligent Surface (BD-RIS): Scattering Matrix Design via Manifold Optimization

Beyond-diagonal reconfigurable intelligent surfaces (BD-RISs) are emerging as a transformative technology in wireless communications, enabling enhanced performance and quality of service (QoS) of wireless systems in harsh urban environments due to their relatively low cost and advanced signal processing capabilities. Generally, BD-RIS systems are employed to improve robustness, increase achievable rates, and enhance energy efficiency of wireless systems in both direct and indirect ways. The direct way is to produce a favorable propagation environment via the design of optimized scattering matrices, while the indirect way is to reap additional improvements via the design of multiple-input multiple-output (MIMO) beamformers that further exploit the latter "engineered" medium. In this article, the problem of sum-rate maximization via BD-RIS is examined, with a focus on feasibility, namely low-complexity physical implementation, by enforcing reciprocity in the BD-RIS design in a manner that adheres to the geometry of the manifold of symmetric matrices. To that end, the sum-rate objective is transformed into a quadratic function via fractional programming (FP), augmented via the also quadratic reciprocity constraint in the form of a regularization term, while the unitary constraint is dealt with via a manifold optimization framework. Simulation results demonstrate the effectiveness of the proposed method in outperforming current state-of-the-art (SotA) approaches in terms of sum-rate maximization.


[71] 2509.24799

Event-Based Control via Sparsity-Promoting Regularization: A Rollout Approach with Performance Guarantees

This paper presents a controller design framework aiming to balance control performance and actuation rate. Control performance is evaluated by an infinite-horizon average cost, and the number of control actions is penalized via sparsity-promoting regularization. Since the formulated optimal control problem has a combinatorial nature, we employ a rollout algorithm to obtain a tractable suboptimal solution. In the proposed scheme, actuation timings are determined through a multistage minimization procedure based on a receding-horizon approach, and the corresponding control inputs are computed online. We establish theoretical performance guarantees with respect to periodic control and prove the stability of the closed-loop system. The effectiveness of the proposed method is demonstrated through a numerical example.


[72] 2510.00676

Formation Control via Rotation Symmetry Constraints

This work introduces a distributed formation control strategy for multi-agent systems based solely on rotation symmetry constraints. We propose a potential function that enforces inter-agent \textbf{rotational} symmetries, whose gradient defines a control law that drives the agents toward a desired planar symmetric configuration. We show that only $n-1$ edges (the minimal connectivity requirement) are sufficient to implement the strategy, where $n$ is the number of agents. We further augment the design to address the \textbf{maneuvering problem}, enabling the formation to undergo coordinated translations, rotations, and scaling along a predefined virtual trajectory. Simulation examples are provided to validate the effectiveness of the proposed method.


[73] 2510.12947

HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection

Personalized Voice Activity Detection (PVAD) systems activate only in response to a specific target speaker. Speaker-conditioning methods are employed to inject information about the target speaker into a VAD pipeline, to achieve personalization. Existing speaker-conditioning methods typically modify the inputs or activations of a VAD model. We propose an alternative perspective to speaker conditioning. Our approach, HyWA, employs a hypernetwork to generate personalized weights for a few selected layers of a standard VAD model. We evaluate HyWA against multiple baseline speaker-conditioning techniques using a fixed backbone VAD. Our comparison shows consistent improvements in PVAD performance. This new approach improves the current speaker-conditioning techniques in two ways: i) increases the mean average precision, ii) facilitates deployment by reusing the same VAD architecture.


[74] 2510.15613

A Predictive Flexibility Aggregation Method for Low Voltage Distribution System Control

This paper presents a method for predictive aggregation of the available flexibility at the residential unit level into a flexibility chart that represents the admissible active and reactive powers, along with the associated flexibility value. The method is also combined with centralized optimization to design a predictive privacy-preserving control scheme to manage low-voltage distribution systems in real-time. Similarly to hierarchical control strategies, this approach divides the optimization horizon into a real-time stage, responsible for decisions in the current market period, and an operational planning stage, which deals with decisions outside of this interval. First, a multiparametric optimization problem is solved offline at the residential unit level. Then, an operational planning problem, also formulated as a parametric optimization problem, is solved to account for the forecasts. The method generates the desired flexibility chart by combining the results of these two problems with measurements. The resulting approach is compatible with real-time control requirements, as heavy computations are performed offline in a decentralized manner. By linking real-time flexibility assessment with energy scheduling, our approach enables efficient and cost-effective management of low-voltage distribution systems. We validate this method on a low-voltage network of 43 buses by comparing it with a fully centralized optimization formulation with perfect foresight and a future-agnostic aggregation method.


[75] 2510.21556

System-Theoretic Analysis of Dynamic Generalized Nash Equilibria -- Turnpikes and Dissipativity

Generalized Nash equilibria are used in multi-agent control applications to model strategic interactions between agents that are coupled in the cost, dynamics, and constraints, and provide the foundations for game-theoretic MPC (Receding Horizon Games). We study properties of finite-horizon dynamic GNE trajectories from a system-theoretic perspective. We show how strict dissipativity generates the turnpike phenomenon in GNE solutions. Moreover, we establish a converse turnpike result, i.e., the implication from turnpike to strict dissipativity. We derive conditions under which the steady-state GNE is the optimal operating point and, using a game value function, we give a local characterization of the geometry of storage functions. Finally, we design linear terminal penalties that ensure dynamic GNE trajectories applied in open-loop converge to and remain at the steady-state GNE. These connections provide the foundation for future system-theoretic analysis of GNEs similar to those existing in optimal control as well as for recursive feasibility and closed-loop stability results of game-theoretic MPC.


[76] 2510.27306

Simplifying Preference Elicitation in Local Energy Markets: Combinatorial Clock Exchange

As distributed energy resources (DERs) proliferate, future power system will need new market platforms enabling prosumers to trade various electricity and grid-support products. However, prosumers often exhibit complex, product interdependent preferences and face limited cognitive and computational resources, hindering engagement with complex market structures and bid formats. We address this challenge by introducing a multi-product market that allows prosumers to express complex preferences through an intuitive format, by fusing combinatorial clock exchange and machine learning (ML) techniques. The iterative mechanism only requires prosumers to report their preferred package of products at posted prices, eliminating the need for forecasting product prices or adhering to complex bid formats, while the ML-aided price discovery speeds up convergence. The linear pricing rule further enhances transparency and interpretability. Finally, numerical simulations demonstrate convergence to clearing prices in approximately 15 clock iterations.


[77] 2601.09006

GOUHFI 2.0: A Next-Generation Toolbox for Brain Segmentation and Cortex Parcellation at Ultra-High Field MRI

Ultra-High Field MRI (UHF-MRI) is increasingly used in large-scale neuroimaging studies, yet automatic brain segmentation and cortical parcellation remain challenging due to signal inhomogeneities, heterogeneous contrasts and resolutions, and the limited availability of tools optimized for UHF data. Standard software packages such as FastSurferVINN and SynthSeg+ often yield suboptimal results when applied directly to UHF images, thereby restricting region-based quantitative analyses. To address this need, we introduce GOUHFI 2.0, an updated implementation of GOUHFI that incorporates increased training data variability and additional functionalities, including cortical parcellation and volumetry. GOUHFI 2.0 preserves the contrast- and resolution-agnostic design of the original toolbox while introducing two independently trained 3D U-Net segmentation tasks. The first performs whole-brain segmentation into 35 labels across contrasts, resolutions, field strengths and populations, using a domain-randomization strategy and a training dataset of 238 subjects. Using the same training data, the second network performs cortical parcellation into 62 labels following the Desikan-Killiany-Tourville (DKT) protocol. Across multiple datasets, GOUHFI 2.0 demonstrated improved segmentation accuracy relative to the original toolbox, particularly in heterogeneous cohorts, and produced reliable cortical parcellations. In addition, the integrated volumetry pipeline yielded results consistent with standard volumetric workflows. Overall, GOUHFI 2.0 provides a comprehensive solution for brain segmentation, parcellation and volumetry across field strengths, and constitutes the first deep-learning toolbox enabling robust cortical parcellation at UHF-MRI.


[78] 2603.02530

Contractor-Expander and Universal Inverse Optimal Positive Nonlinear Control

For general control-affine nonlinear systems in the positive orthant, and with positive controls, we show how strict CLFs can be utilized for inverse optimal stabilization. Conventional ``LgV'' inverse optimal feedback laws, for systems with unconstrained states and controls, assume sign-unconstrained inputs and input penalties that are class-K in the input magnitude, hence symmetric about zero. Such techniques do not extend to positive-state-and-control systems. Major customizations are needed, and introduced in this paper, for positive systems where highly asymmetric (or unconventionally symmetric) costs not only on the state but also on control are necessary. With the predator-prey positive-state positive-input benchmark system as inspiration, using a strict CLF built in our previous paper, we prototype two general inverse optimal methodological frameworks that employ particular ``contractor and expander functions.'' One framework (A) employs a triple consisting of a CLF, a stabilizing feedback, and an expander, whereas the other framework (B) employs a pair of a CLF and a contractor function. Both frameworks yield inverse optimal stabilizer constructions, on positive orthants of arbitrary dimensions. A stronger construction results from a stronger CLF condition. Biological interpretation for the predator-prey model illuminates that such inverse optimal control constructions are bio-ecologically meaningful. In addition to general frameworks, we present two fully explicit designs: two Sontag-like universal formulae for stabilization of positive-orthant systems by positive feedback, one of them with inverse optimality.


[79] 2603.04203

Security-Constrained Substation Reconfiguration Considering Busbar and Coupler Contingencies

Substation reconfiguration via busbar splitting can mitigate transmission grid congestion and reduce operational costs. However, existing approaches neglect the security of substation topology, particularly for substations without busbar splitting (i.e., closed couplers), which can lead to severe consequences. Additionally, the computational complexity of optimizing substation topology remains a challenge. This paper introduces a MILP formulation for security-constrained substation reconfiguration (SC-SR), considering N-1 line, coupler and busbar contingencies to ensure secure substation topology. To efficiently solve this problem, we propose a heuristic approach with multiple master problems (HMMP). A central master problem optimizes dispatch, while independent substation master problems determine individual substation topologies in parallel. Linear AC power flow equations ensure PF accuracy, while feasibility and optimality sub-problems evaluate contingency cases. The proposed HMMP significantly reduces computational complexity and enables scalability to large-scale power systems. Case studies on the IEEE 14-bus, 118-bus, and PEGASE 1354-bus system show the effectiveness of the approach in mitigating the impact of coupler and busbar tripping, balancing system security and cost, and computational efficiency.


[80] 2603.07696

Multi-View Based Audio Visual Target Speaker Extraction

Audio-Visual Target Speaker Extraction (AVTSE) aims to separate a target speaker's voice from a mixed audio signal using the corresponding visual cues. While most existing AVTSE methods rely exclusively on frontal-view videos, this limitation restricts their robustness in real-world scenarios where non-frontal views are prevalent. Such visual perspectives often contain complementary articulatory information that could enhance speech extraction. In this work, we propose Multi-View Tensor Fusion (MVTF), a novel framework that transforms multi-view learning into single-view performance gains. During the training stage, we leverage synchronized multi-perspective lip videos to learn cross-view correlations through MVTF, where pairwise outer products explicitly model multiplicative interactions between different views of input lip embeddings. At the inference stage, the system supports both single-view and multi-view inputs. Experimental results show that in the single-view inputs, our framework leverages multi-view knowledge to achieve significant performance gains, while in the multi-view mode, it further improves overall performance and enhances the robustness. Our demo, code and data are available at this https URL


[81] 2603.07907

Robust control synthesis for uncertain linear systems with input saturation using mixed IQCs

This paper develops a robust control synthesis method for uncertain linear systems with input saturation in the framework of integral quadratic constraints (IQCs). The system is reformulated as a linear fractional representation (LFR) that captures both dead-zone nonlinearity and time-varying uncertainties. By combining mixed IQC-based dissipation inequalities with quadratic Lyapunov functions, sufficient conditions for robust stabilization are established. Compared with conventional approaches based on a single static sector condition for the dead-zone nonlinearity, the proposed method yields improved $\mathcal{L}_2$-gain performance through the use of scaled mixed IQCs. For systems subject to time-varying structured uncertainties, a new scaled bounded real lemma is further developed based on the IQC characterization. The resulting $\mathcal{H}_\infty$ synthesis conditions are expressed as linear matrix inequalities (LMIs), which are numerically tractable in all decision variables, including the scaling factors in the IQC multipliers. The proposed method is validated using a second-order uncertain system in linear fractional form, and its superiority over an anti-windup design is further illustrated by a cart-pendulum example.


[82] 2310.07649

Automated Layout and Control Co-Design of Robust Multi-UAV Transportation Systems

The joint optimization of physical parameters and controllers in robotic systems is challenging. This is due to the difficulties of predicting the effect that changes in physical parameters have on final performances. At the same time, physical and morphological modifications can improve robot capabilities, perhaps completely unlocking new skills and tasks. We present a novel approach to co-optimize the physical layout and the control of a cooperative aerial transportation system. The goal is to achieve the most precise and robust flight when carrying a payload. We assume the agents are connected to the payload through rigid attachments, essentially transforming the whole system into a larger flying object with ``thrust modules" at the attachment locations of the quadcopters. We investigate the optimal arrangement of the thrust modules around the payload, so that the resulting system achieves the best disturbance rejection capabilities. We propose a novel metric of robustness inspired by H2 control, and propose an algorithm to optimize the layout of the vehicles around the object and their controller altogether. We experimentally validate the effectiveness of our approach using fleets of three and four quadcopters and payloads of diverse shapes.


[83] 2410.05406

Synthesizing Interpretable Control Policies through Large Language Model Guided Search

The combination of Large Language Models (LLMs), systematic evaluation, and evolutionary algorithms has enabled breakthroughs in combinatorial optimization and scientific discovery. We propose to extend this powerful combination to the control of dynamical systems, generating interpretable control policies capable of complex behaviors. With our novel method, we represent control policies as programs in standard languages like Python. We evaluate candidate controllers in simulation and evolve them using a pre-trained LLM. Unlike conventional learning-based control techniques, which rely on black-box neural networks to encode control policies, our approach enhances transparency and interpretability. We still take advantage of the power of large AI models, but only at the policy design phase, ensuring that all system components remain interpretable and easily verifiable at runtime. Additionally, the use of standard programming languages makes it straightforward for humans to finetune or adapt the controllers based on their expertise and intuition. We illustrate our method through its application to the synthesis of an interpretable control policy for the \textit{pendulum swing-up} and the \textit{ball in cup} tasks. We make the code available at this https URL.


[84] 2412.20426

Robust targeted exploration for systems with non-stochastic disturbances

We propose a novel targeted exploration strategy designed specifically for uncertain linear time-invariant systems with energy-bounded disturbances, i.e., without any assumptions on the distribution of the disturbances. We use classical results characterising the set of non-falsified parameters consistent with energy-bounded disturbances. We derive a semidefinite program which computes an exploration strategy that guarantees a desired accuracy of the parameter estimate. This design is based on sufficient conditions on the spectral content of the exploration data that robustly account for initial parametric uncertainty. Finally, we highlight the applicability of the exploration strategy through a numerical example involving a nonlinear system.


[85] 2503.11627

Are Deep Speech Denoising Models Robust to Adversarial Noise?

Deep noise suppression (DNS) models enjoy widespread use throughout a variety of high-stakes speech applications. However, we show that four recent DNS models can each be reduced to outputting unintelligible gibberish through the addition of psychoacoustically hidden adversarial noise, even in low-background-noise and simulated over-the-air settings. For three of the models, a small transcription study with audio and multimedia experts confirms unintelligibility of the attacked audio; simultaneously, an ABX study shows that the adversarial noise is generally imperceptible, with some variance between participants and samples. While we also establish several negative results around targeted attacks and model transfer, our results nevertheless highlight the need for practical countermeasures before open-source DNS systems can be used in safety-critical applications.


[86] 2504.08937

Rethinking Few-Shot Image Fusion: Granular Ball Priors Enable General-Purpose Deep Fusion

In image fusion tasks, the absence of real fused images as supervision signals poses significant challenges for supervised learning. Existing deep learning methods typically address this issue either by designing handcrafted priors or by relying on large-scale datasets to learn model parameters. Different from previous approaches, this paper introduces the concept of incomplete priors, which formally describe handcrafted priors at the algorithmic level and estimate their confidence. Based on this idea, we couple incomplete priors with the neural network through a sample-level adaptive loss function, enabling the network to learn and re-infer fusion rules under conditions that approximate the real fusion this http URL generate incomplete priors, we propose a Granular Ball Pixel Computation (GBPC) algorithm based on the principles of granular computing. The algorithm models fused-image pixels as information units, estimating pixel weights at a fine-grained level while statistically evaluating prior reliability at a coarse-grained level. This design enables the algorithm to perceive cross-modal discrepancies and perform adaptive this http URL results demonstrate that even under few-shot conditions, a lightweight neural network can still learn effective fusion rules by training only on image patches extracted from ten image pairs. Extensive experiments across multiple fusion tasks and datasets further show that the proposed method achieves superior performance in both visual quality and model compactness. The code is available at: this https URL


[87] 2504.09836

Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems

In this paper, we propose a deterministic diffusion-based framework for controlling the probability density of nonlinear control-affine systems, with theoretical guarantees for drift-free and linear time-invariant (LTI) dynamics. The central idea is to first excite the system with white noise so that a forward diffusion process explores the reachable regions of state space, and then to design a deterministic feedback law that acts as a denoising mechanism driving the system back toward a desired target distribution supported on the target set. This denoising phase provides a feedback controller that steers the control system to the target set. In this framework, control synthesis reduces to constructing a deterministic reverse process that reproduces the desired evolution of state densities. We derive existence conditions ensuring such deterministic realizations of time-reversals for controllable drift-free and LTI systems, and show that the resulting feedback laws provide a tractable alternative to nonlinear control by viewing density control as a relaxation of controlling a system to target sets. Numerical studies on a unicycle model with obstacles, a five-dimensional driftless system, and a four-dimensional LTI system demonstrate reliable diffusion-inspired density control.


[88] 2505.14973

Customized Interior-Point Methods Solver for Embedded Real-Time Convex Optimization

This paper presents a customized second-order cone programming (SOCP) solver tailored for embedded real-time optimization, which frequently arises in modern guidance and control (G&C) applications. The solver employs a practically efficient predictor-corrector type primal-dual interior-point method (PDIPM) combined with a homogeneous embedding framework for infeasibility detection. Unlike conventional homogeneous self-dual embedding formulations, the adopted approach can directly handle quadratic cost functions without requiring problem reformulation. This capability allows the solver to directly address quadratic objective SOCP problems, while avoiding unnecessary performance degradation caused by the loss of sparsity due to problem reformulation. To support a systematic workflow, we also develop a code generation tool that analyzes the sparsity pattern of the problem to be solved and generates customized solver code using a predefined code template. The generated solver code is written in C with no external dependencies other than the standard library math.h, and it supports complete static allocation of all data. Additionally, it provides parsing information to facilitate the use of the solver by end users. Finally, benchmark and numerical experiments on an embedded platform demonstrate that the developed solver outperforms the existing solvers on problem scales typical of G&C applications.


[89] 2508.19075

Universal Dynamics with Globally Controlled Analog Quantum Simulators

Analog quantum simulators with global control fields have emerged as powerful platforms for exploring complex quantum phenomena. Despite these advances, a fundamental theoretical question remains unresolved: to what extent can such systems realize universal quantum dynamics under global control? Here we establish a necessary and sufficient condition for universal quantum computation using only global pulse control, proving that a broad class of analog quantum simulators is, in fact, universal. We further extend this framework to fermionic and bosonic systems, including modern platforms such as ultracold atoms in optical superlattices. Moreover, we observe that analog simulators driven by random global pulses exhibit information scrambling comparable to random unitary circuits. In a dual-species neutral-atom array setup, the measurement outcomes anti-concentrate on a $\log N$ timescale despite the presence of only temporal randomness, opening opportunities for efficient randomness generation. To bridge theoretical possibility with experimental reality, we introduce \emph{direct quantum optimal control}, a control framework that enables the synthesis of complex effective Hamiltonians while incorporating realistic hardware constraints. Using this approach, we experimentally engineer three-body interactions outside the blockade regime and demonstrate topological dynamics on a Rydberg-atom array. Experimental measurements reveal dynamical signatures of symmetry-protected-topological edge modes, confirming both the expressivity and feasibility of our method. Our work opens a new avenue for quantum simulation beyond native hardware Hamiltonians, enabling the engineering of effective multi-body interactions and advancing the frontier of quantum information processing with globally-controlled analog platforms.


[90] 2509.14053

Trade-offs between structural richness and communication efficiency in music network representations

Music is a structured and perceptually rich sequence of sounds in time, whose perception is shaped by the interplay of expectation and uncertainty about what comes next. Yet the uncertainty we infer from music depends on how the musical piece is encoded as an event sequence. In this work, we use network representations, in which event types are nodes and observed transitions are directed edges, to compare how different feature encodings shape the transition structure we recover and what this implies for both the descriptive uncertainty expectation under imperfect memory and noise. We systematically analyse eight encodings of piano music, from single-feature vocabularies to richer multi-feature combinations. These representational choices reorganize the state space and fundamentally reshape network topology, shifting how uncertainty is distributed across transitions. To connect these descriptive differences to perception, we adopt a perceptual-constraint model that captures imperfect access to transition statistics. Overall, compressed single-feature representations yield dense transition structures with higher entropy rates, corresponding to higher average uncertainty per step, yet low model error, indicating that the constrained estimate stays close to the corpus transitions. In contrast, richer multi-feature representations preserve finer distinctions but expand the state space, sharpen transition profiles, lower entropy rates, and increase model error. Finally, across representations, uncertainty concentrates in diffusion-central nodes while model error remains low there, suggesting an informational landscape in which predictable flow coexists with localized surprise. Overall, our results show that feature choice shapes not only the networks we reconstruct, but also whether their resulting uncertainty is a plausible proxy for the expectations listeners can realistically learn and use.


[91] 2509.18149

Tensor Train Completion from Fiberwise Observations Along a Single Mode

Tensor completion is an extension of matrix completion aimed at recovering a multiway data tensor by leveraging a given subset of its entries (observations) and the pattern of observation. The low-rank assumption is key in establishing a relationship between the observed and unobserved entries of the tensor. The low-rank tensor completion problem is typically solved using numerical optimization techniques, where the rank information is used either implicitly (in the rank minimization approach) or explicitly (in the error minimization approach). Current theories concerning these techniques often study probabilistic recovery guarantees under conditions such as random uniform observations and incoherence requirements. However, if an observation pattern exhibits some low-rank structure that can be exploited, more efficient algorithms with deterministic recovery guarantees can be designed by leveraging this structure. This work shows how to use only standard linear algebra operations to compute the tensor train decomposition of a specific type of ``fiber-wise'' observed tensor, where some of the fibers of a tensor (along a single specific mode) are either fully observed or entirely missing, unlike the usual entry-wise observations. From an application viewpoint, this setting is relevant when it is easier to sample or collect a multiway data tensor along a specific mode (e.g., temporal). The proposed completion method is fast and is guaranteed to work under reasonable deterministic conditions on the observation pattern. Through numerical experiments, we showcase interesting applications and use cases that illustrate the effectiveness of the proposed approach.


[92] 2510.21490

Analysis and Synthesis of Switched Optimization Algorithms

Deployment of optimization algorithms over communication networks face challenges associated with time delays and corruptions. Fixed time delays can destabilize popular gradient-based algorithms, and this degradation is exacerbated by time-varying delays that may arise from packet drops. This work concentrates on the analysis and synthesis of discrete-time optimization algorithms with certified exponential convergence rates that are robust against switched network dynamics between the optimizer and the gradient oracle. Analysis is accomplished by solving linear matrix inequalities under bisection in the exponential convergence rate, searching over Zames-Falb filter coefficients that can certify convergence. Synthesis is performed by alternating between a search over filter coefficient for a fixed controller, and a search over controllers for a fixed filter. Effectiveness is demonstrated by the synthesis of convergent optimization algorithms over networks with time-varying delays, and networks with unstable channel dynamics.


[93] 2510.26299

Modeling strategies for speech enhancement in the latent space of a neural audio codec

Neural audio codecs (NACs) provide compact latent speech representations in the form of sequences of continuous vectors or discrete tokens. In this work, we investigate how these two types of speech representations compare when used as training targets for supervised speech enhancement. We consider both autoregressive and non-autoregressive speech enhancement models based on the Conformer architecture, as well as a simple baseline where the NAC encoder is simply fine-tuned for speech enhancement. Our experiments reveal three key findings: predicting continuous latent representations consistently outperforms discrete token prediction; autoregressive models achieve higher quality but at the expense of intelligibility and efficiency, making non-autoregressive models more attractive in practice; and adding encoder fine-tuning yields the strongest enhancement metrics overall, though at the cost of degraded codec reconstruction. The code and audio samples are available online.


[94] 2511.08502

Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

Autonomous systems increasingly rely on human feedback to align their behavior, expressed as pairwise comparisons, rankings, or demonstrations. While existing methods can adapt behaviors, they often fail to guarantee safety in safety-critical domains. We propose a safety-guaranteed, optimal, and efficient approach for solving the learning problem from preferences, rankings, or demonstrations using Weighted Signal Temporal Logic (WSTL). WSTL learning problems, when implemented naively, lead to multi-linear constraints in the weights to be learned. By introducing structural pruning and log-transform procedures, we reduce the problem size and recast it as a Mixed-Integer Linear Program while preserving safety guarantees. Experiments on robotic navigation and real-world Formula 1 data demonstrate that the method captures nuanced preferences and models complex task objectives.


[95] 2601.01772

Design and Quantitative Evaluation of an Embedded EEG Instrumentation Platform for Real-Time SSVEP Decoding

This paper presents an embedded EEG instrumentation platform for real-time steady-state visually evoked potential (SSVEP) decoding based on an ESP32-S3 microcontroller and an ADS1299 analog front end. The system performs $8$-channel EEG acquisition, zero-phase bandpass filtering, and canonical correlation analysis entirely on-device, while supporting wireless communication and closed-loop operation without external computation. A central contribution is the quantitative characterization of the platform's measurement integrity. Reported results demonstrate a stable shorted-input noise floor ($\approx 0.08~\mu\text{V}_{\text{RMS}}$), tightly bounded sampling jitter ($0.56~\mu\text{s}$ standard deviation), and negligible long-term drift ($< 1~\text{ppm}$). Numerical fidelity analysis shows $100\%$ decision agreement between the mixed-precision embedded pipeline and a $64$-bit double-precision reference. Effective common-mode attenuation exceeded $112~\text{dB}$ under balanced conditions, with a localized $26.9~\text{dB}$ degradation observed under source-impedance mismatch. Closed-loop validation achieved $99.17\%$ online accuracy and an information transfer rate of $27.66~\text{bits/min}$. These results position the proposed system as a quantitatively characterized embedded EEG measurement and processing platform for real-time SSVEP decoding.


[96] 2601.03410

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

Molecular subtyping of PDAC into basal-like and classical has established prognostic and predictive value. However, its use in clinical practice is limited by cost, turnaround time, and tissue requirements, thereby restricting its application in the management of PDAC. We introduce PanSubNet, an interpretable deep learning framework that predicts therapy-relevant molecular subtypes directly from standard H&E-stained WSIs. PanSubNet was developed using data from 1,055 patients across two multi-institutional cohorts (PANCAN, n=846; TCGA, n=209) with paired histology and RNA-seq data. Ground-truth labels were derived using the validated Moffitt 50-gene signature refined by GATA6 expression. The model employs dual-scale architecture that fuses cellular-level morphology with tissue-level architecture, leveraging attention mechanisms for multi-scale representation learning and transparent feature attribution. On internal validation within PANCAN using five-fold cross-validation, PanSubNet achieved mean AUC of 88.5% with balanced sensitivity and specificity. External validation on the independent TCGA cohort without fine-tuning demonstrated robust generalizability (AUC 84.0%). PanSubNet preserved and, in metastatic disease, strengthened prognostic stratification compared to RNA-seq based labels. Prediction uncertainty linked to intermediate transcriptional states, not classification noise. Model predictions are aligned with established transcriptomic programs, differentiation markers, and DNA damage repair signatures. By enabling rapid, cost-effective molecular stratification from routine H&E-stained slides, PanSubNet offers a clinically deployable and interpretable tool for genetic subtyping. We are gathering data from two institutions to validate and assess real-world performance, supporting integration into digital pathology workflows and advancing precision oncology for PDAC.


[97] 2602.17312

LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

Offline safe reinforcement learning (RL) is increasingly important for cyber-physical systems (CPS), where safety violations during training are unacceptable and only pre-collected data are available. Existing offline safe RL methods typically balance reward-safety tradeoffs through constraint relaxation or joint optimization, but they often lack structural mechanisms to prevent safety drift. We propose LexiSafe, a lexicographic offline RL framework designed to preserve safety-aligned behavior. We first develop LexiSafe-SC, a single-cost formulation for standard offline safe RL, and derive safety-violation and performance-suboptimality bounds that together yield sample-complexity guarantees. We then extend the framework to hierarchical safety requirements with LexiSafe-MC, which supports multiple safety costs and admits its own sample-complexity analysis. Empirically, LexiSafe demonstrates reduced safety violations and improved task performance compared to constrained offline baselines. By unifying lexicographic prioritization with structural bias, LexiSafe offers a practical and theoretically grounded approach for safety-critical CPS decision-making.


[98] 2602.17929

ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

Vision Transformers rely on positional embeddings and class tokens encoding fixed spatial priors. While effective for natural images, these priors may be suboptimal when spatial layout is weakly informative, a frequent condition in medical imaging. We introduce ZACH-ViT (Zero-token Adaptive Compact Hierarchical Vision Transformer), a compact Vision Transformer that removes positional embeddings and the [CLS] token, achieving permutation-invariant patch processing via global average pooling. Zero-token denotes removal of the dedicated aggregation token and positional encodings. Patch tokens remain unchanged. Adaptive residual projections preserve training stability under strict parameter constraints. We evaluate ZACH-ViT across seven MedMNIST datasets under a strict few-shot protocol (50 samples/class, fixed hyperparameters, five seeds). Results reveal regime-dependent behavior: ZACH-ViT (0.25M parameters, trained from scratch) achieves strongest advantage on BloodMNIST and remains competitive on PathMNIST, while relative advantage decreases on datasets with stronger anatomical priors (OCTMNIST, OrganAMNIST), consistent with our hypothesis. Component and pooling ablations show positional support becomes mildly beneficial as spatial structure increases, whereas reintroducing a [CLS] token is consistently unfavorable. These findings support that architectural alignment with data structure can outweigh universal benchmark dominance. Despite minimal size and no pretraining, ZACH-ViT achieves competitive performance under data-scarce conditions, relevant for compact medical imaging and low-resource settings. Code: this https URL


[99] 2602.22248

Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP)

The next generation of particle physics experiments will face a new era of challenges in data acquisition, due to unprecedented data rates and volumes along with extreme environments and operational constraints. Harnessing this data for scientific discovery demands real-time inference and decision-making, intelligent data reduction, and efficient processing architectures beyond current capabilities. Crucial to the success of this experimental paradigm are several emerging technologies, such as artificial intelligence and machine learning (AI/ML), silicon microelectronics, and the advent of quantum algorithms and processing. Their intersection includes areas of research such as low-power and low-latency devices for edge computing, heterogeneous accelerator systems, reconfigurable hardware, novel codesign and synthesis strategies, readout for cryogenic or high-radiation environments, and analog computing. This white paper presents a community-driven vision to identify and prioritize research and development opportunities in hardware-based ML systems and corresponding physics applications, contributing towards a successful transition to the new data frontier of fundamental science.