New articles on Electrical Engineering and Systems Science


[1] 2606.07580

Quantifying Uncertainty in Space Debris Capture with Active Tether-Net Systems Caused by Noisy Observations

As Low Earth Orbit has grown more crowded with space debris, the need for reliable and efficient debris removal solutions becomes more urgent. An active tether-net system with maneuverable units is one of the promising solutions to this problem, whose success is dependent on the robustness of the net maneuver and closing decisions. These in turn are impacted by the uncertainties attributed to i) noisy observation of the target debris state (e.g., sensing errors), and ii) imperfect simulations of the complex net dynamics and net/debris interaction behavior, over which the decision system is trained. This paper focuses on the first of these two uncertainty sources, and presents a pipeline to propagate and quantify the resulting uncertainty in the debris capture performance expressed in terms of Capture Quality Index (CQI). This quantification is uniquely performed for both an active tether-net using a fixed baseline control and one using a trained neuro-control policy to guide the net maneuver during the deployment phase. Two different uncertainty quantification (UQ) techniques, namely Sobol's variance-based sensitivity analysis and perturbation-based method are exploited. A high-fidelity simulator and a lower-fidelity surrogate-based environment are used to demonstrate trade-offs between prediction accuracy versus ease of resolving uncertainties.


[2] 2606.07655

FADRW: A Feature-Aware Modulated and Dynamically Reweighted Loss for Few-Shot Linguistic Steganalysis

The ubiquity of social media platforms facilitates malicious linguistic steganography, posing significant security risks. However, detection is severely hampered by two fundamental issues during model training. Firstly, extreme class imbalance (less than 1% steganographic samples) induces a strong decision bias. Secondly, the invisibility of generative steganography means its features are nearly indistinguishable from benign text; this similarity, compounded by their extreme rarity, leads to severe feature marginalization, where faint steganographic signals are completely overwhelmed. To directly address these optimization-level challenges, we propose FADRW (Feature-Aware Modulated and Dynamically Reweighted Loss), a novel loss function framework engineered for few-shot steganalysis. FADRW employs Dynamic Reweighting to progressively counteract decision bias, and a Feature-Aware Modulation module to structurally reshape the feature space, preventing feature marginalization by enhancing the separability of these subtle features. Extensive experiments on datasets from three real-world social platforms demonstrate that FADRW significantly outperforms state-of-the-art methods, particularly in the challenging few-shot steganographic sample scenario.


[3] 2606.07675

The Need for Neural ISP in the Small-Pixel Era: How Shrinking Pixels Push Optics to the Limit and Neural Restoration Pushes Back

Smartphone telephoto cameras are approaching a "telephoto physics wall": as pixel pitches shrink toward sub-0.5 micron, the optics remain limited by geometric aberrations, leading to diminishing returns on resolution. Traditional Image Signal Processors (ISPs) cannot eliminate these aberrations, because they operate through local, stage-wise processing with no explicit model of the underlying point spread function (PSF). We demonstrate how a learning-based Neural ISP for image restoration, trained on the underlying degradations, inverts what stage-wise pipelines cannot, turning small-pixel designs into a net advantage. We investigate this through a controlled simulation of a representative telephoto module, evaluating five configurations (0.35--0.75 micron pixel pitch). The aperture is scaled proportionally to keep per-pixel SNR and diffraction spot size fixed, thereby isolating geometric aberration and spatial sampling. While the traditional ISP improves only modestly with smaller pixels, the Neural ISP scales substantially: at 0.35 micron} it reaches 745 cycles/mm MTF50 (vertical), a 2.5--3x resolution improvement over the traditional ISP, and LPIPS improves significantly from 0.244 to 0.151 while traditional results stay comparatively flat. In a low-SNR extension (15 dB per-frame bursts at 0.35 micron), a multi-frame Neural ISP recovers performance close to the bright-light single-frame baseline, whereas a multi-frame traditional ISP shows no meaningful improvement -- indicating that traditional pipelines at small pixels are bottlenecked by uncorrected PSF blur rather than by noise. These results point to a design philosophy in which Neural ISPs enable high-resolution telephoto modules by correcting residual optical aberrations rather than requiring increasingly complex optics.


[4] 2606.07717

Multi-planar 2D-U-Net Segmentation of 3D-CT Abdominal Organs augmented by Spatial Occurrence Maps

This work proposes a lightweight 2D-U-Net-based framework for segmenting five abdominal organs in large field-of-view 3D CT scans. The method combines coarse-to-fine segmentation, predictions from multiple anatomical planes, and additional fuzzy 3D spatial maps that provide anatomical location cues to improve segmentation accuracy. We combine multi-planar 2D-U-Net models augmented by a spatial occurrence map. The approach involves two main stages. First, the abdominal volume of interest region is detected by traversing the whole scan axially with a 2D-U-Net and determining the x-y-z-minimum and -maximum extents of the 5 abdominal organs of interest. Second, we use spatial occurrence maps to enhance our multi-planar 2D-U-net architecture inside the bounds from the former stage. The method is evaluated on 80 CT scans from various public sources. The results show Dice improvements of about 4% at maximum compared to the same model trained without spatial occurrence maps.


[5] 2606.07719

Hessian-matching Based Weighting for Attitude Determination Using Short-Range DoA Measurements with IMU Assistance

Accurate and reliable attitude determination (AD) is essential for unmanned vehicles operating in Global Navigation Satellite System (GNSS)-denied environments. Short-range wireless arrays can provide direction-of-arrival (DoA) measurements from multiple anchors, enabling AD by aligning corresponding direction vectors (DVs) expressed in the body and navigation frames. In short-range scenarios, navigation-frame DVs inherit non-negligible uncertainty induced by anchor/vehicle position errors in addition to DoA-induced errors in body-frame DVs. Moreover, due to projection and unit-norm normalization, the DV errors are generally anisotropic, which motivates a total least squares (TLS) viewpoint. This paper identifies the key modeling distinction in short-range AD, develops a TLS-consistent formulation based on the total DV error and solves the resulting covariance-weighted orthogonal Procrustes problem via a manifold Gauss--Newton method. To retain the efficiency and numerical robustness of the closed-form weighted Wahba solution, we further propose Hessian-matching based scalar weighting strategies that approximate the Hessian of Wahba formulation to the TLS formulation, including a full-attitude strategy for overall accuracy and a direction-of-interest (DOI) strategy for prioritizing a selected attitude component. Finally, we incorporate IMU-derived gravity as an additional DV pair for static initialization, leading to extended Wahba and extended TLS formulations. Simulation results demonstrate that the proposed Hessian-matching weighting improves accuracy and robustness compared with existing baselines, and that gravity-DV augmentation further reduces attitude errors and improves solution availability under limited anchor availability.


[6] 2606.07758

Koopman meets input-output data: Data-driven output-feedback control of nonlinear systems with closed-loop guarantees

Data-driven control of nonlinear systems from input-output measurements remains a fundamental challenge, as existing approaches with rigorous closed-loop guarantees predominantly require access to full state measurements. In this paper, we address this gap by proposing a data-driven output-feedback controller design method for nonlinear systems that provides provable closed-loop guarantees while operating solely on measured input-output data. Our approach combines Koopman operator theory with an extended state representation of the nonlinear system constructed from input-output trajectories. This allows us to obtain a bilinear surrogate model directly from data, on which robust state-feedback design methods can be applied. By exploiting the observability of the underlying nonlinear system, we establish exponential stability of the extended state, which in turn implies exponential convergence of the original system state to the origin. Finally, we validate our theoretical findings in numerical simulations.


[7] 2606.07803

Stability Without Safety: Gain Manipulation Attacks on Agentic Cyber-Physical Systems

Agentic cyber-physical systems (CPS), where autonomous AI agents participate in runtime control decision-making, introduce agent-driven parameter-update pathways absent from conventional feedback architectures. These pathways form a parameter channel structurally distinct from classical sensor and actuator channels. Among these parameters, feedback gains are the highest-leverage target: a single gain matrix determines closed-loop eigenvalue placement for the entire system, and malicious updates can directly alter closed-loop dynamics while evading residual-based monitors. We formalize this attack surface through a three-axis attacker model and a taxonomy of Gain Manipulation Attacks (GMA). Two impact classes are identified: stability-margin erosion under sustained gain drift, and transient amplification under one-shot gain replacement. A stability-preserving gain replacement can still produce transient amplification far exceeding safe operating limits, and stability verification alone is insufficient to bound the physical impact of such attacks. Stealthiness conditions and worst-case impact certificates are derived for each class via Bauer--Fike eigenvalue bounds and the Kreiss matrix theorem, with preliminary detection directions and a vehicle lateral dynamics example provided.


[8] 2606.07876

Optimal Wiener-Filter Solutions for Denoising of Graph Signals on Directed Graphs

Graph signal processing has opened new avenues to the canonical denoising problem in interesting settings. Specifically, here we propose a Wiener-filter solution for graph signals on directed graphs. Under various stationarity assumptions combining uncorrelated and correlated noise conditions, we show optimal solutions, including a successful proof-of-concept for temperature graph.


[9] 2606.07900

CellSense: A Sub-6 GHz Cellular ISAC System for Clutter-Robust Passive Sensing

Future wireless networks demand capabilities beyond traditional communication, driving the development of Integrated Sensing and Communication (ISAC) for environmental awareness, localization, and tracking. Ubiquitous cellular deployment allows ISAC to maximize spectral efficiency, lower costs, and expand sensing coverage. However, sub-6 GHz research has heavily favored communication, leaving sensing capabilities largely underexplored. To bridge this gap, we introduce CellSense, a novel sub-6 GHz ISAC architecture natively integrated into the 5G cellular protocol stack for real-world target tracking. We validate the system via Sionna-based orthogonal frequency-division multiplexing (OFDM) link-level simulations and an experimental USRP hardware prototype using the OpenAirInterface (OAI) stack. Furthermore, we analyze the communication-sensing tradeoff by quantifying how pilot symbol density impacts throughput versus sensing accuracy. Simulations show that CellSense achieves a 74 percent detection probability with a 1.43 m localization error in indoor warehouse environment, which improves to 94 percent detection and a sub-meter error of 0.33 m in the outdoor environment of Oval area at the NCSU Centennial campus. Hardware experiments in a highly cluttered indoor laboratory confirm a 1.28 m localization accuracy and 76 percent detection probability, proving its efficacy for practical ISAC deployments.


[10] 2606.07906

Extremum Seeking Control Based Adaptive Compensation of Position Sensor Harmonics in PMSM Drives

Permanent Magnet Synchronous Machines (PMSMs) have become one of the preferred forms of electromechanical energy converters, attributing to their high efficiency, torque density, and other unique advantages. However, given the need for proper rotor position measurement for commutation and field orientation, accurate rotor position sensing is of paramount importance. In sensing motor rotor position with a sensor, harmonic errors that arise in the sensing subsystem lead to undesirable torque ripple. Thus, this paper presents an adaptive, extremum seeking control based approach capable of mitigating position signal harmonics in PMSMs. The proposed approach is experimentally validated under varying torque, speed, and harmonic conditions. Its harmonic compensation performance is comparatively evaluated against the look-up table based method. Furthermore, the accuracy of the proposed approach is analyzed, highlighting its effectiveness.


[11] 2606.07912

On Improved Statistical Accuracy of Low-Order Polynomial Chaos Approximations

Polynomial chaos expansions provide surrogate models for stochastic systems, with coefficients typically derived using Galerkin projection, stochastic collocation, or least squares approximation. These traditional approaches often fail to accurately capture statistical moments without resorting to high-order approximations. We propose a constrained optimization framework that modifies standard techniques to determine polynomial chaos coefficients that precisely recover the first two statistical moments. The effectiveness of our approach is demonstrated on several candidate algebraic functions of random variables, showing significant improvements in statistical accuracy even with low-order approximations.


[12] 2606.07961

Feedback Linearization and Control of a Grid-Forming Power Converter in an Islanded Microgrid

In an islanded setting, grid-forming inverters must regulate their terminal voltage without support from an external grid, even though the load current depends directly on that voltage. The usual approach is a cascaded proportional--integral (PI) controller, built on a fast inner current loop and a slower outer voltage loop, with feedforward terms used to compensate dq rotational coupling. However, this compensation is only exact at the operating point where the controller is tuned. This tutorial presents an alternative based on full-state feedback linearization. It is shown that the islanded inverter model has full relative degree, which allows exact state-space linearization with no internal or zero dynamics. A single feedback law cancels the main nonlinear effects; rotational coupling, resistive drops, and load conductance, so that the closed-loop system behaves like two independent double integrators. A standard pole-placement design is then used to shape the response. The controller is tested in MATLAB against a cascaded PI baseline under identical conditions at a 20 MW operating point, including reference tracking, load step disturbances, and parameter mismatch. The feedback-linearizing controller settles a reference step in 0.76 ms, while the PI controller does not reach the 2 % band within 50 ms. The cascaded PI controller shows better robustness to filter parameter mismatch due to its inner-loop integral action, which reduces steady-state errors under modeling uncertainty. Overall, the performance improvement and the robustness trade-off both come directly from the controller structures, rather than from tuning choices.


[13] 2606.08062

Multidimensional Resilience for Electrical Power Systems: Systematic Review, Integrated Index, and Validation under Real-World Cyber-Physical Attack Scenarios

The accelerating decarbonization of energy systems has transformed electrical power systems into complex infrastructures exposed to threats whose interactions generate systemic vulnerabilities that conventional resilience approaches fail to capture. Although resilience assessment has expanded across multiple dimensions, existing studies largely examine them in isolation or adjacent pairs, leaving cross-dimensional couplings insufficiently explored. This study demonstrates i) that single-dimension assessments fail to capture the degradation produced by simultaneous cross-dimensional failures, ii) the nonlinear amplification emerging when physical, operational, and digital-cyber dimensions are jointly compromised, and iii) the intensification imposed by climatic and economic-regulatory stressors. To this end, we leverage a hybrid quantitative methodology. A PRISMA 2020 review with backward and forward snowballing identifies methodological gaps and unresolved dependencies across five resilience dimensions: physical, operational, digital-cyber, climatic-external, and economic-regulatory. Following this analysis, a Multidimensional Resilience Index (MDRI) is developed to capture endogenous couplings and exogenous amplification effects and is validated under escalating cyber-physical attack scenarios inspired by the December 2025 attack on Polish energy infrastructure. Results show that degradation under cascading and simultaneous failures is nearly eight times greater than under isolated stress, while exogenous conditions amplify degradation by an additional factor approaching six, with 72% of this amplification driven by exogenous stressors. Combined, these mechanisms produce a 46-fold increase in resilience loss compared to a single-vector reference.


[14] 2606.08082

When Can Phasor-Domain Device Models Be Trusted for Electromechanical Stability Analysis of Grid-Forming Converter-Dominated Microgrids?

Grid-forming (GFM) converter-dominated microgrids are often analyzed using reduced-order phasor-domain electromechanical GFM models, but the validity of these models is often taken for granted. Assuming ideal inner-loop tracking (IILT) of terminal-voltage references, these models neglect the inner-loop and filter dynamics at the electromagnetic-transient (EMT) timescale to simplify stability analysis. This paper argues that such neglected dynamics can destabilize the system, invalidating the stability conclusions drawn from the IILT model. To address this cross-timescale stability issue, we formulate the validity of the IILT stability conclusion as a robust-stability certification problem. The EMT-induced model mismatch between the reduced-order converter model and the actual converter model is represented as a structured uncertainty embedded around the IILT feedback loop. This yields a frequency-resolved interaction index and a structured singular-value sufficient certificate for determining when the stability conclusion of the IILT model can be certified with respect to a prescribed EMT uncertainty weight. The uncertainty weight can be obtained from detailed EMT models or terminal reference-response measurements. Case studies confirm that the proposed certificate correctly certifies model validity and identifies the loss of trustworthiness. We also demonstrate that the measurement-based uncertainty weights closely match the model-based ones, which enables deployment without accessing inner-loop models.


[15] 2606.08112

A Global Convergence Analysis of Consensus ALADIN for Convex Optimization

Distributed optimization problems are pervasive in machine learning and optimal control. In this paper, we study smooth strongly convex distributed consensus optimization problems. We present a distributed optimization algorithm for consensus problems based on the Consensus Augmented Lagrangian Alternating Direction Inexact Newton (C-ALADIN) framework. Our algorithm uses an auxiliary variable to decide when to update second-order information, enabling curvature exploitation without sacrificing global convergence. This contrasts with existing C-ALADIN methods, which require constant Hessian approximations and thus lose numerical advantages. Under smooth strong convexity, the algorithm converges globally, and the auxiliary variable converges sublinearly. Numerical experiments on logistic regression show that our algorithm outperforms baseline methods that use either fixed or updated Hessian information.


[16] 2606.08137

A Barrier-Modulated Architecture for Safe Affine Formation Control in Second-Order Multi-Agent Systems

Affine formation control offers immense flexibility for coordinating multi-agent maneuvers, but guaranteeing the safety of agents under parametric uncertainties remains an open challenge. This paper proposes a novel safe affine formation control framework for second-order multi-agent systems by integrating Higher-Order Control Barrier Functions (HOCBFs) with Adaptive Dynamic Programming (ADP). We introduce a barrier-modulated control architecture that smoothly attenuates the nominal formation tracking objective when agents approach safety boundaries, preventing conflicting control inputs. Within this architecture, two distinct safety controllers are developed: (1) an analytical barrier-gradient repulsive controller that provides a computationally efficient, rigorous mathematical baseline, and (2) a data-driven optimal safety controller. The data-driven approach utilizes an actor-critic neural network to solve the Hamilton-Jacobi-Bellman (HJB) equation online, enabling optimal collision avoidance even in the presence of unknown system parameters. Using Nagumo's theorem and Lyapunov stability analysis, we formally prove that both controllers guarantee the forward invariance of the safe set ensuring absolute collision avoidance while maintaining Uniformly Ultimately Bounded (UUB) formation tracking errors. Finally, simulations validate the theoretical findings and demonstrate the robustness of the proposed controllers in dynamic obstacle avoidance scenarios.


[17] 2606.08171

Predictive Fixed-Filter Active Noise Control (PFANC) Using Convolutional Recurrent Neural Networks for Dynamic Noises

The existing Generative Fixed-Filter Active Noise Control (GFANC) method generates a suitable control filter based on the current noise frame. This reactive design aims to estimate a control filter that is optimal for the present frame rather than the upcoming one. Consequently, it suffers from an inherent tracking lag and lacks the predictive capability to handle rapidly varying noises. To address this limitation, we propose the Predictive Fixed-Filter Active Noise Control (PFANC) method with a proactive control paradigm in this paper. In the PFANC method, multiple consecutive noise frames are processed by a Convolutional Recurrent Neural Network (CRNN) to predict the next-frame control filter. By utilizing temporal correlations across noise frames to anticipate the control filter in advance, the PFANC method can effectively track dynamic noise changes. Furthermore, the theoretical analysis based on a high-order Markov chain shows that incorporating multiple noise frames enhances the prediction of the control filter. Numerical simulations with linear and logarithmic chirp signals, as well as real-world dynamic noises, validate the effectiveness of the PFANC method and its superiority over GFANC and its variations. The PFANC method also exhibits good transferability across different acoustic paths.


[18] 2606.08208

Risk-Aware Control of Systems with Quasi-Cone-Bounded Nonlinearities

We develop a tractable, rigorous approach to risk-aware control for a class of nonlinear systems. While many classical control methods reduce uncertainty to a simple average or a worst-case outcome, risk-aware control aims to equip systems with a refined awareness of uncertainty. Efficient methods for risk-aware control of linear systems are available, but there is a paucity of tools for tractable, risk-aware control of nonlinear systems. To bridge this gap, we develop an analytical, suboptimal controller with respect to a risk-aware performance criterion for systems with nonlinearities characterized by cone-like bounds. Numerical examples demonstrate benefits of the characterization of nonlinearities and risk that we consider.


[19] 2606.08210

Paediatric-HGNN: A Hybrid Heterogeneous Graph Neural Network for Detecting Disfluency in Children's Speech via Multiscale Acoustic Fusion

Automated stuttering detection (ASD) systems struggle with paediatric speech due to high acoustic variability in developing voices and the subtle distinction between pathological stuttering and typical developmental disfluencies. We introduce Paediatric-HGNN, a framework using a Context-aware Part-whole Interaction Network (CaPIN) tailored for paediatric data. Instead of conventional 1D signal modelling, our approach builds a heterogeneous graph capturing hierarchical relationships between lexical units (word nodes) and fine-grained acoustic segments (frame nodes). Trained on curated paediatric corpora (UCLASS and FluencyBank), Paediatric-HGNN achieves 82.4% weighted accuracy and a Typical Disfluency F1-score of 0.386. Modelling hierarchical lexical-acoustic interactions captures developmental "searching" behaviour, offering a more robust and interpretable tool for early clinical intervention.


[20] 2606.08225

A Double Proportionate Sparse Adaptive Filter for Impulsive Noise Environments

Sparse adaptive filters and impulsive noise robust algorithms have largely been developed along separate tracks, leaving a gap when both properties are needed simultaneously. This letter proposes the double proportionate sparse adaptive filter (DP-SAF), which closes this gap within a single $\mathcal{O}(M)$ update. Two independent diagonal gain matrices are introduced; one scales the adaptation step proportionately to coefficient magnitudes, and the other applies a magnitude-dependent zero-attraction that is strongest for inactive taps. A sign-error update provides robustness against impulsive corruptions. Both gain matrices are derived from a minimum-norm optimization framework. Simulations under a Bernoulli impulsive noise model show that DP-SAF consistently achieves a better steady-state MSD than the competing algorithms while matching or exceeding their convergence speeds.


[21] 2606.08226

CG-MambaNet: A spatiotemporal framework for cross-patient epileptic seizure prediction using CNN-GCN-Mamba-BiLSTM with event-level clinical evaluation

Epileptic seizure prediction from scalp EEG is critical for closed-loop neurostimulation therapy. Existing deep-learning methods share two architectural limitations: they model EEG channels independently, neglecting inter-channel spatial synchrony, and process raw time-domain samples without frequency decomposition. A methodological limitation also affects the field: most studies use data splits that permit patient-level information leakage, yielding optimistic estimates that do not generalise to unseen patients. We present CG-MambaNet, a spatiotemporal seizure prediction framework addressing all three limitations. A depthwise separable CNN front-end decomposes each EEG patch into multi-scale spectro-temporal features, capturing delta-to-gamma band dynamics before sequence modelling. A two-layer graph convolutional network with a learnable adjacency matrix captures inter-channel functional synchrony without montage-specific coordinates, applicable to bipolar (CHB-MIT) and referential (SIENA) montages. A bidirectional Mamba encoder followed by a bidirectional LSTM models long- and short-range temporal dynamics, and a two-layer MLP produces the final seizure probability. This serial hierarchy ensures frequency decomposition precedes spatial mixing, which precedes temporal integration. Under strict leave-one-patient-out cross-validation with five independent random seeds, CG-MambaNet achieves AUC-ROC of 0.8152+/-0.0176 on CHB-MIT (n=22) and 0.7104+/-0.0261 on SIENA (n=6), surpassing all published cross-patient methods without domain adaptation. An event-level evaluation framework merging consecutive alarmed windows via a persistence filter reduces false predictions to 0.32 alarms/hour on CHB-MIT, demonstrating clinically meaningful alarm burden.


[22] 2606.08240

A dual-system approach for epilepsy diagnosis: integrating mamba-Bi-LSTM architecture with SHAP-based verification

This study develops a medical AI-assisted diagnosis system based on deep learning, which provides intelligent diagnostic solutions for epilepsy, a disease that seriously threatens the life and health of patients. Epilepsy has sudden and unpredictable seizures. Traditional diagnostic methods mainly rely on doctors' manual interpretation of EEG, which is time-consuming and dependent by experience. In response to the above challenges, this study designed a dual-system intelligent diagnosis framework, which includes two core components: the main discrimination system and the verification system. The main discrimination system uses a deep learning model that combines the innovative Mamba architecture with the Bi-LSTM structure to integrate and analyze heterogeneous data to achieve extremely high diagnostic accuracy; the verification system provides an explainable diagnostic basis through the SHAP method to enhance the credibility of the results. This system establishes a cross-modal database to realize intelligent analysis of multi-source heterogeneous data-fusion EEG signals and clinical text data for epilepsy. The system outputs results based on diagnostic consistency and confidence levels, and high-confidence predictions can also be used as automatic feedback sources to optimize the model. The experimental results show that the accuracy of the main discriminant model of the intelligent diagnosis system for epilepsy has increased from 92.6% to 98.7% and the F1 score has increased from 0.895 to 0.992, all of which have exceeded the existing optimal methods; the average processing time for verification system feedback integration is only 220 ms, which increases the overall diagnostic accuracy by 5.1%.


[23] 2606.08246

Spatio-Sequential Recurrent Network for 3-D Tunnel Propagation Modeling

Fine-mesh parabolic wave equation (PWE) simulations are high-fidelity but time-consuming, which limits real-time tunnel propagation analysis and motivates coarse-to-fine reconstruction. Existing machine learning (ML)-assisted tunnel models typically provide only one-dimensional (1-D) longitudinal refinement or two-dimensional (2-D) cross-sectional refinement, rather than joint 3-D enhancement. Motivated by this gap, this letter proposes a U-shaped gated spatio-sequential recurrent neural network (UG-SSRNN), a spatio-sequential reconstruction model for tunnel electromagnetic fields. UG-SSRNN jointly super-resolves transverse slices and models longitudinal evolution. It uses sliding-window context encoding and a K-layer convolutional recurrent backbone with a shared propagation-context state and diagonal feedback. A prediction-aware upsampling head leverages the previous prediction to improve slice-to-slice consistency. Experiments on four tunnel cross sections, unseen-material and unseen-frequency tests, and validation in the Massif Central tunnel show close agreement with fine-mesh PWE references. The proposed approach significantly reduces tunnel electromagnetic modeling time.


[24] 2606.08247

AeroSpectra Sentinel: An Auditable LLM Prompt-Chaining Decision-Support Workflow for Acute Asthma Risk Assessment from Respiratory Sounds and Clinical Signals

Acute asthma risk assessment requires rapid interpretation of respiratory sounds, oxygenation, airflow limitation, speech ability, work of breathing, mental status, and response to reliever therapy. Conventional audio-only classifiers can detect wheeze-like patterns but often lack transparent clinical reasoning and safe escalation logic. This paper presents AeroSpectra Sentinel, a client-side research prototype and decision-support workflow that combines short-time Fourier transform (STFT) respiratory sound analysis, lightweight machine-learning screening, clinical feature fusion, and a five-stage large language model (LLM) prompt-chaining process. The workflow separates signal acquisition, preprocessing, acoustic feature extraction, ML screening, clinical guardrails, and FHIR-ready reporting. We evaluated the audio screening component on a public respiratory sound dataset containing 1,211 WAV recordings from five labels. Using a stratified subset of 584 recordings, a random forest achieved 91.10% binary accuracy and 78.69% F1-score for asthma-vs-non-asthma screening, while a feature-based multilayer perceptron achieved 89.73% accuracy and 78.26% F1-score. A compact log-spectrogram CNN achieved 73.29% accuracy and 55.17% F1-score. Multiclass classification achieved 77.40% accuracy and 77.23% macro-F1. To evaluate the LLM workflow, we conducted a scenario-based audit on 40 simulated clinical vignettes comparing one-shot prompting, prompt chaining, prompt chaining with guardrails, and prompt chaining with guardrails plus FHIR schema validation. The guardrail-plus-schema variant achieved the strongest simulated safety and documentation consistency. AeroSpectra Sentinel is intended as a research prototype, not as a diagnostic medical device or clinically validated risk-assessment product.


[25] 2606.08255

Exact Optimization-Free Safety Filters for Control Barrier Functions

For control-affine systems, standard and high-order control barrier function conditions are affine in the control input and are commonly enforced through quadratic-program-based safety filters. Although convex, these optimization problems may be undesirable in embedded, high-rate, or resource-limited implementations. This letter studies when the corresponding Euclidean projection can be computed exactly without solving a quadratic program. Given a nominal control input, we form the set of affine inequalities violated by that input and compute the minimum-norm correction that enforces those inequalities with equality. This correction need not equal the exact Euclidean projection onto the full feasible set. The main result gives structural conditions under which it coincides with the Euclidean projection onto the feasible set. These conditions are interpreted through interactions between affine-inequality normals and are expressed using a Gram matrix. Finally, an online certification procedure is given for determining whether the optimization-free update is exact.


[26] 2606.08313

Feedforward Nonlinear Equalizer for Short- to Medium-Reach Wireline Links

This paper presents a feedforward nonlinear equalizer (FFNE) framework for short- to medium-reach wireline links that removes the feedback-timing bottleneck of decision-feedback equalizers (DFEs) while approaching the noise-margin advantage within a characterized operating region. The proposed FFNE reduces short-window maximum-likelihood sequence estimation to a compact binary decision rule, enabling a low-complexity feedforward realization without transmitter-side encoding. For the single-postcursor NRZ case, the mathematical foundation, hardware implementation, tap adaptation, statistical analysis, and equalization limit relative to an ideal 1-tap DFE are established. A window-length-3 FFNE quantifies the performance-complexity tradeoff of longer sequence windows. The framework is further extended to PAM-4 modulation and simultaneous precursor/postcursor equalization through a pattern-detection-based FFNE (PD-FFNE), which outperforms conventional FFE+DFE baselines under representative channel conditions.


[27] 2606.08315

Benchmarking Sequential Feedback Optimization for Wind Farm Power Maximization

This paper benchmarks sequential feedback optimization (SFO) for wind farm power maximization using a medium-fidelity dynamic flow model. We compare SFO with two well-established approaches, adjoint-based economic model predictive control (AMPC) and extremum seeking control (ESC), under a common nine-turbine layout and identical operating constraints. The comparison focuses on steady-state power production and computational efficiency, both relevant for real-time implementation. The simulation results illustrate that SFO achieves higher steady-state power while preserving real-time feasibility, AMPC provides a better transient performance at a higher online computational cost and without guarantees of convergence to the steady-state optimum, and ESC offers a computationally inexpensive model-free baseline that may converge to locally optimal solutions. These results provide a practical reference for selecting wind farm control strategies and for designing scalable, real-time optimization methods.


[28] 2606.08320

Enhanced Wide-Angle Steering with Multi-Mode Multi-Port Aperture Antenna Arrays

A novel concept for wide-angle scanning is proposed based on multi-mode multi-port antennas. The theory of multi-mode multi-port antennas based on aperture radiators is developed and applied towards the design of an antenna array consisting of multi-mode aperture radiators. An advanced beamforming algorithm is developed and implemented, making use of the higher degrees of freedom available to multi-mode multi-port antennas. The manufactured antenna array is measured and compared to the expected performance. Wide-angle steering up to $\pm77^\circ$ from broadside with respect to a scan loss of $3\,\mathrm{dB}$ is achieved in both the horizontal and vertical plane with no visible grating lobe.


[29] 2606.08370

Programmable Silicon Retina on Pixel Processor Array

Standard dynamic vision sensors approximate retinal processing by detecting temporal contrast changes, offering high speed and high dynamic range. In this work, we explore whether incorporating additional biologically inspired processing stages - specifically spatial filtering and gain control - can offer advantages for certain downstream tasks such as saliency prediction. We present the first implementation of a multi-stage Silicon Retina model on the SCAMP-5 Pixel Processor Array, along with a GPU-based simulation framework. We evaluate the performance of our model on Video Intensity Reconstruction and Video Saliency Prediction. While the bio-inspired model is less effective at reconstructing absolute intensity frames, it achieves a 13\% reduction in saliency prediction loss in comparison to standard DVS event representation, while reducing the event rate by approximately 47\%. These experiments are obtained using a lightweight $\approx 100$k-parameter FireNet-style network, adapted from event-based reconstruction to saliency prediction. These results suggest that the silicon retina's "information distillation" mechanism can achieve a more efficient representation for downstream neural networks, particularly in bandwidth-constrained edge applications.


[30] 2606.08374

Predictive Coding with Bayesian Priors via Proximal Gradients

We recast predictive coding as continuous-time proximal gradient descent applied to a regularized maximum-a-posteriori (MAP) objective. We study first a single-level problem and then a multi-level hierarchy. For the single-level problem, we show that proximal gradient descent is precisely a leaky firing-rate network: the membrane leak, the effective recurrent matrix, the local synaptic drive, and the static nonlinearity all follow from one optimization principle, and the resulting circuit is the one proposed by Rao and Ballard. The prior selects the nonlinearity through its proximal operator, and the likelihood precision sets the gain on the observation. For the hierarchy, we show that a classical variable-splitting relaxation of the deep MAP problem yields hierarchical predictive coding as the interconnection of local and distributed solvers. In probabilistic modeling terms, this relaxation replaces the directed generative chain by an undirected Markov random field whose node potentials are the level-wise priors. Each level then applies its own activation function, namely the proximal operator of its prior.


[31] 2606.08385

A Switching Beamformer for Highly Non-Stationary Environments

Adaptive beamforming is a cornerstone of array signal processing, yet its performance often collapses in the face of complex, rapidly changing interference. When interferers appear or move unpredictably, conventional estimators encounter a fundamental memory trade-off: short windows enable rapid tracking but suffer from high estimation variance, while long windows provide stable rejection but fail to adapt to shifts. This challenge is resolved by introducing the Universal Switching Beamformer (USB), which integrates competitive sequential prediction into the beamforming architecture. By employing a linear transition diagram, the USB implicitly maintains an exponentially large family of candidate covariance histories and dynamically re-weights them based on their cumulative output power. This mechanism allows the beamformer to automatically vary its effective memory length without explicit change detection or heuristic parameter tuning. A theoretical upper bound is proven on the regret relative to an omniscient oracle that selects the best piecewise-stationary covariance model in hindsight. Extensive simulations and experiments on the SwellEx-96 dataset demonstrate that the USB achieves the agility of short-window estimators and the precision of long-term integration, providing a principled solution for tracking highly non-stationary scenes.


[32] 2606.08393

SMC-ITA: Sequential Monte Carlo Inference-Time Alignment for Video-to-Audio Generation

Video-to-audio (V2A) generation must jointly satisfy audiovisual alignment, semantic consistency, temporal synchronization, and perceptual quality. While prior work has mainly focused on model architecture, multimodal conditioning, and training objectives, inference-time alignment for V2A remains underexplored. In this paper, we study inference-time alignment for flow-matching-based V2A generation and formulate it as a search problem. We propose Sequential Monte Carlo Inference-Time Alignment (SMC-ITA), which combines lookahead-based reward estimation and sequential Monte Carlo resampling to reallocate computation adaptively using multi-dimensional cross-modal rewards. SMC-ITA improves over naive single-trajectory sampling, achieving a 55.67% relative reduction in DeSync, a 20.23% improvement in IB-score, and a 15.44% improvement in Audio Quality. Under matched NFE budgets, it also achieves the best overall trade-off among the compared search baselines, outperforming Best-of-N and Beam Search. Ablation studies further show that lookahead improves the reliability of intermediate reward estimates and that systematic resampling is a strong practical default for V2A inference-time alignment.


[33] 2606.08431

Control-Theoretic View of Neural ODEs: Empirical Controllability and Observability

This paper studies neural ordinary differential equations (neural ODEs) from a control-theoretic perspective using controllability and observability concepts. The neural ODE is represented in a control-affine form to facilitate analysis using tools from nonlinear and linear time-varying (LTV) systems. Controllability is examined through trajectory linearization, where the LTV controllability Gramian provides a local, first-order measure of input influence along a nominal trajectory. Observability is analyzed through output linearization, where the LTV observability Gramian characterizes the local ability to reconstruct system states from output measurements. Koopman-based lifting is considered to extend the analysis to a higher-dimensional representation, and its limitations under multiple equilibria and basin-dependent behavior are discussed. The proposed framework is illustrated on a series RLC circuit. The learned neural ODE reproduces system trajectories and generalizes to unseen initial conditions. The computed Gramians are numerically full rank along the tested trajectories, indicating local controllability and observability of the linearized dynamics.


[34] 2606.08434

A Unified Framework for Contraction Stability Analysis of Heterogeneous Grid-Forming Inverters

The shift to renewable-dominated power systems has produced low-inertia grids, undermining system stability. In this context, grid-forming inverters (GFMs) have emerged as a promising solution. However, GFMs challenge conventional analysis techniques, especially those relying on small-signal or root-mean-square (RMS) models. Such models rely on linearization and sinusoidal steady-state assumptions, which fail in large-signal cases. Stability of GFM-based systems therefore becomes operating-point dependent, and a feasible operating point may not even exist. While large-signal analyses are available, decentralized certification of operating-point convergence with explicit transient guarantees, such as rate and overshoot, remains rare. This paper proposes an algebraic, decentralized contraction-based framework. The proposed contraction stability analysis certifies system stability and convergence to desired operating points. The method works in the time domain and captures nonlinear, large-signal behavior of synchronization and power-sharing mechanisms. Moreover, the contraction rate provides an explicit bound on transient time: trajectories converge exponentially to the new operating point at a controlled rate, yielding computable contraction regions that certify stability and large-signal convergence across operating-point changes. These regions directly guide parameter tuning for heterogeneous GFMs.


[35] 2606.08435

Sound Field Interpolation Using Physics-Informed Extreme Learning Machine with Pre-Training

Numerous machine learning-based sound field interpolation methods have been proposed. In particular, physics-informed neural networks (PINNs) can accurately interpolate sound fields from a small number of microphones. However, their high computational cost and long training time pose practical challenges for applications requiring real-time processing or online learning. To address this, we propose a hybrid framework that combines PINN-based pre-training with a physics-informed extreme learning machine (PIELM) tailored for acoustic fields. By replacing iterative PINN fine-tuning for each target sound field with closed-form output-layer adaptation using hidden-layer weights pre-trained by PINN, the proposed method efficiently interpolates unknown sound fields from limited observations. Simulation results under simplified one-dimensional free-field conditions demonstrate that, given a pre-trained model, the proposed method achieves interpolation accuracy comparable to that of PINN-based fine-tuning while reducing the adaptation time by more than three orders of magnitude.


[36] 2606.08437

X-Palm: Paired Multispectral-to-Smartphone Dataset for Cross-Domain Palmprint Authentication

Palmprint modality offers a privacy-preserving biometric solution, yet its deployment is hindered by the domain gap between controlled enrollment and unconstrained authentication. Existing datasets are largely restricted to controlled setups and fail to capture the compound variability of real-world environments. In this paper, we introduce X-Palm, a cross-domain dataset comprising 6,006 palm images from 103 individuals (206 hands). To the best of our knowledge, X-Palm is the first palmprint dataset providing novel paired-identity acquisition specifically designed to bridge the gap between reliably controlled multispectral enrollment and unconstrained mobile authentication while encompassing a broad spectrum of in-the-wild variability. Unlike existing datasets that focus on single to a few variations, X-Palm addresses the massive modality and environmental shifts encountered in practical deployments by capturing paired data for identities across two distinct domains: (1) a controlled Multispectral Palmprint setting using our custom-developed scanner, and (2) an unconstrained smartphone palmprint setting that is participant-driven, incorporating simultaneous variations in hardware, hand pose, illumination, background, camera-to-hand distance, perspective, and palm surface conditions (e.g., moisture and occlusions). Our extensive benchmarks of 12 SOTA models reveal that while existing methods achieve high performance on controlled data, they experience severe performance collapse on X-Palm. Conversely, models trained on X-Palm demonstrate consistent robustness across domains, positioning X-Palm as a valuable resource for training a model towards real-world, cross-domain generalization. Data access instructions and the related benchmarking codes are publicly available at: this https URL


[37] 2606.08439

RadioDiff-Inv2: Differentiable Diffusion Inversion under Location Drift from Sparse Noisy Measurements for Radio Map Estimation

Radio map (RM) estimation is a key enabler for environment-aware optimization in 6G wireless networks. In practice, RM construction increasingly relies on crowdsourced received signal strength (RSS) feedback that is inherently sparse and noisy. A further and often overlooked challenge is location drift, whereby privacy constraints and user mobility cause reported sampling coordinates to deviate from the true measurement locations. Unlike additive measurement noise, location drift perturbs the sensing operator itself, since each RSS sample effectively queries the underlying RM at an incorrect spatial coordinate. This operator uncertainty, compounded with sparse noisy sensing, renders the inverse problem severely ill-posed and limits conventional estimators that rely on analytically specified priors. This paper proposes RadioDiff-Inv2, a differentiable diffusion inversion framework that estimates RMs from sparse noisy measurements under location drift. A Gaussian resampling scheme is introduced to construct a differentiable, drift-aware measurement operator on grid-based maps, and the probability-flow ordinary differential equation (ODE) is exploited to cast the diffusion sampler as a deterministic, differentiable mapping from an initial noise code to the estimated RM. By optimizing the noise code via backpropagation against a drift-marginalized data-fidelity objective, RadioDiff-Inv2 produces reconstructions that are both prior-plausible and measurement-consistent without costly posterior sampling. Extensive experiments show that RadioDiff-Inv2 outperforms the best competing baseline by 4 to 14 dB in PSNR across varying sparsity and drift levels. The advantage is most pronounced in low-SNR regimes, where the learned diffusion prior maintains near-constant reconstruction fidelity while conventional methods degrade severely.


[38] 2606.08505

Fast and Robust On-Device Speaker Diarization: Relative Minimum Cluster Size for Stride-Accelerated Pipelines

Speech applications such as meeting transcription and voice agents would benefit from on-device speaker diarization, but practical adoption is limited by inference cost. We study how far a Pyannote 3.1-based pipeline can be accelerated on consumer hardware (an RTX 5070 Ti GPU and an Apple M4 laptop) while preserving diarization error rate (DER). A simple recipe: coarser segmentation stride and per-chunk embedding, yields multi-fold speedups and is DER-neutral on AMI, but degrades sharply on in-the-wild data: on VoxConverse, DER rises from 0.075 to 0.113. We trace the failure to speaker under-counting in the clustering stage, caused by a fixed minimum cluster size interacting with the reduced number of embeddings per speaker. We propose a relative minimum cluster size, mcs = round(f * n) with f = 0.01, which adapts to the embedding budget per recording. A single value of f recovers VoxConverse DER to 0.079 (about 89% of the lost accuracy) while keeping AMI flat, and the accelerated pipeline reaches up to 12.2x speedup on AMI (MPS) over our CAM++ baseline.


[39] 2606.08580

G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior Matching

Using speaker embeddings as conditioning can strengthen speech enhancement, but most methods either require clean enrollment audio or rely on embeddings extracted from noisy speech, which are fragile under noise and domain shift. We propose G-MaP-SE, a guided enhancement framework that builds a clean-speech embedding prior with a Gaussian Mixture Model (GMM) and refines a noisy conditioning embedding by matching it to this prior. The matched prior embedding is then injected into a time-frequency enhancement backbone via a lightweight gated fusion module. Experiments on VoiceBank+DEMAND and DNS Challenge 2020 datasets show that the proposed prior matching consistently outperforms noisy conditioning and substantially narrows the gap to an oracle clean-conditioning upper bound, while requiring no enrollment audio at inference time. The code, audio samples, and checkpoint are available.


[40] 2606.08611

Bayesian Optimization of a Multi-Product Chemical Reactor Using Composite Models and Partial Physics Knowledge

We study data-driven real-time economic optimization of a multi-product chemical reactor when no reliable first-principles model is available beyond a steady-state energy balance. Instead of learning the economic objective directly as a black-box function, we use a composite formulation in which Gaussian process (GP) models predict physically meaningful outputs, including product concentrations and reactor temperature, while profit is computed analytically from these predictions together with raw-material, product, and utility prices. This preserves the structure of the economic objective, makes it parametric in changing prices without needing retraining, and allows candidate operating points to be checked against the available energy balance through a physics residual. The GPs also provide predictive uncertainty, which is exploited in a Bayesian optimization (BO) framework both for data-efficient exploration and for conservative enforcement of the reactor temperature constraint through an upper confidence bound. The acquisition function additionally penalizes large energy-balance mismatch obtained by substituting the GP-predicted outputs and candidate inputs into the available steady-state energy balance. The approach is demonstrated on a benchmark simulation of a non-isothermal multi-product reactor. Relative to a trust-region safe BO implementation, the proposed method achieves better simulated economic performance within the available iteration budget. Relative to a purely data-driven BO approach that does not use the available physics information, it avoids reactor temperature constraint violations.


[41] 2606.08636

Cooperative Guidance and Control for Active Asset Protection with Time-Varying Agent Speeds

Protecting an asset against threats is a challenging problem in an era of continuously evolving intelligent attacks. This requires cooperation between the asset and the defender to share information and jointly maneuver. To address this problem, this work proposes a cooperative guidance and control strategy for active asset protection against a maneuvering threat. This work develops a joint maneuver strategy where both the defender and the asset coordinate their time-varying speeds and courses to neutralize/capture the attacker. The control strategy is formulated around three coupled geometric and temporal objectives. The first objective is to set the line-of-sight rate between the asset and the attacker to zero, putting the attacker on a collision course and reducing their maneuvering. The second objective is to maintain the defender on the line-of-sight between the asset and the attacker. This ensures that the attacker faces the defender first before reaching the vicinity of the asset. Lastly, the defender is also guided to pursue the attacker based on the time-to-go estimates between the defender and the attacker. While keeping these objectives in mind, the control actions for the asset and the defender are jointly designed, fostering cooperation between the two. The stability of the proposed strategy is established using a Lyapunov-based approach. Numerical simulations performed show the effectiveness of the proposed cooperative strategy in ensuring the successful capture of a maneuvering threat.


[42] 2606.08714

Hybrid Neural Network and Conventional Controller Approach for Robust Control of Highly Unstable Systems: Application to Tilt-Rotor Control

Multirotors are widely used in applications ranging from surveillance to precision agriculture, yet conventional designs remain limited by their under-actuation. Tilt-rotor configurations overcome this limitation by enabling full actuation. This paper investigates neural-network-based control strategies for a fully actuated tilt-rotor system with four thrust-vectoring inputs. Our work is structured in two parts. First, we deliberately present a negative result by evaluating a direct input-output control approach. In this method, multilayer perceptrons (MLPs), long short-term memory (LSTM) networks, and transformer models are trained to map system states and their desired values directly to control signals. We show that this strategy fails to stabilize the system, highlighting the inherent difficulty of applying direct input-output learning to highly unstable plants. Second, as the main contribution, we propose a neural-network-enhanced sliding mode controller (SMC). The method decomposes the system dynamics into input-independent and input-dependent components, with the former learned from a small dataset using lightweight networks, thereby reducing real-time computational demands. Moreover, the proposed method can be trained using flight logs collected from low-performance controllers, and the resulting dynamic model learned from real-world data can be used in simulation. We further compare MLP- and LSTM-based implementations under model uncertainties and external disturbances, demonstrating the robustness and effectiveness of the proposed approach; in particular, the controller with the LSTM plant dynamics predictor achieves superior performance to its MLP-based counterpart while also exhibiting lower runtime.


[43] 2606.08749

Active Source-free Domain Adaptation in Open-set Medical Image Segmentation via Decomposed Uncertainty and Prototype Discrepancy

Deep learning (DL) methods are challenged to demonstrate robust performance across different segmentation datasets due to domain shifts, but active domain adaptation techniques enhance their generalization performance by querying a few samples from target domains for adaptation training. However in clinical practice, target domains often include private classes of new anatomical structures or pathologies that are not presented in the source data, and existing methods implement closed-set segmentation where source and target domains have the same segmentation classes. Additionally, source data are often inaccessible during adaptation due to strict data privacy regulations. To address these limitations, we propose an Active Source-free Open-set Domain Adaptation (ASFOSDA) method which is the first work to implement active learning for adapting DL models in open-set medical image segmentation without the access to source data. This method employs an active open-set query strategy to select the most informative target samples for training models based on Class-aware Decomposed Uncertainty (CDU) and Class-agnostic Prototype Discrepancy (CPD). CDU measures sample aleatoric uncertainty and model epistemic uncertainty by employing test time augmentation in stochastic processes. CPD measures cross-domain and self-domain discrepancy for selecting diverse samples. Subsequently, to boost the adaptation performance by enhancing training samples, a Target-refined Self-training strategy is proposed to generate high-quality pseudo labels for unselected samples, thus combining them with labeled samples for a semi-supervised training. We evaluated our method on cross-domain open-set volumetric medical image segmentation tasks, and it outperformed state-of-the-art adaptation methods.


[44] 2606.08803

Some Essential Constructive Foundations for Systems and Control

This work develops several constructive foundations for systems and control within Bishop-style constructive mathematics. For an engineer, the guiding principle is that an object claimed to exist, such as a trajectory, an optimal control law, a selector, or a viable solution, should come with finite data and an operation computing approximations to any prescribed precision. The style remains close to classical analysis, but existential statements are organized so that their computational content is visible. The paper begins with elementary geometric data in finite-dimensional Euclidean spaces: blocks, multiblocks, representable sets, regular functions, and certified integrals. This set-first integration route is meant to complement, rather than replace, abstract constructive integration theories such as Daniell-type or integration-space approaches. The developed apparatus is then applied to a constructive functional extremum-value theorem, selector extraction for multifunctions, Filippov-type and viable solutions of differential inclusions, regular probability densities, controlled Markov chains, and empirical density certificates. A short account of resolvent projectors and linear stability is included for completeness.


[45] 2606.08808

Energy Storage as a Multi-Use Asset: Applications Across the Power System

The energy transition in power systems requires flexible assets to offset renewable generation variability across multiple time scales, while supporting the integration of renewables and the electrification of demand without requiring costly grid reinforcement. Energy storage occupies a unique position among these assets: depending on the technology, it can provide short-duration grid services at high ramping rates, such as frequency regulation and voltage support, longer-duration functions such as intra-day peak shaving, or inter-seasonal energy buffering. This multi-service character, combined with the declining costs of energy storage technologies (most notably that of battery energy storage systems), is central to the economic viability of storage investments. The value of a given installation depends strongly on its grid connection point and intended use case: an asset-coupled battery serving a consumer or generation plant faces a different service landscape, and therefore a different business case, than a network-coupled system operating as an independent grid resource. This paper presents a structured taxonomy of grid-connected energy storage applications, discusses the principal application domains, and describes the key challenges that must be addressed to integrate storage effectively into power systems. Services are discussed with special emphasis on the Swiss regulatory context. Finally, the STORE flagship project supported by the Swiss Innovation Agency (Innosuisse), where some of the critical challenges of energy storage integration in power grids are addressed, is introduced.


[46] 2606.08836

Adaptive Model Predictive Control of Nonlinear Generic Urban Air Mobility Using Linear Parameter-Varying Systems

This paper presents an adaptive model predictive control (MPC) framework for nonlinear urban air mobility (UAM) vehicles operating across the full flight envelope. The proposed approach leverages a linear parameter-varying (LPV) representation to update the predictive model online, enabling accurate capture of strongly nonlinear and time-varying dynamics associated with distributed electric propulsion (DEP) eVTOL aircraft. To systematically address the high-dimensional and coupled nature of MPC tuning, a multi-objective evolutionary optimization strategy based on NSGA-II is employed, incorporating proper normalization of states and control inputs to ensure balanced weighting and meaningful exploration of the design space. The resulting controller explicitly accounts for actuator constraints and enables reconfigurable control allocation for fault-tolerant operation. The framework is evaluated in nonlinear simulations using NASA's Generic Urban Air Mobility (GUAM) model and benchmarked against a robust servomechanism linear quadratic regulator (RSLQR). Results demonstrate that the proposed adaptive MPC achieves improved trajectory tracking and enhanced robustness under both nominal conditions and actuator degradation scenarios, including partial motor failure, while maintaining constraint satisfaction throughout all flight regimes.


[47] 2606.08880

Direct Data-driven Predictive Control: A Computationally Efficient Alternative to DeePC for Eco-driving in Mixed Traffic Flows

Improving energy efficiency in the transportation sector is critical for achieving sustainable mobility, with eco-driving emerging as a key strategy. However, implementing effective eco-driving for connected and automated vehicles (CAVs) in mixed traffic presents a significant control challenge due to the heterogeneous, uncertain behavior of human-driven vehicles (HDVs). Data-enabled Predictive Control (DeePC) offers a promising model-free approach but is often hindered by a high computational burden, limiting its real-time feasibility. This paper introduces a novel Direct Data-driven Predictive Control (D3PC) framework to address this limitation. By reformulating the data-driven prediction mechanism, the D3PC significantly reduces computational complexity, making its computation time nearly invariant to historical data size. This computational efficiency directly enables the formulation of a sophisticated eco-driving controller that can solve the complex energy optimization problem in real time, even within diverse and stochastic mixed-traffic environments. Comprehensive simulations demonstrate that the D3PC is orders of magnitude faster than existing DeePC-based methods while achieving superior energy efficiency. Specifically, it reduces total platoon energy consumption by up to 10.71% compared to rule-based cruise control baselines and 3.80% compared to the original DeePC, confirming its effectiveness for real-time, energy-efficient control.


[48] 2606.08898

Few-shot Class-variable Incremental Audio Classification via Prototype Adaptation and Pseudo Class-variable Training

In the task of few-shot class-incremental audio classification, the number of classes is assumed to always increase without considering the possibility of decrease. However, the number of classes generally increases or decreases in practice. In this paper, we investigate a problem of Few-shot Class-variable Incremental Audio Classification (FCIAC), in which the number of classes increases or decreases. We propose a FCIAC method using prototype adaptation and pseudo class-variable training. The model in our method consists of an encoder and a classifier. The classifier is initialized by a class-variable prototype adaptation network, whose structure dynamically changes with the change of classes. In addition, we design a pseudo class-variable training strategy to enhance the model's adaptability to changing classes. Experiments on three public datasets show that our method exceeds previous methods in average accuracy. The code is at: this https URL.


[49] 2606.09042

Seamless Contraction-Control Framework for Unplanned Grid-Connected/Stand-Alone Transitions of Grid-Forming Inverters

Unplanned grid-connected (GC)/stand-alone (SA) transitions commonly occur in AC microgrids during protection trips, manual breaker operation, or low-bandwidth supervisory communication. Under such unplanned transitions, a grid-forming inverter must support the local-load voltage in stand-alone operation and regulate the desired power/current injection in grid-connected operation. Existing P--Q droop-based seamless-transfer methods often rely on planned transition commands, supervisory islanding detection, or pre-synchronization interval, which may prevent timely voltage/current support during unplanned bidirectional transitions. To address this problem, this paper proposes a seamless contraction-control (SCC) framework for target dynamics. Using the SCC, contraction-based grid-connected current-control and stand-alone voltage-control laws are proposed. With the new control laws, the inverter achieves transient stability and converges to the target trajectory with a prescribed convergence rate. Furthermore, a breaker-status observer is proposed to infer the grid-connected/stand-alone mode from voltage measurements on both sides of the breaker, eliminating the need for a dedicated pre-synchronization interval or supervisory islanding detection process and enabling timely voltage/current support during unplanned transitions. Experimental results validate that the proposed method achieves stand-alone voltage support, stable grid-connected current injection under symmetrical/unsymmetrical grid-voltage sag and phase-jump disturbances, and unplanned bidirectional transitions.


[50] 2606.09047

Families of Control-Cost-Parametrized Inverse-Optimal Universal Stabilizers

A classical universal stabilization formula offers the practitioner no design freedom: it is a single, parameter-free object. We introduce a cost-parametrized family of stabilizing feedback laws, where (1) the user chooses a function that serves as the running cost on control in an inverse-optimal cost functional, and (2) obtains, through a formula, a nonlinear "expander" of a pre-existing universal controller, which solves an infinite-horizon optimal control problem with a meaningful cost on the state. The cost-to-expander formula is a three-step construction, involving, inter alia, cost differentiation and function inversion-overall, a nonlinear infinite-dimensional operator. The cost-to-expander operator is proven Lipschitz, which enables uniform neural operator approximation of the entire family and supports both offline performance exploration and online adaptation. Semiglobal practical asymptotic stability and second-order suboptimality bounds are established under the approximation. The operator learning and its use in semiglobal stabilization are illustrated numerically. We call the result 'half-direct-optimal' because the paper's design is less than a general 'direct optimal' (HJB-inducing) control, but more than the fully inverse optimal, since the user performs minimization for an arbitrary given cost on control. The dual to the half-direct problem we solve is the problem in which the cost on the state is arbitrary and given. This dual problem is easier and outside of the scope of the paper.


[51] 2606.09048

BareWave: Waveform-Native Flow-Matching Text-to-Speech

Removing intermediate representations and separately trained decoding stages has become an important direction in generative modeling. In text-to-speech, however, high-quality systems are still commonly built through an intermediate acoustic representation before waveform synthesis. In this work, we present BareWave, a fully waveform-native framework for direct text-to-wave generation in flow-matching TTS. We consider this setting to raise three training challenges: raw-waveform modeling lacks a strong pretrained representational scaffold, different stages of training benefit from different noise schedules, and data-space perceptual objectives do not automatically share the temporal structure of the velocity-space flow objective. As a result, direct waveform training is hard to optimize efficiently, hard to push toward a strong final operating point with a fixed recipe, and hard to integrate effective perceptual refinement. Guided by this view, we develop a direct text-to-wave training framework that combines training-time representation alignment, staged noise scheduling, and velocity-aware perceptual alignment (VAPA), while preserving a single waveform-native inference path without pretrained components at test time. Experiments on zero-shot voice cloning show that strong intelligibility, speaker similarity, and naturalness can be achieved under a fully waveform-native inference path, supporting waveform-native flow-matching TTS as a practical direction. Project page with audio demos is available at this https URL.


[52] 2606.09050

MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion

Streaming zero-shot voice conversion (VC) has become increasingly popular due to its potential for real-time applications. The recently proposed MeanVC achieves lightweight streaming zero-shot VC, but it has several limitations: its chunk-wise autoregressive denoising doubles the effective training sequence length, conversion quality degrades under small-chunk settings, and its timbre encoder directly relies on reference mel-spectrograms, making it sensitive to reference audio quality. To address these limitations we propose MeanVC 2. We introduce future-receptive chunking (FRC), which explicitly schedules past and future receptive fields across diffusion transformer decoder layers and removes clean-chunk teacher forcing. By incorporating bounded future context, FRC enables stable conversion with a 40 ms chunk size. We further introduce a universal timbre token encoder, which constructs a timbre representation from a global speaker embedding and retrieves fine-grained timbre cues via cross-attention, improving robustness to low-quality references and enhancing zero-shot speaker similarity. Experimental results show that MeanVC 2 significantly outperforms MeanVC, while reducing latency from 211 ms to 110 ms. Audio samples are publicly available. The source code will be publicly released.


[53] 2606.09085

Mixture-of-Experts Transformer for Automatic Modulation Recognition

Automatic Modulation Recognition (AMR) is a key enabling technology for cognitive radio and intelligent spectrum management in next-generation wireless systems. However, current deep learning-based AMR methods predominantly rely on static multi-scale fusion strategies, which lack the flexibility to adapt to the highly dynamic temporal variations of modulation signals. To address this limitation, we propose MoEformer, an adaptive Multi-Scale Mixture-of-Experts Transformer network that directly processes I/Q signals to preserve their temporal and phase structures. Specifically, MoEformer constructs multi scale expert views through temporal resampling, employs an input-dependent gating mechanism for dynamic expert fusion, and integrates Rotary Position Embeddings (RoPE) within Transformer encoders to capture both local and global tem poral dependencies. Comprehensive evaluations on three widely adopted benchmarks (RadioML2016.10a, RadioML2016.10b, and RadioML2018.01A) demonstrate that MoEformer outperforms the competitive baselines, achieving superior average recognition accuracies of 63.74%, 66.24%, and 64.22%, respectively. In addition, the proposed method strikes an optimal trade-off between recognition performance and model complexity.


[54] 2606.09095

RSMA Technique for Multi-User Downlink Single-Waveguide Multi-Pinching Antenna Systems

Pinching antennas have recently emerged as a promising technology for reconfigurable wireless systems due to their ability to dynamically radiate signals from flexible positions along a waveguide. This letter investigates a multi-user communication framework by integrating rate-splitting multiple access (RSMA) into a single-input single-output (SISO) single-waveguide architecture equipped with multiple pinching antennas. Multiple antennas are activated along a shared waveguide to radiate a common guided signal toward distributed users, enabling strong near-field line-of-sight (LoS) links with low hardware complexity and a single radiofrequency (RF) chain. To manage multi-user interference, RSMA is employed within the proposed architecture. Simulation results show that the proposed framework improves system sum-rate, enhances user rate fairness, and achieves lower bit error rate (BER) while preserving the low-cost and scalable characteristics of pinching antenna systems (PASS).


[55] 2606.09098

HoliDubber: Holistic Video Dubbing for Complex Acoustic Scenes via Text-Guided Audio Synthesis

Video dubbing is a cornerstone of multimedia content creation, aiming to synthesize synchronized acoustic sequences for visual streams. While Text-to-Speech (TTS) and Text-to-Audio (TTA) generation have each achieved remarkable progress, existing dubbing systems remain confined to isolated speech synthesis without incorporating sound effects and ambient audio, forcing practitioners to rely on fragmented workflows and laborious manual post-mixing. To address this limitation, we present HoliDubber, a holistic video dubbing framework that moves beyond speech-only generation by enabling the joint synthesis of speech and sound effects from a single text prompt. Specifically, HoliDubber adopts a patch-based autoregressive diffusion transformer architecture, where a causal language model autoregressively models aggregated patch embeddings to capture global temporal structure, and a Diffusion Transformer decoder generates high-fidelity continuous tokens within each patch, following a divide-and-conquer strategy. To achieve cross-modal alignment, visual features are encoded into patch-level representations and fused with audio patches via cross-attention, enabling the model to ground speech generation in the speaker's visual articulation dynamics. In addition, we introduce HoliDub-Bench, a benchmark curated from established datasets with synchronized video-text-audio triplets designed for holistic dubbing evaluation. Extensive experiments demonstrate that HoliDubber significantly outperforms existing methods across multiple benchmarks in speech quality, synchronization, and speaker similarity. Furthermore, results on HoliDub-Bench validate the effectiveness of joint speech-and-sound generation, establishing a new paradigm for holistic video dubbing in complex acoustic scenes. \footnote{The demo page of the project is this https URL}


[56] 2606.09113

Towards Intelligent Wireless Networks: The Synergy of Generative AI and Digital Twins

This paper proposes a generative AI (GenAI)-enabled digital twin (DT) framework for proactive and energy-aware wireless optimization in future 6G ecosystems. Most existing AI-assisted DT approaches remain fundamentally reactive, adjusting network parameters only after performance degradation occurs or restricting GenAI to isolated signal-level tasks such as channel estimation. This work adopts a proactive approach. Instead of responding to problems after they appear, the proposed framework continuously synchronizes channel states, mobility dynamics, traffic conditions, and energy information within a real-time DT environment, enabling the system to anticipate congestion, interference, and energy demand before they materialize. The result is a closed-loop proactive architecture that operates at the system level, jointly managing communication, mobility, and resource dynamics for autonomous wireless control. Evaluations on a UAV-assisted non-terrestrial network (NTN) scenario show approximately 69.2\% energy savings over reactive baselines while maintaining reliable quality-of-service (QoS) under dense and mobility-intensive conditions. Beyond this specific scenario, the framework offers a scalable foundation for broader AI-native 6G applications, including aerial platforms, autonomous systems, extended reality (XR), industrial automation, and space-air-ground-sea (SAGS) integrated infrastructures.


[57] 2606.09137

Modeling of Spinning Plates: Geometric Stiffening and Modal Approximation for GNC Applications

This work presents a modal formulation for flexible rectangular plates, accounting for nonlinear geometric effects arising from in-plane foreshortening and centrifugal stiffening. The model is linearized with respect to elastic deformations while retaining the full dependence on spacecraft angular velocities and accelerations. System matrices depend nonlinearly on spacecraft states through squared and cross-product terms, capturing gyroscopic coupling and dynamic stiffening phenomena for arbitrary rotational maneuvers. Polynomial approximation of mode shapes enables efficient computation while preserving accuracy. Model predictions are validated against finite element simulations and literature data for transient response under prescribed hub motion.


[58] 2606.09141

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Recent progress in speech dialogue systems requires Text-to-Speech (TTS) models to be faster and more responsive. Modern speech dialogue systems impose two primary requirements on TTS models: low latency and support for streaming inputs and outputs. However, most existing single-codebook LLM-based TTS methods rely on multi-stage pipelines that lack native streaming capabilities. These systems typically suffer from high end-to-end latency due to slow autoregressive prediction and multi-step flow matching. To address these limitations, we propose FlashTTS, an open-source and low-latency streaming TTS framework. FlashTTS introduces a lagged multi-track architecture that natively processes streaming text and speech inputs, thereby eliminating the need for sentence-level buffering. To accelerate acoustic generation, we integrate parallel Multi-Token Prediction (MTP) with an X-pred mean flow matching decoder. This configuration achieves high-fidelity token-to-mel generation in exactly two function evaluations (2-NFE). By jointly optimizing input processing and decoding efficiency, FlashTTS offers a practical foundation for real-time speech dialogue systems. Experiments show that FlashTTS substantially reduces First-Packet Latency to 325ms compared to robust streaming baselines, all while preserving strong zero-shot voice cloning and cross-lingual intelligibility. Speech samples are available. The model code and checkpoints will be released as open source.


[59] 2606.09264

Block-Term Decomposition Approach to Blind Multi-trial Functional Ultrasound Unmixing

Functional ultrasound (fUS) has emerged as a powerful neuroimaging modality due to its high resolution in both space and time, low cost and potential portability. Nevertheless, fUS signals provide only indirect observations of neuronal activity through the neurovascular coupling, and hence require the blind separation of latent neuronal sources while also deconvolving their hemodynamic responses. In this work, we propose a data-driven convolutive block-term tensor decomposition-based model for multi-trial fUS measurements, where each source has a spatiotemporal representation comprising a low-rank spatial map and a piecewise-constant neuronal activation signal convolved with a trial- and source-dependent hemodynamic response function (HRF) with a physiologically plausible shape. We propose a constrained optimization framework for the model computation, which consists of alternating projected gradient descent iterations. Simulation results are reported that demonstrate accurate recovery of spatial maps and reliable estimation of activation temporal profiles across various noise levels, while confirming that HRF estimation remains the most challenging part of the problem.


[60] 2606.09282

Revisiting mesoscopic traffic flow simulation in SUMO: Limitations, analysis, and an alternative

Mesoscopic traffic flow models combines the merits of both macroscopic and microscopic models by capturing individual vehicle behavior in great detail and remaining the computational efficiency. At the time of this study, the mesoscopic model proposed by Eissfeldt (2004) is used in Simulation of Urban MObility (SUMO). The movement of vehicles is governed by dynamic headways between edges. However, the model does not fully comply with the principle of the Lighthill-Whitham-Richards (LWR) model. Several problems are identified, including the incomplete consideration of queue dynamics and the limited implementation of backward traveling spaces. Two case study scenarios demonstrate that the problems lead to unrealistic onset and recovery pattern of congestion. The magnitude of congestion is generally underestimated with this model. To address these drawbacks, a proper mesoscopic discrete-time implementation of link transmission model, which follows the LWR principle, is proposed. By explicitly incorporating backward traveling spaces to capture queue spillback phenomena, the proposed model provides a more precise representation of congestion dynamics. The link density outputs are consistent with the kinematic wave theory and the microscopic traffic simulation in SUMO, thus verifying its theoretical accuracy.


[61] 2606.09317

A Comparative Study of Pre-trained Speech Encoders and Training Objectives for Large-Scale Indic Spoken Language Identification

Spoken language identification (LID) for Indian languages is a challenging problem due to the large number of languages, significant phonetic overlap among related varieties, and the scarcity of labeled data for many low-resource languages. In this work, we present a systematic comparative study of two pre-trained speech encoders -- Whisper and FastConformer -- combined with a linear classifier for large-scale Indic LID spanning 42 languages across four linguistic families. We evaluate both encoders in frozen (linear probing) and fine-tuned settings, and compare three training objectives: cross-entropy (CE), supervised contrastive loss with cross entropy (CE + supCon), and hierarchical softmax (HSM). Models are trained on the Vaani dataset and evaluated in a cross-corpus setting on Vaani-Test (held-out), FLEURS, and Kathbath, providing insights into domain generalization. The frozen FastConformer encoder achieves over 90\% macro accuracy on FLEURS and Kathbath without any task-specific adaptation, substantially outperforming Whisper on out-of-domain benchmarks, while fine-tuned Whisper yields stronger in-domain performance. HSM consistently outperforms CE and CE+SupCon for both encoders across all benchmarks, with the largest gains on out-of-domain test sets. CE+SupCon degrades FastConformer's cross-corpus generalization, suggesting that the contrastive objective over-specializes representations to in-domain conditions. Per-family analysis shows that Central Indo-Aryan varieties are the hardest to discriminate, with Hindi--Urdu and the Sadri--Chhattisgarhi--Surgujia cluster being the dominant confusion pairs.


[62] 2606.09330

Dynamic XR Rendering Offloading Based on Feature-Based Quality Assessment

Extended Reality (XR) applications demand intensive computation and low latency, especially for real-time rendering tasks. In this letter, we present an edge-aided XR rendering testbed that dynamically offloads rendering workloads between the XR client and the edge server built upon network conditions and latency constraints. The testbed integrates a Microsoft HoloLens 2 headset, a GPU-enabled edge server, and a customized remote rendering toolkit based on the HOLO Stream SDK, enabling seamless switching between local and edge rendering modes in real time. To overcome the limitations of pixel-level quality metrics under head movements and asynchronous frame arrivals, we propose a perceptual evaluation metric based on deep feature embeddings and cosine similarity, which remains robust to spatial and temporal misalignments. Furthermore, we design a contextual bandit learning controller to adapt rendering placement decisions in real time by jointly optimizing perceptual quality and latency. Experimental results demonstrate the feasibility and performance of our testbed, validating its effectiveness in delivering high-quality and interactive XR experiences.


[63] 2606.09332

Wearable Single-Lead ECG Detects Fine-Grained Structural Heart Disease Through Echo-Report Supervision

Structural heart disease (SHD) is a primary driver of heart failure and cardiovascular mortality, yet early detection remains constrained by the limited accessibility of echocardiography. While single-lead electrocardiogram (ECG) is ubiquitous through wearables, existing AI screening models often depend on 12-lead inputs, generalize poorly across institutions, or require massive, condition-specific labeled datasets. Recent work has demonstrated the feasibility of contrastive pre-training between single-lead ECGs and echocardiography reports within a single health system. Here, we present AnyECG-Echo, a framework that advance this paradigm toward clinical translation through three key developments: (1) evaluation in a geographically independent external cohort (n = 16,621); (2) diagnostic coverage of 13 fine-grained SHD subtypes spanning myocardial, chamber, valvular, and great-vessel pathologies; and (3) dual-axis mechanistic interpretability combining electrophysiology-grounded Shapley attribution with emergent correlations to quantitative measurements. Across validation cohorts totaling n = 25,222, the model demonstrated high AUROC for high-impact subtypes, including reduced left ventricular systolic function (AUROC 0.866-0.924), global heart enlargement (0.877-0.931), and mitral stenosis (0.836-0.906). Furthermore, we successfully validated the alignment of model outputs with established medical physiological traits, thereby enhancing interpretability. Notably, we discovered that AnyECG-Echo's outputs function as physiologically grounded digital biomarkers that accurately track objective metrics such as LVEF and myocardial wall thickness. These findings prove that wearable single-lead ECGs can effectively detect fine-grained structural heart disease, offering a practical solution for population-scale screening.


[64] 2606.09335

Factors affecting ASR performance: A study using state of the art ASR models in Indic Languages

ASR performance varies across languages, speakers, and recording conditions, yet systematic analysis for Indic languages remain limited. We present a large-scale study of decoded outputs from multiple open-source ASR models evaluated on diverse Indian speech datasets in zero-shot settings. We analyze linguistic, speaker-level, and acoustic factors across Hindi, Bengali, Kannada, Telugu, and Marathi. We examine correlations between WER and speaker traits such as average word length, speaking rate, and utterance duration across multiple model dataset pairs. For Hindi, we further analyze audio factors including telephone codecs, bit depth, resampling, and background noise. Results reveal both cross lingual patterns and language-specific sensitivities, showing how speaker behavior and signal processing choices affect ASR robustness in real world Indic scenarios.


[65] 2606.09342

Parameter-Efficient Continual Learning for Automatic Speech Recognition

Speech foundation models enable strong general-purpose ASR and are attractive for downstream adaptation. However, their size and the catastrophic forgetting induced by sequential fine-tuning demand parameter-efficient and regularized training methods, motivating parameter-efficient continual learning (PECL). While PECL has been widely studied in NLP and vision, it has received less attention in ASR. In this paper, we propose a simple yet effective PECL method based on recent advances in parameter-efficient fine-tuning for ASR. We partition pretrained weight matrices into head and tail subspaces according to singular values and restrict adaptation to approximate rotations within the low-energy tail subspace, preserving dominant components and reducing forgetting. For subsequent tasks, rotations are combined via weight averaging to further improve retention. Experiments on two benchmarks demonstrate reduced forgetting and superior overall performance compared to recent PECL baselines.


[66] 2606.09345

A study on the impact of region specific data on the performance of Indic ASR

Automatic Speech Recognition (ASR) systems are widely deployed across linguistically diverse regions, yet their ability to generalize across fine-grained geographic variation remains underexplored. We present a systematic study of cross-district ASR generalization for Indian languages, analyzing the impact of regional variation on performance. Using finetuning as a controlled probe, we train models on speech from a single district and evaluate them on other districts within the same language. We examine trends across multiple train test district pairs and quantify performance differences. To assess geographic effects, we analyze the correlation between WER and inter district distance using two distance measures. Our results show consistent correlations between geographic distance and WER, highlighting the challenges of regional generalization and the need for geographically diverse speech data in ASR development and evaluation in India.


[67] 2606.09357

Rethinking Depth: A study of the Recursive-Transformer for Speech Recognition

Transformer-based architectures have led to significant improvements in Automatic Speech Recognition (ASR), often at the cost of substantially increased model sizes. A promising approach to address this issue is layer sharing through depth recursion, commonly referred to as the Recursive-Transformer, which involves repeatedly applying the same layers within the model. Despite its potential shown in other fields, this technique remains relatively unexplored in ASR. In this paper, we present an experimental study of the Recursive-Transformer applied to ASR encoder architectures. We systematically investigate the impact of recursion depth and layer allocation within the Recursive-based Transformer. Our results demonstrate that the Recursive-Transformer is a viable alternative, especially when recurrence is applied in the latent space with a restricted number of loops, obtaining comparable performance while reducing the parameter count by 66%.


[68] 2606.09406

Advanced simulation framework for AC/MTDC power systems

Alternating current (AC)/multi-terminal direct current (MTDC) hybrid power systems (HPSs) play a crucial role in enabling long-distance power transmission and flexible interconnections between AC grids. However, the challenges that HPSs encountered are numerous, with stability and harmonic issues being particularly prominent. Traditional electromagnetic transient (EMT) tools have struggled to accommodate small-signal stability problems and the potential issues of the optimal interactions among converters. To address this gap, HARMONY ("HARMONic stabilitY assessment of PE-penetrated power systems") has been developed for the advanced simulation and analysis of interconnected AC/MTDC HPSs as a comprehensive mathematical framework based on C++ programming language. The primary goals of Harmony are to provide faster and trusted stability analyses, and address the analytical difficulties associated with converter control dynamics, converter-driven stability, and interoperability in HPSs. This framework is intended to be open source, therefore broadening collaboration for researchers, and to contribute to the community of power systems engineers. In this paper, we demonstrate two core functionalities featured in HARMONY, that are optimal power flow (OPF) and harmonic stability analyses (HAS). The underlying analysis models and computational methodologies for both functionalities are presented in detail to help future readers and users gain a clear understanding of mathematical fundamentals of HARMONY. Furthermore, we introduce the integrated framework of OPF and HAS designed in HARMONY, along with representative printed analysis results, to demonstrate the appealing capabilities of HARMONY.


[69] 2606.09407

Delayed Functional Observers for Output-Delayed Linear Systems

This paper introduces a novel class of delayed functional observers specifically designed to reconstruct delayed control laws under severe output measurement lags, directly complementing recent literature \cite{trinhnn26, trinhnam26}. By systematically mitigating simultaneous, unequal delays across both the actuator and sensor channels, the proposed architecture resolves dual-channel latency without requiring full-state estimation or computationally intensive real-time distributed integration. Ultimately, this work provides a powerful, low-order framework that bridges the gap between idealized control theory and the practical constraints of modern networked engineering systems.


[70] 2606.09436

Leveraging Optimal Information-Power Flow for Transmission Switching in AC/MTDC Grids

The emerging AC/multi-terminal DC grids are regarded as a promising solution for accommodating the increasing integration of renewable energy sources. This work proposes an optimization framework to address transmission switching (TS) problems arising in practical operational scenarios, such as maintenance scheduling, contingency management, and fault restoration. Unlike most existing studies, the proposed framework considers the role of communication networks in TS operations and develops an optimal information-power flow (OIPF) model. The OIPF model captures the impact of information flows on circuit breaker actions while incorporating communication-related costs, thereby better reflecting practical operational decision-making processes. To ensure computational tractability, the resulting optimization problem is formulated as a mixed-integer second-order cone programming (MISOCP) model through convex relaxations, polygonal approximations, and Big-M reformulations. Numerical case studies illustrate the applicability of the proposed OIPF model and indicate its potential in supporting transmission switching decisions.


[71] 2606.09439

Tracking the Effective Surface Area of Non-Convex Satellites

This paper presents a novel framework to track the effective surface area of non-convex satellites, enabling the use of aerodynamic drag in low Earth orbit for orbital control. The proposed framework enables the satellite to track the effective surface area while simultaneously performing other maneuvers. We introduce this framework through a backstepping control algorithm, and exemplify its advantages with an extension, to simultaneously maximize solar panel exposure. The equilibria of the closed-loop systems are shown to be asymptotically stable, and simulation results confirm the effectiveness of the proposed framework.


[72] 2606.09444

Vendor-agnostic 4D Phase Contrast MRI: a complete open-source pipeline for velocities, displacement, and strain analysis

Phase contrast MRI (PC MRI) enables quantitative assessment of tissue motion and strain. Although it is increasingly used, standardized, vendor-agnostic pipelines for accelerated acquisitions remain scarce. We present a fully open-source 4D flow PC-MRI pipeline integrating a compressed sensing-accelerated sequence implemented in PyPulseq, BART-based reconstruction, and strain analysis. Additionally, a gradient probing sequence was developed to ensure correct velocity sign assignment across scanner orientations and vendors. The pipeline was validated across two Siemens MRI systems (3T MAGNETOM Prisma and 3T Vida Fit) in two anatomical applications: forearm (Flexor Digitorum Superficialis, n=9) and thigh (Vastus Lateralis, n=10) during Neuromuscular Electrical Stimulation (NMES)-induced contractions. Compressed sensing reduced acquisition times from 35 and 80 minutes to 5 and 11 minutes for the arm and leg acquisitions, respectively. Muscle strain maps and sigmoid-fitted strain curves enabled extraction of peak strain, mean strain, and buildup rate. Strains in the Vastus Lateralis were approximately one order of magnitude higher than in the Flexor Digitorum Superficialis (median peak strain 0.49 vs. 0.063, mean strain 0.31 vs. 0.031). The pipeline demonstrates multi-platform compatibility and provides a reproducible, open framework for quantitative muscle imaging.


[73] 2606.09496

Orbital Plane Geometry and Information Conditioning for Doppler-Only LEO Positioning

We study an idealized information model for Doppler-only positioning with low earth orbit (LEO) signals of opportunity from a stationary receiver. Motivated by the observation that Doppler measurements from a satellite pass provide information primarily within the associated orbital plane, we model each satellite contribution as a weighted projection onto that plane. Under this model, the combined information matrix from multiple satellites is a sum of orbital-plane projection operators. Closed-form expressions are derived for the eigenvalues, condition number, and worst-case Cramer-Rao lower bound. For two satellites, the conditioning is governed by the dihedral angle between orbital planes and the relative information strengths of the two links. Monte Carlo evaluation of pass-integrated Doppler Fisher information matrices demonstrates that the proposed surrogate captures the dominant conditioning trends associated with orbital-plane diversity. The results provide a simple geometric framework for understanding the role of constellation geometry in Doppler-only positioning systems.


[74] 2606.09504

Hierarchical Federated Learning for Unsupervised Waveform Classification over Tactical MANETs

Distributed radio frequency sensing in contested tactical environments demands collaborative learning across mobile nodes. In ad-hoc networks, learning must occur without persistent backhaul, ground truth labels, or reliable communication links. Traditional federated learning approaches assume either ideal link conditions or supervised training objectives, neither of which holds in practice for deployed MANET platforms. This paper presents a hierarchical federated learning framework for unsupervised waveform classification over tactical MANETs subject to Rayleigh fading, random waypoint mobility, and multi-hop routing loss. Each node trains a local denoising convolutional autoencoder on raw IQ observations without label exchange, learning compact representations through a self-supervised reconstruction objective. A two-stage aggregation protocol elects connectivity-based relay aggregators consistent with OLSR multipoint relay selection, compressing cluster-level model updates before forwarding to a mobile server proxy. Simulation results demonstrate that in-network aggregation reduces attempted transmission bits relative to relay-forward federated averaging by around 12% at equivalent classification performance. Notably, stochastic channel-driven subsampling under non-IID data acts as an implicit regularizer, with both MANET conditions matching or exceeding ideal federated averaging on unsupervised representation quality. This suggests that moderate link loss can partially compensate for client drift in heterogeneous networks. Performance is assessed on analysis of the learned latent embeddings using KMeans normalized mutual information and linear probe accuracy.


[75] 2606.09505

Guaranteed Fast Implementation of the Split Covariance Intersection Filter: Nested Newton Method Thanks to the Fourth-Order Convexity of w-Optimization

The split covariance intersection filter (Split CIF) is a useful tool for general data fusion and has the potential to be applied in a variety of engineering tasks. The w-optimization problem involved in the Split CIF concerns the performance and implementation efficiency of the Split CIF. It is known that the w-optimization problem enjoys the desirable property of convexity (or more clearly, the second-order convexity in this paper's context). This paper proves that the w-optimization problem further enjoys a more desirable property namely the fourth-order convexity, thanks to which a guaranteed fast implementation of the Split CIF can be realized. The new implementation is coined as the nested Newton method, which is also presented in this paper.


[76] 2606.09534

A Continuification Approach to CAV Control in Mixed Traffic via Variable Speed Limits

This paper presents a method for controlling traffic via the use of connected and automated vehicles (CAVs) acting as moving bottlenecks. Current methods for moving bottleneck control use a couple PDE-ODE model, based on the Lighthill-Whitham-Richard (LWR) model, to represent the influence of the CAV. Control of the CAV is normally achieved by designing the control on the ODE which models the speed of the moving bottleneck. The proposed method in this paper instead looks to reduce the computational burden of controlling multiple CAVs by designing the moving bottleneck controller first on the PDE. The original control designed on the PDE is a linear quadratic regulator (LQR) that determines the optimal variable speed limit (VSL) for the entire length of freeway in order to regulate density to a desired setpoint. Then, a continuification approach is utilized to determine the input speed for each CAV. Results show that multiple CAVs can be controlled via this method, with minimal computational burden, and that as the number of CAVs increases the solution approaches the global optimal solution determined by the LQR.


[77] 2606.09557

Your U-Net Dereverberation Model is Secretly an RIR Encoder

In this work, we analyze the ability of NCSN++ U-Net based audio dereverberation models to capture global room characteristics in their intermediate representations. Through an empirical study of both a state-of-the-art diffusion-based model and a discriminative counterpart, we show that deeper layers encode structured room impulse response (RIR)-dependent embeddings. Moreover, the discriminative ability of this implicit room representation correlates with dereverberation performance across objective metrics. Motivated by this observation, we propose a training strategy that explicitly conditions the network on pre-trained RIR embeddings, obtained via self-supervised contrastive learning. Incorporating RIR conditioning improves representation quality, accelerates convergence, and enhances dereverberation performance, while significantly reducing the number of reverse diffusion steps required by the diffusion-based model during inference.


[78] 2606.09573

Bernoulli Filtering for Multi-Sensor Tracking with Thresholded Measurements

Target tracking is challenging when sensor detection thresholds cause state-dependent missed detections, particularly in multi-sensor scenarios with clutter and uncertain target existence. A recently developed missed detection framework models detection probability as a function of target state, sensor characteristics, and detection threshold, but it is limited to individual measurements and does not address the recursive tracking problem. This work extends the framework using a Bernoulli filter formulation to jointly handle recursive target tracking, clutter, and target existence uncertainty. A Bernoulli particle filter is evaluated in a simulated 2D multi-sensor tracking scenario with nonlinear measurements, clutter, and detection uncertainty. Incorporating accurate detection threshold knowledge reduces the generalized optimal subpattern assignment (GOSPA) metric by 62.4% compared to a conventional Bernoulli filter with fixed detection probability, while better balancing missed detections and false alarms.


[79] 2606.09652

Throughput Analysis for Near-Field Mobile Communications: Beamfocusing or Caustic Beamforming?

The migration to the Terahertz (THz) band and the deployment of extremely large antenna arrays (ELAAs) are transitioning wireless communications into the radiative near-field regime, fundamentally evolving conventional angular beam steering to beamfocusing (BF). However, the combination of the extremely narrow beamwidth and the mobility of the users necessitates frequent beamfocusing reconfigurations, incurring a significant switching overhead that degrades the system achievable throughput. In this regard, caustic beamforming (CB) is a promising alternative based on the synthesis of a continuous curved beam, which eliminates the need for beam tracking at the expense of a distributed beamforming gain. By leveraging the Airy beam as a canonical model, this paper develops an analytical framework to compare the throughputs achieved by CB and BF. Our main results include closed-form throughput expressions for both beamforming strategies and a performance boundary for paradigm selection. First, we derive the BF throughput by modeling a defocusing penalty induced by continuous user movement. The optimal beam dwell time that maximizes the throughput is analytically determined, and the impact of user speed and switching overhead on the throughput is quantified. For the CB scheme, we demonstrate that its throughput is determined by the signal-to-noise ratio (SNR) and the geometry of the trajectory of the user, yet invariant to the user speed. Finally, we analytically establish a threshold for the switching overhead to define the crossover point of the achievable throughput of both beamformers. Crucially, this threshold asymptotically vanishes at extremely high frequencies, positioning the continuous CB scheme as the preferred beam design paradigm for high-mobility THz communications.


[80] 2606.09667

Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading

Speech restoration through silent speech interfaces (SSIs) has emerged as a promising assistive technology for individuals with impaired or absent laryngeal voice production. Among non-invasive SSI modalities, surface electromyography (sEMG) and video-based lipreading provide complementary articulatory information, yet their integration for continuous speech synthesis remains underexplored. Moreover, existing multimodal approaches rarely address robustness to modality degradation or temporary sensor failure, limiting their applicability in realistic scenarios. In this work, we propose a masked multimodal speech synthesis framework that jointly leverages sEMG and lipreading signals through modality masking during training. Under multispeaker settings, the proposed approach reduces word error rate by up to 14 absolute percentage points compared to the strongest unimodal baseline. Experimental results not only show that masking strategies are critical for these performance gains and robustness under low-bitrate conditions, but also that they generalize better than degradation-specific data augmentations in the presence of modality absence conditions. Phone-level analyses further reveal complementary contributions across modalities, with particularly strong benefits for vowels and for specific consonant groups. Overall, these findings demonstrate the effectiveness and robustness of masked multimodal integration for silent speech synthesis, although adaptation to laryngectomized speakers remains an open research challenge.


[81] 2606.09677

MeCo: One-Step MeanFlow-based Corrector for Multi-Channel Speech Separation

While discriminative models for multi-channel speech separation excel in reference-based metrics, they often exhibit suboptimal human listening quality. To address this, we propose a novel MeanFlow-based one-step generative corrector (MeCo). MeCo learns a conditional average velocity field to map discriminative estimates directly onto the clean speech manifold in a single step. To maximize one-step generation performance, we introduce Data-Space Optimization (DSO). DSO integrates an $\mathbf{x}_r$-loss, which penalizes prediction errors on longer displacement intervals to serve as a generative objective for human listening quality, with an Endpoint SI-SDR loss that directly optimizes terminal signal fidelity. Experiments demonstrate that MeCo achieves state-of-the-art (SOTA) performance with minimal computational overhead, simultaneously achieving superior signal fidelity and human listening quality in both in-domain and out-of-domain scenarios.


[82] 2606.09753

Jamming-Resilient Sparse Delay-Doppler NOMA: Unitary Precoding, Randomized Active Sets, and Superincreasing Power Allocation

We propose a sparse delay-Doppler NOMA scheme resilient to intentional jamming. The transmitter places user data on a small random subset of delay-Doppler bins, spreads the result through a unitary precoder, and re-draws the active subset per frame from a pseudo-random seed shared with the receiver. The receiver detects and discards jammed bins, recovers the sparse signal by least squares, and decodes per bin via SIC. Hadamard, DFT, and Haar-random precoders all yield essentially the same BER, because a Marchenko-Pastur conditioning argument controls any random unitary submatrix. The closed-form BER has no jammer-induced floor, unlike the well-known partial-band floor of conventional OTFS-NOMA. The same argument shows that compromising the shared seed does not break the system: random unitary submatrices remain well-conditioned, so BER stays within the unjammed envelope. For more than two users we use a superincreasing power allocation (Merkle-Hellman knapsack) and prove the resulting low-complexity SIC matches maximum-likelihood detection exactly, removing the usual SIC propagation ceiling. For more than four users we partition them into pairs assigned to disjoint bin subsets; this OMA-friendly NOMA rule reaches floor BER at eight users by SNR around 20 dB. We extend the framework to Rician fading and show the jammer-independence property holds for arbitrary Rician K-factor. Monte Carlo simulations track the analytical predictions within 3 dB and show at least a 40 dB BER-ratio improvement against pattern-aware jammers, with about 24 dB of cumulative gain over conventional OTFS-NOMA under oracle jamming.


[83] 2606.09829

Adaptive Derivative Estimation via Stein's Unbiased Risk

Estimating derivatives from noisy sampled data is fundamental to control, human--computer interaction, and biomedical engineering. Causal FIR derivative filters offer a natural approach for this challenge, yet their performance depend on their length. While short filters amplify noise, long filters introduce smoothing bias. We present SURDE (SURE Derivative Estimator), which addresses this tradeoff at each time step by evaluating a data-driven cost derived from Stein's Unbiased Risk Estimator (SURE) across a bank of candidate lengths and soft-combining their outputs via exponential weighting. We prove a minimax-optimal oracle inequality for the soft-combined estimator and use it to derive the optimal weighting temperature in closed form. Thus, the only tuning parameter for SURDE is the noise variance. Via numerical simulations we show that SURDE consistently outperforms alternative adaptive methods (the Intersection of Confidence Intervals (ICI) rule and the Adaptive Windowing Velocity Estimator (AWVE)) for first-derivative estimation. We further show that \surede{} is robust to noise-variance misspecification (9\% degradation over a $4\times$ range), and that it is superior to ICI and AWVE also over real data scenarios (the EuRoC MAV dataset). SURDE is causal, computationally light, and requires only a rough estimate of the noise variance.


[84] 2606.07577

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

Audio-visual large language models (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present OmniMem, a memory-efficient streaming framework designed specifically for audio-visual LLMs. Unlike existing compression methods that treat all tokens uniformly, OmniMem introduces a modality-aware memory allocation strategy that separately manages visual and audio contexts, addressing the severe token imbalance between the two modalities. OmniMem further preserves informative and non-redundant KV states through perturbation-aware memory selection, enabling compact memory without sacrificing long-range understanding. To strengthen compression under realistic deployment constraints, we also explore budget-aware fine-tuning, which encourages the model to consolidate useful information into retained memory. Experiments on VideoMME Long, LVBench, and LVOmniBench with video-SALMONN 2+ and Qwen-2.5-Omni show that OmniMem consistently improves over strong training-free compression baselines by 2-4% absolute accuracy under the same memory budgets, with an additional 1-2% gain after fine-tuning.


[85] 2606.07643

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

Recent advances in Omni-Multimodal Large Language Models (Omni-MLLMs) have enabled strong integration of vision, audio, and language. However, their audio-visual intelligence (AVI) remains insufficiently evaluated due to the lack of systematic and comprehensive benchmarks. We introduce AVI-Bench, a cognitively inspired benchmark that evaluates Omni-MLLMs across three stages, perception, understanding, and reasoning, through cross-modal tasks requiring joint audio-visual interpretation. This design enables fine-grained diagnosis of model capabilities and failure modes. To further assess robustness beyond familiar domains, we propose AVI-Bench-PriSe, an extension that probes models' primitive audio-visual sensation using unfamiliar, low-semantic stimuli, testing generalization beyond common training distributions. Extensive experiments on both open-source and closed-source models reveal substantial limitations in current Omni-MLLMs. Based on these findings, we present a four-level AVI taxonomy. Overall, AVI-Bench provides a principled evaluation framework to guide the development of more robust and generalizable AVI. Project website: this https URL


[86] 2606.07659

Real-Time Industrial Defect Detection on Edge Hardware Using Fine-Tuned YOLOv8: A Systematic Benchmark on the NEU Surface Defect Database and MVTec AD with Automotive & Battery Manufacturing Extensions

Automated surface defect detection is critical for ensuring rigorous quality control in high-speed manufacturing environments. While deep learning models offer remarkable accuracy, deploying them on resource-constrained edge hardware without introducing significant latency remains a persistent challenge. This paper presents Industrial-YOLO, an edge-optimized framework built upon a fine-tuned YOLOv8 architecture specifically engineered for real-time industrial defect detection. We conduct a systematic benchmark utilizing the NEU surface defect database for steel sheets and the MVTec AD dataset, supplemented with custom automotive manufacturing extensions representing real-world structural anomalies (scratches, pits, and inclusions). To bridge the gap between algorithmic complexity and edge hardware constraints, target-specific optimizations are introduced via TensorRT and OpenVINO acceleration engines. Experimental results demonstrate that Industrial-YOLO achieves a high-velocity inference speed exceeding 120 FPS on the NVIDIA Jetson Orin platform while maintaining an exceptional mean Average Precision (mAP) of 98.5%. The proposed framework showcases highly robust, zero-latency performance when deployed directly onto an active automotive assembly line, offering a scalable blueprint for next-generation automated optical inspection (AOI) systems.


[87] 2606.07683

Review the Code, Not the Story: A Vision and Protocol for Code-First Peer Review

Peer review in computational fields remains centered on author-written manuscripts, even though the decisive evidence for many claims resides in executable code, data, configurations, and experiment pipelines. This manuscript-first workflow gives authors substantial control over narrative framing while leaving reviewers with limited time to inspect implementation details, reproduce results, or detect unsupported claims. This vision and protocol paper proposes code-first peer review: authors submit executable research artifacts and minimal claim manifests; a venue-controlled AI system builds the environment, executes experiments, audits code paths, maps claims to evidence, and generates a standardized Review Package for human reviewers. The goal is not to replace reviewers or to give authors an automatic writing assistant. Instead, AI serves as review infrastructure that shifts the target of peer review from polished narratives to executable evidence. We formalize a claim-evidence contract, define the Generated Review View and Review Package abstractions, give a worked example, outline a system architecture, and analyze evaluation and governance challenges including AI bias, prompt injection, model instability, auditability, and author appeal.


[88] 2606.07873

Adverse Effects of V2V Adoption on Road Safety

Vehicle-to-vehicle (V2V) communication is expected to improve road safety and reduce congestion. However, prior work shows that V2V information sharing under partial adoption may increase congestion and decrease safety. We study whether increasing V2V adoption itself affects road safety. We propose a corrected version of an existing model and analyze its behavior under varying adoption levels. We show that, in some cases, increased V2V adoption can increase accident probability. Moreover, under an optimal signaling policy, the system can ensure that accident probability is non-increasing in the adoption level.


[89] 2606.07932

LEGS: Laplacian-Enhanced Gaussian Splatting with a Nonlinear Weighted Loss

3D Gaussian Splatting (3DGS) has become an efficient explicit representation for radiance field reconstruction and real-time novel view synthesis. However, its standard photometric loss treats flat and structure-rich regions similarly, which may limit the recovery of sharp contours and fine details. Edge-Guided Gaussian Splatting (EGGS) improves structure awareness through edge-guided weighting, but mainly relies on first-order gradient responses and linear weighting. In this paper, we propose LEGS, a Laplacian-Enhanced Gaussian Splatting method with a nonlinearly weighted loss. LEGS replaces first-order gradient guidance with second-order Laplacian structural guidance and maps the normalized Laplacian response into pixel-wise weights through nonlinear response-to-weight functions. The proposed loss improves structure-aware Gaussian optimization while keeping the original 3DGS rendering pipeline unchanged. Experiments on the full Tanks\&Temples and Mip-NeRF360 datasets show that LEGS improves peak signal-to-noise ratio (PSNR) by up to 1.68 dB over 3DGS and up to 0.52 dB over EGGS. Incorporating the proposed second-order nonlinear weighting strategy into FastGS and FasterGS further improves PSNR by up to 1.69 dB, demonstrating its effectiveness as a general loss-level extension for Gaussian Splatting pipelines with potential applications in AR/VR, immersive visualization, and real-time 3D content generation.


[90] 2606.07938

DAL-PCQA: Enabling Distortion-Level and Language-Driven Reasoning for Point Cloud Quality Assessment

Point Cloud Quality Assessment (PCQA) methods typically predict scalar Mean Opinion Scores (MOS), which quantify overall perceptual degradation but do not reveal its causes. In contrast, human observers naturally reason in terms of specific distortions such as blur, color shifts, point density changes, missing regions, and geometric deformations. To close this gap, we introduce DAL-PCQA, a distortion-aware, language-annotated dataset for PCQA. DAL-PCQA augments benchmark point clouds with multi-level distortion severity labels, discrete quality categories, and structured natural language descriptions aligned with human perception. We define a point-cloud-specific distortion taxonomy that covers both photometric and geometric artifacts. Statistical analysis reveals characteristic degradation patterns across distortion types and quality levels. To assess the utility of these annotations, we compare zero-shot and fine-tuned multimodal models for generating perceptual quality descriptions. Experiments show that distortion-aware supervision substantially improves lexical and semantic alignment with ground-truth descriptions. By enabling interpretable, distortion-level reasoning, DAL-PCQA facilitates language-driven, explainable point cloud quality assessment. The dataset is publicly available at this https URL.


[91] 2606.07949

Feasibility to detect rapid change and disappearance of seagrass: Lessons from nearly 80 years of vegetation change in the Ako, Seto Inland Sea, Japan

This study analyses the Ako tidal flat in the Seto Inland Sea, Japan, where nearly all Zostera marina disappeared within a single year in 2025. Using aerial photographs from the 1940s onward, high-resolution satellite imagery, GRUS images (2.5-5 m), and monthly Sentinel-2 composites (10 m), we reconstructed approximately 80 years of seagrass distribution. YOLO-based segmentation using deep learning achieved high accuracy (overall accuracy >= 0.9) across these datasets; although species could not be discriminated, the models captured the major temporal dynamics in vegetation area. The long-term mean seagrass area was 6.8 ha, but values fluctuated widely, from 3.5 ha in 1974 to 41.3 ha in 1989 except 0.2 ha in 2025. Sentinel-2 composites from 2019 to 2026 revealed clear seasonality, with vegetation increasing in early summer and declining from autumn. In 2025, however, the area decreased sharply after summer and remained anomalously low throughout the winter of 2025-2026. Our results, indicating that the 2025 event was not a normal fluctuation but a rapid ecosystem shift involving the loss of the dominant canopy-forming species, most plausibly driven by regionally elevated summer water temperatures. The findings also have implications for seagrass Essential Ocean Variables (EOVs) and the State of Nature (SoN) metrics used in TNFD-aligned nature-related disclosures. Unlike forests, seagrass meadows require finer temporal resolution because both pronounced seasonality and abrupt collapse strongly influence area-based indicators. Therefore, in addition to previously noted issues such as species-level classification accuracy, we recommend that (1) baselines be defined over the longest available record and justified ecologically, (2) seasonal standardization be applied before inter-annual comparisons, and (3) years with extreme area anomalies be flagged rather than used as reference points.


[92] 2606.08094

vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models

Vision-Language-Action (VLA) policies are typically shipped as Python/PyTorch stacks that assume a workstation-class GPU, a mismatch for the hardware on which robots actually run. We present this http URL, a portable C++ inference runtime built on this http URL. To our knowledge, it is the first ggml-class engine to natively serve the flow-matching and diffusion VLA inference pattern, in which a cached vision-language prefix is consumed by a cross-attending action expert integrated over several solver steps. A single runtime serves seven architectures spanning five backbone and four action-head families behind one request/response protocol, with each model packaged as a self-contained bundle. On LIBERO-Object, the engine matches a state-of-the-art checkpoint to within one episode out of 200, and runs BitVLA at 100% success in 1.3 GiB of memory. The same bundle runs unchanged across three hardware tiers, from a consumer GPU down to an 8 GB embedded module. A cross-hardware roofline analysis shows that batch-1 VLA inference is compute-bound, so utilization rather than bandwidth is the deployment lever; an IMMA ladder GEMM derived from this analysis cuts BitVLA per-step latency by 4.5x. We then frame an on-robot stress test on an ALOHA arm that isolates the latency constraint under which a learned VLA must replan against a moving target on the hardware it was trained for. Code, demo videos, and the reproducible benchmark scaffold are available at this https URL.


[93] 2606.08281

Impedance MPC for Physical Human-Robot Interaction: Predictive Disturbance Rejection with Joint-Limit Safety

Physical human-robot interaction (pHRI) demands simultaneous trajectory accuracy and compliant safety under unplanned contact. Classical impedance control incurs a nonzero steady-state position error under sustained human force -- the applied force divided by the task stiffness -- which integral action reduces only within a narrow stable-gain budget. We present a two-layer Impedance MPC that resolves this tension. Layer~1 analytically cancels gravity, Coriolis, and task-space inertia, reducing the residual plant to a configuration-independent double integrator with a constant state-transition matrix. Layer~2 solves a 30-variable convex QP at 100\,Hz, exploiting this constant structure so the free-response matrix is precomputed once; an augmented Kalman filter estimates the persistent disturbance state, giving a formal zero-steady-state-error guarantee. A null-space inverse-barrier potential and a task-space workspace projection enforce joint-limit safety across the tested workspace. On a 7-DOF Franka FR3, Impedance MPC with Kalman augmentation attains sub-0.05\,mm steady-state error versus 44.8\,mm for classical impedance (a $>$800-fold reduction) under a sustained 15\,N force, sub-millimeter tracking on four 3-D circles, and graceful robustness to measurement noise and inertial mismatch up to 30\%.


[94] 2606.08425

TinyGiantALM: A Compact Audio-Language Model for Intent-Aware Reasoning under Resource Constraints

Current advancements in Audio Reasoning rely on massive Large Audio-Language Models (LALMs), hindering deployment in resource-constrained environments. We introduce TinyGiantALM, a compact 1.5B efficiency-oriented alternative. Instead of brute-force scaling, we propose an Instruction-Aware Feature Refinement framework using a Query-guided Projector and Semantic Gating to filter acoustic signals based on user intent. On the MMAR benchmark, TinyGiantALM achieves 46.4% zero-shot accuracy, significantly outperforming 7B-13B baselines. While a reasoning gap in logical narrative remains versus 30B+ models and certain trade-offs exist in overly dense or spatial scenes, our approach notably surpasses models up to 8x larger in disentangling mixed-modality environments. These findings demonstrate that architectural precision offers a tangible pathway to secure robust perception capabilities on edge-friendly scales.


[95] 2606.08463

Simplest Nontrivial Maxwellian Random Field Models for Stochastic LoS MIMO Using the Dyadic Green's Function

This letter introduces a novel, full-wave, physics-compliant stochastic dyadic Green's function (SDGF) framework for modeling electromagnetic (EM) multiple-input-multiple-output (MIMO) channels under wavenumber uncertainty. Unlike conventional phenomenological fading models, the proposed approach provides what appear to be the simplest exact random field models of electromagnetic line-of-sight (LoS) propagation that are also exact solutions of Maxwell's equations. Hence, we dub them Maxwellian random field theoretic models. These physically consistent stochastic models, including an analytically tractable wavenumber Gaussian model and a more general stochastic plane wave (SPW) model, serve as fundamental baseline models for stochastic LoS channel characterization. By preserving the vectorial structure of Maxwell's equations and the dispersion relation, the framework naturally incorporates both propagating and evanescent modes. Our analysis of ergodic capacity and degrees of freedom (DoF) reveals that the key results of the complex SPW model can be reproduced by the simpler Gaussian model with limited variance. Furthermore, we provide examples using 2D continuous MIMO systems, illustrating how the model's Maxwell-consistent stochasticity explains observed increases in channel capacity and DoF over the deterministic MIMO capacity baseline. These idealized Maxwellian random field theoretic models offer a physically grounded reference point for understanding fundamental limits in stochastic LoS propagation environments.


[96] 2606.08513

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

Autonomous Underwater Vehicles (AUVs) traditionally rely on complex, heavily engineered pipelines for perception, path planning, and motion control. This paper explores the feasibility of an end-to-end Deep Reinforcement Learning (DRL) approach that maps raw sensor data directly to thruster commands, reducing manual engineering. We propose a hierarchical reinforcement learning (HRL) architecture splitting the problem into two Markov Decision Processes. A High-Level (HL) policy operating at 2Hz processes raw $84 \times 84$ pixel monocular camera frames, stacked $100 \times 100$ pixel forward-looking imaging sonar, and proprioceptive data to generate spatial subgoals. Simultaneously, a Low-Level (LL) policy operating at 10Hz converts these subgoals into thruster commands. The HL policy is trained using Reinforcement Learning from Prior Demonstrations (RLPD) within a modified Sample-Efficient Robotic Reinforcement Learning (SERL) framework, while the LL policy utilizes Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER). Evaluated in the high-fidelity HoloOcean simulator, our method demonstrates successful obstacle avoidance, achieving trajectory lengths closely approximating (within 4% to 6% of) an $\text{RRT}^*$ planning baseline. Furthermore, the learned policy exhibits strong robustness to simulated sensor noise and decreased visibility. While the system navigates familiar geometries effectively, experiments reveal generalization limitations when encountering unvisited areas with novel obstacle shapes. Ultimately, this work demonstrates the promise of sample-efficient, end-to-end DRL for underwater navigation using minimal computational hardware.


[97] 2606.08524

Acoustic disguising: a unified framework for cloaking and holography

Cloaking and holography -- usually treated as distinct problems -- are two limits of a single operation that we call acoustic disguising, realized here using immersive boundary conditions on a closed surface. Driving the boundary with homogeneous Green's functions suppresses any incident field inside the enclosed volume and cloaks unknown objects broadband; driving it with scattering Green's functions synthesizes a holographic scatterer indistinguishable from a target for arbitrary illuminations. Combining the two, using heterogeneous Green's functions, replaces the scattering signature of one object with that of another, transforming its acoustic identity. We demonstrate the framework in three-dimensional FDTD simulations driven by impulsive Green's functions, complemented by data-driven Green's-function retrieval, establishing a direct route to real-time 3D acoustic cloaking, holography, cloning, and disguising.


[98] 2606.08583

A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning

Deep learning on physiological time series is interpreted through domain-specific features -- oscillatory rhythms in EEG, morphological complexes in ECG -- yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32--0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.


[99] 2606.08594

How Much Capacity Does EEG Denoising Need? Ultra-Compact Networks reveal Benchmark Saturation and Metric-Utility Gap

Deep learning EEG denoising architectures have scaled from tens of thousands to tens of millions of parameters, yet no prior study has isolated model capacity as the experimental variable or tested whether reconstruction metrics predict downstream neural-signal utility. We address both gaps by fixing architecture, loss, data split, and training recipe while sweeping only channel width from 1.05K to 40.26K parameters in a minimal depthwise-separable convolutional U-Net. Models were evaluated on the EEGDenoiseNet benchmark, cross-dataset BCI transfer tests, controlled baseline retraining, and downstream motor-imagery classification with five decoder families across all nine BCI Competition IV-2a subjects. Reconstruction performance saturated by 3-6.5K parameters, with post-elbow gains of at most 0.015 correlation coefficient per log10-parameter unit. An 8.46M-parameter baseline retrained under the same pipeline matched the 40.26K compact variant on EOG--a 200x parameter gap yielding no advantage--while a Patch-Transformer control reproduced the same diminishing-return shape. Downstream evaluation exposed a classifier-dependent metric-utility gap: reconstruction-optimized denoising significantly degraded CSP+LDA classification across all nine subjects and three artifact types (best denoised accuracy 0.547 vs. 0.612 noisy baseline; Bonferroni p=0.0488), persisting on naturally recorded trials (Delta=-0.047; BH-FDR q=0.0049). End-to-end neural decoders showed variable or neutral effects. Standard EEG denoising benchmarks are saturated far below current model capacity, and reconstruction metrics do not predict BCI utility. Ultra-compact models at 33-46 KB and 1.27-2.61M FLOPs/segment are practical for edge deployment. These findings argue for capacity-controlled evaluation, harder task-aware benchmarks, and mandatory downstream validation.


[100] 2606.08614

Acoustic Cloning

Cloning refers to producing identical copies of existing objects. Here, we experimentally show how to clone acoustic scattering objects. We acquire a digital twin and bring it back to life - a simple two-step process. First, we use broadband speakers to illuminate the scattering object within a closed receiver aperture. From these recorded reverberative data, we retrieve the object's scattering Green's functions using multidimensional deconvolution. In the second step, the acoustic scatterer is holographically reconstructed using the acquired scattering Green's functions. The hologram scatters any wavefield in real-time exactly like the original object would. Low-latency feedback reproduces all orders of interactions between the physical wavefield and the numerically defined hologram. This two-step process is demonstrated by cloning and modifying several rigid scatterers in a two-dimensional acoustic waveguide. Applications range from fully realistic digital scattering models to efficient metamaterial experimentation.


[101] 2606.08618

Coil-Integrated Alignment Sensor for Real-Time Feedback of Coil-Scalp Contact Point and Angle During Transcranial Magnetic Stimulation (TMS)

Whereas coil positioning in transcranial magnetic stimulation (TMS) to reach a specific cortical target with modern focal stimulation coils has been intensively studied, the alignment and contact of a coil with the head is often ignored. Focal figure-of-eight coils have a point on the surface, where they generate the largest induced electric field. This point should touch the head first, and the coil should be approximately tangential to the head in this point. Previous research has demonstrated the large impact if the coil does not touch the head with the right point and that many operators struggle with establishing or maintaining the correct coil-scalp alignment. This paper presents a technological support technology that can monitor the exact position of the contact point and also pressure to provide feedback to users. As the system uses exclusively components from consumer electronics, the sensor is low-cost and affordable. Through proper design, we achieved sufficient robustness so that the sensor does neither reset during TMS pulses and also not show any detectable degradation.


[102] 2606.08645

Nonlocal Teams and Information Structures

We look at Bell inequalities from the lens of information structures in stochastic teams. We consider the usual CHSH game and a dynamic variant of the same to study how various classes of strategies, classical, projective and quantum, behave under team theoretic solution concepts. We find that projective strategies (where each player performs projective measurements) enjoy important properties in the usual CHSH game, but they do not carry over to its dynamic version. These results shed light on the delicate interplay of information structure in quantum strategies and the fragility of some well known ideas under changes of information structure.


[103] 2606.08663

Probing Token Spaces under Generator Shift in AI-Generated Music Detection

AI-generated music detectors can appear robust on standard benchmark splits, yet their deployments require transfer to generator sources absent during training. We study this problem with source-restricted evaluation on \textsc{MoM-open}, an open reconstruction of MoM-CLAM that replaces the non-redistributable real corpus with FMA and MTG-Jamendo while preserving the fake-generator protocol. To isolate the role of representation, we introduce \textsc{CoMoE}, a compact fixed classifier for comparing heterogeneous audio token spaces while keeping the downstream architecture and training recipe unchanged. Experiments show that standard and real-source-restricted splits are nearly saturated, whereas fake-source restriction exposes large differences between token spaces: X-Codec tokens are strongest when training on Udio alone, while MERT-derived tokens are stronger when training on Suno-v3.5 alone. These results suggest that codec-style discrete token spaces should be treated as a primary experimental axis under generator shift in AI-generated music detection. Our code and data are available at this https URL.


[104] 2606.08725

Real-Time and Accurate Collision-Free Teleoperation via Differentiable Constraint-Based Trajectory Planning

In teleoperation, the human operator typically controls only the end-effector pose, which often leads to self-collisions of the manipulator and collisions with environmental obstacles, since joints and links are not controlled individually. A common strategy to mitigate this issue is to enhance the operator's input using optimal-control-based trajectory planning. As derivative-based solvers require differentiable constraints, existing approaches either approximate robots and obstacles with spheres, reducing geometric accuracy, or approximate derivatives, degrading convergence and increasing computation times. We address these limitations by adapting a recent formulation of differentiable collision-avoidance constraints, based on duality in convex optimization, to the teleoperation setting. The robot is approximated with capsules and the environment with polytopes. We compare the resulting trajectory planning method against state-of-the-art techniques in simulation with varying numbers of obstacles and evaluate it on a UR5e manipulator in a real-world teleoperation test. Results show that our approach achieves lower computation times while enabling more accurate obstacle modeling, leading to smoother and collision-free end-effector teleoperation.


[105] 2606.08856

Optimal Control and Dissipativity of Linear Hermitian Matrix-Valued Dynamical Systems

We develop a unified framework for linear-cost optimal control, finite-time optimal steering, dissipativity analysis, and zero-sum differential games for linear impulsive systems whose state is a Hermitian matrix evolving in $\mathbb{H}^{n+m}_{\succeq0}$, a class that encompasses continuous- and discrete-time linear systems and switched systems as degenerate cases, and includes the second-order moment dynamics of linear (stochastic) hybrid systems. The entire theory rests on three tools: a single \emph{key identity} relating cost, trajectory, and a dual variable, an Extended Schur complement lemma, and a Schur inner-product decomposition, applied identically to the flow integral and to each jump. These yield structurally uniform sufficient and necessary conditions, dual linear matrix inequality (LMI) characterizations, and explicit optimal policies for every problem class, on both finite and infinite horizons under time-varying assumptions (without time invariance or periodicity), together with causal dwell-time policies for the problems that admit them.


[106] 2606.08984

Not All Warm Starts Help: Benchmarking Primal-Dual Initializations for ACOPF Algorithms

Warm starts are widely used to accelerate AC optimal power flow (ACOPF) solves, but the impact of different initialization strategies has received limited systematic study, particularly for the primal-dual interior-point methods that dominate large-scale ACOPF algorithms. This paper benchmarks initialization strategies for ACOPF solved with the interior-point solver IPOPT on 19 PGLib-OPF instances (5 to 30,000 buses), testing all 15 non-empty subsets of the primal blocks $\{P_g, Q_g, V_m, V_a\}$ under oracle conditions and three DC-seeded combinations in a practical setting. The experiments show that most partial primal-plus-dual restarts increase solve time or reduce convergence reliability. Among the oracle primal-plus-dual (O-PD) configurations, only the complete restart reliably converges on every baseline-convergent case, reaching a $47.6\%$ median solve-time speedup. Twelve of the 14 partial O-PD combinations have negative median speedups, and several fail repeatedly on larger networks. Decomposing the dual into constraint and bound multipliers shows that \emph{coverage}, not the presence of duals per se, governs robustness: the full bound-multiplier vector reaches 90.7\% convergence and a $+26.8$\% median speedup, whereas block-matched coverage (oracle multipliers on some bounds, defaults on the rest) drops to 70.4\% and $-31.1$\%. Practical DC seeding sometimes helps the AC solve, but the benefit is no longer statistically significant once the DCOPF presolve cost is included in the end-to-end comparison ($p = 0.4171$). For learned warm-start methods, the results support the following target ordering: predict the full primal vector first; if only partial coverage is possible, prioritize voltage variables; and avoid partial or inconsistent dual predictions unless the primal estimate is nearly complete.


[107] 2606.08993

LEAF: A Learning-Enabled ADMM Framework for Accelerated Convex Optimization

We propose LEAF, a learning-enabled ADMM framework for accelerated convex optimization. The key idea is to approximate the Moreau envelope of the objective function using an Input Convex Neural Network (ICNN), resulting in a learned model that preserves convexity and smoothness. This leads to the proposed Moreau Envelope Learning ADMM (MEL-ADMM) and its splitting variant sMEL-ADMM. Unlike existing approaches that learn high-dimensional operators directly, LEAF learns a scalar-valued Moreau envelope, significantly reducing model complexity and improving data efficiency. The framework accommodates a broad class of convex problems with smooth and non-smooth objectives. By embedding convexity explicitly through the ICNN architecture, the proposed approach maintains high approximation accuracy while preserving key structural properties of the optimization problem. Both MEL-ADMM and sMEL-ADMM are developed with theoretical guarantees of convergence and feasibility under the learned model. Rigorous analysis shows that the proposed methods achieve convergence rates comparable to classical ADMM while reducing per-iteration computational cost. Numerical experiments demonstrate up to an order-of-magnitude speedup over state-of-the-art solvers while maintaining low optimality gaps


[108] 2606.09237

Can we stabilize an inverted pendulum with feedback from a time-of-flight camera?

Time-of-flight cameras are popular in robotics for providing direct depth information while being compact, inexpensive, and robust to lighting conditions, but their low spatial resolution and depth noise are widely believed to preclude precise feedback control. In this paper, we show that an inexpensive, low-resolution time-of-flight camera provides sufficient feedback to reliably and precisely balance an inverted pendulum on a cart--a canonical benchmark for fast, unstable dynamics.


[109] 2606.09292

Dual Quaternion-Based Unscented Kalman Filter with Visual Inertial Odometry for Navigation in GPS-Denied Environments

Reliable navigation in GPS-denied environments remains a fundamental challenge in robotics, aerospace, and autonomous vehicle applications. This paper presents a Dual Quaternion-Based Unscented Kalman Filter (DQUKF) equipped with a Visual Inertial Odometry (VIO) algorithm for accurate state estimation enabling navigation in GPS denied locations. The proposed framework formulates the DQUKF in an error state manner, where the nominal pose is represented by a unit dual quaternion and the local pose error is represented by a 6-dimensional twistor parameterization used for sigma point generation, covariance propagation, and measurement correction. In parallel, the VIO algorithm tracks features across image frames, synchronizes measurements between the IMU and camera, and provides visual constraints that complement inertial propagation. Simulation results on the EuRoC MAV dataset show that the proposed DQUKF converges under high initialization uncertainty and achieves a position RMSE of 0.2584~m in the difficult flight sequence, outperforming the benchmark filters.


[110] 2606.09366

Is Text All You Need? Text as a Universal Information Bottleneck for Speech LLMs

Large language models (LLMs) provide a powerful reasoning backbone for speech understanding, but integrating continuous acoustic signals into a frozen LLM remains challenging. Existing speech-to-LLM interfaces typically operate at two extremes: either enforcing near-discrete token alignment, which benefits transcription but loses paralinguistic information, or learning unconstrained continuous representations, which can drift away from the LLM's input space and degrade autoregressive decoding. In this work, we propose Convex Gate (C-Gate), a speech-to-LLM bridge that constrains all speech representations to lie within the LLM's input embedding manifold with an architectural convex-hull constraint. Concretely, each frame is represented as a convex combination of token embeddings, ensuring compatibility with the pretrained LLM while preserving continuous expressivity. Across automatic speech recognition (ASR) and emotion recognition, C-Gate achieves strong joint performance, improving LibriSpeech WER by up to 48.7% relative while matching or exceeding single-task emotion accuracy. Beyond performance, our analysis reveals a key insight: information is not carried by discrete token identities, but by time-resolved trajectories in the embedding space. Causal interventions confirm that both the trajectory structure and alignment to the pretrained embedding manifold are critical for performance. These results suggest that geometry, rather than token discreteness, is the fundamental design factor in speech-to-LLM interfaces, and provide a controlled regime for studying multimodal integration in frozen LLMs. We release the checkpoint, per-sample outputs, mechanism dumps, and intervention suite for replication.


[111] 2606.09617

Powering the Future of AI: Navigating the Trade-offs for Europe's Energy Transition and Net-Zero Goals

The rapid expansion of AI globally has led to the proliferation of energy-intensive hyperscale data centres (DCs), making them as a structurally challenging component in power system planning and operation. Using a spatially explicit optimisation model of Europe across 21 AI growth scenarios, we systematically quantify additional demand, capacity requirements, emissions, and operational impacts of DCs. Results indicate that AI could drive 73-723 TWh of extra demand by 2050, risking cumulative emissions overshoots of 67-181 MtCO2 between 2030 and 2050. Our analysis indicates that after 2030, the geography of AI infrastructure will be shaped more by firm power and system flexibility than by the mere abundance of clean energy. In moderate scenarios, AI requires an additional of 200 hours of firm generation, which increases LCOE by 35 EUR/MWh in key hubs. We show that even under the pessimistic scenarios, existing infrastructure would require 70 GW additional capacity, while under managed growth pathways, this expansion could reach 226 GW. We further find DCs workload dynamics strongly shape energy dispatch, system flexibility, and emissions, while improved efficiency significantly reduces capacity needs, and system peaks. While our findings suggest that net-zero targets for 2050 may be achieved, critical emission risks may appear in the intermediate years, and the EU may compromise its carbon-neutral goals unless policies adapt to this accelerating digital transformation.


[112] 2606.09620

Motion planning for hundreds of floating robots

Planning collision-free motion for large robot fleets is difficult because collision avoidance induces strong inter-agent coupling that grows rapidly with team size. We consider omnidirectional floating robots on water, where choreographies are specified by sparse keyframes and an interactive tool must generate trajectories within seconds, even when transitions span minutes and thousands of time steps. We propose a scalable pipeline that builds a collision graph from an initialization, decomposes the coupled problem into interaction clusters, and solves clusters independently (and in parallel) with robustness mechanisms for common decomposition pathologies. We validate the approach in simulations up to 500 robots. The synthesized trajectories have also been deployed in two real-world demonstrations, on Lake Zürich with a fleet of 24 Way of Water crafts and at the Time Space Existence 2025 Venice Biennale.


[113] 2606.09698

Optimal Feedback Communication with Information Maximization and Distortion Minimization

We study the problem of optimally sending a real-valued source through multiple uses of a channel with feedback. First, we state a set of conditions that are sufficient for an encoder to achieve maximal mutual information between the source and all the channel outputs. This set of conditions are also necessary when the channel is input-identifiable, a condition widely satisfied by common channel models. More notably, we further study the information maximization-distortion minimization problem, where the mutual information between the source and all channel outputs still needs to be maximized, while at each step, the MMSE of estimating the source from the channel outputs so far also needs to be minimized. We derive a solution to this problem for discrete channels with certain symmetries, e.g. $k$-ary symmetric or $k$-ary erasure channels. We show that for such channels the famous posterior matching scheme, while not necessary for information maximization alone, is sufficient and essentially necessary for achieving both information maximization and distortion minimization. This work also provides a new perspective of regularizing distortion-minimizing feedback communication through information maximization, which enables us to find the optimal solution that otherwise would be intractable.


[114] 2606.09717

What Makes Synthetic Speech Sound Sarcastic? A Prosody-Controlled Perception Study

Prosody plays a central role in sarcasm perception, yet previous studies have relied on naturally produced speech that lacks fine-grained control over individual acoustic dimensions. As prosodic cues co-vary in natural data, isolating their independent contributions remains challenging. We introduce a controlled framework using neural text-to-speech (TTS) with prompt-based prosodic conditioning to manipulate speech rate, pitch variation, and loudness. An orthogonal stimulus set was constructed to enable causal testing of prosodic cue effects. Human listeners rated sarcasm and naturalness, and their judgments were compared with predictions from a foundation model capable of processing audio input. Results show that loudness primarily drives human sarcasm perception, whereas the model assigns greater weight to speech rate, leading to distinct cue-weighting patterns. This study shows how controllable neural TTS enables investigation of prosodic cue weighting in speech perception.


[115] 2606.09825

An Agency-Transferring Model-Free Policy Enhancement Technique

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a baseline into the RL training process, simultaneously improving training efficiency relative to from-scratch methods and producing a learning policy that outperforms the baseline. At each step, the method arbitrates between the baseline policy and a trainable learning policy, initially relying strongly on the baseline policy and then progressively transferring agency to the learning policy. By the end of training, the learning policy is a standalone neural network that operates without baseline policy support. The paper formalizes what it means for the baseline policy to be functional: under this policy, the agent reaches a goal set and remains there with high probability. The proposed arbitration mechanism is designed to exploit this property during training, yielding high goal-reaching rates right from the beginning of training. A theoretical analysis provides a formal interpretation of this behavior under stated assumptions and extends it to the final baseline-free regime, where explicit lower bounds are derived for the goal-reaching probability of the standalone learning policy. Empirical results on continuous-control benchmarks show that the proposed method achieves returns that match or exceed those of competitive approaches, while maintaining the highest goal-reaching rates throughout training among the compared methods -- including in the final stage, where the learning policy operates without any baseline support.


[116] 2311.17484

Notes on data-driven output-feedback control of linear MIMO systems

Recent works have approached the data-driven design of dynamic output-feedback controllers for discrete-time LTI systems by constructing non-minimal state vectors composed of past inputs and outputs. Depending on the system's complexity (order $n$, lag $\ell$ and number of outputs $p$), it was observed in several works that such an approach presents significant limitations. In particular, many works require to restrict the class of LTI systems to those satisfying the relation $p\ell=n$. In this note, we show how to address the general MIMO case (for which $p\ell\geq n$ in general) by constructing an alternative non-minimal state vector from data. Different from the existing literature, our method guarantees the satisfaction of certain rank conditions when the system is persistently excited, thereby facilitating the direct data-driven dynamic output-feedback control of MIMO systems by applying methods that were originally developed for the input-state data setting.


[117] 2312.13683

Joint Channel Estimation and Cooperative Localization for Near-Field Ultra-Massive MIMO

The next-generation wireless networks are envisioned to jointly support high-rate communications and ubiquitous sensing. Ultra-Massive Multiple-Input Multiple-Output (UM-MIMO) offers abundant spatial Degrees of Freedom (DoFs) for both functions, yet its large aperture shifts electromagnetic propagation into the near field, invalidating conventional far-field (plane-wave) assumptions. While near-field channel modeling has been studied, existing channel estimation methods are inadequate: on-grid designs suffer from non-orthogonal codebooks, and off-grid methods lack convergence guarantees, yielding unreliable estimates. Moreover, channel estimation and localization are typically designed in isolation, preventing the exchange of information that could otherwise enable mutual performance improvement. To address this difficulty, we propose a unified framework that exploits near-field characteristics to jointly design channel estimation and cooperative localization. Specifically, we develop a Variational Newtonized Near-field Channel Estimation (VNNCE) algorithm that extracts position-aware soft information from the channel, and a Gaussian Fusion Cooperative Localization (GFCL) method that leverages this information across multiple Base Stations (BSs) for enhanced accuracy.


[118] 2406.19749

SPIRONet: Spatial-Frequency Learning and Graph-based Channel Interaction Network for Vessel Segmentation

Automatic vessel segmentation plays a pivotal role in the development of next-generation interventional navigation systems for surgical robotics. However, current approaches still suffer from suboptimal segmentation performance under challenging intraoperative conditions, such as low-signal-to-noise ratio (SNR), small or slender vessels, and strong interference. In this study, a novel spatial-frequency learning and graph-based channel interaction network (SPIRONet) is proposed to address the above issues. To address low-SNR vessel appearance and small or slender branches, dual spatial-frequency encoders are utilized, where the frequency encoder captures global vessel continuity that is less affected by local noise fluctuations, while the spatial encoder preserves fine vessel details. A cross-attention fusion module is further introduced to adaptively integrate this complementary spatial and frequency information. Moreover, to suppress interference from non-target vessels and vessel-like structures, a graph-based channel interaction module is designed to model channel-wise correlations, enhancing consistent vessel-related responses while suppressing task-irrelevant activations. Extensive experimental results on five challenging datasets demonstrate that the proposed method achieves competitive and consistently strong performance compared with existing methods. For example, SPIRONet achieves IoU improvements of +0.87%, +0.52%, +0.23%, +1.39%, and +2.22% over the strongest competing methods on CADSA, CAXF, DCA1, XCAD, and ARCADE, respectively. Moreover, SPIRONet achieves an inference speed of 21 FPS with a 512x512 input size, meeting the real-time requirements of interventional scenarios (6-12 FPS). These promising results indicate SPIRONet's potential for integration into interventional navigation systems. Code is available at this https URL.


[119] 2407.20135

Trade-offs in Reliability and Performance Using Selective Beamforming for Ultra-Massive MIMO

This paper addresses the optimization challenges in Ultra-Massive MIMO communication systems, focusing on array selection and beamforming in dynamic and diverse operational contexts. We introduce a novel array selection criterion that incorporates antenna health information into the optimization process, distinguishing our approach from traditional methods. Our methodology employs dual proximal-gradient ascent to effectively tackle the constrained non-convex and non-smooth nature of sparse array selection problems. A central feature of our strategy is the implementation of proportional fairness among communication users, aligning with system resource limitations while ensuring minimum rate requirements for all users. This approach not only enhances system efficiency and responsiveness but also ensures equitable resource distribution. Extensive simulations validate the effectiveness of the proposed solutions in optimizing Ultra-Massive MIMO system performance, demonstrating their applicability in complex communication scenarios. Our findings reveal key trade-offs influenced by the sparsity promotion weight (\(\gamma\)). As \(\gamma\) increases, spectral efficiency (SE) and communication rate (Ri) decrease, while beamforming matrix density (BMD) reduces and antenna reliability (RL) significantly improves. These results highlight the critical balance between performance and reliability, essential for the practical deployment of Ultra-Massive MIMO systems. This work advances the field by providing innovative solutions and new insights into array selection and beamforming optimization, setting a foundation for future research in Ultra-Massive MIMO communication systems.


[120] 2408.06516

Exposing Barriers to Flexibility Aggregation in Unbalanced Distribution Networks

The increasing integration of distributed energy resources (DER) offers new opportunities for distribution system operators (DSO) to improve network operation through flexibility services. To utilise flexible resources, various DER flexibility aggregation methods have been proposed, such as the concept of aggregated P-Q flexibility areas. Yet, many existing studies assume perfect coordination among DER and rely on single-phase power flow analysis, thus overlooking barriers to flexibility aggregation in real unbalanced systems. To quantify the impact of these barriers, this paper proposes a three-phase optimal power flow (OPF) framework for P-Q flexibility assessment, implemented as an open-source Julia tool this http URL. The framework explicitly accounts for voltage unbalance and imperfect coordination among DER in low voltage (LV) distribution networks. Simulations on an illustrative 5-bus system and a real 221-bus LV network in the UK reveal that over 30% of the theoretical aggregated flexibility potential can be lost due to phase unbalance and lack of coordination across phases. These findings highlight the need for improved flexibility aggregation tools applicable to real unbalanced distribution networks.


[121] 2409.18867

Robust and efficient data-driven predictive control

We propose a robust and efficient data-driven predictive control (eDDPC) scheme which is more sample efficient (requires less offline data) compared to existing schemes, and is also computationally efficient. This scheme employs a recently proposed data-based representation of linear time-invariant (LTI) systems as a predictor. Such a representation serves as an alternative to Hankel-based predictors obtained from, e.g., the so-called fundamental lemma, and can be derived by exploiting the kernel structure of shallow Hankel matrices of data. This allows for application of our proposed scheme using very short (and potentially irregularly measured) noisy input-output data, the amount of which is independent of the prediction horizon. To account for measurement noise, we provide a novel result that quantifies the uncertainty between the true (unknown) restricted behavior of the system and the estimated one from noisy data. Furthermore, we show that the robust eDDPC scheme is recursively feasible and that the resulting closed-loop system is practically exponentially stable. Finally, we compare the performance of this scheme to existing ones on a case study of a four tank system.


[122] 2409.19187

Efficient Dual-Blind Deconvolution for Joint Radar-Communication Systems Using ADMM: Enhancing Channel Estimation and Signal Recovery in 5G mmWave Networks

This paper introduces a novel framework for jointly estimating unknown radar channels and transmit signals in millimeter-wave (mmWave) Joint Radar-Communication (JRC) systems, a problem often referred to as dual-blind deconvolution. The proposed method employs the Alternating Direction Method of Multipliers (ADMM) to iteratively refine the radar channel G (or H) and the transmitted signal X under convex constraints, incorporating both smooth and non-smooth penalty terms via proximal operators. By enforcing a bounded perturbation model for the radar channel and a strict transmit power budget, the algorithm aligns well with practical hardware limits. Extensive simulations demonstrate that the proposed approach reliably addresses the dual-blind deconvolution challenge, resulting in effective radar channel estimation and robust communication performance. Notably, the framework's iterative structure readily accommodates hardware considerations and different system configurations, making it well-suited for emerging mmWave JRC scenarios. Its adaptability and computational efficiency highlight the potential for wider adoption in next-generation wireless networks, where radar detection and communications increasingly share bandwidth and hardware resources.


[123] 2412.17880

Antenna Health-Aware Selective Beamforming for Hardware-Constrained DFRC Systems I

This paper addresses the optimization challenges in dual-functional radar-communication (DFRC) systems with a focus on array selection and beamforming in dynamic and heterogeneous operational contexts. We propose a novel array selection criterion that integrates antenna health information into the optimization process, distinguishing our approach from traditional methods. Our methodology employs gradient dual ascent and dual proximal-gradient ascent for tackling the constrained non-convex and non-smooth nature of sparse array selection problems. A key feature of our strategy is the implementation of proportional fairness among communication users, which aligns with system resource limitations while meeting the minimum rate requirements for all users. This facet of our method not only enhances system efficiency and responsiveness but also ensures a fair distribution of resources. Through extensive simulations, the efficacy of the proposed solutions in optimizing DFRC system performance is validated, illustrating their applicability in integrated sensing and communication (ISAC) scenarios. Our findings contribute to the evolving field of DFRC systems, offering new perspectives and solutions for the challenges in array selection and beamforming optimisation.


[124] 2412.17881

Antenna Health-Aware Selective Beamforming for Hardware-Constrained DFRC Systems II

This study introduces an innovative beamforming design approach that incorporates the reliability of antenna array elements into the optimization process, termed "antenna health-aware selective beamforming". This method strategically focuses transmission power on more reliable antenna elements, thus enhancing system resilience and operational integrity. By integrating antenna health information and individual power constraints, our research leverages advanced optimization techniques such as the Group Proximal-Gradient Dual Ascent (GPGDA) to efficiently address nonconvex challenges in sparse array selection. Applying the proposed technique to a Dual-Functional Radar-Communication (DFRC) system, our findings highlight that increasing the sparsity promotion weight ($\rho_s$) generally boosts spectral efficiency and communication data rate, achieving perfect system reliability at higher $\rho_s$ values but also revealing a performance threshold beyond which further sparsity is detrimental. This underscores the importance of balanced sparsity in beamforming for optimizing performance, particularly in critical communication and defense applications where uninterrupted operation is crucial. Additionally, our analysis of the time complexity and power consumption associated with GPGDA underscores the need for optimizing computational resources in practical implementations.


[125] 2501.11755

A generalizable 3D framework and model for self-supervised learning in medical imaging

Current self-supervised learning methods for 3D medical imaging rely on simple pretext formulations and organ- or modality-specific datasets, limiting their generalizability and scalability. We present 3DINO, a cutting-edge SSL method adapted to 3D datasets, and use it to pretrain 3DINO-ViT: a general-purpose medical imaging model, on an exceptionally large, multimodal, and multi-organ dataset of ~100,000 3D medical imaging scans from over 10 organs. We validate 3DINO-ViT using extensive experiments on numerous medical imaging segmentation and classification tasks. Our results demonstrate that 3DINO-ViT generalizes across modalities and organs, including out-of-distribution tasks and datasets, outperforming state-of-the-art methods on the majority of evaluation metrics and labeled dataset sizes. Our 3DINO framework and 3DINO-ViT will be made available to enable research on 3D foundation models or further finetuning for a wide range of medical imaging applications.


[126] 2506.19357

Revisiting Power System Stabilizers with Increased Inverter-Based Generation: A Case Study

As power systems evolve with increasing production from Inverter-Based Resources (IBRs), their underlying dynamics are undergoing significant changes that can jeopardize system operation, leading to poorly damped oscillations or small-signal rotor angle instability. In this work, we investigate whether Power System Stabilizer (PSS) setting adjustments can effectively restore system stability and provide adequate damping in systems with increased IBR penetration, using the benchmark Kundur Two-Area System as a case study. Specifically, we evaluate the model-based Residues and P-Vref PSS tuning methods to examine their effectiveness under evolving grid conditions. Our findings indicate that the effectiveness of these tuning methods is not guaranteed, particularly when coordination is limited. Consequently, our case study motivates local and adaptive online PSS tuning methods.


[127] 2507.07805

Tunable Real-Time Safety Filters via Set-Based Control Barrier Functions

Safety filters for industrial constrained systems are required to combine certified constraint satisfaction, predictable online computation, and a transparent tuning interface. Existing set-based filters are based on a well-established control invariant set design that scales favorably with state and input constraints, but typically intervene only at the set boundary. Control barrier function (CBF)-based filters, by contrast, provide tunable intervention but require a scalar barrier construction. This paper proposes a set-based CBF safety filter that turns a convex control invariant set directly into a tunable barrier via its Minkowski functional. The resulting filter is formulated as a single-level quadratic program (QP) in which one class-$\mathcal{K}^e$ parameter sets the intervention aggressiveness. Explicit convex formulations are derived for polytopic, zonotopic, and MPC-based invariant sets. Under standard bounded-disturbance assumptions, the resulting safety filter guarantees constraint satisfaction and asymptotic recovery into the invariant set. For tight real-time budgets, a learning-based approximation enables online acceleration, while the formal safety guarantees remain tied to the exact formulation. The method is validated in numerical studies and on a permanent-magnet synchronous motor drive, where an explicit QP implementation evaluates within a 150 microseconds sampling window and has a worst-case execution time of 28.04 microseconds.


[128] 2508.00724

Petri Net Modeling and Deadlock-Free Scheduling of Attachable Heterogeneous AGV Systems

The increasing demand for flexible automation has accelerated the adoption of heterogeneous automated guided vehicles (AGVs). This work investigates a new scheduling problem in a material transportation system consisting of attachable heterogeneous AGVs, including carriers and shuttles, that flexibly attach and detach for cooperative task execution. While such collaboration enhances operational efficiency, the attachment-induced synchronization renders the system highly coupled and susceptible to deadlocks. To address this, we propose a Petri net (PN)-based deadlock-free scheduling framework integrated into an adaptive large neighborhood search (ALNS) algorithm. The PN is introduced to map candidate solutions from static permutations into dynamic collaborative processes, enabling performance evaluation via state evolution and proactive deadlock prevention through structural analysis. Extensive experiments on real-world and synthetic instances demonstrate that the proposed framework significantly improves computational efficiency, with the developed ALNS outperforming the current on-site policy, exact solvers, and state-of-the-art metaheuristics. Finally, sensitivity analysis yields managerial insights for optimal fleet sizing.


[129] 2508.05163

Resilience metrics to guide back-up investments in the power system during extreme weather

Security of supply is a common and important concern when integrating renewables in net-zero power systems. Extreme weather affects both demand and supply leading to power system stress; in Europe this stress spreads continentally beyond the meteorological root cause. We use an approach based on shadow prices to identify periods of elevated stress called system-defining events and analyse their impact on the power system. By classifying different types of system-defining events, we identify challenges to power system operation and planning. Crucially, we find the need for sufficient resilience back-up (power) capacities whose financial viability is precarious due to weather variability and weather-induced risk. Furthermore, we disentangle short- and long-term resilience challenges (from multi-day to annual scale) with distinct metrics and stress tests to incorporate both into future energy modelling assessments. Our methodology and implementation in an open energy system model (PyPSA-Eur) can be re-applied to other systems and help researchers and policymakers in building more resilient and adequate energy systems.


[130] 2508.08217

Autonomous Air-Ground Vehicle Operations Optimization in Hazardous Environments: A Multi-Armed Bandit Approach

Hazardous environments such as chemical spills, radiological zones, and bio-contaminated sites pose significant threats to human safety and public infrastructure. Rapid and reliable hazard mitigation in these settings often unsafe for humans, calling for autonomous systems that can adaptively sense and respond to evolving risks. This paper presents a decision-making framework for autonomous vehicle dispatch in hazardous environments with uncertain and evolving risk levels. The system integrates a Bayesian Upper Confidence Bound (BUCB) sensing strategy with task-specific vehicle routing problems with profits (VRPP), enabling adaptive coordination of unmanned aerial vehicles (UAVs) for hazard sensing and unmanned ground vehicles (UGVs) for cleaning. Using VRPP allows selective site visits under resource constraints by assigning each site a visit value that reflects sensing or cleaning priorities. Site-level hazard beliefs are maintained through a time-weighted Bayesian update. BUCB scores guide UAV routing to balance exploration and exploitation under uncertainty, while UGV routes are optimized to maximize expected hazard reduction under resource constraints. Simulation results demonstrate that our framework reduces the number of dispatch cycles to resolve hazards by around 30% on average compared to uninformed baseline dispatch strategies, underscoring the value of uncertainty-aware vehicle dispatch for reliable hazard mitigation.


[131] 2510.04593

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Large language models (LLMs) have demonstrated promising performance in both automatic speech recognition (ASR) and text-to-speech (TTS) systems, gradually becoming the mainstream approach. However, most current approaches address these tasks separately rather than through a unified framework. This work aims to integrate these two tasks into one unified model. Although discrete speech tokenization enables joint modeling, its inherent information loss limits performance in both recognition and generation. In this work, we present UniVoice, a unified LLM framework through continuous representations that seamlessly integrates speech recognition and synthesis within a single model. Our approach combines the strengths of autoregressive modeling for speech recognition with flow matching for high-quality generation. To mitigate the inherent divergence between autoregressive and flow-matching models, we further design a dual attention mechanism, which switches between a causal mask for recognition and a bidirectional attention mask for synthesis. Furthermore, the proposed text-prefix-conditioned speech infilling method enables high-fidelity zero-shot voice cloning. Experimental results demonstrate that our method can achieve or exceed current single-task modeling methods in both ASR and zero-shot TTS tasks. This work explores new possibilities for end-to-end speech understanding and generation. Code is available at this https URL.


[132] 2510.05478

AQA-TTRL: Self-Adaptation in Audio Question Answering with Test-Time Reinforcement Learning

Large Audio Language Models (LALMs) exhibit strong capabilities in general audio understanding but remain static after deployment, limiting their adaptability to real-world data. Since supervised fine-tuning is costly, we propose AQA-TTRL, a novel framework for audio understanding that enables on-the-fly evolution via test-time reinforcement learning using only unlabeled test data. It generates pseudo-labels via majority voting and optimizes the model through reinforcement learning. To address the noise in self-generated labels, we introduce confidence weighting to adjust training signals. Furthermore, multiple-attempt sampling mitigates advantage collapse and stabilizes training. Across MMAU, MMAR, and MMSU, AQA-TTRL achieves significant average improvements of 4.42% for Qwen2.5-Omni 7B and 11.04% for the 3B model. Notably, the adapted 3B model outperforms direct inference of the unadapted 7B model, highlighting the effectiveness of test-time adaptation in audio understanding.


[133] 2510.19608

Optimal Kron-based Reduction of Networks (Opti-KRON) for Three-phase Distribution Feeders

This paper presents a novel structure-preserving, Kron-based reduction framework for unbalanced distribution feeders. The method aggregates electrically similar nodes within a mixed-integer optimization (MIP) problem to produce reduced networks that optimally reproduce the voltage profiles of the original full network. To overcome computational bottlenecks of MIP formulations, we propose an exhaustive-search formulation to identify optimal aggregation decisions while enforcing voltage margin limits. The proposed exhaustive network reduction algorithm is parallelizable on GPUs, which enables scalable network reduction. The resulting reduced networks approximate the full system's voltage profiles with low errors and are suitable for steady-state analysis and optimal power flow studies. The framework is validated on two real utility distribution feeders with 5,991 and 8,381 nodes. The reduced models achieve up to 90% and 80% network reduction, respectively, while the maximum voltage-magnitude error remains below 0.003 p.u. Furthermore, on a 1000-node version of the network, the GPU-accelerated reduction algorithm runs up to 15x faster than its CPU-based counterpart.


[134] 2511.03603

Artificial-reference tracking MPC with probabilistically validated performance on industrial embedded systems

Industrial embedded systems are typically used to execute simple control algorithms due to their low computational resources. Despite these limitations, the implementation of advanced control techniques such as Model Predictive Control (MPC) has been explored by the control community in recent years, typically considering simple linear formulations or explicit ones to facilitate the online computation of the control input. These simplifications often lack features and properties that are desirable in real-world environments. This article presents an efficient implementation for embedded systems of MPC for tracking with artificial reference, solved via a recently developed structure-exploiting ADMM-based algorithm. This formulation is tailored to a wide range of applications by incorporating essential practical features at a small computational cost, including integration with an offset-free scheme, back-off parameters that enable constraint tightening, and soft constraints that preserve feasibility under disturbances or plant-model mismatch. This is accompanied with a framework for probabilistic performance validation of the closed-loop system over long-term operation. The applicability of the approach is illustrated on a Programmable Logic Controller (PLC), incorporated in a hardware-in-the-loop setup to control a nonlinear continuous stirred-tank reactor. The behavior of the closed-loop system is probabilistically validated with respect to constraint violations and the number of iterations required at each time step by the MPC optimization algorithm.


[135] 2511.18493

SAGE: Shape-Adapting Gated Experts for Adaptive Histopathology Image Segmentation

The significant variability in cell size and shape continues to pose a major obstacle in computer-assisted cancer detection on gigapixel Whole Slide Images (WSIs), due to cellular heterogeneity. Current CNN-Transformer hybrids use static computation graphs with fixed routing. This leads to extra computation and makes it harder to adapt to changes in input. We propose Shape-Adapting Gated Experts (SAGE), an input-adaptive framework that enables dynamic expert routing in heterogeneous visual networks. SAGE reconfigures static backbones into dynamically routed expert architectures via a dual-path design with hierarchical gating and a Shape-Adapting Hub (SA-Hub) that harmonizes feature representations across convolutional and transformer modules. Embodied as SAGE with ConvNeXt and Vision Transformer UNet (SAGE-ConvNeXt+ViT-UNet), our model achieves a Dice score of 95.23% on EBHI, DSC scores of 92.78% and 91.42% on GlaS Test A and Test B, respectively, and 91.26% DSC at the WSI level on DigestPath, while exhibiting robust generalization under distribution shifts by adaptively balancing local refinement and global context. SAGE establishes a scalable foundation for dynamic expert routing in visual networks, thereby facilitating flexible visual reasoning. Project page: this https URL


[136] 2511.22836

Power System Robust State Estimation As a Layer: An Optimization-embedded End-to-end Learning Approach

Serving as an essential prerequisite for modern power system operation, robust state estimation (RSE) could effectively resist noises and outliers in measurements. The emerging neural network (NN) based end-to-end (E2E) learning framework enables real-time application of RSE but potentially yields solutions that are statistically accurate yet physically inconsistent. To bridge this gap, this work proposes a novel E2E learning based RSE framework, where the convex-relaxed RSE problem is innovatively constructed as an explicit differentiable layer into an NN as the first trial. This optimization-embedded layer (termed as `Opt-Layer` in our work) serves as a solver of the RSE problem. Then, the relaxed solutions are recovered through post-processing layers. Through seamlessly embedding the underlying KKT conditions into the gradients during backward propagation, the physical consistency in the estimated states could be significantly enhanced, realizing lower measurement residuals. Also, the measurement weights are treated as learnable parameters of NN to enhance estimation robustness, enabling the Opt-Layer to actively denoise. A hybrid loss function is formulated to pursue accurate and physically consistent solutions. Extensive simulations have been carried out to demonstrate that the proposed framework can significantly improve the SE performance especially in terms of physical consistency on eight test systems, in comparison to classical E2E learning models, physics-informed NN (PINN) models, graph-based learning models, and conventional optimization-based approaches. The estimation performances under partial observability, severe noise contamination are systematically evaluated. Computational complexity and runtime analysis are also comprehensively demonstrated.


[137] 2512.00707

Urban Macro/Microcellular Channel Characterization at 4.85 GHz With Literature-Referenced Upper-FR1-to-FR3 Cross-Band Analysis

The transition from 5G to 6G requires frequency-dependent, physically consistent radio channel models across the upper-FR1/FR3 transition region, particularly in the underexplored 4--8 GHz region targeted in the current WRC-27 studies, where outdoor urban channel measurements and characterizations remain scarce. This paper presents a 4.85 GHz measurement-anchored study of urban channels and a literature-referenced cross-band analysis. Double-directional measurements were conducted at 4.85 GHz in urban macrocell (UMa) and urban microcell (UMi) routes in Yokohama, Japan, from which path loss, delay spread (DS), azimuth spread of arrival/departure (ASA/ASD), K-factor, and route-dependent spatial-consistency statistics were extracted. To align these results in a broader cross-band context, the measured 4.85 GHz large-scale parameter (LSP) means were combined with scenario-matched literature anchors to derive log-log trends for DS, ASA, and ASD over an approximately 4--28 GHz range around the 7.125 GHz upper-FR1/FR3 cross-band boundary. The resulting trends were compared with 3GPP UMa/UMi reference parameterizations over the same interval, and the sensitivity of the UMi DS fit was examined via leave-one-out analysis. Because the cross-band analysis still relies on a single in-house measurement band alongside heterogeneous anchors from different campaigns, it is presented as measurement-informed and indicative rather than as a definitive multi-band model. The paper therefore contributes both a detailed, parameterized 4.85 GHz urban measurement reference and a bounded literature-referenced view of channel behavior near the upper-FR1/FR3 transition


[138] 2512.03703

Pixel-based Reconfigurable Beamforming Networks Emulating Physical Movement in FAS

The concept of Fluid Antenna Systems (FAS) has emerged as an attractive new system technology for use in sixth-generation (6G) wireless systems. However, most FAS implementations rely on mechanical antenna movement and thus are too slow to be useful. In this paper, a novel pixel-based reconfigurable beamforming network (PRBFN) is used to emulate movement in Fluid Antenna Systems (FASs). Using the insight that changing an antenna's physical position is equivalent to changing radiation patterns that satisfy the desired pattern correlation, the PRBFN is used to control the excitation current vectors of a multi-port antenna, thereby governing the pattern correlation. Key novelties of our work involve the selection of current vectors, and the methodology for scaling the PRBFN to realize large-aperture FAS. Results are provided for our PRBFN combined with an FAS (denoted as a PRBFN-FAS) when the equivalent physical movement is set to 1.5 wavelengths. Measurements demonstrate that the PRBFN-FAS provides the desired spatial correlation, including the Bessel function relation from Clarke's model across a 5\% bandwidth, satisfying FAS requirements. System-level experiments confirm the viability of the PRBFN-FAS in communication scenarios.


[139] 2512.03767

CaFTRA: Frequency-Domain Correlation-Aware Feedback-Free MIMO Transmission and Resource Allocation for 6G and Beyond

The fundamental designs of wireless systems toward AI-Native 6G and beyond are driven by the need for ever-increasing demand of mobile data traffic, extreme spectral efficiency, and adaptability across diverse service scenarios. To overcome the limitations posed by feedback-based multiple-input and multiple-output (MIMO) transmission, we propose a novel frequency-domain Correlation-aware Feedback-free MIMO Transmission and Resource Allocation (CaFTRA) framework tailored for fully-decoupled radio access networks (FD-RAN) to meet the emerging requirements of AI-Native 6G and beyond. By leveraging artificial intelligence (AI), CaFTRA effectively eliminates real-time uplink feedback by predicting channel state information (CSI) based solely on user geolocation. We introduce a Learnable Queries-driven Transformer Network for CSI mapping from user geolocation, which utilizes multi-head attention and learnable query embeddings to accurately capture frequency-domain correlations among resource blocks (RBs), thereby significantly improving the precision of CSI prediction. Once base stations (BSs) adopt feedback-free transmission, their downlink transmission coverage can be significantly expanded due to the elimination of frequent uplink feedback. To enable efficient resource scheduling under such extensive-coverage scenarios, we apply a low-complexity many-to-one matching theory-based algorithm for efficient multi-BS association and multi-RB resource allocation, which is proven to converge to a stable matching within limited iterations. Simulation results demonstrate that CaFTRA achieves stable matching convergence and significant gains in spectral efficiency and user fairness compared to 5G, underscoring its potential value for 6G standardization efforts.


[140] 2512.20978

GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model

Language Model (LM)-based generative modeling has emerged as a promising direction for TSE, offering potential for improved generalization and high-fidelity speech. We propose GenTSE, a two-stage decoder-only generative LM for TSE: Stage-1 predicts coarse semantic tokens, and Stage-2 generates fine acoustic tokens. Separating semantics and acoustics stabilizes decoding and yields more accurate target speech. Both stages use continuous SSL or codec embeddings, offering richer context than discretized-prompt methods. To reduce exposure bias, we employ a Frozen-LM Conditioning training strategy that conditions the LMs on predicted tokens from earlier checkpoints to reduce the gap between teacher-forcing training and autoregressive inference. We further apply DPO to better align outputs with perceptual preferences. Experiments on Libri2Mix show that GenTSE surpasses previous LM-based systems in speech quality, intelligibility, and speaker consistency.


[141] 2601.04069

Hybrid Downlink Beamforming with Outage Constraints under Imperfect CSI using Model-Driven Deep Learning

We consider energy-efficient multi-user hybrid downlink beamforming (BF) and power allocation under imperfect channel state information (CSI) and probabilistic outage constraints. In this domain, classical optimization methods resort to computationally costly conic optimization problems. Meanwhile, generic deep network (DN) architectures lack interpretability and require large training data sets to generalize well. In this paper, we therefore propose a lightweight model-aided deep learning architecture based on a greedy selection algorithm for analog beam codewords. The architecture relies on an instance-adaptive augmentation of the signal model to estimate the impact of the CSI error. To learn the DN parameters, we derive a novel and efficient implicit representation of the nested constrained BF problem and prove sufficient conditions for the existence of the corresponding gradient. In the loss function, we utilize an annealing-based approximation of the outage compared to conventional quantile-based loss terms. This approximation adaptively anneals towards the exact probabilistic constraint depending on the current level of quality of service (QoS) violation. Simulations validate that the proposed DN can achieve the nominal outage level under CSI error due to channel estimation and channel compression, while allocating less power than benchmarks. Thereby, a single trained model generalizes to different numbers of users, QoS requirements and levels of CSI quality. We further show that the adaptive annealing-based loss function can accelerate the training and yield a better power-outage trade-off.


[142] 2601.04178

Sound Event Detection with Boundary-Aware Optimization and Inference

Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes the need for post-processing hyperparameter tuning, and scales to achieve new state-of-the-art performance across all AudioSet Strong classes.


[143] 2601.15790

Adaptive Non-Uniform Sampling of Bandlimited Signals via Algorithm-Encoder Co-Design

We propose an adaptive non-uniform sampling framework for bandlimited signals based on an algorithm-encoder co-design perspective. By revisiting the convergence analysis of iterative reconstruction algorithms for non-uniform measurements, we derive a local, energy-based sufficient condition that governs reconstruction behavior as a function of the signal and derivative energies within each sampling interval. Unlike classical approaches that impose a global Nyquist-type bound on the inter-sample spacing, the proposed condition permits large gaps in slowly varying regions while enforcing denser sampling only where the signal exhibits rapid temporal variation. Building on this theoretical insight, we design a variable-bias, variable-threshold integrate-and-fire time encoding machine (VBT-IF-TEM) whose firing mechanism is explicitly shaped to enforce the derived local convergence condition. To ensure robustness, a shifted-signal formulation is introduced to suppress excessive firing in regions where the magnitude of the signal amplitude is close to zero or the local signal energy approaches zero. Using the proposed encoder, an analog signal is discretely represented by time encodings and signal averages, enabling perfect reconstruction via a standard iterative algorithm even when the local sampling rate falls below the Nyquist rate. Simulation results on synthetic signals and experiments on ultrasonic guided-wave and ECG signals demonstrate that the proposed framework achieves substantial reductions in sampling density compared to uniform sampling and conventional IF-TEMs, while maintaining accurate reconstruction. The results further highlight a controllable tradeoff between sampling density, reconstruction accuracy, and convergence behavior, which can be navigated through adaptive parameter selection.


[144] 2601.23231

Solving Inverse Problems with Flow-based Models via Model Predictive Control

Flow-based generative models provide strong unconditional priors for inverse problems, but guiding their dynamics for conditional generation remains challenging. Recent work casts training-free conditional generation in flow models as an optimal control problem; however, solving the resulting trajectory optimisation is computationally and memory intensive, requiring differentiation through the flow dynamics or adjoint solves. We propose MPC-Flow, a model predictive control framework that formulates inverse problem solving with flow-based generative models as a sequence of control sub-problems, enabling practical optimal control-based guidance at inference time. We provide theoretical analysis linking MPC-Flow to the underlying optimal control objective and show how different algorithmic choices yield a spectrum of guidance algorithms, including regimes that avoid backpropagation through the generative model trajectory. We evaluate MPC-Flow on benchmark image restoration tasks, spanning linear and non-linear settings such as in-painting, deblurring, and super-resolution, and demonstrate strong performance and scalability to massive state-of-the-art architectures via training-free guidance of FLUX.2 (32B) in a quantised setting on consumer hardware.


[145] 2602.05483

Toward Operationalizing Rasmussen: Drift Observability on the Simplex for Evolving Systems

Software operations increasingly rely on SLOs, traces, deployment specifications, and change events, yet dashboards and thresholding practices often expose share-like operational signals as separate scalar panels or baseline distances. This can create false alarms under benign redistribution and miss movement toward policy boundaries. Rasmussen's dynamic safety model motivates drift under competing pressures, but operationalizing it for software is difficult because relevant state variables (remaining margin, engineering effort, and risk/impact) are often compositional and their parts evolve. We formulate an automated, artifact-derived drift-monitor design that maps changing software artifacts into a stable compositional monitoring state: it extracts a current part inventory and policy constraints, maps telemetry to a positive composition, stabilizes splits, merges, and renames through lineage-aware canonical groups, and analyzes boundary-directed drift in log-ratio coordinates. The proposed monitor would report drift direction, step-to-boundary, balance-level attribution, and model-health indicators under architectural churn. We specify the approach, identify its zero/noise/lineage assumptions, and report a reproducible synthetic sanity check of boundary-aware drift and controlled part churn.


[146] 2602.07977

Detect, Attend and Extract: Keyword Guided Target Speaker Extraction

Target speaker extraction (TSE) aims to extract the speech of a target speaker from mixtures containing multiple competing speakers. Conventional TSE systems predominantly rely on speaker cues, such as pre-enrolled speech, to identify and isolate the target speaker. However, in many practical scenarios, clean enrollment utterances are unavailable, limiting the applicability of existing approaches. In this work, we propose DAE-TSE, a keyword-guided TSE framework that specifies the target speaker through distinct keywords they utter. By leveraging keywords (i.e., partial transcriptions) as cues, our approach provides a flexible and practical alternative to enrollment-based TSE. DAE-TSE follows the Detect-Attend-Extract (DAE) paradigm: it first detects the presence of the given keywords, then attends to the corresponding speaker based on the keyword content, and finally extracts the target speech. Experimental results demonstrate that DAE-TSE outperforms standard TSE systems that rely on clean enrollment speech. To the best of our knowledge, this is the first study to utilize partial transcription as a cue for specifying the target speaker in TSE, offering a flexible and practical solution for real-world scenarios. Our code and demo page are now publicly available.


[147] 2602.08273

Pitot-Aided Attitude and Air Velocity Estimation with Almost Global Asymptotic Stability Guarantees

This paper investigates the problem of attitude and air velocity estimation for fixed-wing unmanned aerial vehicles (UAVs) using IMU measurements and at least one Pitot tube measurement, with almost global asymptotic stability (AGAS) guarantees. A cascade observer architecture is developed, in which a Riccati/Kalman-type filter estimates the body-fixed frame air velocity and the vehicle's tilt using IMU data as inputs and Pitot measurements as outputs. Under mild excitation conditions, the resulting air velocity and tilt estimation error dynamics are shown to be uniformly observable. The estimated tilt is then combined with magnetometer measurements in a nonlinear observer on SO(3) to recover the full attitude. Rigorous analysis establishes AGAS of the overall cascade structure under the uniform observability (UO) condition. The effectiveness of the proposed approach is demonstrated through validation on real flight data.


[148] 2602.14590

Learning Dirac Spectral Transforms for Topological Signals

The Dirac operator provides a unified framework for processing signals defined over different order topological domains, such as node and edge signals. Its eigenmodes define a spectral representation that inherently captures cross-domain interactions, in contrast to conventional Hodge-Laplacian eigenmodes that operate within a single topological dimension. In this paper, we compare the two alternatives in terms of the distortion/sparsity trade-off and we show how an overcomplete basis built concatenating the two dictionaries can provide better performance with respect to each approach. Then, we propose a parameterized nonredundant transform whose eigenmodes incorporate a mode-specific mass parameter that captures the interplay between node and edge modes. Interestingly, we show that learning the mass parameters from data makes the proposed transform able to achieve the best distortion-sparsity tradeoff with respect to both complete and overcomplete bases.


[149] 2602.15519

Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios

Target speech extraction (TSE) typically relies on pre-recorded high-quality enrollment speech, which disrupts user experience and limits feasibility in spontaneous interaction. In this paper, we propose Enroll-on-Wakeup (EoW), a novel framework where the wake-word segment, captured naturally during human-machine interaction, is automatically utilized as the enrollment reference. This eliminates the need for pre-collected speech to enable a seamless experience. We perform the first systematic study of EoW-TSE, evaluating advanced discriminative and generative models under real diverse acoustic conditions. Given the short and noisy nature of wake-word segments, we investigate enrollment augmentation using LLM-based TTS. Results show that while current TSE models face performance degradation in EoW-TSE, TTS-based assistance significantly enhances the listening experience, though gaps remain in speech recognition accuracy.


[150] 2602.18777

Mind the Gap: Detecting Cluster Exits for Robust Local Density-Based Score Normalization in Anomalous Sound Detection

Local density-based score normalization is an effective component of distance-based embedding methods for anomalous sound detection, particularly when data densities vary across conditions or domains. In practice, however, performance depends strongly on neighborhood size. Increasing it can degrade detection accuracy when neighborhood expansion crosses cluster boundaries, violating the locality assumption of local density estimation. This observation motivates adapting the neighborhood size based on locality preservation rather than fixing it in advance. We realize this by proposing cluster exit detection, a lightweight mechanism that identifies distance discontinuities and selects neighborhood sizes accordingly. Experiments across multiple embedding models and datasets show improved robustness to neighborhood-size selection and consistent performance gains.


[151] 2602.20967

Training-Free Intelligibility-Guided Observation Addition for Noisy ASR

Automatic speech recognition (ASR) degrades severely in noisy environments. Although speech enhancement (SE) front-ends effectively suppress background noise, they often introduce artifacts that harm recognition. Observation addition (OA) addressed this issue by fusing noisy and SE enhanced speech, improving recognition without modifying the parameters of the SE or ASR models. This paper proposes an intelligibility-guided OA method, where fusion weights are derived from intelligibility estimates obtained directly from the backend ASR. Unlike prior OA methods based on trained neural predictors, the proposed method is training-free, reducing complexity and enhances generalization. Extensive experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines. Additional analyses of intelligibility-guided switching-based alternatives and frame versus utterance-level OA further validate the proposed design.


[152] 2602.23958

An Empirical Analysis of Task-Induced Encoder Bias in Fréchet Audio Distance

Fréchet Audio Distance (FAD) is the de facto standard for evaluating text-to-audio generation, yet its scores depend on the underlying encoder's embedding space. An encoder's training task dictates which acoustic features are preserved or discarded, causing FAD to inherit systematic task-induced biases. We decompose evaluation into Recall, Precision, and Alignment (split into semantic and structural dimensions), using log-scale normalization for fair cross-encoder comparison. Controlled experiments on six encoders across two datasets reveal a four-axis trade-off: reconstruction-based AudioMAE leads precision sensitivity; ASR-trained Whisper dominates structural detection but is blind to signal degradation; classification-trained VGGish maximizes semantic detection but penalizes legitimate intra-class variation. Since no single encoder is a universal evaluator, future metrics must shift toward evaluation-native encoders intrinsically aligned with human perception.


[153] 2603.08977

Universal Speech Content Factorization

We propose Universal Speech Content Factorization (USCF), a simple and invertible linear method for extracting a low-rank speech representation in which speaker timbre is suppressed while phonetic content is preserved. USCF extends Speech Content Factorization, a closed-set voice conversion (VC) method, to an open-set setting by learning a universal speech-to-content mapping via least-squares optimization and deriving speaker-specific transformations from only a few seconds of target speech. We show through embedding analysis that USCF effectively removes speaker-dependent variation. As a zero-shot VC system, USCF achieves competitive intelligibility, naturalness, and speaker similarity compared to methods that require substantially more target-speaker data or additional neural training. Finally, we demonstrate that as a training-efficient timbre-disentangled speech feature, USCF features can serve as the acoustic representation for training timbre-prompted text-to-speech models. Speech samples and code are publicly available.


[154] 2603.11669

SEMamba++: A General Speech Restoration Framework Leveraging Global, Local, and Periodic Spectral Patterns

General speech restoration demands techniques that can interpret complex speech structures under various distortions. While State-Space Models like SEMamba have advanced the state-of-the-art in speech denoising, they are not inherently optimized for critical speech characteristics, such as spectral periodicity or multi-resolution frequency analysis. In this work, we introduce an architecture tailored to incorporate speech-specific features as inductive biases. In particular, we propose the Global, Local, and Periodic (GLP) module, a frequency feature extraction block that effectively and efficiently leverages the properties of frequency bins. Then, we design a multi-resolution parallel time-frequency dual-processing block to capture diverse spectral patterns, and a learnable mapping to further enhance model performance. With all our ideas combined, the proposed SEMamba++ achieves the best performance among multiple baseline models while remaining computationally efficient.


[155] 2603.12046

Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition

Audio-Visual Speech Recognition (AVSR) leverages both acoustic and visual information for robust recognition under noise. However, how models balance these modalities remains unclear. We present Dr. SHAP-AV, a framework using Shapley values to analyze modality contributions in AVSR. Through experiments on six models across two benchmarks and varying SNR levels, we introduce three analyses: Global SHAP for overall modality balance, Generative SHAP for contribution dynamics during decoding, and Temporal Alignment SHAP for input-output correspondence. Our findings reveal that models shift toward visual reliance under noise yet maintain high audio contributions even under severe degradation. Modality balance evolves during generation, temporal alignment holds under noise, and SNR is the dominant factor driving modality weighting. These findings expose a persistent audio bias, motivating ad-hoc modality-weighting mechanisms and Shapley-based attribution as a standard AVSR diagnostic.


[156] 2603.18949

Heart Artifact Removal in Electrohysterography Measurements Using Algebraic Differentiators

Electrohysterography (EHG) enables non-invasive monitoring of uterine contractions but can be contaminated by electrocardiogram (ECG) artifacts. This work presents an ECG removal method using algebraic differentiators, a control-theoretic tool for model-free derivative estimation, that preserves signal shape outside the detected cardiac pulse locations. The differentiator parameters are designed to simultaneously suppress slow physiological artifacts and powerline interference. Cross-channel clustering distinguishes cardiac pulses from localized artifacts, enabling accurate pulse subtraction without auxiliary ECG references. Implemented as a causal FIR filter, the method is validated as a proof of concept on multichannel EHG recordings from one female and one male healthy volunteer and compared to the template subtraction method.


[157] 2603.29001

A Unified Algebraic Framework for Subspace Pruning in Koopman Operator Approximation via Principal Vectors

Finite-dimensional approximations of the Koopman operator rely critically on identifying nearly invariant subspaces. This invariance proximity can be rigorously quantified via the principal angles between a candidate subspace and its image under the operator. To systematically minimize this error, we propose an algebraic framework for subspace pruning utilizing principal vectors. We establish the equivalence of this approach to existing consistency-based methods while providing a foundation for broader generalizations. To ensure scalability, we introduce an efficient numerical update scheme based on rank-one modifications, reducing the computational complexity of tracking principal angles by an order of magnitude. Finally, we demonstrate the effectiveness of our framework through numerical simulations.


[158] 2604.01459

Koopman Subspace Pruning in Reproducing Kernel Hilbert Spaces via Principal Vectors

Data-driven approximations of the infinite-dimensional Koopman operator rely on finite-dimensional projections, where the predictive accuracy of the resulting models hinges heavily on the invariance of the chosen subspace. Subspace pruning systematically discards geometrically misaligned directions to enhance this invariance proximity, which formally corresponds to the largest principal angle between the subspace and its image under the operator. Yet, existing techniques are largely restricted to Euclidean settings. To bridge this gap, this paper presents an approach for computing principal angles and vectors to enable Koopman subspace pruning within a Reproducing Kernel Hilbert Space (RKHS) geometry. We first outline an exact computational routine, which is subsequently scaled for large datasets using randomized Nystrom approximations. Based on these foundations, we introduce the Kernel-SPV and Approximate Kernel-SPV algorithms for targeted subspace refinement via principal vectors. Simulation results validate our approach.


[159] 2604.12804

Grid-Forming Characterization in DC Microgrids

DC microgrids are converter-based electrical networks that are increasingly being used in various applications, including data centers and industrial distribution systems. A central challenge in their operation is maintaining the DC-bus voltage within predefined limits while ensuring overall system stability. Although a wide variety of converter control algorithms has been proposed to achieve these objectives, the literature lacks a clear and physically interpretable framework for evaluating their effectiveness and for classifying and comparing them. Moreover, the grid-forming versus grid-following distinction that exists in AC systems has largely been unexplored in DC microgrids. To address this gap, this paper introduces three novel impedance-based indices that can be used to quantify the voltage-forming and current-forming behavior of a converter. The indices also provide a basis for defining the desired converter behavior that yields superior DC-bus voltage regulation performance. Simulation results illustrate the application of the framework to several representative control strategies and highlight the strengths and limitations of these control algorithms.


[160] 2604.14413

Comprehensive Review of Doppler Shift Localization Methods: Advances, Limitations, and Research Opportunities

Reliable geolocation of non-cooperative emitters in environments where Global Navigation Satellite Systems (GNSS) are unavailable or degraded is a key enabler for spectrum regulation, emergency response, autonomous mobility, and Integrated Sensing and Communication (ISAC) services in 5G/6G systems. Doppler-based techniques - from single-receiver Signal Doppler Frequency (SDF) fixes through multi-node Frequency Difference of Arrival (FDOA) and Direct Position Determination (DPD) to derivative-enhanced and learning-assisted hybrids - exploit radial-velocity-induced frequency shifts as a passive, high-resolution localization cue accessible with commodity software-defined radios, millimeter-wave access points, or acoustic sensors. This review consolidates over a decade of research across radio, acoustic, and satellite domains. It introduces a unifying taxonomy that divides the field into five technique families, outlining their evolution, measurement models, and estimator archetypes. It then compares algebraic, Bayesian, convex, and neural inference frameworks under realistic impairments such as oscillator drift, multipath, and asynchronous clocks, highlighting conditions where derivative Doppler metrics tighten the Cramer-Rao bound with minimal hardware cost. Environment-specific deployments are examined, from urban canyons and GNSS-denied tunnels to underwater, radar, UAV-swarm, and multi-orbit satellite scenarios, with prototype accuracies reaching meter scale using low-size, weight, and power payloads. Finally, the survey distils design recommendations for mobile and tactical operations and identifies open research challenges in frequency-reference integrity, multipath-aware modelling, edge-constrained computation, and trajectory-aware sensing.


[161] 2604.14441

Batch Effects In Brain Foundation Model Embeddings

Foundation models show strong potential for large-scale, high-dimensional biomedical applications, yet their ability to capture relevant neurobiological characteristics remains underexplored. We systematically evaluate embeddings from two neuroimaging foundation models, BrainLM and SwiFT, across multi-site fMRI datasets using a comprehensive evaluation framework. Our results show that foundation model embeddings encode substantial batch-related variability, often dominating diagnosis-related information across heterogeneous datasets. We further investigate how harmonization, applied to reduce batch effects, influences these embeddings. In addition, we find that BrainLM prefers to capture fine-grained regional activity, whereas SwiFT tends to represent interactions between regions, consistent with their respective model architectures. Our study highlights the importance of accounting for batch effects in foundation models and motivates future work on disentangling biologically meaningful signals from acquisition-related variability.


[162] 2604.23595

Adaptive Plug-and-Play Channel Estimation with Consistency Models for MIMO Systems

This paper proposes a consistency-model-based channel estimation algorithm for multiple-input multiple-output (MIMO) systems. The proposed algorithm employs a consistency model (CM) to learn the angle-domain channel distribution and uses the trained CM as a plug-and-play (PnP) generative prior for MIMO channel estimation. The proposed algorithm alternates between a pilot-observation-based data-consistency update and a CM-prior-based denoising update. In addition, the proposed algorithm adaptively selects the penalty parameter according to residual energy and residual whiteness, and adjusts the CM denoising level according to the observed signal-to-noise ratio (SNR), thereby avoiding the performance degradation caused by fixed inference schedules under varying observation conditions. Simulation results show that the proposed algorithm not only reduces the number of inference steps by 50%--90, but also achieves high estimation accuracy and favorable cross-dataset performance.


[163] 2604.26595

Exploring Converter Control Duality in Microgrids: AC Grid-Forming vs DC Droop Control

Power electronic converters are fundamental building blocks of both AC and DC microgrids, enabling the integration of renewable energy sources, energy storage systems, electronic loads, and electric vehicles. In contrast, converter control in DC microgrids has developed along the path of droop control, which is widely adopted for decentralized DC-bus voltage regulation and power sharing. Although these control strategies share certain characteristics, their similarities remain largely unexplored due to the distinct physical domains in which they operate. To bridge this gap, we introduce a novel perspective based on the concept of duality to reveal the underlying isomorphism between the two control approaches. We show that AC grid-forming and DC I--V droop control are duals of each other in several aspects, including: (i) the small-signal model of the converter; (ii) the inner current control structure; (iii) power-sharing mechanisms based on the AC swing equation and DC capacitor power balance; and (iv) disturbance signals and dynamic response. Theoretical analysis, validated through simulations on simple converter setups, illustrates these dualities and provides new insights towards a unified control design.


[164] 2605.13135

Subspace Pruning via Principal Vectors for Accurate Koopman-Based Approximations

The accuracy of Koopman operator approximations over finite-dimensional spaces relies critically on their invariance properties. These can be rigorously quantified via the principal angles between a candidate subspace and its image under the Koopman operator. This paper proposes a unified algebraic framework for subspace pruning designed to systematically refine the invariance error. We establish the geometric equivalence between consistency-based methods and principal-vector pruning, and build on this insight to introduce a hybrid strategy that balances between multiple and single principal vector pruning for improved numerical stability and scalability. We derive error bounds for the retention of approximate and external eigenfunctions, demonstrating that the multi-vector approach mitigates the numerical drift inherent to sequential pruning. To ensure scalability, we develop an efficient numerical update scheme based on rank-one modifications that reduces the computational complexity of tracking principal angles by an order of magnitude. Finally, we exploit the subspace obtained from the pruning algorithms to build a lifted linear model for state prediction that accounts for the trade-offs between improving invariance and minimizing state reconstruction error. Simulations demonstrate the effectiveness of our approach.


[165] 2605.14285

ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on frame-to-frame transition models. However, these models are fragile when observations are non-Markovian (when they form only a partial slice of a higher-dimensional latent state as in real-world weather data): they tend to accumulate errors over long horizons. At the same time, learned DA methods typically commit to a single regime, either filtering (nowcasting, real-time forecasting) or smoothing (retrospective reanalysis), which splits what should be a shared prior across application-specific pipelines. To address both issues, we introduce ForcingDAS, a unified and robust DA framework. Built on Diffusion Forcing with an independent noise level assigned to each frame, ForcingDAS learns a joint-trajectory prior instead of frame-to-frame transitions. This allows it to capture long-horizon temporal dependencies and reduce error accumulation. In addition, the same trained model spans the full filtering to smoothing spectrum at inference time. Specifically, nowcasting, fixed-lag smoothing, and batch reanalysis are selected through the inference schedule alone, without retraining. We evaluate ForcingDAS on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation. Across all settings, a single model is competitive with or outperforms both learned and classical baselines that are specialized for individual regimes, with the largest gains observed on real-world weather benchmarks.


[166] 2605.21895

Rotatable Antenna-Enhanced Wireless Sensing with Uniform Sparse Array via Tensor Decomposition

In this letter, we propose a new wireless sensing system equipped with a rotatable antenna (RA) array to enhance the sensing performance of a uniform sparse array (USA). To tackle the severe spatial undersampling issues, we propose a novel tensor decomposition-based direction-of-arrival (DOA) estimation algorithm. Specifically, we introduce a synchronous multiple rotation pattern for active target probing such that the received signals across multiple rotations to capture the diverse spatial degree of freedoms. Subsequently, we mathematically formulate the received signals across successive rotations as a third-order tensor, and leverage the canonical polyadic decomposition to obtain the factor matrices incorporating the DOA of targets. By analyzing the extrema distribution laws of array steering vector correlation (SVC) and gain SVC of RAs, we propose to combine the array and gain factor matrices via the Kronecker product, which theoretically guarantees the unambiguous DOA estimation. Simulation results demonstrate that the proposed RA-enhanced tensor decomposition-based algorithm achieves high-precision and unambiguous sensing performance compared to conventional uniform dense arrays and omnidirectional antenna systems.


[167] 2605.25531

From Denoising to Decision Making: A Survey on Diffusion Model-Enabled Deep Reinforcement Learning for Wireless Networks

Deep reinforcement learning (DRL) has long been a promising solution for sequential resource management in wireless networks. However, conventional DRL methods are fundamentally limited by their reliance on unimodal policy distributions, inefficient exploration in high-dimensional action spaces, and poor adaptability to dynamic and heterogeneous environments. Meanwhile, diffusion models (DMs) as one of the most powerful families of generative AI have demonstrted remarkable capabilities in modeling complex, multi-modal data distributions across diverse domains. The integration of DMs and DRL has opened a new and rapidly growing research direction, in which DM-enabled policies substantially enhance decision quality by capturing the complex, discontinuous, and multimodal action structures inherent in wireless resource management. In this paper, we present a comprehensive survey of DM-enabled DRL algorithms and their applications for various issues in wireless networks. Particularly, we first provide the theoretical background of DM and present different DM-enabled DRL algorithms. We then systematically review applications of DM-enabled DRL for across computation offloading in mobile edge computing, UAV-assisted, vehicular, and AIGC-driven systems, as well as wireless resource allocation, physical-layer security, and robotics and UAV planning. We conclude the paper by higlight future research directions.


[168] 2606.07048

Geometric Time-Domain Identification of Three-Phase Load Equivalents from Terminal Measurements

This paper presents a geometric time-domain method for identifying three-phase load equivalents from instantaneous voltage and current measurements at the point of common coupling. Measured waveforms are interpreted as trajectories in Euclidean signal spaces, and load-equivalent parameters are recovered from the geometry of those trajectories. The method extends a previously published single-phase geometric identification formulation to three- and four-wire systems and places special emphasis on the three-wire case, where no neutral voltage is measured and the terminal data must satisfy coupled Kirchhoff constraints. The main advance over the earlier analytical formulation is a sampled-data implementation based on local time windows, normalized matrix equations, harmonic-projection derivative and primitive coordinates, explicit geometric identifiability tests, passivity constraints, and energy/Kirchhoff residuals. The method does not force a model when the measured trajectory lacks enough information; instead, it reports low-rank or ill-conditioned windows as low-confidence evidence. Numerical simulations with clean data, measurement noise, window-length sweeps, and sensor delay show that the method accurately identifies informative three-phase trajectories and exposes structurally degenerate cases such as pure single-frequency excitation for higher-order three-wire models. For a given admissible topology the identified circuit closes the instantaneous terminal energy balance of the measured load over the analysis window.


[169] 2606.07076

Branch-Level Energy Localization in Three-Phase Loads: Resolving Indeterminacy in Time-Domain

This paper develops a branch-level energy-localization framework for three-phase loads. The instantaneous terminal power of an admissible lumped equivalent is decomposed uniquely as Joule dissipation plus magnetic and electric stored-energy rates, branch by branch. Three formal results are established: a Branch-Level Localization Theorem (uniqueness given an admissible topology); a Topology-Indeterminacy Theorem (multiple admissible topologies reproduce identical terminal data with distinct localizations); and a Generalized Energetic Duality Theorem that organizes classical electrical dualities (Norton-Thevenin, series--parallel, L vs C, R vs G) as restrictions to Linear Time Invariant (LTI) sinusoidal regimes of a single time-domain principle in which constant-parameter equivalence is replaced by time-varying parameters. The framework is exercised on six test cases including the de Leon--Cohen open-phase paradox, switched-resistive loads, three-wire delta-versus-wye-virtual indeterminacy, fluctuating-phase loads, and a four-wire nonlinear load with hysteretic, linear, and switched branches. The framework is positioned as complementary to IEEE Std. 1459, CPC, instantaneous p-q, and Fryze-Buchholz-Depenbrock: each answers a different question, and the apparent paradoxes vanish once the question is posed precisely.


[170] 2308.07822

Deep reinforcement learning for process design: Review and perspective

The transformation towards renewable energy and feedstock supply in the chemical industry requires new conceptual process design approaches. Recently, breakthroughs in artificial intelligence offer opportunities to accelerate this transition. Specifically, deep reinforcement learning, a subclass of machine learning, has shown the potential to solve complex decision-making problems and aid sustainable process design. We survey state-of-the-art research in reinforcement learning for process design through three major elements: (i) information representation, (ii) agent architecture, and (iii) environment and reward. Moreover, we discuss perspectives on underlying challenges and promising future works to unfold the full potential of reinforcement learning for process design in chemical engineering.


[171] 2312.15946

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make it difficult to generate music-aligned dance movements. 2) the lack of a large-scale music-dance dataset, which hinders the generation of generalized dance movements from music. 3) The protracted nature of dance movements poses a challenge to the maintenance of a consistent dance style. In this work, we introduce the EnchantDance framework, a state-of-the-art method for dance generation. Due to the redundancy of the original dance sequence along the time axis, EnchantDance first constructs a strong dance latent space and then trains a dance diffusion model on the dance latent space. To address the data gap, we construct a large-scale music-dance dataset, ChoreoSpectrum3D Dataset, which includes four dance genres and has a total duration of 70.32 hours, making it the largest reported music-dance dataset to date. To enhance consistency between music genre and dance style, we pre-train a music genre prediction network using transfer learning and incorporate music genre as extra conditional information in the training of the dance diffusion model. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on dance quality, diversity, and consistency.


[172] 2406.07318

Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs

The utilisation of event cameras represents an important and swiftly evolving trend aimed at addressing the constraints of traditional video systems. Particularly within the automotive domain, these cameras find significant relevance for their integration into embedded real-time systems due to lower latency and power consumption. One effective approach to ensure the necessary throughput and latency for event processing is through the utilisation of graph convolutional networks (GCNs). In this study, we introduce a custom EFGCN (Event-based FPGA-accelerated Graph Convolutional Network) designed with a series of hardware-aware optimisations tailored for PointNetConv,a graph convolution designed for point cloud processing. The proposed techniques result in up to 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN), one of the most recent works in the field, with a relatively small decrease in accuracy (2.9% for the N-Caltech101 classification task, 2.2% for the N-Cars classification task), thus following the TinyML trend. We implemented EFGCN on a ZCU104 SoC FPGA platform without any off-chip external memory resources, achieving a throughput of 13.3 million events per second (MEPS) and real-time partially asynchronous processing with low latency. Across multiple event-based classification benchmarks, our approach achieves competitive accuracy while providing state-of-the-art computational efficiency per event, small model size, and high scalability, customisability and resource efficiency. We publish both software and hardware source code in an open repository: this https URL.


[173] 2406.20003

Hyperuniformity and non-hyperuniformity of zeros of Gaussian Weyl-Heisenberg Functions

We study zero sets of twisted stationary Gaussian random functions on the complex plane, i.e., Gaussian random functions that are stochastically invariant under the action of the Weyl-Heisenberg group. This model includes translation-invariant Gaussian entire functions (GEFs), and also many other non-analytic examples, in which case winding numbers around zeros can be either positive or negative. We investigate zero statistics both when zeros are weighted with their winding numbers (charged zero set) and when they are not (uncharged zero set). We show that the variance of the charged zero statistic always grows linearly with the radius of the observation disk (hyperuniformity). Importantly, this holds for functions with possibly non-zero means and without assuming additional symmetries such as radiality. With respect to uncharged zero statistics, we provide an example for which the variance grows with the area of the observation disk (non-hyperuniformity). This is used to show that, while the zeros of GEFs are hyperuniform, the set of their critical points fails to be so. Our work contributes to recent developments in statistical signal processing, where the time-frequency profile of a non-stationary signal embedded into noise is revealed by performing a statistical test on the zeros of its spectrogram ("silent points"). We show that empirical spectrogram zero counts enjoy moderate deviations from their ensemble averages over large observation windows (something that was previously known only for pure noise). In contrast, we also show that spectrogram maxima ("loud points") fail to enjoy a similar property. This gives the first formal evidence for the statistical superiority of silent points over the competing feature of loud points, a fact that has been noted by practitioners.


[174] 2411.11350

Zero and Few Shot Load Forecasting with Large Language Models

Deep learning models have shown strong performance in load forecasting, but they generally require large amounts of data for model training before being applied to new scenarios, which limits their effectiveness in data-scarce scenarios. Inspired by the great success of pre-trained language models (LLMs) in natural language processing, this paper proposes a zero and few shot load forecasting approach using an advanced LLM framework denoted as the Chronos model. By utilizing its extensive pre-trained knowledge, the Chronos model enables accurate load forecasting in data-scarce scenarios. Simulation results across five real-world datasets demonstrate that the Chronos model significantly outperforms nine popular baseline models for both deterministic and probabilistic load forecasting with various forecast horizons (e.g., 1 to 48 hours), even though the Chronos model is neither tailored nor fine-tuned to these specific load datasets. Notably, Chronos reduces root mean squared error (RMSE), continuous ranked probability score (CRPS), and quantile score (QS) by approximately 7.34%-84.30%, 19.63%-60.06%, and 22.83%-54.49%, respectively, compared to baseline models. These results highlight the superiority and flexibility of the Chronos model, positioning it as an effective solution in data-scarce scenarios.


[175] 2501.08238

CodecFake+: Codec-Based Resynthesized Data as a Proxy for Detecting CodecFake Speech

With the rapid advancement of neural audio codecs, codec-based speech generation (CoSG) systems have become highly powerful. Unfortunately, CoSG also enables the creation of highly realistic deepfake speech, making it easier to mimic an individual's voice and spread misinformation. We refer to this emerging deepfake speech generated by CoSG systems as CodecFake. Detecting such CodecFake is an urgent challenge, yet most existing systems primarily focus on detecting fake speech generated by traditional speech synthesis models. In this paper, we introduce CodecFake+, a large-scale dataset designed to advance CodecFake detection. To our knowledge, CodecFake+ is the largest dataset encompassing the most diverse range of codec architectures. The training set is generated through re-synthesis using 31 publicly available open-source codec models, while the evaluation set includes web-sourced data from 17 advanced CoSG models. We also propose a comprehensive taxonomy that categorizes codecs by their root components: vector quantizer, auxiliary objectives, and decoder types. Our proposed dataset and taxonomy enable detailed analysis at multiple levels to discern the key factors for successful CodecFake detection. At the individual codec level, we validate the effectiveness of using codec re-synthesized speech (CoRS) as training data for large-scale CodecFake detection. At the taxonomy level, we show that detection performance is strongest when the re-synthesis model incorporates disentanglement auxiliary objectives or a frequency-domain decoder. Furthermore, from the perspective of using all the CoRS training data, we show that our proposed taxonomy can be used to select better training data for improving detection performance. Overall, we envision that CodecFake+ will be a valuable resource for both general and fine-grained exploration to develop better anti-spoofing models against CodecFake.


[176] 2501.11842

Harnessing Rydberg Atomic Receivers: From Quantum Physics to Wireless Communications

The intrinsic integration of Rydberg atomic receivers into wireless communication systems is proposed, by harnessing the principles of quantum physics in wireless communications. More particularly, we conceive a pair of Rydberg atomic receivers, one incorporates a local oscillator (LO), referred to as an LO-dressed receiver, while the other operates without an LO and is termed an LO-free receiver. The appropriate wireless model is developed for each configuration, elaborating on the receiver's responses to the radio frequency (RF) signal, on the potential noise sources, and on the signal-to-noise ratio (SNR) performance. The developed wireless model conforms to the classical RF framework, facilitating compatibility with established signal processing methodologies. Next, we investigate the associated distortion effects that might occur, specifically identifying the conditions under which distortion arises and demonstrating the boundaries of linear dynamic ranges. This provides critical insights into its practical implementations in wireless systems. Finally, extensive simulation results are provided for characterizing the performance of wireless systems, harnessing this pair of Rydberg atomic receivers. Our results demonstrate that LO-dressed systems achieve a significant SNR gain of approximately 40~50 dB over conventional RF receivers in the standard quantum limit regime. This SNR head-room translates into reduced symbol error rates, enabling efficient and reliable transmission with higher-order constellations.


[177] 2502.01332

A two-disk approach to the synthesis of coherent passive equalizers for linear quantum systems

The coherent equalization problem consists in designing a quantum system acting as a mean-square near-optimal filter for a given quantum communication channel. The paper develops an improved method for the synthesis of transfer functions for such equalizing filters, based on a linear quantum system model of the channel. The method draws on a connection with the two-disk problem of ${H}_{\infty}$ control for classical (i.e., non-quantum) linear uncertain systems. Compared with the previous methods, the proposed method applies to a broader class of linear quantum communication channels.


[178] 2502.15917

Qubit-Efficient Quantum Annealing for Stochastic Unit Commitment

Stochastic Unit Commitment (SUC) has been proposed to manage the uncertainties driven by renewable integration, but it leads to significant computational complexity. When accelerated by Benders Decomposition (BD), the master problem becomes binary integer programming, which is still NP-hard and computationally demanding for classical methods. Quantum Annealing (QA), known for efficiently solving Quadratic Unconstrained Binary Optimization (QUBO) problems, presents a potential solution. However, existing quantum algorithms rely on slack variables to handle linear binary inequality constraints, leading to increased qubit consumption and reduced computational efficiency. To solve the problem, this paper introduces the Powell-Hestenes-Rockafellar Augmented Lagrangian Multiplier (PHR-ALM) method to eliminate the need for slack variables, making qubit consumption independent of the increasing number of Benders cuts. To further reduce the qubit overhead, quantum ADMM is applied to break large-scale SUC into smaller blocks for sequential solutions, which does not scale with the number of generators. Finally, the simulation results on both 4-generator and the IEEE bus-118 systems demonstrate the feasibility and scalability of the proposed algorithm, indicating its superior qubit and runtime efficiency over classical and baseline quantum approaches on the D-Wave QPU platform.


[179] 2502.16584

Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound

Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learning across text and vision, its application to audio remains largely unexplored. A major obstacle is the lack of comprehensive datasets that unify audio understanding and generation. To address this, we introduce Audio-FLAN, a large-scale instruction-tuning dataset covering 80 diverse tasks across speech, music, and sound domains, with over 100 million instances. Audio-FLAN lays the foundation for unified audio-language models that can seamlessly handle both understanding (e.g., transcription, comprehension) and generation (e.g., speech, music, sound) tasks across a wide range of audio domains in a zero-shot manner. The Audio-FLAN dataset is available on HuggingFace and GitHub.


[180] 2505.04753

Hybrid-Field 6D Movable Antenna for Terahertz Communications: Channel Modeling and Estimation

In this work, we study a six-dimensional movable antenna (6DMA)-enhanced Terahertz (THz) network that supports a large number of users with a few antennas by controlling the three-dimensional (3D) positions and 3D rotations of antenna surfaces/subarrays at the base station (BS). However, the short wavelength of THz signals combined with a large 6DMA movement range extends the near-field region. As a result, a user can be in the far-field region relative to the antennas on one 6DMA surface, while simultaneously residing in the near-field region relative to other 6DMA surfaces. Moreover, 6DMA THz channel estimation suffers from increased computational complexity and pilot overhead due to uneven power distribution across the large number of candidate position-rotation pairs, as well as the limited number of radio frequency (RF) chains in THz bands. To address these issues, we propose an efficient hybrid-field generalized 6DMA THz channel model, which accounts for planar wave propagation within individual 6DMA surfaces and spherical waves among different 6DMA surfaces. Furthermore, we propose a low-overhead channel estimation algorithm that leverages directional sparsity to construct a complete channel map for all potential antenna position-rotation pairs. Numerical results show that the proposed hybrid-field channel model achieves a sum rate close to that of the ground-truth near-field channel model and confirm that the channel estimation method yields accurate results with low complexity.


[181] 2509.22497

UAV-Enabled Fluid Antenna Systems for Multi-Target Wireless Sensing over LAWCNs

Fluid antenna system (FAS) is emerging as a key technology for enhancing spatial flexibility and sensing accuracy in future wireless systems. This paper investigates an unmanned aerial vehicle (UAV)-enabled FAS for multi-target wireless sensing in low-altitude wireless consumer networks (LAWCNs) for achieving the low-altitude economy (LAE) missions. We formulate an optimization problem aimed at minimizing the average Cramér-Rao bound (CRB) for multiple target estimations. To tackle this non-convex problem, an efficient alternating optimization (AO) algorithm is proposed, which jointly optimizes the UAV trajectory, the antenna position of the transmit fluid antennas (FAs) and the receive FAs, and the transmit beamforming at the UAV. Simulation results demonstrate significant performance improvements in estimation accuracy and sensing reliability compared to conventional schemes, e.g., the fixed position antenna scheme. The proposed system achieves enhanced sensing performance through adaptive trajectory design and beamforming, alongside effective interference suppression via the flexible FAS antenna repositioning, underscoring its practical potential for precision sensing in the UAV-enabled LAWCNs.


[182] 2510.16028

TAO: Tolerance-Aware Optimistic Verification for Floating-Point Neural Networks

Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present TAO: a Tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. TAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement TAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. Together, TAO reconciles scalability with verifiability for real-world heterogeneous ML compute.


[183] 2511.05355

SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning

Flow matching (FM) has shown promising results in data-driven planning. However, it inherently lacks formal guarantees for ensuring state and action constraints, whose satisfaction is a fundamental and crucial requirement for the safety and admissibility of planned trajectories on various systems. Moreover, existing FM planners do not ensure the dynamical consistency, which potentially renders trajectories inexecutable. We address these shortcomings by proposing SAD-Flower, a novel framework for generating Safe, Admissible, and Dynamically consistent trajectories. Our approach relies on an augmentation of the flow with a virtual control input. Thereby, principled guidance can be derived using techniques from nonlinear control theory, providing formal guarantees for state constraints, action constraints, and dynamic consistency. Crucially, SAD-Flower operates without retraining, enabling test-time satisfaction of unseen constraints. Through extensive experiments across several tasks, we demonstrate that SAD-Flower outperforms various generative-model-based baselines in ensuring constraint satisfaction.


[184] 2511.07938

Decision-Focused Continual Learning for Seaport Power-Logistics Scheduling: Generalization across Varying Tasks

Power-logistics scheduling in modern seaports typically follows a predict-then-optimize pipeline. To enhance the decision quality of predictions, decision-focused learning has been proposed, which aligns the training of forecasting models with downstream decision outcomes. However, this end-to-end design inherently restricts the value of forecasting models to a specific task structure and therefore generalizes poorly to evolving tasks induced by varying vessel arrivals. We address this gap with a decision-focused continual learning framework that adapts online to a stream of scheduling tasks. Specifically, we introduce Fisher-information-based regularization to enhance cross-task generalization by preserving parameters critical to prior tasks. A differentiable convex surrogate is also developed to stabilize gradient backpropagation. The proposed approach enables learning a decision-aligned forecasting model across a varying task stream with sustainable long-term computational and memory requirements. Experiments calibrated to Jurong Port show improved decision performance and cross-task generalization over existing methods, together with reduced computational cost and a bounded memory footprint.


[185] 2601.18840

Bellman Residual Minimization for Control: Geometry, Stationarity, and Convergence

Markov decision problems are most commonly solved via dynamic programming. Another approach is Bellman residual minimization, which directly minimizes the squared Bellman residual objective function. However, compared to dynamic programming, this approach has received relatively less attention, mainly because it is often less efficient in practice and can be more difficult to extend to model-free settings such as reinforcement learning. Nonetheless, Bellman residual minimization has several advantages that make it worth investigating, such as more stable convergence with function approximation for value functions. While Bellman residual methods for policy evaluation have been widely studied, methods for policy optimization (control tasks) have been scarcely explored. In this paper, we establish foundational results for the control Bellman residual minimization for policy optimization.


[186] 2602.05458

Emergence-as-Code as a Foundation for Self-Governing Reliable Systems

Service-level objective (SLO)-as-code tools make per-service reliability declarative, but users experience journeys: end-to-end executions whose availability and tail latency emerge from topology, routing, redundancy, timeouts/fallbacks, shared failure domains, and tail amplification. Journey objectives are therefore often maintained outside code and drift away from the effective runtime graph. We propose Emergence-as-Code (EmaC), a declarative contract that compiles journey-level SLI bounds and governance artifacts for declared SLOs from intent and evidence. An EmaC specification defines a typed journey expression, leaf bindings to atomic SLOs and telemetry, failure-domain assumptions, and guarded actions. Model Discovery proposes evidence-backed deltas for edges, branch probabilities, redundancy groups, and failure-domain hypotheses; each delta carries provenance and confidence. The compiler derives optimistic and pessimistic journey bounds and emits reviewable governance artifacts. An executable checkout replay shows that local SLOs can remain green while evidence-backed discovery changes the failure-domain model, collapses the pessimistic payment-race bound, and changes the rollout decision from pass to fail or review.


[187] 2603.14106

Chaos-Free Networks are Stable Recurrent Neural Networks

Gated Recurrent Neural Networks (RNNs) are widely used for nonlinear system identification due to their high accuracy, although they often exhibit complex, chaotic dynamics that are difficult to analyze. This paper investigates the system-theoretic properties of the Chaos-Free Network (CFN), an architecture originally proposed to eliminate the chaotic behavior found in standard gated RNNs. First, we formally prove that the CFN satisfies Input-to-State Stability (ISS) by design. However, we demonstrate that the CFN architecture does not intrinsically guarantee Incremental ISS (delta-ISS), as ensuring this property relies on specific parametric constraints. To address this, we introduce the Decoupled-Gate Network (DGN), a novel structural variant of the CFN that removes internal state connections in the gating mechanisms. Finally, we prove that the DGN unconditionally satisfies the delta-ISS property, providing an incrementally stable architecture for identifying nonlinear dynamical systems without requiring complex network training modifications. Numerical results confirm that the DGN maintains the modeling capabilities of standard architectures while adhering to these rigorous stability guarantees.


[188] 2603.14374

A Systematic Comparison and Evaluation of Building Ontologies for Deploying Data-Driven Analytics in Smart Buildings

Ontologies play a critical role in data exchange, information integration, and knowledge sharing across diverse smart building applications. Yet, semantic differences between the prevailing building ontologies hamper their purpose of bringing data interoperability and restrict the ability to reuse building ontologies in real-world applications. In this paper, we propose and adopt a framework to conduct a systematic comparison and evaluation of four popular building ontologies (Brick Schema, RealEstateCore, Project Haystack and Google's Digital Buildings) from both axiomatic design and assertions in a use case, namely the Terminological Box (TBox) evaluation and the Assertion Box (ABox) evaluation. In the TBox evaluation, we use the SQuaRE-based Ontology Quality Evaluation (OQuaRE) Framework and concede that Project Haystack and Brick Schema are more compact with respect to the ontology axiomatic design. In the ABox evaluation, we apply an empirical study with sample building data that suggests that Brick Schema and RealEstateCore have greater completeness and expressiveness in capturing the main concepts and relations within the building domain. The results implicitly indicate that there is no universal building ontology for integrating Linked Building Data (LBD). We discuss ontology compatibility and investigate building ontology design patterns (ODPs) to support ontology matching, alignment, and harmonisation.


[189] 2603.14762

Online Learning for Supervisory Switching Control

We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy a suitable controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to a control-theoretic setting. The proposed data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of state history, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the matching controller in $O(N \log^2 N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.


[190] 2603.26763

A Camera-Native Talking-Head Video Dataset for Various Computer Vision Tasks

Talking-head videos constitute a predominant content type in real-time communication, yet publicly available datasets for video processing research in this domain remain scarce and limited in signal fidelity. In this paper, we open-source a camera-native dataset of 847 talking-head recordings (approximately 212 minutes), each 15s in duration, captured from 805 participants using 446 unique consumer webcam devices in their natural environments. All recordings are stored using the FFV1 lossless codec, preserving the camera-native signal -- uncompressed (24.4%) or MJPEG-encoded (75.6%) -- without additional lossy processing. Each recording is annotated with a Mean Opinion Score (MOS) and ten perceptual quality tokens that jointly explain 64.4% of the MOS variance. From this corpus, we curate a stratified benchmarking subset of 120 clips in three content conditions: original, background blur, and background replacement. Codec efficiency evaluation across four datasets and four codecs, namely H.264, H.265, H.266, and AV1, yields VMAF BD-rate savings up to $-71.3\%$ (H.266) relative to H.264, with significant encoder$\times$dataset ($\eta_p^2 = .112$) and encoder$\times$content condition ($\eta_p^2 = .149$) interactions, demonstrating that both content type and background processing affect compression efficiency. A preliminary super-resolution evaluation with four SR models confirms that the dataset significantly affects absolute performance while preserving model rankings, demonstrating applicability beyond codec benchmarking. The dataset offers 5$\times$ the scale of the largest prior talking-head webcam dataset (847 vs. 160 clips) with lossless signal fidelity, establishing a resource for benchmarking video compression, super-resolution, quality assessment, and enhancement models in real-time communication.


[191] 2604.24199

Speech Enhancement Based on Drifting Models

We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step, outperforming multi-step diffusion baselines and establishing a new paradigm for speech enhancement.


[192] 2604.27460

Solution Sets for Inverse Infinite-Horizon Linear-Quadratic Descriptor Differential Games

In this letter, we study a model-based inverse problem for infinite-horizon linear-quadratic differential games with descriptor dynamics. Given an observed feedback strategy profile, we seek to identify all cost functions that rationalize it as a feedback Nash equilibrium; this collection is referred to as the solution set. We characterize the solution set, show that it is rectangular and convex, and provide an algorithm for computing an admissible realization whenever it is nonempty. We also show that, compared with the corresponding inverse problem for standard state-space dynamics, descriptor dynamics modify the geometry of the solution set and may reduce identifiability. Finally, we illustrate the results with numerical examples.


[193] 2605.14610

Parametrically Adaptive Transition Polynomial: a Signed-Parity Continuous-alpha Extension of Kunchenko Stochastic Polynomials

Kunchenko's method of polynomial maximization provides a semiparametric apparatus for parameter estimation under non-Gaussian errors, but its classical power basis relies on finite higher-order integer moments. This paper introduces the Parametrically Adaptive Transition Polynomial (PATP), a signed-parity fractional-power family controlled by a continuous parameter alpha in [0,1]. The quadratic exponent map p_i(alpha) connects the fractal regime p_i(0)=1/i, the degenerate linear point p_i(1/2)=1, and the signed-parity integer-power regime p_i(1)=i. For the degree-S=2 case we derive a closed-form variance-reduction coefficient g_2(alpha) in terms of signed and absolute fractional moments, identify the singular behavior at alpha=1/2, and state the moment and regularity conditions under which the formula is meaningful. The construction should be read as a Form-B PATP analogue within Kunchenko's generalized apparatus, not as an exact recovery of the canonical even-power PMM basis at alpha=1. Numerical illustrations on canonical distributions are used to examine the finite-sample behavior of the signed-parity estimator and to mark the boundary of applicability for extremely heavy-tailed cases such as Cauchy.


[194] 2605.26452

Robust Koopman Control Barrier Filters for Safe Actor-Critic Reinforcement Learning

Safe reinforcement learning (RL) for robotic systems requires policies that improve task performance while satisfying state and input constraints during both training and deployment. Control barrier functions (CBFs) provide a principled mechanism for enforcing forward invariance through minimally invasive safety filters, but their use in model-free RL is limited by the need for accurate dynamics and hand-designed barrier certificates. We propose Robust Koopman-CBF SAC, a safety-filtered actor--critic framework that learns a finite-dimensional Koopman predictor from data, constructs affine CBF constraints in the lifted space, and enforces them through a quadratic-program safety layer. To account for finite-dimensional Koopman approximation error, the CBF condition is tightened using a projected residual margin estimated from held-out rollout data. The critic is trained on the executed safe action, while the actor is regularized toward the Koopman-CBF feasible set, reducing dependence on the filter over training. Across safe-control benchmarks, the method achieves zero constraint violations on CartPole stabilization and tracking while matching or exceeding unconstrained SAC returns. On high-dimensional Safety Gymnasium locomotion tasks, the method reduces violations in some settings but also exposes important limitations of first-order velocity barriers and linear EDMD models, motivating high-order and multi-step Koopman-CBF extensions. These results suggest that robust Koopman-CBF filters are a promising bridge between model-free RL and certifiable safety, while clarifying the structural conditions under which such filters remain effective.


[195] 2606.01478

Crazyflow: An Accurate, GPU-Accelerated, Differentiable Drone Simulator in JAX

High-quality, large-scale synthetic data from simulations is becoming a cornerstone for pushing the capabilities of robot algorithms. While aerial robotics simulators have evolved to support specialized needs such as fidelity, differentiability, and swarms independently, a unified platform that can synthesize data across all these domains is missing. In this work, we propose Crazyflow, a simulator designed to push the limits of aerial-robotics algorithm development, from model-based to data-driven methods, gradient-based to sampling-based approaches, and single-agent to multi-agent systems. Compared to existing state-of-the-art drone simulators, it achieves speeds more than an order of magnitude faster for a single drone and can simulate thousands of swarms of 4000 drones each. Real-world experiments show Crazyflow supports both analytical-gradient-based policy learning, achieving sub-centimeter trajectory tracking accuracy without domain randomization, and sampling-based obstacle avoidance at speeds exceeding half a billion steps per second. Breaking the traditional train-then-deploy paradigm, we show that its unprecedented speed even enables in-flight reinforcement learning; we demonstrate this by throwing a physical drone into the air and training a recovery policy from scratch in 0.38 seconds, successfully stabilizing the drone. Crazyflow supports multiple levels of simulation abstraction, is directly compatible with all open-source Crazyflie models, and enables rapid reconfiguration across custom drone platforms and applications by providing a light-weight system identification pipeline. By pushing accuracy, speed, and differentiability simultaneously, Crazyflow serves as an open-source resource for synthetic data generation, with emerging capabilities for large-scale parallelization for online, in-execution learning and optimization, opening the door to novel algorithm development.


[196] 2606.02383

A Game-Theoretic Decision Framework for Optimal Selection of Coordination Detection Methods in Multi-UAV Fleet Operations

Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying the route-lead aircraft whose navigation decisions govern fleet behavior presents a fundamental speed--accuracy trade-off: fast methods enable real-time traffic management but sacrifice detection fidelity, while accurate methods may exceed the time budget for actionable airspace deconfliction. This paper presents a game-theoretic decision framework that resolves this trade-off by formulating method selection as a two-player zero-sum game between a Monitor (selecting computational methods and parameters) and Nature (selecting the unknown traffic scenario). We construct an end-to-end pipeline from trajectory surveillance data through eight candidate detection algorithms, a Monte Carlo sensitivity analysis characterizing their stochastic performance, and finally a multi-objective optimization layer that identifies Pareto-optimal method portfolios. The minimax solution provides a robust mixed strategy with a probability distribution over methods that guarantees worst-case performance regardless of scenario uncertainty. Experimental evaluation across 200 randomized configurations spanning 5--50 aircraft demonstrates that the framework recommends distinct method portfolios depending on operational priority: Koopman Phase dominates balanced (70.6%) and speed-priority (79.7%) profiles, while CRQA emerges as primary (47.4%) when route-lead identification is prioritized. The framework achieves a guaranteed game value of 0.29--0.53 (normalized utility) across all tested preference profiles, providing the first principled, scenario-adaptive methodology for computational method selection in UTM fleet monitoring operations.


[197] 2606.03632

Optimal Finite-Horizon LQR Control for Traffic Flow via Variable Speed Limits

This article presents a finite-horizon linear quadratic regulator for the control of the first-order Lighthill-Whitham-Richards traffic model with a triangular fundamental diagram. The in-domain control action is realized through variable speed limits implemented as a source term in the governing hyperbolic partial differential equation. Unlike prior studies on infinite-horizon formulations, this article develops a finite-horizon LQR framework, deriving a space and time varying state feedback function for hyperbolic PDEs. The solution to the finite time optimal control problem relies on the solution of another PDE, called the Riccati PDE. The resulting nonlinear Riccati PDE is solved analytically via the parametric method of characteristics. The Riccati PDE solution is a function of both time and space, as well as the traffic regime. A sensitivity analysis demonstrates the effects of the LQR parameters for both the infinite and finite time horizon problem in different traffic situations, while siulations validate the finite-horizon LQR's ability to guarentee finite-time convergence. Comapred to the infinite-horizon LQR, the proposed approach achieves significantly improved control performance across various scenarios, making it particularly suitable for time-sensitive traffic management applications.


[198] 2606.06037

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

Large audio language models (LALMs) are increasingly deployed in real-world applications, yet their safety alignment is still primarily evaluated on monolingual, text-based harmful prompts. This leaves their generalizability under multilingual and spoken settings, particularly code-switched speech, largely underexplored. To address this gap, we introduce SpeechJBB, an audio jailbreak dataset for benchmarking across multiple state-of-the-art LALMs. The extent of safety weaknesses is further probed by introducing an augmented setting where phonologically plausible pseudo-words are inserted around safety-critical terms to simulate localized obfuscation. Across models, code-switched harmful audio yields substantially high jailbreak success rates (JSR), with non-English monolingual and non-English code-switched pairs exhibiting the highest attack success. Pseudo-word insertion further reduces refusal rates, which demonstrates that natural-sounding obfuscation can effectively bypass safety policies.