New articles on Electrical Engineering and Systems Science


[1] 2604.01229

Interpretable Battery Aging without Extra Tests via Neural-Assisted Physics-based Modelling

State of health (SoH) is widely used for battery management, but it is a single scalar and offers limited interpretability. Two batteries with similar SoH can exhibit very different degradation behaviors and the lack of interpretability hinders optimal battery operation. In this paper, we propose IBAM for interpretable battery aging modelling with a neural-assisted physics-based framework. IBAM outputs a 2-D aging fingerprint without extra diagnostic tests and uses only routine logs from the battery management system. The fingerprint offers great interpretability by capturing a battery's curve-wide polarization voltage loss and the tail loss near the end-of-discharge. IBAM first creates a physics-based battery model based on a fractional-order equivalent circuit model, and then extracts per-cycle fingerprints from the model using a two-stage least-squares method. IBAM further anchors fingerprints on the SoH axis with physics-guided regression, where the per-cycle SoH is estimated via a bidirectional gated recurrent unit with customized multi-channel voltage features. Across batteries with short-, medium-, and long-lifespans, IBAM consistently yields the best physics model fidelity at different aging stages, and provides clear interpretations of degradation mechanisms and fingerprint patterns about batteries of different lifespans. The resulting fingerprints support interpretable battery health assessment and can inform battery control choices.


[2] 2604.01264

OkanNet: A Lightweight Deep Learning Architecture for Classification of Brain Tumor from MRI Images

Medical imaging techniques, especially Magnetic Resonance Imaging (MRI), are accepted as the gold standard in the diagnosis and treatment planning of neurological diseases. However, the manual analysis of MRI images is a time-consuming process for radiologists and is prone to human error due to fatigue. In this study, two different Deep Learning approaches were developed and analyzed comparatively for the automatic detection and classification of brain tumors (Glioma, Meningioma, Pituitary, and No Tumor). In the first approach, a custom Convolutional Neural Network (CNN) architecture named "OkanNet", which has a low computational cost and fast training time, was designed from scratch. In the second approach, the Transfer Learning method was applied using the 50-layer ResNet-50 [1] architecture, pre-trained on the ImageNet dataset. In experiments conducted on an extended dataset compiled by Masoud Nickparvar containing a total of $7,023$ MRI images, the Transfer Learning-based ResNet-50 model exhibited superior classification performance, achieving $96.49\%$ Accuracy and $0.963$ Precision. In contrast, the custom OkanNet architecture reached an accuracy rate of $88.10\%$; however, it proved to be a strong alternative for mobile and embedded systems with limited computational power by yielding results approximately $3.2$ times faster ($311$ seconds) than ResNet-50 in terms of training time. This study demonstrates the trade-off between model depth and computational efficiency in medical image analysis through experimental data.


[3] 2604.01338

A High Voltage Test System Meeting Requirements Under Normal and All Single Contingencies Conditions of Peak, Dominant, and Light Loadings for Transmission Expansion Planning Studies (TEP) and TEP Case Studies

This paper presents a high-voltage test system designed specifically for transmission expansion planning (TEP) and explores multiple TEP studies using this test system. The network incorporates long transmission lines, lines are accurately modeled, and line parameters are calculated using the equivalent {\pi} circuit model for long transmission lines to account for the distributed nature of line parameters. The paper provides detailed load flow analyses for both normal and all contingency conditions for three different loading conditions (peak load, dominant load, and light load), demonstrating that the proposed test system offers technically feasible load flow solutions at these loading scenarios. As the real power system is subject to various loading scenarios and should be effectively operable under all conditions, this test system accurately replicates the properties of real power systems. Furthermore, this paper presents multiple TEP cases to supply the load at a new location. TEP cases are conducted with different numbers of transmission line connections, and each case is underscored by its respective maximum capacity satisfying all technical requirements for normal and all single contingencies under three different scenarios. The cost of TEP for each case is calculated and compared in terms of the average cost per MW of power delivered to the new bus.


[4] 2604.01372

Temporal Logic Control of Nonlinear Stochastic Systems with Online Performance Optimization

The deployment of autonomous systems in safety-critical environments requires control policies that guarantee satisfaction of complex control specifications. These systems are commonly modeled as nonlinear discrete-time stochastic systems. A~popular approach to computing a policy that provably satisfies a complex control specification is to construct a finite-state abstraction, often represented as a Markov decision process (MDP) with intervals of transition probabilities, i.e., an interval MDP (IMDP). However, existing abstraction techniques compute a \emph{single policy}, thus leaving no room for online cost or performance optimization, e.g., of energy consumption. To overcome this limitation, we propose a novel IMDP abstraction technique that yields a \emph{set of policies}, each of which satisfies the control specification with a certain minimum probability. We can thus use any online control algorithm to search through this set of verified policies while retaining the guaranteed satisfaction probability of the entire policy set. In particular, we employ model predictive control (MPC) to minimize a desired cost function that is independent of the control specification considered in the abstraction. Our experiments demonstrate that our approach yields better control performance than state-of-the-art single-policy abstraction techniques, with a small degradation of the guarantees.


[5] 2604.01373

Dissipativity Analysis of Nonlinear Systems: A Linear--Radial Kernel-based Approach

Estimating the dissipativity of nonlinear systems from empirical data is useful for the analysis and control of nonlinear systems, especially when an accurate model is unavailable. Based on a Koopman operator model of the nonlinear system on a reproducing kernel Hilbert space (RKHS), the storage function and supply rate functions are expressed as kernel quadratic forms, through which the dissipative inequality is expressed as a linear operator inequality. The RKHS is specified by a linear--radial kernel, which inherently encode the information of equilibrium point, thus ensuring that all functions in the RKHS are locally at least linear around the origin and that kernel quadratic forms are locally at least quadratic, which expressively generalize conventional quadratic forms including sum-of-squares polynomials. Based on the kernel matrices of the sampled data, the dissipativity estimation can be posed as a finite-dimensional convex optimization problem, and a statistical learning bound can be derived on the kernel quadratic form for the probabilistic approximate correctness of dissipativity estimation.


[6] 2604.01392

Safe Policy Optimization via Control Barrier Function-based Safety Filters

Control barrier function (CBF)-based safety filters provide a systematic way to enforce state constraints, but they can significantly alter the closed-loop dynamics induced by a nominal, stabilizing controller. In particular, the resulting safety-filtered system may exhibit undesirable behaviors including limit cycles, unbounded trajectories, and undesired equilibria. This paper develops a policy optimization framework to maximally enhance the stability properties of safety-filtered controllers. Focusing on linear systems with linear nominal controllers, we jointly parameterize the nominal feedback gain and safety-filter components, and optimize them using trajectory-based objectives computed from closed-loop rollouts. To ensure that the nominal controller remains stabilizing throughout training, we encode Lyapunov-based stability conditions as smooth scalar constraints and enforce them using robust safe gradient flow. This guarantees feasibility of the stability constraints along the optimization iterates and therefore avoids instability during training. Numerical experiments on obstacle-avoidance problems show that the proposed approach can remove asymptotically stable undesired equilibria and improve convergence behavior while maintaining forward invariance of the safe set.


[7] 2604.01406

Causal Optimal Coupling for Gaussian Input-Output Distributional Data

We study the problem of identifying an optimal coupling between input-output distributional data generated by a causal dynamical system. The coupling is required to satisfy prescribed marginal distributions and a causality constraint reflecting the temporal structure of the system. We formulate this problem as a Schr"odinger Bridge, which seeks the coupling closest - in Kullback-Leibler divergence - to a given prior while enforcing both marginal and causality constraints. For the case of Gaussian marginals and general time-dependent quadratic cost functions, we derive a fully tractable characterization of the Sinkhorn iterations that converges to the optimal solution. Beyond its theoretical contribution, the proposed framework provides a principled foundation for applying causal optimal transport methods to system identification from distributional data.


[8] 2604.01409

Semantic MIMO: Revisiting Linear Precoding in the Generative AI Era

This paper revisits linear precoding, namely match-filter (MF) and zero-forcing (ZF), in a semantic multiple-input multiple-output (MIMO) system empowered by generative AI. The aim is to examine whether interference, channel state information (CSI) accuracy, and scalability limitations in conventional MIMO systems remain critical. Theoretical analysis, which is based on the generative inference model and Lipschitz continuous assumptions, reveals reduced sensitivity to interference and channel imperfections, as well as performance inferiority in high-SINR regimes compared to conventional MIMO systems. Simulation results validate the analysis and show that MF achieves semantic performance comparable to ZF under both perfect and imperfect CSI. These findings suggest that semantic MIMO relaxes the needs for aggressive interference mitigation and highly accurate CSI, while improving scalability with reduced computational and implementation complexity.


[9] 2604.01441

Generative Profiling for Soft Real-Time Systems and its Applications to Resource Allocation

Modern real-time systems require accurate characterization of task timing behavior to ensure predictable performance, particularly on complex hardware architectures. Existing methods, such as worst-case execution time analysis, often fail to capture the fine-grained timing behaviors of a task under varying resource contexts (e.g., an allocation of cache, memory bandwidth, and CPU frequency), which is necessary to achieve efficient resource utilization. In this paper, we introduce a novel generative profiling approach that synthesizes context-dependent, fine-grained timing profiles for real-time tasks, including those for unmeasured resource allocations. Our approach leverages a nonparametric, conditional multi-marginal Schrödinger Bridge (MSB) formulation to generate accurate execution profiles for unseen resource contexts, with maximum likelihood guarantees. We demonstrate the efficiency and effectiveness of our approach through real-world benchmarks, and showcase its practical utility in a representative case study of adaptive multicore resource allocation for real-time systems.


[10] 2604.01448

Neural Robust Control on Lie Groups Using Contraction Methods (Extended Version)

In this paper, we propose a learning framework for synthesizing a robust controller for dynamical systems evolving on a Lie group. A robust control contraction metric (RCCM) and a neural feedback controller are jointly trained to enforce contraction conditions on the Lie group manifold. Sufficient conditions are derived for the existence of such an RCCM and neural controller, ensuring that the geometric constraints imposed by the manifold structure are respected while establishing a disturbance-dependent tube that bounds the output trajectories. As a case study, a feedback controller for a quadrotor is designed using the proposed framework. Its performance is evaluated using numerical simulations and compared with a geometric controller.


[11] 2604.01459

Koopman Subspace Pruning in Reproducing Kernel Hilbert Spaces via Principal Vectors

Data-driven approximations of the infinite-dimensional Koopman operator rely on finite-dimensional projections, where the predictive accuracy of the resulting models hinges heavily on the invariance of the chosen subspace. Subspace pruning systematically discards geometrically misaligned directions to enhance this invariance proximity, which formally corresponds to the largest principal angle between the subspace and its image under the operator. Yet, existing techniques are largely restricted to Euclidean settings. To bridge this gap, this paper presents an approach for computing principal angles and vectors to enable Koopman subspace pruning within a Reproducing Kernel Hilbert Space (RKHS) geometry. We first outline an exact computational routine, which is subsequently scaled for large datasets using randomized Nystrom approximations. Based on these foundations, we introduce the Kernel-SPV and Approximate Kernel-SPV algorithms for targeted subspace refinement via principal vectors. Simulation results validate our approach.


[12] 2604.01509

Feedforward Density-Driven Optimal Control for Tracking Time-Varying Distributions with Guaranteed Stability

This paper addresses the spatiotemporal mismatch in multi-agent distribution tracking within time-varying environments. While recent advancements in Density-Driven Optimal Control (D$^2$OC) have enabled finite-time distribution matching using Optimal Transport theory, existing formulations primarily assume a stationary reference density. In dynamic scenarios, such as tracking evolving wildfires or moving plumes, this assumption leads to a structural tracking lag where the agent configuration inevitably falls behind the shifting reference flow. To resolve this, we propose a feedforward-augmented D$^2$OC framework that explicitly incorporates the reference velocity field, modeled via the continuity equation, into the control law. We provide a formal mathematical quantification of the induced tracking lag and analytically prove that the proposed predictive mechanism effectively reduces the cumulative tracking error. Furthermore, an analytical ultimate bound for the local Wasserstein distance is established under discretization errors and transport jitter. Theoretical analysis and numerical results demonstrate that our approach significantly mitigates tracking latency, ensuring robust and high-fidelity tracking performance in rapidly changing environments.


[13] 2604.01517

MorphoGuard: A Morphology-Based Whole-Body Interactive Motion Controller

Whole-body control (WBC) has demonstrated significant advantages in complex interactive movements of high-dimensional robotic systems. However, when a robot is required to handle dynamic multi-contact combinations along a single kinematic chain-such as pushing open a door with its elbow while grasping an object-it faces major obstacles in terms of complex contact representation and joint configuration coupling. To address this, we propose a new control approach that explicitly manages arbitrary contact combinations, aiming to endow robots with whole-body interactive capabilities. We develop a morphology-constrained WBC network (MorphoGuard)-which is trained on a self-constructed dual-arm physical and simulation platform. A series of model recommendation experiments are designed to systematically investigate the impact of backbone architecture, fusion strategy, and model scale on network performance. To evaluate the control performance, we adopt a multi-object interaction task as the benchmark, requiring the model to simultaneously manipulate multiple target objects to specified positions. Experimental results show that the proposed method achieves a contact point management error of approximately 1 cm, demonstrating its effectiveness in whole-body interactive control.


[14] 2604.01524

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

Many speaker localization methods can be found in the literature. However, speaker localization under strong reverberation still remains a major challenge in the real-world applications. This paper proposes two algorithms for localizing speakers using microphone array recordings of reverberated sounds. To separate concurrent speakers, the first algorithm decomposes microphone signals spectrotemporally into subbands via an auditory filterbank. To suppress reverberation, we propose a novel speech onset detection approach derived from the speech signal and impulse response models, and further propose to formulate the multi-channel cross-correlation coefficient (MCCC) of encoded speech onsets in each subband. The subband results are combined to estimate the directions-of-arrival (DOAs) of speakers. The second algorithm extends the generalized cross-correlation - phase transform (GCC-PHAT) method by using redundant information of multiple microphones to address the reverberation problem. The proposed methods have been evaluated under adverse conditions using not only simulated signals (reverberation time $T_{60}$ of up to $1$s) but also recordings in a real reverberant room ($T_{60} \approx 0.65$s). Comparing with some state-of-the-art localization methods, experimental results confirm that the proposed methods can reliably locate static and moving speakers, in presence of reverberation.


[15] 2604.01531

A Conditional Denoising Diffusion Probabilistic Model for RFI Mitigation in Synthetic Aperture Interferometric Radiometer

In Earth remote sensing, spatial-frequency domain visibility samples are inversely transformed into spatial-domain brightness temperature (BT) images through the signal processing pipeline of synthetic aperture interferometric radiometers (SAIR). However, L-band radio-frequency interference (RFI) contaminates the measured visibilities and severely degrades BT image quality, thereby impairing geophysical parameter retrieval. To address this issue, we propose VFDM, a Visibility-Function Diffusion Model based on Denoising Diffusion Probabilistic Models (DDPM), to mitigate RFI in the spatial-frequency domain while preserving fine-scale structures consistent with natural scene statistics. Furthermore, we construct a comprehensive dataset comprising more than ten thousand pairs of RFI-free natural scene visibility sample sets and their corresponding simulated contaminated counterparts, categorized by varying RFI intensities, numbers, and distributions. Finally, comprehensive experiments on both simulated and real-world data demonstrate the effectiveness and robustness of the proposed VFDM-based approach.


[16] 2604.01533

Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation

Speech-based depression detection has shown promise as an objective diagnostic tool, yet the cross-linguistic robustness of acoustic markers and their neurobiological underpinnings remain underexplored. This study extends Cross-Data Multilevel Attention (CDMA) framework, initially validated on Italian, to investigate these dimensions using a Chinese Mandarin dataset with Electroencephalography (EEG) recordings. We systematically fuse read speech with spontaneous speech across different emotional valences (positive, neutral, negative) to investigate whether emotional arousal is a more critical factor than valence polarity in enhancing detection performance in speech. Additionally, we establish the first neurophysiological validation for a speech-based depression model by correlating its predictions with neural oscillatory patterns during emotional face processing. Our results demonstrate strong cross-linguistic generalizability of the CDMA framework, achieving state-of-the-art performance (F1-score up to 89.6%) on the Chinese dataset, which is comparable to the previous Italian validation. Critically, emotionally valenced speech (both positive and negative) significantly outperformed neutral speech. This comparable performance between positive and negative tasks supports the emotional arousal hypothesis. Most importantly, EEG analysis revealed significant correlations between the model's speech-derived depression estimates and neural oscillatory patterns (theta and alpha bands), demonstrating alignment with established neural markers of emotional dysregulation in depression. This alignment, combined with the model's cross-linguistic robustness, not only supports that the CDMA framework's approach is a universally applicable and neurobiologically validated strategy but also establishes a novel paradigm for the neurophysiological validation of computational mental health models.


[17] 2604.01539

Toward Single-Step MPPI via Differentiable Predictive Control

Model predictive path integral (MPPI) is a sampling-based method for solving complex model predictive control (MPC) problems, but its real-time implementation faces two key challenges: the computational cost and sample requirements grow with the prediction horizon, and manually tuning the sampling covariance requires balancing exploration and noise. To address these issues, we propose Step-MPPI, a framework that learns a sampling distribution for efficient single-step lookahead MPPI implementation. Specifically, we use a neural network to parameterize the MPPI proposal distribution at each time step, and train it in a self-supervised manner over a long horizon using the MPC cost, constraint penalties, and a maximum-entropy regularization term. By embedding long-horizon objectives into training the neural distribution policy, Step-MPPI achieves the foresight of a multi-step optimizer with the millisecond-level latency of single-step lookahead. We demonstrate the efficiency of Step-MPPI across multiple challenging tasks in which MPPI suffers from high dimensionality and/or long control horizons.


[18] 2604.01541

Robust Pitch Estimation and Tracking for Speakers Based on Subband Encoding and the Generalized Labeled Multi-Bernoulli Filter

This paper proposes a new pitch estimator and a novel pitch tracker for speakers. We first decompose the sound signal into subbands using an auditory filterbank, assuming time-frequency sparsity of human speech. Instead of directly selecting the number of subbands according to experience, we propose a novel frequency coverage metric to derive the number of subbands and the center frequencies of the filterbank. The subband signals are then encoded inspired by the computational auditory scene analysis (CASA) approach, and the normalized autocorrelations are calculated for pitch estimation. To suppress spurious errors and track the speaker identity, the temporal continuity constraint is exploited and a Generalized Labeled Multi-Bernoulli (GLMB) filter is adapted for pitch tracking, where we use a novel pitch state transition model based on the Ornstein-Uhlenbeck process, and the measurement driven birth model for adaptive new births of pitch targets. Experimental evaluations with various additive noises demonstrate that the proposed methods have achieved better accuracy compared with several state-of-the-art pitch estimation methods in most studied scenarios. Tests using real recordings in a reverberant room also show that the proposed method is robust against reverberation.


[19] 2604.01547

Data-Driven Covariance Steering with Output Feedback

This paper addresses the problem of output-feedback covariance steering for stochastic, discrete-time, linear, time-invariant systems without knowledge of the system model. We employ a controllable, non-minimal state representation constructed from past inputs and outputs and convert the problem to one in state-feedback form. In this representation, the induced disturbance becomes temporally correlated, which requires explicit propagation of the cross-covariance between the state and disturbance processes. To handle the lack of a system model, we leverage persistently exciting data collected offline and formulate the mean and covariance steering problems using an indirect and a direct approach, respectively. The indirect formulation requires an estimate of the mean dynamics model, while the direct formulation relies on an estimate of the noise realization in the collected data. To this end, we present an estimation method suitable to handle temporally correlated noise, enabling consistent identification of both components. Using a convex relaxation, we convert the covariance steering problem to a semidefinite program that can be solved efficiently. We conduct numerical simulations to evaluate the performance of the developed framework.


[20] 2604.01582

Air-to-Air Channel Characterization for UAV Communications at 3.4 GHz

Uncrewed Aerial Vehicle (UAV) networks require accurate Air-to-Air (A2A) channel models, but most existing work focuses on Air-to-Ground links and leaves the sub-6 GHz A2A channel poorly characterized. We present preliminary 3.4 GHz A2A channel measurements collected with a lightweight, reconfigurable, open-source channel sounder built from USRP B210 software-defined radios and a high-precision GNSS-disciplined oscillator mounted on two UAVs. Measurements were conducted at the AERPAW Lake Wheeler testbed using a spherical flight trajectory around a second drone to capture channel behavior over varying altitudes, elevation angles, and relative headings. From these data, we analyze fundamental channel properties, extract channel impulse responses, model fading behavior as a function of link geometry, and characterize fading statistics including RMS delay spread. The resulting dataset and analysis provide a more realistic basis for the design, emulation, and evaluation of physical-layer and MAC protocols for next-generation UAV communication networks.


[21] 2604.01590

PhiNet: Speaker Verification with Phonetic Interpretability

Despite remarkable progress, automatic speaker verification (ASV) systems typically lack the transparency required for high-accountability applications. Motivated by how human experts perform forensic speaker comparison (FSC), we propose a speaker verification network with phonetic interpretability, PhiNet, designed to enhance both local and global interpretability by leveraging phonetic evidence in decision-making. For users, PhiNet provides detailed phonetic-level comparisons that enable manual inspection of speaker-specific features and facilitate a more critical evaluation of verification outcomes. For developers, it offers explicit reasoning behind verification decisions, simplifying error tracing and informing hyperparameter selection. In our experiments, we demonstrate PhiNet's interpretability with practical examples, including its application in analyzing the impact of different hyperparameters. We conduct both qualitative and quantitative evaluations of the proposed interpretability methods and assess speaker verification performance across multiple benchmark datasets, including VoxCeleb, SITW, and LibriSpeech. Results show that PhiNet achieves performance comparable to traditional black-box ASV models while offering meaningful, interpretable explanations for its decisions, bridging the gap between ASV and forensic analysis.


[22] 2604.01623

Frequency-switching Coherent Reception for Hardware-efficient High-baud-rate Optical Transmission Experiments

Signal gating combined with local-oscillator-frequency switching enables bandwidth scaling of offline coherent reception without costly receiver parallelization. We experimentally verify this concept at symbol rates of up to 288 GBaud.


[23] 2604.01656

Steady-state response assignment for a given disturbance and reference: Sylvester equation rather than regulator equations

Conventionally, the concept of moment has been primarily employed in model order reduction to approximate system by matching the moment, which is merely the specific set of steady-state responses. In this paper, we propose a novel design framework that extends this concept from ``moment matching'' for approximation to ``moment assignment'' for the active control of steady-state. The key observation is that the closed-loop moment of an interconnected linear system can be decomposed into the open-loop moment and a term linearly parameterized by the moment of the compensator. Based on this observation, we provide necessary and sufficient conditions for the assignability of desired moment and a canonical form of the dynamic compensator, followed by constructive synthesis procedure of compensator. This covers both output regulation and closed-loop interpolation, and further suggests using only the Sylvester equation, rather than regulator equations.


[24] 2604.01704

Breaking Near-Field Communication Barriers: Focused, Curved, or Airy Beamforming?

To meet the requirements for high data rates and ubiquitous connectivity in 6G networks, higher frequencies and larger array apertures are employed to enhance spatial resolution and spectral efficiency. This evolution leads to an expansion of the near-field region, where spherical-wave focusing can significantly enhance received power. However, the pervasive presence of obstacles in near-field environments makes communication in obstructed scenarios a critical challenge, particularly for sensitive high-frequency links with high penetration losses. In this paper, we propose a new waveform, termed the near-field Airy beam, which is tailored to the amplitude and phase characteristics of obstructed near-field channels. By integrating non-uniform amplitude response with non-linear phase profile, the proposed Airy beam forms specific curved trajectories, energy distributions, and focal points, enabling energy concentration at the user even after circumventing obstacles. An Airy beamforming algorithm is also developed for hybrid beamformer architectures. Considering practical conditions with unknown obstacle and user locations, we design an Airy beam codebook and a low-overhead hierarchical search scheme to identify the optimal user-aligned beam. Simulation results demonstrate that in obstructed environments, the near-field Airy beam achieves a received power gain of over 3 dB compared to conventional waveforms like focused and curved beams, closely approaching the theoretical upper bound. Across the mmWave to THz bands and various obstacle dimensions, the proposed beam training scheme consistently outperforms traditional methods in terms of spectral efficiency while maintaining a comparable training overhead.


[25] 2604.01721

Phase-Shifted Pilot Design for NOMA-Empowered Uplink ISAC Systems

The deployment of multiple transmitters (TXs) in integrated sensing and communication (ISAC) networks necessitates efficient resource sharing to overcome the limitations of orthogonal allocation. While conventional interleaved (CI) pilots combined with non-orthogonal multiple access (NOMA) improve spectral efficiency (SE), they inherently compromise sensing resolution due to spectral sparsity, rendering the CI nulling (CIN) extension a strictly limited remedy. This paper proposes a phase-shifted (PS) pilot design and its novel PS nulling (PSN) variant to integrate a communication TX (CTX) over the PS-ISAC framework. The PSN variant strategically punctures sensing signals at CTX pilot locations to preserve initial channel estimates, enabling a dense data overlay. To resolve the resulting multi-TX interference, joint iterative interference cancellation (IIC) is adapted for non-nulling configurations and sequential IIC is adapted for nulling variants, optimizing for both detection robustness and convergence speed. Simulation results across varying STX densities and modulation orders demonstrate that the phase-shifted frameworks maintain sensing integrity while explicitly reducing receiver-side computational complexities by $18.8\%$ and $21.0\%$ against their respective interleaved baselines.


[26] 2604.01760

T5Gemma-TTS Technical Report

Autoregressive neural codec language models have shown strong zero-shot voice cloning ability, but decoder-only architectures treat input text as a prefix that competes with the growing audio sequence for positional capacity, weakening text conditioning over long utterances. We present T5Gemma-TTS, an encoder-decoder codec language model that maintains persistent text conditioning by routing bidirectional text representations through cross-attention at every decoder layer. Built on the T5Gemma pretrained encoder-decoder backbone (2B encoder + 2B decoder; 4B parameters), it inherits rich linguistic knowledge without phoneme conversion and processes text directly at the subword level. To improve duration control, we introduce Progress-Monitoring Rotary Position Embedding (PM-RoPE) in all 26 cross-attention layers, injecting normalized progress signals that help the decoder track target speech length. Trained on 170,000 hours of multilingual speech in English, Chinese, and Japanese, T5Gemma-TTS achieves a statistically significant speaker-similarity gain on Japanese over XTTSv2 (0.677 vs. 0.622; non-overlapping 95% confidence intervals) and the highest numerical Korean speaker similarity (0.747) despite Korean not being included in training, although this margin over XTTSv2 (0.741) is not statistically conclusive. It also attains the lowest numerical Japanese character error rate among five baselines (0.126), though this ranking should be interpreted cautiously because of partial confidence-interval overlap with Kokoro. English results on LibriSpeech should be viewed as an upper-bound estimate because LibriHeavy is a superset of LibriSpeech. Using the same checkpoint, disabling PM-RoPE at inference causes near-complete synthesis failure: CER degrades from 0.129 to 0.982 and duration accuracy drops from 79% to 46%. Code and weights are available at this https URL.


[27] 2604.01767

Channel Measurements and Modeling based on Composite Environmental Factor for Urban Street-Canyon Intersections

In urban environments, vehicle-to-everything (V2X) communications require accurate wireless channel characterization. This requirement is particularly critical at street-canyon intersections, where building blockage and rich multipath propagation can severely degrade link reliability. Due to its unique environmental layout, the channel characteristics in urban canyon are influenced by building distribution. However, this feature has not been well captured in existing channel models. In this paper, we propose an environment-related statistical channel model based on 5.8~GHz channel measurements. We construct a composite environmental factor to characterize environmental differences in intersections. Then, the factor is incorporated into 3GPP path-loss model and further linked to small-scale channel parameters. Finally, accuracy of the proposed model is validated using second-order channel statistics. The results show that the proposed model can effectively characterize propagation properties of urban street-canyon intersection channels with different building conditions. The proposed model provides a physically interpretable and statistically effective framework for channel simulation and performance evaluation in urban vehicular scenarios.


[28] 2604.01780

1-bit Quantized Continuous Aperture Arrays

Continuous aperture arrays (CAPAs) have emerged as a promising physical-layer paradigm for sixth generation (6G) systems, offering spatial degrees of freedom beyond those of conventional discrete antenna arrays. This paper investigates the interaction between the CAPA receive architecture and low-cost 1-bit analog-to-digital converters (ADCs), which impose a severe nonlinear distortion penalty in conventional discrete systems. For Rayleigh fading, we derive a moment matching approximation (MMA)-based closed-form symbol error probability (SEP) approximation based on Gamma moment-matching of the spatial eigenvalue distribution, and show that CAPAs incur a diversity-order penalty governed by Jensen's inequality on the mode eigenvalues. For line-of-sight (LoS) propagation, we prove that CAPA achieves exactly the unquantized additive white Gaussian noise (AWGN) performance bound under perfect spatial and phase alignment, completely eliminating the 1-bit penalty that forces discrete systems to double their antenna count. Monte Carlo simulations under Rayleigh, Rician, and LoS conditions validate all analytical results.


[29] 2604.01786

MIMO Capacity Enhancement by Grating Walls: A Physics-Based Proof of Principle

This paper investigates the passive enhancement of MIMO spectral efficiency through boundary engineering in a simplified two dimensional indoor proof of principle model. The propagation channel is constructed from the electromagnetic Green's function of a room with boundaries modeled as free space, drywall, perfect electric conductor (PEC), or binary gratings. Within this framework, grating coated walls enrich the non line of sight (NLoS) multipath field, reduce channel correlation, and enhance spatial multiplexing over a broad range of receiver locations. Comparisons with the drywall and PEC reference cases further reveal that the observed capacity enhancement arises not from diffraction alone, but from the combined effects of effective wall reflectivity, which confines and reradiates energy within the room, and diffraction induced angular redistribution, which enriches the channel eigenstructure.


[30] 2604.01790

Set-Theoretic Receding Horizon Control for Obstacle Avoidance and Overtaking in Autonomous Highway Driving

This article addresses obstacle avoidance motion planning for autonomous vehicles, specifically focusing on highway overtaking maneuvers. The control design challenge is handled by considering a mathematical vehicle model that captures both lateral and longitudinal dynamics. Unlike existing numerical optimization methods that suffer from significant online computational overhead, this work extends the state-of-the-art by leveraging a fast set-theoretic ellipsoidal Model Predictive Control (Fast-MPC) technique. While originally restricted to stabilization tasks, the proposed framework is successfully adapted to handle motion planning for vehicles modeled as uncertain polytopic discrete-time linear systems. The control action is computed online via a set-membership evaluation against a structured sequence of nested inner ellipsoidal approximations of the exact one-step ahead controllable set within a receding horizon framework. A six-degrees-of-freedom (6-DOF) nonlinear model characterizes the vehicle dynamics, while a polytopic embedding approximates the nonlinearities within a linear framework with parameter uncertainties. Finally, to assess performance and real-time feasibility, comparative co-simulations against a baseline Non-Linear MPC (NLMPC) were conducted. Using the high-fidelity CARLA 3D simulator, results demonstrate that the proposed approach seamlessly rejects dynamic traffic disturbances while reducing online computational time by over 90% compared to standard optimization-based approaches.


[31] 2604.01795

Cooperative Adaptive Cruise Control with Variable Time Headway for Graceful Degradation under Fluctuating Network Quality of Service

This paper proposes a dynamic distance adaptation for Cooperative Adaptive Cruise Control (CACC) under time-varying network conditions. When the Quality of Service (QoS) drops below a level required to maintain desired inter-vehicle distances, an online adaptation of the reference distances, reflected by a change of the time headway factor, becomes necessary. We present a control design algorithm realizing a graceful degradation, for which a distance control to a virtual preceding vehicle is introduced. Furthermore, the Integral Quadratic Constraints (IQC) framework is applied to guarantee robust stability of the time-varying system. The concept is validated in simulation and experimentally using small-scale test vehicles.


[32] 2604.01805

Neural Network-Assisted Model Predictive Control for Implicit Balancing

In Europe, balance responsible parties can deliberately take out-of-balance positions to support transmission system operators (TSOs) in maintaining grid stability and earn profit, a practice called implicit balancing. Model predictive control (MPC) is widely adopted as an effective approach for implicit balancing. The balancing market model accuracy in MPC is critical to decision quality. Previous studies modeled this market using either (i) a convex market clearing approximation, ignoring proactive manual actions by TSOs and the market sub-quarter-hour dynamics, or (ii) machine learning methods, which cannot be directly integrated into MPC. To address these shortcomings, we propose a data-driven balancing market model integrated into MPC using an input convex neural network to ensure convexity while capturing uncertainties. To keep the core network computationally efficient, we incorporate attention-based input gating mechanisms to remove irrelevant data. Evaluating on Belgian data shows that the proposed model both improves MPC decisions and reduces computational time.


[33] 2604.01814

Empirical and Statistical Characterisation of 28 GHz mmWave Propagation in Office Environments

Millimeter wave (mmWave) technology at 28 GHz is vital for beyond-5G systems, but indoor deployment remains challenging due to limited statistical evidence on propagation. This study investigates path loss, material penetration, and coverage enhancement using TMYTEK-based measurements. Statistical tests and confidence interval analysis show that path loss aligns with free-space theory, with an exponent of n = 2.07 plus or minus 0.073 (p = 0.385), confirming the suitability of classical models. Material analysis reveals significant variation: desk dividers introduce 3.4 dB more attenuation than display boards (95 percent CI: 1.81 to 4.98 dB, p less than 0.01), contradicting thickness-based assumptions. Reflector optimisation yields a significant mean gain of 2.17 plus or minus 2.33 dB (p less than 0.05), enhancing coverage. The results provide new empirical benchmarks and practical design insights for reliable indoor mmWave deployment.


[34] 2604.01832

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speech restoration in a self-supervised representation domain and reconstructs the waveform via a neural vocoder, along with a predictive branch that performs spectrogram-domain enhancement, providing complementary cues. Outputs from both branches are fused by a post-processing module, which also performs bandwidth extension to generate the enhanced waveform at 48 kHz, later downsampled to the original sampling rate. This generative-predictive fusion improves robustness and perceptual quality, achieving top performance in the blind-test phase and ranking 1st in the objective evaluation. Audio examples are available at this https URL.


[35] 2604.01912

Global Geometry of Orthogonal Foliations in the Control Allocation of Signed-Quadratic Systems

This work formalizes the differential topology of redundancy resolution for systems governed by signed-quadratic actuation maps. By analyzing the minimally redundant case, the global topology of the continuous fiber bundle defining the nonlinear actuation null-space is established. The distribution orthogonal to these fibers is proven to be globally integrable and governed by an exact logarithmic potential field. This field foliates the actuator space, inducing a structural stratification of all orthants into transverse layers whose combinatorial sizes follow a strictly binomial progression. Within these layers, adjacent orthants are continuously connected via lower-dimensional strata termed reciprocal hinges, while the layers themselves are separated by boundary hyperplanes, or portals, that act as global sections of the fibers. This partition formally distinguishes extremal and transitional layers, which exhibit fundamentally distinct fiber topologies and foliation properties. Through this geometric framework, classical pseudo-linear static allocation strategies are shown to inevitably intersect singular boundary hyperplanes, triggering infinite-derivative kinetic singularities and fragmenting the task space into an exponential number of singularity-separated sectors. In contrast, allocators derived from the orthogonal manifolds yield continuously differentiable global sections with only a linear number of sectors for transversal layers, or can even form a single global diffeomorphism to the task space in the case of the two extremal layers, thus completely avoiding geometric rank-loss and boundary-crossing singularities. These theoretical results directly apply to the control allocation of propeller-driven architectures, including multirotor UAVs, marine, and underwater vehicles.


[36] 2604.01914

A Weak Notion of Symmetry for Dynamical Systems

Many nonlinear dynamical systems exhibit symmetry, affording substantial benefits for control design, observer architecture, and data-driven control. While the classical notion of group invariance enables a cascade decomposition of the system into highly structured subsystems, it demands very rigid structure in the original system. Conversely, much more general notions (e.g., partial symmetry) have been shown to be sufficient for obtaining less-structured decompositions. In this work, we propose a middle ground termed "weak invariance", studying diffeomorphisms (resp., vector fields) that are group invariant up to a diffeomorphism of (resp., vector field on) the symmetry group. Remarkably, we prove that weak invariance implies that this diffeomorphism of (resp., vector field on) the symmetry group must be an automorphism (resp., group linear). Additionally, we demonstrate that a vector field is weakly invariant if and only if its flow is weakly invariant, where the associated group linear vector field generates the associated automorphisms. Finally, we show that weakly invariant systems admit a cascade decomposition in which the dynamics are group affine along the orbits. Weak invariance thus generalizes both classical invariance and the important class of group affine dynamical systems on Lie groups, laying a foundation for new methods of symmetry-informed control and observer design.


[37] 2604.01923

PLL Based Sub-/Super-synchronous Resonance Damping Controller for D-PMSG Wind Farm Integrated Power Systems

Existing sub-/super-synchronous (SSO) suppression methods for the direct-drive permanent magnet synchronous generators (D-PMSG) integrated power systems are mainly achieved by external devices or sub-synchronous resonance damping controller (SSRDC) at the converters, facing challenges of considerable control costs, complex parameters tuning, or inadaptability to various operating conditions. To address these problems, this paper proposes an adaptive SSRDC based on the phase-locked loop (PLL) for D-PMSG integrated power systems. Firstly, the PLL parameter is found critical to SSO suppression by a comprehensive sensitivity analysis on the dominant poles of the impedance closed-loop transfer function. Motivated by this finding, this paper then designs a PLL-based SSRDC, which features a simple structure, easy parameter tuning, and flexible adaptability to various operating modes. The simplicity in structure is guaranteed by the avoidance of phase compensation. Benefiting from the simple structure, only one key parameter needs to be tuned. Moreover, two principles of parameter tuning are proposed to enhance the efficiency, robustness, and adaptability of the proposed SSRDC. The controller-hardware-in-the-loop (CHIL) tests verify the validity of the proposed SSRDC under various operating conditions. Finally, some concerns about this method such as frequency estimation, computational efficiency and potential impacts on PLL are thoroughly analyzed and clarified.


[38] 2604.01928

A Data-Aided Power Transformer Differential Protection without Inrush Blocking Module

When a slightly faulty transformer closes without load, the current waveform presents the coexistence of inrush and fault current. At this time, the inrush blocking module will block the relay, which may delay the removal of the slight fault and lead to more serious faults. To address this problem, this paper proposes a data-aided power transformer differential protection without inrush blocking module. The key to eliminating the negative influence of inrush current is to extract the fundamental component from the non-inrush part of the current waveform, which corresponds to the unsaturation period of the transformer core. Firstly, a data-aided module, namely an Attention module embedded Fully Convolutional Network (A-FCN), is built to distinguish the inrush and non-inrush parts of the current waveform. Then, a physical model of the current waveform is built for the non-inrush part, and the fundamental component is extracted by the nonlinear least square (NLS) algorithm. The proposed method can avoid the block of differential protections when inrush current occurs, which improves the sensitivity and rapidity of the relay, especially in the case of a weak internal fault hidden in inrush current. Finally, simulation and experimental data verify the effectiveness and generalization of the proposed method.


[39] 2604.01945

Model-Free Fast Frequency Support of Wind Farms for Tracking Optimal Frequency Trajectory

The fast frequency support (FFS) towards frequency trajectory optimization provides a system view for the frequency regulation of wind farms (WFs). However, the existing frequency trajectory optimization-based FFS generally relies on the accurate governor dynamics model of synchronous generators (SGs), which aggrandizes the difficulty of controller implementation. In this paper, a proportional-integral (PI) based FFS of WFs is designed for tracking the optimal frequency trajectory, which gets rid of the dependence on the governor model. Firstly, the prototypical PI-based FFS of WFs is proposed and its feasibility for tracking the optimal frequency trajectory is analyzed and demonstrated. Then, based on the "frequency-RoCoF" form of the optimal frequency trajectory, a more practical PI controller is constructed, avoiding the time dependence of the prototypical PI controller. Besides, an adaptive gain associated with PI parameters is designed for multi-WF coordination. Finally, the validity of the proposed method is verified in both the single-WF system and the multi-WF system.


[40] 2604.01956

Receding-Horizon Nonlinear Optimal Control With Safety Constraints Using Constrained Approximate Dynamic Programming

We present a receding-horizon optimal control for nonlinear continuous-time systems subject to state constraints. The cost is a quadratic finite-horizon integral. The key enabling technique is a new constrained approximate dynamic programming (C-ADP) approach for finite-horizon nonlinear optimal control with constraints that are affine in the control. The C-ADP approach is intuitive because it uses a quadratic approximation of the cost-to-go function at each backward step. This method yields a sequence of analytic closed-form optimal control functions, which have identical structure and where parameters are obtained from 2 Riccati-like difference equations. This C-ADP method is well suited for real-time implementation. Thus, we use the C-ADP approach in combination with control barrier functions to obtain a continuous-time receding-horizon optimal control that is farsighted in the sense that it optimizes the integral cost subject to state constraints along the entire prediction horizon. Lastly, receding-horizon C-ADP control is demonstrated in simulation of a nonholonomic ground robot subject to velocity and no-collision constraints. We compare performance with 3 other approaches.


[41] 2604.02105

DenOiS: Dual-Domain Denoising of Observation and Solution in Ultrasound Image Reconstruction

Medical imaging aims to recover underlying tissue properties, using inexact (simplified/linearized) imaging models and often from inaccurate and incomplete measurements. Analytical reconstruction methods rely on hand-crafted regularization, sensitive to noise assumptions and parameter tuning. Among deep learning alternatives, plug-and-play (PnP) approaches learn regularization while incorporating imaging physics during inference, outperforming purely data-driven methods. The performance of all these approaches, however, still strongly depends on measurement quality and imaging model accuracy. In this work, we propose DenOiS, a framework that denoises both input observations and resulting solution in their respective domains. It consists of an observation refinement strategy that corrects degraded measurements while compensating for imaging model simplifications, and a diffusion-based PnP reconstruction approach that remains robust under missing measurements. DenOiS enables generalization to real data from training only in simulations, resulting in high-fidelity image reconstruction with noisy observations and inexact imaging models. We demonstrate this for speed-of-sound imaging as a challenging setting of quantitative ultrasound image reconstruction.


[42] 2604.02157

Transformer-Accelerated Interpolated Data-Driven Reachability Analysis from Noisy Data

Data-driven reachability analysis provides guaranteed outer approximations of reachable sets from input-state measurements, yet each propagation step requires a matrix-zonotope multiplication whose cost grows with the horizon length, limiting scalability. We observe that data-driven propagation is inherently step-size sensitive, in the sense that set-valued operators at different discretization resolutions yield non-equivalent reachable sets at the same physical time, a property absent in model-based propagation. Exploiting this multi-resolution structure, we propose Interpolated Reachability Analysis (IRA), which computes a sparse chain of coarse anchor sets sequentially and reconstructs fine-resolution intermediate sets in parallel across coarse intervals. We derive a fully data-driven coarse-noise over-approximation that removes the need for continuous-time system knowledge, prove deterministic outer-approximation guarantees for all interpolated sets, and establish conditional tightness relative to the fine-resolution chain. To replace the remaining matrix-zonotope multiplications in the fine phase, we further develop Transformer-Accelerated IRA (TA-IRA), where an encoder-decoder Transformer is calibrated via split conformal prediction to provide finite-sample pointwise and path-wise coverage certificates. Numerical experiments on a five-dimensional linear system confirm the theoretical guarantees and demonstrate significant computational savings.


[43] 2604.02170

Dynamic resource coordination can increase grid hosting capacity to support more renewables, storage, and electrified load growth

We show that dynamic coordination of distributed energy resources (DERs) can increase the capacity of low- and medium-voltage grids, improve reliability and power quality, and reduce solar curtailment. We develop three approaches to compute hosting capacity on a representative distribution grid with realistic scenarios. A deterministic iterative method provides insight into how dynamic operation and DER interactions enhance capacity and affect power flows, demonstrating clear gains over static methods even with low-to-moderate levels of storage and flexible demand. A stochastic programming approach jointly optimizes DER siting and sizing, showing that nodal colocation and complementary effects expand the feasible region of solar, heat pump, and battery penetrations by over 22X. This enables up to 200% solar, 100% battery, and 90% heat pump penetration. Batteries emerge as the most critical technology, followed by heat pumps and electric vehicles. A Monte Carlo-based extension shows that uncertainty significantly impacts hosting capacity and grid metrics, with 46% higher volatility under dynamic operation.


[44] 2604.02173

Transformer-Enhanced Data-Driven Output Reachability with Conformal Coverage Guarantees

This paper considers output reachability analysis for linear time-invariant systems with unknown state-space matrices and unknown observation map, given only noisy input-output measurements. The Cayley--Hamilton theorem is applied to eliminate the latent state algebraically, producing an autoregressive input-output model whose parameter uncertainty is enclosed in a matrix zonotope. Set-valued propagation of this model yields output reachable sets with deterministic containment guarantees under a bounded aggregated residual assumption. The conservatism inherent in the lifted matrix-zonotope product is then mitigated by a decoder-only Transformer trained on labels obtained through directional contraction of the formal envelope via an exterior non-reachability certificate. Split conformal prediction restores distribution-free coverage at both per-step and trajectory levels without access to the true reachable-set hull. The framework is validated on a five-dimensional system with multiple unknown observation matrices.


[45] 2604.02181

Grey-Box Bayesian Optimization for ISAC in Fluid-Antenna Assisted Air-Ground Network

Fluid antenna systems (FAS) provide extra position agile spatial diversity for integrated sensing and communication (ISAC), by jointly optimizing the port selection and precoding. However, this optimization is challenging in air ground networks due to the intricate dual objective Pareto frontier, complex self-interference, and prohibitive channel state information overhead. To overcome these bottlenecks, this work proposes a novel grey box multi objective Bayesian optimization framework to address the joint design of discrete port selection and ISAC precoding. Unlike black box methods, this architecture explicitly leverages known physical system models to learn unknown channel constituents, dramatically reducing sample complexity. To navigate high dimensional combinatorial spaces, an adaptive trust region mechanism powered by expected hypervolume improvement (EHI) acquisition is implemented. Furthermore, the framework incorporates a spatio-temporal tracking strategy to handle the continuous mobility of users and targets, robustly capturing the drifting optimum in time varying environments. Simulations demonstrate that this framework achieves significantly faster convergence and discovers superior Pareto optimal configurations, validating its efficiency for dynamic real time FAS-ISAC deployments.


[46] 2604.02196

Computing the Exact Pareto Front in Average-Cost Multi-Objective Markov Decision Processes

Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front. Much of the literature approximates this front via scalarization into single-objective MDPs. Recent work has begun to characterize the full front in discounted or simple bi-objective settings by exploiting its geometry. In this work, we characterize the exact front in average-cost MOMDPs. We show that the front is a continuous, piecewise-linear surface lying on the boundary of a convex polytope. Each vertex corresponds to a deterministic policy, and adjacent vertices differ in exactly one state. Each edge is realized as a convex combination of the policies at its endpoints, with the mixing coefficient given in closed form. We apply these results to a remote state estimation problem, where each vertex on the front corresponds to a threshold policy. The exact Pareto front and solutions to certain non-convex MDPs can be obtained without explicitly solving any MDP.


[47] 2604.02205

Evaluation of gNB Monostatic Sensing for UAV Use Case

3GPP Release 19 has initiated the standardization of integrated sensing and communications (ISAC), including a channel model for monostatic sensing, evaluation scenarios, and performance assessment methodologies. These common assumptions provide an important basis for ISAC evaluation, but reproducible end-to-end studies still require a transparent sensing implementation. This paper evaluates 5G New Radio (NR) base station (gNB)-based monostatic sensing for the Unmanned Aerial Vehicle (UAV) use case using a 5G NR downlink Cyclic Prefix-Orthogonal Frequency Division Multiplexing (CP-OFDM) waveform and positioning reference signals (PRS), following 3GPP Urban Macro-Aerial Vehicle (UMa-AV) scenario assumptions. We present an end-to-end processing chain for multi-target detection and 3D localization, achieving more than 70% detection probability with less than 5% false alarm rate, in the considered scenario. For correctly detected targets, localization errors are on the order of a few meters, with a 90th-percentile error of 4m and 6m in the vertical and horizontal directions, respectively. To support reproducible baseline studies and further research, we release the simulator 5GNRad, which reproduces our evaluation


[48] 2604.02225

Stochastic Control for Organ Donations: A Review

We review the literature on individual patient organ acceptance decision making by presenting a Markov Decision Process (MDP) model to formulate the organ acceptance decision process as a stochastic control problem. Under the umbrella of the MDP framework, we classify and summarize the major research streams and contributions. In particular, we focus on control limit-type policies, which are shown to be optimal under certain conditions and easy to implement in practice. Finally, we briefly discuss open problems and directions for future research.


[49] 2604.02227

Sensitivity analysis for stopping criteria with application to organ transplantations

We consider a stopping problem and its application to the decision-making process regarding the optimal timing of organ transplantation for individual patients. At each decision period, the patient state is inspected and a decision is made whether to transplant. If the organ is transplanted, the process terminates; otherwise, the process continues until a transplant happens or the patient dies. Under suitable conditions, we show that there exists a control limit optimal policy. We propose a smoothed perturbation analysis (SPA) estimator for the gradient of the total expected discounted reward with respect to the control limit. Moreover, we show that the SPA estimator is asymptotically unbiased.


[50] 2604.02251

Data-Driven Koopman Predictive Control for Frequency Regulation of Power Systems using Black-Box IBRs

Model uncertainty of inverter-based resources (IBRs) presents significant challenges for power system control and stability. This work studies secondary frequency regulation in inverter-based power systems using a Data-driven Koopman Predictive Control (DKPC) framework. The method employs Koopman theory to lift the nonlinear system dynamics into a higher-dimensional space where they can be approximated as linear. Based on Willems' fundamental lemma, a behavioral model is constructed directly from lifted input-output data. A receding-horizon predictive control formulation is then provided that operates entirely using observed data, without requiring a parametric model, while satisfying explicit constraints on the control input and system output. The proposed approach is particularly suited for IBRs with complex or uncertain dynamics. Numerical results demonstrate its effectiveness for frequency control as benchmarked against the Data-enabled Predictive Control (DeePC). The trade-off between tracking performance and control effort is illustrated through tuning of the weighting parameters.


[51] 2604.02266

Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs

Orthogonal time frequency space (OTFS) modulation offers superior robustness to high-mobility channels compared to conventional orthogonal frequency-division multiplexing (OFDM) waveforms. However, its explicit delay-Doppler (DD) domain representation incurs substantial signal processing complexity, especially with increased DD domain grid sizes. To address this challenge, we present a scalable, real-time Zak-OTFS receiver architecture on GPUs through hardware--algorithm co-design that exploits DD-domain channel sparsity. Our design leverages compact matrix operations for key processing stages, a branchless iterative equalizer, and a structured sparse channel matrix of the DD domain channel matrix to significantly reduce computational and memory overhead. These optimizations enable low-latency processing that consistently meets the 99.9-th percentile real-time processing deadline. The proposed system achieves up to 906.52 Mbps throughput with a DD grid size of (16384,32) using 16QAM modulation over 245.76 MHz bandwidth. Extensive evaluations under a Vehicular-A channel model demonstrate strong scalability and robust performance across CPU (Intel Xeon) and multiple GPU platforms (NVIDIA Jetson Orin, RTX 6000 Ada, A100, and H200), highlighting the effectiveness of compute-aware Zak-OTFS receiver design for next-generation (NextG) high-mobility communication systems.


[52] 2604.02273

Selective State-Space Models for Koopman-based Data-driven Distribution System State Estimation

Distribution System State Estimation (DSSE) plays an increasingly-important role in modern power grids due to the integration of distributed energy resources (DERs). The inherent characteristics of distribution systems make classical estimation methods struggle, and recent advancements in data-driven learning methods, although promising, exhibit systematic failure in generalization and scalability that limits their applicability. In this work, we propose MambaDSSE, a model-free data-driven framework that incorporates Koopman-theoretic probabilistic filtering with a selective state-space model that learn to infer the underlying time-varying behavior of the system from data. We evaluate the model across a variety of test systems and scenarios, and demonstrate that the proposed method outperforms machine learning baselines on scalability, resilience to DER penetration levels, and robustness to data sampling rate irregularities. We further highlight the Mamba-based SSM's ability to capture long range dependencies from data, improving performance on the DSSE task.


[53] 2604.02326

ReVAR: A Data-Driven Algorithm for Generating Aero-Optic Phase Screens

The propagation of light through a turbulent flow field around an aircraft results in optical distortions commonly known as aero-optic effects. The development of methods to mitigate these effects requires large amounts of realistic aero-optic data. However, methods for obtaining this data, including experiment, computational fluid dynamics, and simple phase screen algorithms (e.g., boiling flow), each have significant drawbacks such as high cost, high computation, limited quantity, and/or inaccurate statistics. More recently, data-driven algorithms have been proposed that are computationally efficient and can synthesize aero-optic data to match the statistics of measured data, but these approaches still have drawbacks including limited quality, inaccurate statistics, and the use of complicated algorithms. In this paper, we introduce ReVAR (Re-whitened Vector AutoRegression), a data-driven algorithm for generating synthetic aero-optic data that matches the statistics of measured data. A key contribution in this algorithm is Long-Range AutoRegression, a linear predictive model that combines a standard autoregression with a set of low-pass filters of the data to fit both short-range and long-range temporal statistics. ReVAR uses Long-Range AR together with a spatial re-whitening step to convert measured aero-optic data to temporally and spatially un-correlated white noise. ReVAR can then generate synthetic aero-optic data by reversing this process using white noise input. Using two measured turbulent boundary layer data sets, we demonstrate that ReVAR better matches the measured data's temporal power spectrum and other key metrics than do two conventional phase screen generation methods and an existing single time-lag autoregressive model.


[54] 2401.15855

Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing

Remote sensing images present unique challenges to image analysis due to the extensive geographic coverage, hardware limitations, and misaligned multi-scale images. This paper revisits the classical multi-scale representation learning problem but under the general framework of self-supervised learning for remote sensing image understanding. We present Cross-Scale MAE, a self-supervised model built upon the Masked Auto-Encoder (MAE).During pre-training, Cross-Scale MAE employs scale augmentation techniques and enforces cross-scale consistency constraints through both contrastive and generative losses to ensure consistent and meaningful representations well-suited for a wide range of downstream tasks. Further, our implementation leverages the xFormers library to accelerate network pre-training on a single GPU while maintaining the quality of learned representations. Experimental evaluations demonstrate that Cross-Scale MAE exhibits superior performance compared to standard MAE and other state-of-the-art remote sensing MAE methods.


[55] 2603.17219

SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, where the marginal distribution of voxel intensities $P(\mathbf{x})$ varies non-linearly across acquisition protocols while the conditional anatomy $P(\mathbf{y}|\mathbf{x})$ remains constant. This is particularly detrimental to radiomic reproducibility, where acquisition variance often exceeds biological pathology variance. Existing statistical harmonization methods (e.g., ComBat) operate in feature space, precluding spatial downstream tasks, while standard deep learning approaches are theoretically bounded by local effective receptive fields (ERF), failing to model the global intensity correlations characteristic of field-strength bias. We propose SA-CycleGAN-2.5D, a domain adaptation framework motivated by the $H\Delta H$-divergence bound of Ben-David et al., integrating three architectural innovations: (1) A 2.5D tri-planar manifold injection preserving through-plane gradients $\nabla_z$ at $O(HW)$ complexity; (2) A U-ResNet generator with dense voxel-to-voxel self-attention, surpassing the $O(\sqrt{L})$ receptive field limit of CNNs to model global scanner field biases; and (3) A spectrally-normalized discriminator constraining the Lipschitz constant ($K_D \le 1$) for stable adversarial optimization. Evaluated on 654 glioma patients across two institutional domains (BraTS and UPenn-GBM), our method reduces Maximum Mean Discrepancy (MMD) by 99.1% ($1.729 \to 0.015$) and degrades domain classifier accuracy to near-chance (59.7%). Ablation confirms that global attention is statistically essential (Cohen's $d = 1.32$, $p < 0.001$) for the harder heterogeneous-to-homogeneous translation direction. By bridging 2D efficiency and 3D consistency, our framework yields voxel-level harmonized images that preserve tumor pathophysiology, enabling reproducible multi-center radiomic analysis.


[56] 2604.01234

CLPIPS: A Personalized Metric for AI-Generated Image Similarity

Iterative prompt refinement is central to reproducing target images with text to image generative models. Previous studies have incorporated image similarity metrics (ISMs) as additional feedback to human users. Existing ISMs such as LPIPS and CLIP provide objective measures of image likeness but often fail to align with human judgments, particularly in context specific or user driven tasks. In this paper, we introduce Customized Learned Perceptual Image Patch Similarity (CLPIPS), a customized extension of LPIPS that adapts a metric's notion of similarity directly to human judgments. We aim to explore whether lightweight, human augmented fine tuning can meaningfully improve perceptual alignment, positioning similarity metrics as adaptive components for human in the loop workflows with text to image tools. We evaluate CLPIPS on a human subject dataset in which participants iteratively regenerate target images and rank generated outputs by perceived similarity. Using margin ranking loss on human ranked image pairs, we fine tune only the LPIPS layer combination weights and assess alignment via Spearman rank correlation and Intraclass Correlation Coefficient. Our results show that CLPIPS achieves stronger correlation and agreement with human judgments than baseline LPIPS. Rather than optimizing absolute metric performance, our work emphasizes improving alignment consistency between metric predictions and human ranks, demonstrating that even limited human specific fine tuning can meaningfully enhance perceptual alignment in human in the loop text to image workflows.


[57] 2604.01247

Combining Masked Language Modeling and Cross-Modal Contrastive Learning for Prosody-Aware TTS

We investigate multi-stage pretraining for prosody modeling in diffusion-based TTS. A speaker-conditioned dual-stream encoder is trained with masked language modeling followed by SigLIP-style cross-modal contrastive learning using mixed-phoneme batches, with an additional same-phoneme refinement stage studied separately. We evaluate intrinsic text-audio retrieval and downstream synthesis in Grad-TTS and a latent diffusion TTS system. The two-stage curriculum (MLM + mixed-phoneme contrastive learning) achieves the best overall synthesis quality in terms of intelligibility, speaker similarity, and perceptual measures. Although same-phoneme refinement improves prosodic retrieval, it reduces phoneme discrimination and degrades synthesis. These findings indicate that improvements in embedding-space metrics do not necessarily translate to better generative performance and highlight the need to balance phoneme discrimination and prosodic sensitivity in TTS pretraining.


[58] 2604.01251

Camouflage-aware Image-Text Retrieval via Expert Collaboration

Camouflaged scene understanding (CSU) has attracted significant attention due to its broad practical implications. However, in this field, robust image-text cross-modal alignment remains under-explored, hindering deeper understanding of camouflaged scenarios and their related applications. To this end, we focus on the typical image-text retrieval task, and formulate a new task dubbed ``camouflage-aware image-text retrieval'' (CA-ITR). We first construct a dedicated camouflage image-text retrieval dataset (CamoIT), comprising $\sim$10.5K samples with multi-granularity textual annotations. Benchmark results conducted on CamoIT reveal the underlying challenges of CA-ITR for existing cutting-edge retrieval techniques, which are mainly caused by objects' camouflage properties as well as those complex image contents. As a solution, we propose a camouflage-expert collaborative network (CECNet), which features a dual-branch visual encoder: one branch captures holistic image representations, while the other incorporates a dedicated model to inject representations of camouflaged objects. A novel confidence-conditioned graph attention (C\textsuperscript{2}GA) mechanism is incorporated to exploit the complementarity across branches. Comparative experiments show that CECNet achieves $\sim$29% overall CA-ITR accuracy boost, surpassing seven representative retrieval models. The dataset and code will be available at this https URL.


[59] 2604.01254

Simulating Realistic LiDAR Data Under Adverse Weather for Autonomous Vehicles: A Physics-Informed Learning Approach

Accurate LiDAR simulation is crucial for autonomous driving, especially under adverse weather conditions. Existing methods struggle to capture the complex interactions between LiDAR signals and atmospheric phenomena, leading to unrealistic representations. This paper presents a physics-informed learning framework (PICWGAN) for generating realistic LiDAR data under adverse weather conditions. By integrating physicsdriven constraints for modeling signal attenuation and geometryconsistent degradations into a physics-informed learning pipeline, the proposed method reduces the sim-to-real gap. Evaluations on real-world datasets (CADC for snow, Boreas for rain) and the VoxelScape dataset show that our approach closely mimics realworld intensity patterns. Quantitative metrics, including MSE, SSIM, KL divergence, and Wasserstein distance, demonstrate statistically consistent intensity distributions. Additionally, models trained on data enhanced by our framework outperform baselines in downstream 3D object detection, achieving performance comparable to models trained on real-world data. These results highlight the effectiveness of the proposed approach in improving the realism of LiDAR data and enabling robust perception under adverse weather conditions.


[60] 2604.01321

Risk Control of Traffic Flow Through Chance Constraints and Large Deviation Approximation

Existing macroscopic traffic control methods often struggle to strictly regulate rare, safety-critical extreme events under stochastic disturbances. In this paper, we develop a rare chance-constrained optimal control framework for autonomous traffic management. To efficiently enforce these probabilistic safety specifications, we exploit a large deviation theory (LDT) based approximation method, which converts the original highly non-convex, sampling-heavy optimization problem into a tractable deterministic nonlinear programming problem. In addition, the proposed LDT-based reformulation exhibits superior computational scalability, as it maintains a constant computational burden regardless of the target violation probability level, effectively bypassing the extreme scaling bottlenecks of traditional sampling-based methods. The effectiveness of the proposed framework in achieving precise near-target probability control and superior computational efficiency over risk-averse baselines is illustrated through extensive numerical simulations across diverse traffic risk measures.


[61] 2604.01355

Sterile mosquito release via intelligent proportional controllers

The Sterile Insect Technique (SIT) against insect pests and insect vectors consists of releasing males that have been previously sterilized in order to reduce or eliminate a specific wild population. We study this complex control question via model-free control, ultra-local models, and intelligent proportional controllers that have already proven their effectiveness in various fields. They permit addressing, perhaps for the first time, the essential sampling question. Computer simulations are displayed and discussed.


[62] 2604.01362

Multipath Channel Metrics and Detection in Vascular Molecular Communication: A Wireless-Inspired Perspective

Motivated by classical communications engineering, early works in molecular communication (MC) largely adopted established modeling and signal processing concepts from wireless electromagnetic communication systems. In the context of the human cardiovascular system (CVS), MC channel models evolved from simple unbounded and single-duct environments mimicking individual blood vessels to complex vessel network (VN) topologies, generally at the expense of analytical tractability. Up until now, this has largely prohibited rigorous communication-theoretic analysis of large-scale VNs. In this work, we leverage a recently established closed-form analytical channel model for VNs, named mixture of inverse Gaussians for hemodynamic transport (MIGHT), to conduct the first systematic communication-theoretic study of MC in complex, large-scale VNs. Based on MIGHT, we derive a Poisson channel noise model and unveil structural analogies between multipath wireless communications (MWC) and advective-diffusive MC in VNs. In particular, we establish classical MWC metrics, namely the root mean squared (RMS) delay spread, the mean excess delay, and the coherence bandwidth, for MC in VNs and derive closed-form expressions for the channel frequency response and power delay profile (PDP). Building on this characterization, we propose a VN-adapted, coherent decision-feedback (DF) detector and show how the derived multipath metrics can inform the choice of critical system parameters like the symbol duration, the sampling time, and the memory length. Additionally, we evaluate the detector's performance in different VNs exhibiting inter-symbol interference (ISI). Together, these contributions open the door to a systematic, MWC-inspired MC system design for large-scale VNs.


[63] 2604.01371

AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction

Surgical action automation has progressed rapidly toward achieving surgeon-like dexterous control, driven primarily by advances in learning from demonstration and vision-language-action models. While these have demonstrated success in table-top experiments, translating them to clinical deployment remains challenging: current methods offer limited predictability on where instruments will interact on tissue surfaces and lack explicit conditioning inputs to enforce tool-action-specific safe interaction regions. Addressing this gap, we introduce AffordTissue, a multimodal framework for predicting tool-action specific tissue affordance regions as dense heatmaps during cholecystectomy. Our approach combines a temporal vision encoder capturing tool motion and tissue dynamics across multiple viewpoints, language conditioning enabling generalization across diverse instrument-action pairs, and a DiT-style decoder for dense affordance prediction. We establish the first tissue affordance benchmark by curating and annotating 15,638 video clips across 103 cholecystectomy procedures, covering six unique tool-action pairs involving four instruments (hook, grasper, scissors, clipper) and their associated tasks: dissection, grasping, clipping, and cutting. Experiments demonstrate substantial improvement over vision-language model baselines (20.6 px ASSD vs. 60.2 px for Molmo-VLM), showing that our task-specific architecture outperforms large-scale foundation models for dense surgical affordance prediction. By predicting tool-action specific tissue affordance regions, AffordTissue provides explicit spatial reasoning for safe surgical automation, potentially unlocking explicit policy guidance toward appropriate tissue regions and early safe stop when instruments deviate outside predicted safe zones.


[64] 2604.01403

Concentration of Stochastic System Trajectories with Time-varying Contraction Conditions

We establish two concentration inequalities for nonlinear stochastic system under time-varying contraction conditions. The key to our approach is an energy function termed Averaged Moment Generating Function (AMGF). By combining it with incremental stability analysis, we develop a concentration inequality that bounds the deviation between the stochastic system state and its deterministic counterpart. As this inequality is restricted to single time instance, we further combine AMGF with martingale-based methods to derive a concentration inequality that bounds the fluctuation of the entire stochastic trajectory. Additionally, by synthesizing the two results, we significantly improve the trajectory-level concentration inequality for strongly contractive systems. Given the probability level $1-\delta$, the derived inequalities ensure an $\mO(\sqrt{\log(1/\delta))}$ bound on the deviation of stochastic trajectories, which is tight under our assumptions. Our results are exemplified through a case study on stochastic safe control.


[65] 2604.01433

Semantically Annotated Multimodal Dataset for RF Interpretation and Prediction

Current limitations in wireless modeling and radio frequency (RF)-based AI are primarily driven by a lack of high-quality, measurement-based datasets that connect RF signals to their physical environments. RF heatmaps, the typical form of such data, are high-dimensional and complex but lack the geometric and semantic context needed for interpretation, constraining the development of supervised machine learning models. To address this bottleneck, we propose a new class of multimodal datasets that combines RF measurements with auxiliary modalities like high-resolution cameras and lidar to bridge the gap between RF signals and their physical causes. The proposed data collection will span diverse indoor and outdoor environments, featuring both static and dynamic scenarios, including human activities ranging from walking to subtle gestures. By achieving precise spatial and temporal co-registration and creating digital replicas for voxel-level annotation, this dataset will enable transformative AI research. Key tasks include the forward problem of predicting RF heatmaps from visual data to revolutionize wireless system design, and the inverse problem of inferring scene semantics from RF signals, creating a new form of RF-based perception.


[66] 2604.01450

Discrete-Time Event-Triggered Extremum Seeking

This paper proposes a discrete-time event-triggered extremum seeking control scheme for real-time optimization of nonlinear systems. Unlike conventional discrete-time implementations relying on periodic updates, the proposed approach updates the control input only when a state-dependent triggering condition is satisfied, reducing unnecessary actuation and communication. The resulting closed-loop system combines extremum seeking with an event-triggering mechanism that adaptively determines the input update instants. Using discrete-time averaging and Lyapunov analysis, we establish practical convergence of the trajectories to a neighborhood of the unknown extremum point and show exponential stability of the associated average dynamics. The proposed method preserves the optimization capability of classical extremum seeking while significantly reducing the number of input updates. Simulation results illustrate the effectiveness of the approach for resource-aware real-time optimization.


[67] 2604.01477

Soft MPCritic: Amortized Model Predictive Value Iteration

Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.


[68] 2604.01573

When is cumulative dose response monotonic? Analysis of incoherent feedforward motifs

We study the monotonicity of the cumulative dose response (cDR) for a class of incoherent feedforward motifs (IFFM) systems with linear intermediate dynamics and nonlinear output dynamics. While the instantaneous dose response (DR) may be nonmonotone with respect to the input, the cDR can still be monotone. To analyze this phenomenon, we derive an integral representation of the sensitivity of cDR with respect to the input and establish general sufficient conditions for both monotonicity and non-monotonicity. These results reduce the problem to verifying qualitative sign properties along system trajectories. We apply this framework to four canonical IFFM systems and obtain a complete characterization of their behavior. In particular, IFFM1 and IFFM3 exhibit monotone cDR despite potentially non-monotone DR, while IFFM2 is monotone already at the level of DR, which implies monotonicity of cDR. In contrast, IFFM4 violates these conditions, leading to a loss of monotonicity. Numerical simulations indicate that these properties persist beyond the structured initial conditions used in the analysis. Overall, our results provide a unified framework for understanding how network structure governs monotonicity in cumulative input-output responses.


[69] 2604.01712

Transformer self-attention encoder-decoder with multimodal deep learning for response time series forecasting and digital twin support in wind structural health monitoring

The wind-induced structural response forecasting capabilities of a novel transformer methodology are examined here. The model also provides a digital twin component for bridge structural health monitoring. Firstly, the approach uses the temporal characteristics of the system to train a forecasting model. Secondly, the vibration predictions are compared to the measured ones to detect large deviations. Finally, the identified cases are used as an early-warning indicator of structural change. The artificial intelligence-based model outperforms approaches for response forecasting as no assumption on wind stationarity or on structural normal vibration behavior is needed. Specifically, wind-excited dynamic behavior suffers from uncertainty related to obtaining poor predictions when the environmental or traffic conditions change. This results in a hard distinction of what constitutes normal vibration behavior. To this end, a framework is rigorously examined on real-world measurements from the Hardanger Bridge monitored by the Norwegian University of Science and Technology. The approach captures accurate structural behavior in realistic conditions, and with respect to the changes in the system excitation. The results, importantly, highlight the potential of transformer-based digital twin components to serve as next-generation tools for resilient infrastructure management, continuous learning, and adaptive monitoring over the system's lifecycle with respect to temporal characteristics.


[70] 2604.01755

Day-Ahead Offering for Virtual Power Plants: A Stochastic Linear Programming Reformulation and Projected Subgradient Method

Virtual power plants (VPPs) are an emerging paradigm that aggregates distributed energy resources (DERs) for coordinated participation in power systems, including bidding as a single dispatchable entity in the wholesale market. In this paper, we address a critical operational challenge for VPPs: the day-ahead offering problem under highly intermittent and uncertain DER outputs and market prices. The day-ahead offering problem determines the price-quantity pairs submitted by VPPs while balancing profit opportunities against operational uncertainties. First, we formulate the problem as a scenario-based two-stage stochastic adaptive robust optimization problem, where the uncertainty of the locational marginal prices follows a Markov process and DER uncertainty is characterized by static uncertainty sets. Then, motivated by the outer approximation principle of the column-and-constraint generation (CC&G) algorithm, we propose a novel inner approximation-based projected subgradient method. By exploiting the problem structure, we propose two novel approaches to improve computational tractability. First, we show that under mild modeling assumptions, the robust second-stage problem can be equivalently reformulated as a linear program (LP) with a nested resource allocation structure that is amenable to an efficient greedy algorithm. Furthermore, motivated by the computational efficiency of solving the reformulated primal second-stage problem and the isotonic structure of the first-stage feasible region, we propose an efficient projected subgradient algorithm to solve the overall stochastic LP problem. Extensive computational experiments using real-world data demonstrate that the overall projected subgradient descent method achieves about two orders of magnitude speedup over CC&G while maintaining solution quality.


[71] 2604.01778

Multi-Mode Pinching-Antenna Systems: Polarization-Aware Full-Wave Modeling and Optimization

Millimeter-wave and terahertz communications face a fundamental challenge: overcoming severe path loss without sacrificing spectral efficiency. Pinching antenna systems (PASS) address this by bringing radiators physically close to users, yet existing frameworks treat the waveguide as a mere transmission line, overlooking its inherent multi-mode capabilities and the critical role of polarization. This paper develops the first polarization-aware, full-wave electromagnetic model for multi-mode PASS (MMPASS), capturing spatial radiation patterns, modal polarization states, and polarization matching efficiency from first principles. Leveraging this physically grounded model, we reveal fundamental trade-offs among waveguide attenuation, atmospheric absorption, and geometric spreading, yielding closed-form solutions for optimal PA placement and orientation in single-user scenarios. Extending to multi-user settings, we propose a modular optimization framework that integrates fractional programming with closed-form polarization updates, scaling gracefully to arbitrary numbers of waveguides, PAs, and users. Numerical results show that MMPASS achieves up to a 167% increase in spectral efficiency compared with single-mode PASS. Moreover, when comparing MMPASS with its polarization-ignorant counterpart, polarization awareness alone improves the sum rate by up to 23%. By bridging rigorous electromagnetic theory with scalable optimization, MMPASS establishes a physically complete and practically viable foundation for future high-frequency wireless networks.


[72] 2604.01830

Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids

Topology control for power grid operation is a challenging sequential decision making problem because the action space grows combinatorially with the size of the grid and action evaluation through simulation is computationally expensive. We propose a physics-informed Reinforcement Learning framework that combines semi-Markov control with a Gibbs prior, that encodes the system's physics, over the action space. The decision is only taken when the grid enters a hazardous regime, while a graph neural network surrogate predicts the post action overload risk of feasible topology actions. These predictions are used to construct a physics-informed Gibbs prior that both selects a small state-dependent candidate set and reweights policy logits before action selection. In this way, our method reduces exploration difficulty and online simulation cost while preserving the flexibility of a learned policy. We evaluate the approach in three realistic benchmark environments of increasing difficulty. Across all settings, the proposed method achieves a strong balance between control quality and computational efficiency: it matches oracle-level performance while being approximately $6\times$ faster on the first benchmark, reaches $94.6\%$ of oracle reward with roughly $200\times$ lower decision time on the second one, and on the most challenging benchmark improves over a PPO baseline by up to $255\%$ in reward and $284\%$ in survived steps while remaining about $2.5\times$ faster than a strong specialized engineering baseline. These results show that our method provides an effective mechanism for topology control in power grids.


[73] 2604.01870

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler

In modern process industries, data-driven models are important tools for real-time monitoring when key performance indicators are difficult to measure directly. While accurate predictions are essential, reliable uncertainty quantification (UQ) is equally critical for safety, reliability, and decision-making, but remains a major challenge in current data-driven approaches. In this work, we introduce a diffusion-based posterior sampling framework that inherently produces well-calibrated predictive uncertainty via faithful posterior sampling, eliminating the need for post-hoc calibration. In extensive evaluations on synthetic distributions, the Raman-based phenylacetic acid soft sensor benchmark, and a real ammonia synthesis case study, our method achieves practical improvements over existing UQ techniques in both uncertainty calibration and predictive accuracy. These results highlight diffusion samplers as a principled and scalable paradigm for advancing uncertainty-aware modeling in industrial applications.


[74] 2604.01873

Scaled Relative Graphs and Dynamic Integral Quadratic Constraints: Connections and Computations for Nonlinear Systems

Scaled relative graphs (SRGs) enable graphical analysis and design of nonlinear systems. In this paper, we present a systematic approach for computing both soft and hard SRGs of nonlinear systems using dynamic integral quadratic constraints (IQCs). These constraints are exploited via application of the S-procedure to compute tractable SRG overbounds. In particular, we show that the multipliers associated with the IQCs define regions in the complex plane. Soft SRG computations are formulated through frequency-domain conditions, while hard SRGs are obtained via hard factorizations of multipliers and linear matrix inequalities. The overbounds are used to derive an SRG-based feedback stability result for Lur'e-type systems, providing a new graphical interpretation of classical IQC stability results with dynamic multipliers.


[75] 2604.01910

Quantum Networking Fundamentals: From Physical Protocols to Network Engineering

The realization of the Quantum Internet promises transformative capabilities in secure communication, distributed quantum computing, and high-precision metrology. However, transitioning from laboratory experiments to a scalable, multi-tenant network utility introduces deep orchestration challenges. Current development is often siloed within physics communities, prioritizing hardware, while the classical networking community lacks architectural models to manage fragile quantum resources. This tutorial bridges this divide by providing a network-centric view of quantum networking. We dismantle idealized assumptions in current simulators to address the "simulation-reality gap," recasting them as explicit control-plane constraints. To bridge this gap, we establish Software-Defined Quantum Networking (SDQN) as a prerequisite for scale, prioritizing a symbiotic, dual-plane architecture where classical control dictates quantum data flow. Specifically, we synthesize reference models for SDQN and the Quantum Network Operating System (QNOS) for hardware abstraction, and adapt a Quantum Network Utility Maximization (Q-NUM) framework as a unifying mathematical lens for engineers to reason about trade-offs between entanglement routing, scheduling, and fidelity. Furthermore, we analyze Distributed Quantum AI (DQAI) over imperfect networks as a case study, illustrating how physical constraints such as probabilistic stragglers and decoherence dictate application-layer viability. Ultimately, this tutorial equips network engineers with the tools required to transition quantum networking from a bespoke physics experiment into a programmable, multi-tenant global infrastructure.


[76] 2604.01937

Architectural Implications of the UK Cyber Security and Resilience Bill

The UK Cyber Security and Resilience (CS&R) Bill represents the most significant reform of UK cyber legislation since the Network and Information Systems (NIS) Regulations 2018. While existing analysis has addressed the Bill's regulatory requirements, there is a critical gap in guidance on the architectural implications for organisations that must achieve and demonstrate compliance. This paper argues that the CS&R Bill's provisions (expanded scope to managed service providers (MSPs), data centres, and critical suppliers; mandatory 24/72-hour dual incident reporting; supply chain security duties; and Secretary of State powers of direction-), collectively constitute an architectural forcing function that renders perimeter-centric and point-solution security postures structurally non-compliant. We present a systematic mapping of the Bill's key provisions to specific architectural requirements, demonstrate that Zero Trust Architecture (ZTA) provides the most coherent technical foundation for meeting these obligations, and propose a reference architecture and maturity-based adoption pathway for CISOs and security architects. The paper further addresses the cross-regulatory challenge facing UK financial services firms operating under simultaneous CS&R, DORA, and NIS2 obligations, and maps the architectural framework against the NCSC Cyber Assessment Framework v4.0. This work extends a companion practitioner guide to the Bill by translating regulatory analysis into actionable architectural strategy. Keywords: Cyber Security and Resilience Bill, Zero Trust Architecture, Security Architecture, Critical National Infrastructure, NIS Regulations, DORA, Supply Chain Security, NCSC CAF v4.0


[77] 2604.01959

Output Corridor Impulsive Control of First-order Continuous System with Non-local Attractivity Analysis

This paper addresses the design of an impulsive controller for a continuous scalar time-invariant linear plant that constitutes the simplest conceivable model of chemical kinetics. The model is ubiquitous in process control as well as pharmacometrics and readily generalizes to systems of Wiener structure. Given the impulsive nature of the feedback, the control problem formulation is particularly suited to discrete dosing applications in engineering and medicine, where both doses and inter-dose intervals are manipulated. Since the feedback controller acts at discrete time instants and employs both amplitude and frequency modulation, whereas the plant is continuous, the closed-loop system exhibits hybrid dynamics featuring complex nonlinear phenomena. The problem of confining the plant output to a predefined corridor of values is considered. The method at the heart of the proposed approach is to design a stable periodic solution, called a 1-cycle, whose one-dimensional orbit coincides with the predefined corridor. Conditions ensuring local and global attractivity of the 1-cycle are established. As a numerical illustration of the proposed approach, the problem of intravenous paracetamol dosing is considered.


[78] 2604.01991

Integrated Identification of Collaborative Robots for Robot Assisted 3D Printing Processes

In recent years, the integration of additive manufacturing (AM) and industrial robotics has opened new perspectives for the production of complex components, particularly in the automotive sector. Robot-assisted additive manufacturing processes overcome the dimensional and kinematic limitations of traditional Cartesian systems, enabling non-planar deposition and greater geometric flexibility. However, the increasing dynamic complexity of robotic manipulators introduces challenges related to precision, control, and error prediction. This work proposes a model-based approach equipped with an integrated identification procedure of the system's parameters, including the robot, the actuators and the controllers. We show that the integrated modeling procedure allows to obtain a reliable dynamic model even in the presence of sensory and programming limitations typical of collaborative robots. The manipulator's dynamic model is identified through an integrated five step methodology: starting with geometric and inertial analysis, followed by friction and controller parameters identification, all the way to the remaining parameters identification. The proposed procedure intrinsically ensures the physical consistency of the identified parameters. The identification approach is validated on a real world case study involving a 6-Degrees-Of-Freedom (DoFs) collaborative robot used in a thermoplastic extrusion process. The very good matching between the experimental results given by actual robot and those given by the identified model shows the potential enhancement of precision, control, and error prediction in Robot Assisted 3D Printing Processes.


[79] 2604.02025

Systematic Analyses of Reinforcement Learning Controllers in Signalized Urban Corridors

In this work, we extend our systematic capacity region perspective to multi-junction traffic networks, focussing on the special case of an urban corridor network. In particular, we train and evaluate centralized, fully decentralized, and parameter-sharing decentralized RL controllers, and compare their capacity regions and ATTs together with a classical baseline MaxPressure controller. Further, we show how the parametersharing controller may be generalised to be deployed on a larger network than it was originally trained on. In this setting, we show some initial findings that suggest that even though the junctions are not formally coordinated, traffic may self organise into `green waves'.


[80] 2604.02043

Tracking the emergence of linguistic structure in self-supervised models learning from speech

Self-supervised speech models learn effective representations of spoken language, which have been shown to reflect various aspects of linguistic structure. But when does such structure emerge in model training? We study the encoding of a wide range of linguistic structures, across layers and intermediate checkpoints of six Wav2Vec2 and HuBERT models trained on spoken Dutch. We find that different levels of linguistic structure show notably distinct layerwise patterns as well as learning trajectories, which can partially be explained by differences in their degree of abstraction from the acoustic signal and the timescale at which information from the input is integrated. Moreover, we find that the level at which pre-training objectives are defined strongly affects both the layerwise organization and the learning trajectories of linguistic structures, with greater parallelism induced by higher-order prediction tasks (i.e. iteratively refined pseudo-labels).


[81] 2604.02069

Fixed-time-stable ODE Representation of Lasso

Lasso problems arise in many areas, including signal processing, machine learning, and control, and are closely connected to sparse coding mechanisms observed in neuroscience. A continuous-time ordinary differential equation (ODE) representation of the Lasso problem not only enables its solution on analog computers but also provides a framework for interpreting neurophysiological phenomena. This article proposes a fixed-time-stable ODE representation of the Lasso problem by first transforming it into a smooth nonnegative quadratic program (QP) and then designing a projection-free Newton-based ODE representation of the Lasso problem by first transforming it into a smooth nonnegative quadratic program (QP) and then designing a projection-free Newton-based fixed-time-stable ODE system for solving the corresponding Karush-Kuhn-Tucker (KKT) conditions. Moreover, the settling time of the ODE is independent of the problem data and can be arbitrarily prescribed. Numerical experiments verify that the trajectory reaches the optimal solution within the prescribed time.


[82] 2604.02102

Prosodic ABX: A Language-Agnostic Method for Measuring Prosodic Contrast in Speech Representations

Speech representations from self-supervised speech models (S3Ms) are known to be sensitive to phonemic contrasts, but their sensitivity to prosodic contrasts has not been directly measured. The ABX discrimination task has been used to measure phonemic contrast in S3M representations via minimal pairs. We introduce prosodic ABX, an extension of this framework to evaluate prosodic contrast with only a handful of examples and no explicit labels. Also, we build and release a dataset of English and Japanese minimal pairs and use it along with a Mandarin dataset to evaluate contrast in English stress, Japanese pitch accent, and Mandarin tone. Finally, we show that model and layer rankings are often preserved across several experimental conditions, making it practical for low-resource settings.


[83] 2604.02132

Safe Control of Feedback-Interconnected Systems via Singular Perturbations

Control Barrier Functions (CBFs) have emerged as a powerful tool in the design of safety-critical controllers for nonlinear systems. In modern applications, complex systems often involve the feedback interconnection of subsystems evolving at different timescales, e.g., two parts from different physical domains (e.g., the electrical and mechanical parts of robotic systems) or a physical plant and an (optimization or control) algorithm. In these scenarios, safety constraints often involve only a portion of the overall system. Inspired by singular perturbations for stability analysis, we develop a formal procedure to lift a safety certificate designed on a reduced-order model to the overall feedback-interconnected system. Specifically, we show that under a sufficient timescale separation between slow and fast dynamics, a composite CBF can be designed to certify the forward invariance of the safe set for the interconnected system. As a result, the online safety filter only needs to be solved for the lower-dimensional, reduced-order model. We numerically test the proposed approach on: (i) a robotic arm with joint motor dynamics, and (ii) a physical plant driven by an optimization algorithm.


[84] 2604.02177

Explicit Distributed MPC: Reducing Computation and Communication Load by Exploiting Facet Properties

Classical Distributed Model Predictive Control (DiMPC) requires multiple iterations to achieve convergence, leading to high computational and communication burdens. This work focuses on the improvement of an iteration-free distributed MPC methodology that minimizes computational effort and communication load. The aforementioned methodology leverages multiparametric programming to compute explicit control laws offline for each subsystem, enabling real-time control without iterative data exchanges between subsystems. Extending our previous work on iteration-free DiMPC, here we introduce a FAcet-based Critical region Exploration Technique for iteration-free DiMPC (FACET-DiMPC) that further reduces computational complexity by leveraging facet properties to do targeted critical region exploration. Simulation results demonstrate that the developed method achieves comparable control performance to centralized methods, while significantly reducing communication overhead and computation time. In particular, the proposed methodology offers substantial efficiency gains in terms of the average computation time reduction of 98% compared to classic iterative DiMPC methods and 42% compared to iteration-free DiMPC methods, making it well-suited for real-time control applications with tight latency and computation constraints.


[85] 2604.02199

A unified framework for synchronization optimization in directed multiplex networks

The multiplex network paradigm has been instrumental in revealing many unexpected phenomena and dynamical regimes in complex interacting systems. Nevertheless, most of the current research focuses on undirected multiplex structures, whereas real-world systems predominantly involve directed interactions. Here, we present an analytical framework for attaining optimal synchronization in directed multiplex networks composed of phase oscillators, considering both frustrated and non-frustrated regimes. A multiplex synchrony alignment function (MSAF) is introduced for this purpose, whose formulation integrates structural properties and dynamical characteristics of the individual directed layers. Using this function, we derive two classes of frequency distributions: one that yields perfect synchronization at a prescribed coupling strength in the presence of phase-lag, and another that optimizes synchronization over a broad range of coupling strengths. Numerical simulations on various directed duplex topologies demonstrate that both frequency sets substantially outperform conventional distributions. We also explore network optimization through a directed link rewiring strategy aimed at minimizing the MSAF, along with a swapping algorithm for optimally assigning fixed frequencies on both layers of a given directed duplex network. Examination of synchrony-optimized directed networks uncovers three notable correlations: a positive relationship between frequency and out-degree, a negative correlation between neighboring frequencies, and an anti-correlation between mirror node frequencies across directed layers.


[86] 2604.02256

A virtual-variable-length method for robust inverse kinematics of multi-segment continuum robots

This paper proposes a new, robust method to solve the inverse kinematics (IK) of multi-segment continuum manipulators. Conventional Jacobian-based solvers, especially when initialized from neutral/rest configurations, often exhibit slow convergence and, in certain conditions, may fail to converge (deadlock). The Virtual-Variable-Length (VVL) method proposed here introduces fictitious variations of segments' length during the solution iteration, conferring virtual axial degrees of freedom that alleviate adverse behaviors and constraints, thus enabling or accelerating convergence. Comprehensive numerical experiments were conducted to compare the VVL method against benchmark Jacobian-based and Damped Least Square IK solvers. Across more than $1.8\times 10^6$ randomized trials covering manipulators with two to seven segments, the proposed approach achieved up to a 20$\%$ increase in convergence success rate over the benchmark and a 40-80$\%$ reduction in average iteration count under equivalent accuracy thresholds ($10^{-4}-10^{-8}$). While deadlocks are not restricted to workspace boundaries and may occur at arbitrary poses, our empirical study identifies boundary-proximal configurations as a frequent cause of failed convergence and the VVL method mitigates such occurrences over a statistical sample of test cases.


[87] 2301.05351

Data-driven Moving Horizon Estimation for Angular Velocity of Space Noncooperative Target in Eddy Current De-tumbling Mission

Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this method, model-free state estimation of the angular velocity can be achieved using only one historical trajectory data that satisfies the rank condition. With local linear approximation, the Willems fundamental lemma is extended to nonlinear autonomous systems, and the rank condition for the historical trajectory data is deduced. Then, a data-driven moving horizon estimation algorithm based on the M step Lyapunov function is designed, and the time-discount robust stability of the algorithm is given. In order to illustrate the effectiveness of the proposed algorithm, experiments and simulations are performed to estimate the angular velocity in eddy current de-tumbling with only de-tumbling torque measurement.


[88] 2410.06083

Characterizing simulation relations through control architectures in abstraction-based control

Abstraction-based control design is a promising approach for ensuring safety-critical control of complex cyber-physical systems. A key aspect of this methodology is the relation between the original and abstract systems, which ensures that the abstract controller can be transformed into a valid controller for the original system through a concretization procedure. In this paper, we provide a comprehensive and systematic framework that characterizes various simulation relations, through their associated concretization procedures. We introduce the concept of interfaced system, which universally enables a feedback refinement relation with the abstract system. This interfaced system encapsulates the specific characteristics of each simulation relation within an interface, enabling a plug-and-play control architecture. Our results demonstrate that the existence of a particular simulation relation between the concrete and abstract systems is equivalent to the implementability of a specific control architecture, which depends on the considered simulation relation. This allows us to introduce new types of relations, and to establish the advantages and drawbacks of different relations, which we exhibit through detailed examples.


[89] 2412.04727

Learning to Translate Noise for Robust Image Denoising

Deep learning-based image denoising techniques often struggle with poor generalization performance to out-of-distribution real-world noise. To tackle this challenge, we propose a novel noise translation framework that performs denoising on an image with translated noise rather than directly denoising an original noisy image. Specifically, our approach translates complex, unknown real-world noise into Gaussian noise, which is spatially uncorrelated and independent of image content, through a noise translation network. The translated noisy images are then processed by an image denoising network pretrained to effectively remove Gaussian noise, enabling robust and consistent denoising performance. We also design well-motivated loss functions and architectures for the noise translation network by leveraging the mathematical properties of Gaussian noise. Experimental results demonstrate that the proposed method substantially improves robustness and generalizability, outperforming state-of-the-art methods across diverse benchmarks. Visualized denoising results and the source code are available on our project page.


[90] 2501.15764

RIFT: Entropy-Optimised Fractional Wavelet Constellations for Ideal Time-Frequency Estimation

We introduce a new method for estimating the Ideal Time-Frequency Representation (ITFR) of complex nonstationary signals. The Reconstructive Ideal Fractional Transform (RIFT) computes a constellation of Continuous Fractional Wavelet Transforms (CFWTs) aligned to different local time-frequency curvatures. This constellation is combined into a single optimised time-frequency energy representation via a localised entropy-based sparsity measure, designed to resolve auto-terms and attenuate cross-terms. Finally, a positivity-constrained Lucy-Richardson deconvolution with total-variation regularisation is applied to estimate the ITFR, achieving auto-term resolution comparable to that of the Wigner-Ville Distribution (WVD), yielding the high-resolution RIFT representation. The required Cohen's class convolutional kernels are fully derived in the paper for the chosen CFWT constellations. Additionally, the optimisation yields an Instantaneous Phase Direction (IPD) field, which allows the localised curvature in speech or music extracts to be visualised and utilised within a Kalman tracking scheme, enabling the extraction of signal component trajectories and the construction of the Spline-RIFT variant. Evaluation on synthetic and real-world signals demonstrates the algorithm's ability to effectively suppress cross-terms and achieve superior time-frequency precision relative to competing methods. This advance holds significant potential for a wide range of applications requiring high-resolution cross-term-free time-frequency analysis.


[91] 2507.14194

Boosted Enhanced Quantile Regression Neural Networks with Spatiotemporal Permutation Entropy for Complex System Prognostics

This paper presents a novel framework for pattern prediction and system prognostics centered on Spatiotemporal Permutation Entropy analysis integrated with Boosted Enhanced Quantile Regression Neural Networks (BEQRNNs). We address the challenge of understanding complex dynamical patterns in multidimensional systems through an approach that combines entropy-based complexity measures with advanced neural architectures. The system leverages dual computational stages: first implementing spatiotemporal entropy extraction optimized for multiscale temporal and spatial data streams, followed by an integrated BEQRNN layer that enables probabilistic pattern prediction with uncertainty quantification. This architecture achieves 81.17% accuracy in spatiotemporal pattern classification with prediction horizons up to 200 time steps and maintains robust performance across diverse regimes. Field testing across chaotic attractors, reaction-diffusion systems, and industrial datasets shows a 79% increase in critical transition detection accuracy and 81.22% improvement in long-term prediction reliability. The framework's effectiveness in processing complex, multimodal entropy features demonstrates significant potential for real-time prognostic applications.


[92] 2508.02441

Computationally efficient Gauss-Newton reinforcement learning for model predictive control

Model predictive control (MPC) is widely used in process control due to its interpretability and ability to handle constraints. As a parametric policy in reinforcement learning (RL), MPC offers strong initial performance and low data requirements compared to black-box policies like neural networks. However, most RL methods rely on first-order updates, which scale well to large parameter spaces but converge at most linearly, making them inefficient when each policy update requires solving an optimal control problem, as is the case with MPC. While MPC policies are typically low parameterized and thus amenable to second-order approaches, existing second-order methods demand second-order policy derivatives, which can be computationally intractable. This work introduces a Gauss-Newton approximation of the deterministic policy Hessian that eliminates the need for second-order policy derivatives, enabling superlinear convergence with minimal computational overhead. To further improve robustness, we propose a momentum-based Hessian averaging scheme for stable training under noisy estimates coupled with an adaptive trustregion. We demonstrate the effectiveness of the approach on a nonlinear continuously stirred tank reactor (CSTR), showing faster convergence and improved data efficiency over state-of-the-art firstorder methods and deep RL approaches.


[93] 2509.09812

EDMD-Based Robust Observer Synthesis for Nonlinear Systems

This paper presents a data-driven approach for designing state observers for continuous-time nonlinear systems, where an extended dynamic mode decomposition (EDMD) procedure is used to identify an approximate linear lifted model. Since such a model on a finite-dimensional space spanned by the dictionary functions has an inevitable mismatch, we first establish, based on our theory of reproducing kernel Hilbert space with a linear--radial kernel, that the nonlinear error magnitude in the approximate linear model is sectorially bounded by the lifted state. The sector bound comprises a deterministic part due to the finite dictionary and a stochastic part due to the random data samples, and the observer design needs to account for both of these errors in a robust formulation. Hence, the observer synthesis is performed using linear matrix inequalities (LMIs), specified by the desired exponential decay rate of the observation error (when the system is asymptotically stable) or the L2 gain from the modeling error to the observation error. Numerical studies demonstrate the effectiveness and flexibility of the proposed method. As such, this work entails an explicit elementary use of linear systems theory for nonlinear state observation in a Koopman operator-theoretic framework.


[94] 2510.13714

DeDelayed: Deleting Remote Inference Delay via On-Device Correction

Video comprises the vast majority of bits that are generated daily, and is the primary signal driving current innovations in robotics, remote sensing, and wearable technology. Yet, the most powerful video understanding models are too expensive for the resource-constrained platforms used in these applications. One approach is to offload inference to the cloud; this gives access to GPUs capable of processing high-resolution videos in real time. But even with reliable, high-bandwidth communication channels, the combined latency of video encoding, model inference, and round-trip communication prohibits use for certain real-time applications. The alternative is to use fully local inference; but this places extreme constraints on computational and power costs, requiring smaller models and lower resolution, leading to degraded accuracy. To address these challenges, we propose Dedelayed, a real-time inference system that divides computation between a remote model operating on delayed video frames and a local model with access to the current frame. The remote model is trained to make predictions on anticipated future frames, which the local model incorporates into its prediction for the current frame. The local and remote models are jointly optimized with an autoencoder that limits the transmission bitrate required by the available downlink communication channel. We evaluate Dedelayed on the task of real-time streaming video segmentation using the BDD100k driving dataset. For a round trip delay of 100 ms, Dedelayed improves performance by 6.4 mIoU compared to fully local inference and 9.8 mIoU compared to remote inference -- an equivalent improvement to using a model ten times larger. We release our training code, pretrained models, and python library at this https URL .


[95] 2511.06131

A Multi-Criterion Approach to Smart EV Charging with CO2 Emissions and Cost Minimization

We study carbon-aware smart charging in a fossil-dominated grid by coupling a simplified hydro-thermal-renewable dispatch model with a tractable linear charging scheduler. The case study is informed by Vietnam's regional data. Thermal units remain dominant, renewables are time-varying, and hydropower is modeled through a single reservoir budget. From the day-ahead dispatch we derive hourly carbon intensity and a corresponding carbon-cost signal; these are combined with a local time-of-use tariff in the EV charging problem. The resulting weighted-sum linear program is multi-objective: by sweeping the trade-off coefficient, we recover the supported Pareto frontier between electricity cost and charging-associated emissions. In a 300-EV public-charging scenario with a 0.8 MW feeder cap, the proposed carbon-aware scheduler preserves the 19.8% bill reduction of a cost-only optimizer while lowering charging-associated emissions by 7.3%; a more carbon-focused tuning still remains 12.6% cheaper and 9.3% cleaner than a FIFO baseline. A hydro-sensitivity study shows that changing the reservoir budget by +/- 20% moves the mean grid carbon intensity from 360 to 466 g/kWh, yet the carbon-aware scheduler remains consistently cheaper and cleaner than FIFO. The dispatch and charging LPs solve in few milliseconds on a standard desktop computer, showing that the framework is lightweight enough for repeated day-ahead studies.


[96] 2512.01023

Approximating Analytically-Intractable Likelihood Densities with Deterministic Arithmetic for Optimal Particle Filtering

Particle filtering algorithms have enabled practical solutions to problems in autonomous robotics (self-driving cars, UAVs, warehouse robots), target tracking, and econometrics, with further applications in speech processing and medicine (patient monitoring). Yet, their inherent weakness at representing the likelihood of the observation (which often leads to particle degeneracy) remains unaddressed for real-time resource-constrained systems. Improvements such as the optimal proposal and auxiliary particle filter mitigate this issue under specific circumstances and with increased computational cost. This work presents a new particle filtering method and its implementation, which enables tunably-approximative representation of arbitrary likelihood densities as program transformations of parametric distributions. Our method leverages a recent computing platform thatcan perform deterministic computation on probability distributionrepresentations (UxHw) without relying on stochastic methods. For non-Gaussian non-linear systems and with an optimal-auxiliary particle filter, we benchmark the likelihood evaluation error and speed for a total of 294840 evaluation points. For such models, the results show that the UxHw method leads to as much as 37.7x speedup compared to the Monte Carlo alternative. For narrow uniform measurement uncertainty, the particle filter falsely assigns zero likelihood as much as 81.89% of the time whereas UxHw achieves 1.52% false-zero rate. The UxHw approach achieves filter RMSE improvement of as much as 18.9% (average 3.3%) over the Monte Carlo alternative.


[97] 2512.11556

ACCOR: Attention-Enhanced Complex-Valued Contrastive Learning for Occluded Object Classification Using mmWave Radar IQ Signals

Millimeter-wave (mmWave) radar provides robust sensing under adverse conditions and can penetrate thin materials for non-visual perception in industrial and robotic settings. Recent work with MIMO mmWave radar has demonstrated its ability to penetrate cardboard packaging for occluded object classification. However, existing models leave room for improvement and extensions across different sensing frequencies. Building on recent work with MIMO radar for occluded object classification, we propose ACCOR, an attention-enhanced complex-valued contrastive learning approach for radar, enabling robust occluded object classification. ACCOR processes complex-valued IQ radar signals via a complex-valued CNN backbone, a multi-head attention layer and a hybrid loss. The hybrid loss combines a weighted cross-entropy term with a supervised contrastive term. We extend an existing 64 GHz dataset with a new 67 GHz subset and evaluate performance across both bands. ACCOR achieves 96.60 % accuracy at 64 GHz and 93.59 % at 67 GHz on 10 objects, surpassing prior radar-specific and adapted image models. Results demonstrate the benefits of integrating complex-valued deep learning, attention, and contrastive learning for mmWave radar-based occluded object classification.


[98] 2512.17466

Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks

Optimal AP clustering and power allocation are critical in user-centric cell-free massive MIMO systems. Existing deep learning models lack flexibility to handle dynamic network configurations. Furthermore, many approaches overlook pilot contamination and suffer from high computational complexity. In this paper, we propose a lightweight transformer model that overcomes these limitations by jointly predicting AP clusters and powers solely from spatial coordinates of user devices and AP. Our model is architecture-agnostic to users load, handles both clustering and power allocation without channel estimation overhead, and eliminates pilot contamination by assigning users to AP within a pilot reuse constraint. We also incorporate a customized linear attention mechanism to capture user-AP interactions efficiently and enable linear scalability with respect to the number of users. Numerical results confirm the model's effectiveness in maximizing the minimum spectral efficiency and providing near-optimal performance while ensuring adaptability and scalability in dynamic scenarios.


[99] 2512.17585

SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis

This work introduces SkinGenBench, a systematic biomedical imaging benchmark that investigates how preprocessing complexity interacts with generative model choice for synthetic dermoscopic image augmentation and downstream melanoma diagnosis. Using a curated dataset of $14,116$ dermoscopic images from HAM10000 and MILK10K across five lesion classes, we evaluate the two representative generative paradigms: StyleGAN2-ADA and Denoising Diffusion Probabilistic Models (DDPMs) under basic geometric augmentation and advanced artifact removal pipelines. Synthetic melanoma images are assessed using established perceptual and distributional metrics (FID, KID, IS), feature space analysis, and their impact on diagnostic performance across five downstream classifiers. Experimental results demonstrate that generative architecture choice has a stronger influence on both image fidelity and diagnostic utility than preprocessing complexity. StyleGAN2-ADA consistently produced synthetic images more closely aligned with real data distributions, achieving the lowest FID ($\approx 65.5$) and KID ($\approx 0.05$), while diffusion models generated higher variance samples at the cost of reduced perceptual fidelity and class anchoring. Advanced artifact removal yielded only marginal improvements in generative metrics and provided limited downstream diagnostic gains, suggesting possible suppression of clinically relevant texture cues. In contrast, synthetic data augmentation substantially improved melanoma detection with $8$-$15$\% absolute gains in melanoma F1-score, and ViT-B/16 achieving F1 $\approx 0.88$ and ROC-AUC $\approx 0.98$, representing an improvement of approximately $14\%$ over non-augmented baselines. Our code can be found at this https URL


[100] 2601.03282

New Formulations and Discretization Insights for the Electric Autonomous Dial-a-Ride Problem

The Electric Autonomous Dial-a-Ride Problem (E-ADARP) involves routing and scheduling electric autonomous vehicles under battery capacity and partial recharging constraints, aiming to minimize total travel cost and excess ride time. In practice, operational data for time and state-of-charge (SoC) are often available only at a coarse granularity. This raises a natural question: can discretization be exploited to improve computational performance by enabling alternative formulation structures? To investigate this question, we develop three formulations reflecting different levels of discretization. The first is an improved event-based formulation (IEBF) with arc-flow SoC variables for the continuous-parameter E-ADARP, serving as a strengthened baseline. The latter two are fragment-based formulations designed for discretized inputs. The second is a time-space fragment-based formulation with continuous SoC arc-flow variables (TSFFCS), which discretizes time while keeping SoC continuous. The third is a battery-time-space fragment-based formulation (BTSFF), which discretizes both time and SoC. Here, an event denotes a tuple consisting of a location and a set of onboard customers, while a fragment denotes a partial path. Computational results show that IEBF improves upon the existing event-based formulation for the original E-ADARP. Under discretized settings, TSFFCS tends to outperform IEBF, particularly when recharging is frequent and time discretization is relatively coarse, indicating that time discretization can improve computational performance across a wide range of settings. In contrast, BTSFF rarely outperforms TSFFCS unless the number of reachable SoC levels is limited, suggesting that explicit SoC discretization is beneficial only in relatively restricted settings.


[101] 2601.19462

Physical Human-Robot Interaction: A Critical Review of Safety Constraints

This paper aims to provide a clear and rigorous understanding of commonly recognized safety constraints in physical human-robot interaction, particularly regarding ISO/TS 15066. We investigate the derivation of these constraints, critically examine the underlying assumptions, and evaluate their practical implications for system-level safety and performance in industrially relevant scenarios. Key design parameters within safety-critical control architectures are identified, and numerical examples are provided to quantify performance degradation arising from typical approximations and design decisions in manufacturing environments. Within this analysis, the fundamental role of energy in safety assessment is emphasized, providing focused insights into energy-based safety methodologies for collaborative industrial robot systems.


[102] 2602.23791

FluoCLIP: Stain-Aware Focus Quality Assessment in Fluorescence Microscopy

Accurate focus quality assessment (FQA) in fluorescence microscopy is challenging due to stain-dependent optical variations that induce heterogeneous focus behavior across images. Existing methods, however, treat focus quality as a stain-agnostic problem, assuming a shared global ordering. We formulate stain-aware FQA for fluorescence microscopy, showing that focus-rank relationships vary substantially across stains due to stain-dependent imaging characteristics and invalidate this assumption. To support this formulation, we introduce FluoMix, the first dataset for stain-aware FQA spanning multiple tissues, fluorescent stains, and focus levels. We further propose FluoCLIP, a two-stage vision-language framework that grounds stain semantics and enables stain-conditioned ordinal reasoning for focus prediction, effectively decoupling stain representation from ordinal structure. By explicitly modeling stain-dependent focus behavior, FluoCLIP consistently outperforms both conventional FQA methods and recent vision-language baselines, demonstrating strong generalization across diverse fluorescence microscopy conditions. Code and dataset are publicly available at this https URL.


[103] 2603.01316

Inter-Speaker Relative Cues for Two-Stage Text-Guided Target Speech Extraction

This paper investigates the use of relative cues for text-based target speech extraction (TSE). We first provide a theoretical justification for relative cues from the perspectives of human perception and label quantization, showing that relative cues preserve fine-grained distinctions that are often lost in absolute categorical representations for continuous-valued attributes. Building on this analysis, we propose a two-stage TSE framework in which a speech separation model first generates candidate sources, followed by a text-guided classifier that selects the target speaker based on embedding similarity. Within this framework, we train two separate classification models to evaluate the advantages of relative cues over independent cues in case of continuous-valued attributes, considering both classification accuracy and TSE performance. Experimental results demonstrate that (i) relative cues achieve higher overall classification accuracy and improved TSE performance compared with independent cues; (ii) the proposed two-stage framework substantially outperforms single-stage text-conditioned extraction methods on both signal-level and objective perceptual metrics; and (iii) several relative cues, including language, loudness, distance, temporal order, speaking duration, random cues, and all cues, can even surpass the performance of an enrollment-audio-based TSE system. Further analysis reveals notable differences in discriminative power across cue types, providing insights into the effectiveness of different relative cues for TSE.


[104] 2603.16880

NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

Electroencephalography (EEG) provides a non-invasive window into neural dynamics at high temporal resolution and plays a pivotal role in clinical neuroscience research. Despite this potential, prevailing computational approaches to EEG analysis remain largely confined to task-specific classification objectives or coarse-grained pattern recognition, offering limited support for clinically meaningful interpretation. To address these limitations, we introduce NeuroNarrator, the first generalist EEG-to-text foundation model designed to translate electrophysiological segments into precise clinical narratives. A cornerstone of this framework is the curation of NeuroCorpus-160K, the first harmonized large-scale resource pairing over 160,000 EEG segments with structured, clinically grounded natural-language descriptions. Our architecture first aligns temporal EEG waveforms with spatial topographic maps via a rigorous contrastive objective, establishing spectro-spatially grounded representations. Building on this grounding, we condition a Large Language Model through a state-space-inspired formulation that integrates historical temporal and spectral context to support coherent clinical narrative generation. This approach establishes a principled bridge between continuous signal dynamics and discrete clinical language, enabling interpretable narrative generation that facilitates expert interpretation and supports clinical reporting workflows. Extensive evaluations across diverse benchmarks and zero-shot transfer tasks highlight NeuroNarrator's capacity to integrate temporal, spectral, and spatial dynamics, positioning it as a foundational framework for time-frequency-aware, open-ended clinical interpretation of electrophysiological data.


[105] 2604.00277

Hybrid Energy-Based Models for Physical AI: Provably Stable Identification of Port-Hamiltonian Dynamics

Energy-based models (EBMs) implement inference as gradient descent on a learned Lyapunov function, yielding interpretable, structure-preserving alternatives to black-box neural ODEs and aligning naturally with physical AI. Yet their use in system identification remains limited, and existing architectures lack formal stability guarantees that globally preclude unstable modes. We address this gap by introducing an EBM framework for system identification with stable, dissipative, absorbing invariant dynamics. Unlike classical global Lyapunov stability, absorbing invariance expands the class of stability-preserving architectures, enabling more flexible and expressive EBMs. We extend EBM theory to nonsmooth activations by establishing negative energy dissipation via Clarke derivatives and deriving new conditions for radial unboundedness, exposing a stability-expressivity tradeoff in standard EBMs. To overcome this, we introduce a hybrid architecture with a dynamical visible layer and static hidden layers, prove absorbing invariance under mild assumptions, and show that these guarantees extend to port-Hamiltonian EBMs. Experiments on metric-deformed multi-well and ring systems validate the approach, showcasing how our hybrid EBM architecture combines expressivity with sound and provable safety guarantees by design.


[106] 2407.15828

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling

Spoken dialogue is essential for human-AI interactions, providing expressive capabilities beyond text. Developing effective spoken dialogue systems (SDSs) requires large-scale, high-quality, and diverse spoken dialogue corpora. However, existing datasets are often limited in size, spontaneity, or linguistic coherence. To address these limitations, we introduce J-CHAT, a 76,000-hour open-source Japanese spoken dialogue corpus. Constructed using an automated, language-independent methodology, J-CHAT ensures acoustic cleanliness, diversity, and natural spontaneity. The corpus is built from YouTube and podcast data, with extensive filtering and denoising to enhance quality. Experimental results with generative spoken dialogue language models trained on J-CHAT demonstrate its effectiveness for SDS development. By providing a robust foundation for training advanced dialogue models, we anticipate that J-CHAT will drive progress in human-AI dialogue research and applications.


[107] 2412.13748

On the Performance of Physical Layer Security for Continuous-Aperture Array (CAPA) Systems

A continuous-aperture array (CAPA)-based secure transmission framework is proposed to enhance physical layer security. Continuous current distributions, or beamformers, are designed to maximize the secrecy transmission rate under a power constraint and to minimize the required transmission power for achieving a specific target secrecy rate. On this basis, the fundamental secrecy performance limits achieved by CAPAs are analyzed by deriving closed-form expressions for the maximum secrecy rate (MSR) and minimum required power (MRP), along with the corresponding optimal current distributions. To provide further insights, asymptotic analyses are performed for the MSR and MRP, which reveals that i) for the MSR, the optimal current distribution simplifies to maximal ratio transmission (MRT) beamforming in the low-SNR regime and to zero-forcing (ZF) beamforming in the high-SNR regime; ii) for the MRP, the optimal current distribution simplifies to ZF beamforming in the high-SNR regime. The derived results are specialized to the typical array structures, e.g., planar CAPAs and planar spatially discrete arrays (SPDAs). The rate and power scaling laws are further analyzed by assuming an infinitely large CAPA. Numerical results demonstrate that: i) the proposed secure continuous beamforming design outperforms MRT and ZF beamforming in terms of both achievable secrecy rate and power efficiency; ii) CAPAs achieve superior secrecy performance compared to conventional SPDAs.


[108] 2504.04665

A Simultaneous Approach for Training Neural Differential-Algebraic Systems of Equations

Scientific machine learning is an emerging field that broadly describes the combination of scientific computing and machine learning to address challenges in science and engineering. Within the context of differential equations, this has produced highly influential methods, such as neural ordinary differential equations (NODEs). Recent works extend this line of research to consider neural differential-algebraic systems of equations (DAEs), where some unknown relationships within the DAE are learned from data. Training neural DAEs, similarly to neural ODEs, is computationally expensive, as it requires the solution of a DAE for every parameter update. Further, the rigorous consideration of algebraic constraints is difficult within common deep learning training algorithms such as stochastic gradient descent. In this work, we apply the simultaneous approach to neural DAE problems, resulting in a fully discretized nonlinear optimization problem, which is solved to local optimality and simultaneously obtains the neural network parameters and the solution to the corresponding DAE. We extend recent work demonstrating the simultaneous approach for neural ODEs, by presenting a general framework to solve neural DAEs, with explicit consideration of hybrid models, where some components of the DAE are known, e.g. physics-informed constraints. Furthermore, we present a general strategy for improving the performance and convergence of the nonlinear programming solver, based on solving an auxiliary problem for initialization and approximating Hessian terms. We achieve promising results in terms of accuracy, model generalizability and computational cost, across different problem settings such as sparse data, unobserved states and multiple trajectories. Lastly, we provide several promising future directions to improve the scalability and robustness of our approach.


[109] 2504.10739

HippoMM: Hippocampal-inspired Multimodal Memory for Long Audiovisual Event Understanding

Comprehending extended audiovisual experiences remains challenging for computational systems, particularly temporal integration and cross-modal associations fundamental to human episodic memory. We introduce HippoMM, a computational cognitive architecture that maps hippocampal mechanisms to solve these challenges. Rather than relying on scaling or architectural sophistication, HippoMM implements three integrated components: (i) Episodic Segmentation detects audiovisual input changes to split videos into discrete episodes, mirroring dentate gyrus pattern separation; (ii) Memory Consolidation compresses episodes into summaries with key features preserved, analogous to hippocampal memory formation; and (iii) Hierarchical Memory Retrieval first searches semantic summaries, then escalates via temporal window expansion around seed segments for cross-modal queries, mimicking CA3 pattern completion. These components jointly create an integrated system exceeding the sum of its parts. On our HippoVlog benchmark testing associative memory, HippoMM achieves state-of-the-art 78.2% accuracy while operating 5x faster than retrieval-augmented baselines. Our results demonstrate that cognitive architectures provide blueprints for next-generation multimodal understanding. The code and benchmark dataset are publicly available at this https URL.


[110] 2507.09681

Seamless High-Resolution Terrain Reconstruction: A Prior-Based Vision Transformer Approach

High-resolution elevation data is essential for hydrological modeling, hazard assessment, and environmental monitoring; however, globally consistent, fine-scale Digital Elevation Models (DEMs) remain unavailable. Very high-resolution single-view imagery enables the extraction of topographic information at the pixel level, allowing the reconstruction of fine terrain details over large spatial extents. In this paper, we present single-view-based DEM reconstruction shown to support practical analysis in GIS environments across multiple sub-national jurisdictions. Specifically, we produce high-resolution DEMs for large-scale basins, representing a substantial improvement over the 30 m resolution of globally available Shuttle Radar Topography Mission (SRTM) data. The DEMs are generated using a prior-based monocular depth foundation (MDE) model, extended in this work to the remote sensing height domain for high-resolution, globally consistent elevation reconstruction. We fine-tune the model by integrating low-resolution SRTM data as a global prior with high-resolution RGB imagery from the National Agriculture Imagery Program (NAIP), producing DEMs with near LiDAR-level accuracy. Our method achieves a 100x resolution enhancement (from 30 m to 30 cm), exceeding existing super-resolution approaches by an order of magnitude. Across two diverse landscapes, the model generalizes robustly, resolving fine-scale terrain features with a mean absolute error of less than 5 m relative to LiDAR and improving upon SRTM by up to 18 %. Hydrological analyses at both catchment and hillslope scales confirm the method's utility for hazard assessment and environmental monitoring, demonstrating improved streamflow representation and catchment delineation. Finally, we demonstrate the scalability of the framework by applying it across large geographic regions.


[111] 2510.00463

On the Adversarial Robustness of Learning-based Conformal Novelty Detection

This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on two powerful learning-based frameworks that come with finite-sample false discovery rate (FDR) control: one is AdaDetect (by Marandon et al., 2024) that is based on the positive-unlabeled classifier, and the other is a one-class classifier-based approach (by Bates et al., 2023). While they provide rigorous statistical guarantees under benign conditions, their behavior under adversarial perturbations remains underexplored. We first formulate an oracle attack setup, under the AdaDetect formulation, that quantifies the worst-case degradation of FDR, deriving an upper bound that characterizes the statistical cost of attacks. This idealized formulation directly motivates a practical and effective attack scheme that only requires query access to the output labels of both frameworks. Coupling these formulations with two popular and complementary black-box adversarial algorithms, we systematically evaluate the vulnerability of both frameworks on synthetic and real-world datasets. Our results show that adversarial perturbations can significantly increase the FDR while maintaining high detection power, exposing fundamental limitations of current error-controlled novelty detection methods and motivating the development of more robust alternatives.


[112] 2510.11491

Constraint-Aware Reinforcement Learning via Adaptive Action Scaling

Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.


[113] 2511.19336

Nonlinear MPC for Feedback-Interconnected Systems: a Suboptimal and Reduced-Order Model Approach

In this paper, we propose a suboptimal and reduced-order Model Predictive Control (MPC) architecture for discrete-time feedback-interconnected systems. The numerical MPC solver: (i) acts suboptimally, performing only a finite number of optimization iterations at each sampling instant, and (ii) relies only on a reduced-order model that neglects part of the system dynamics, either due to unmodeled effects or the presence of a low-level compensator. We prove that the closed-loop system resulting from the interconnection of the suboptimal and reduced-order MPC optimizer with the full-order plant has a globally exponentially stable equilibrium point. Specifically, we employ timescale separation arguments to characterize the interaction between the components of the feedback-interconnected system. The analysis relies on an appropriately tuned timescale parameter accounting for how fast the system dynamics are sampled. The theoretical results are validated through numerical simulations on a mechatronic system consisting of a pendulum actuated by a DC motor.


[114] 2512.14870

HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering

Video Large Language Models (Video-LLMs) are improving rapidly, yet current Video Question Answering (VideoQA) benchmarks often admit single-cue shortcuts, under-testing reasoning that must integrate evidence across time. We introduce HERBench, a benchmark designed to make multi-evidence integration unavoidable: each question requires at least three non-overlapping cues drawn from distinct video segments. HERBench contains 26,806 five-way multiple-choice questions across 12 compositional tasks. To make evidential demand measurable, we introduce the Minimum Required Frame-Set (MRFS), the smallest number of frames a model must fuse to answer correctly, and show that HERBench imposes higher evidential demand than prior benchmarks. Evaluating 13 state-of-the-art Video-LLMs yields only 31-42% accuracy, only modestly above the 20\% random-guess baseline. We disentangle this failure into two critical bottlenecks: (1) a retrieval deficit, where frame selectors overlook key evidence, and (2) a fusion deficit, where models fail to integrate information even when all necessary evidence is provided. HERBench thus provides a principled benchmark for studying robust multi-evidence video understanding.


[115] 2601.17641

RPNT: Robust Pre-trained Neural Transformer -- A Pathway for Generalized Motor Decoding

Brain motor decoding aims to interpret and translate neural activity into behaviors. Decoding models should generalize across variations, such as recordings from different brain sites, experimental sessions, behavior types, and subjects, will be critical for real-world applications. Current decoding models only partially address these challenges. In this work, we develop a pretrained neural transformer model, RPNT - Robust Pretrained Neural Transformer, designed to achieve robust generalization through pretraining, which in turn enables effective finetuning for downstream motor decoding tasks. We achieved the proposed RPNT architecture by systematically investigating which transformer building blocks could be suitable for neural spike activity modeling, since components from models developed for other modalities, such as text and images, do not transfer directly to neural data. The final RPNT architecture incorporates three unique enabling components: 1) Multidimensional rotary positional embedding to aggregate experimental metadata such as site coordinates, session ids and behavior types; 2) Context-based attention mechanism via convolution kernels operating on global attention to learn local temporal structures for handling non-stationarity of neural population activity; 3) Robust self-supervised learning objective with stochastic causal masking strategies and contrastive representations. We pretrained two versions of RPNT on distinct datasets that present significant generalization challenges: a) Multi-session, multi-task, and multi-subject microelectrode benchmark; b) Multi-site recordings using high-density Neuropixel 1.0 probes from many cortical locations. After pretraining, we evaluated RPNT generalization on cross-session, cross-type, cross-subject, and cross-site downstream behavior decoding tasks. Our RPNT consistently outperforms the existing decoding models on these tasks.


[116] 2601.20666

Learning Contextual Runtime Monitors for Safe AI-Based Autonomy

We introduce a novel framework for learning context-aware runtime monitors for AI-based control ensembles. Machine-learning (ML) controllers are increasingly deployed in (autonomous) cyber-physical systems because of their ability to solve complex decision-making tasks. However, their accuracy can degrade sharply in unfamiliar environments, creating significant safety concerns. Traditional ensemble methods aim to improve robustness by averaging or voting across multiple controllers, yet this often dilutes the specialized strengths that individual controllers exhibit in different operating contexts. We argue that, rather than blending controller outputs, a monitoring framework should identify and exploit these contextual strengths. In this paper, we reformulate the design of safe AI-based control ensembles as a contextual monitoring problem. A monitor continuously observes the system's context and selects the controller best suited to the current conditions. To achieve this, we cast monitor learning as a contextual learning task and draw on techniques from contextual multi-armed bandits. Our approach comes with two key benefits: (1) theoretical safety guarantees during controller selection, and (2) improved utilization of controller diversity. We validate our framework in two simulated autonomous driving scenarios, demonstrating significant improvements in both safety and performance compared to non-contextual baselines.


[117] 2602.11319

Coupler Position Optimization and Channel Estimation for Flexible Coupler Antenna Aided Multiuser Communication

In this paper, we propose a distributed flexible coupler antenna (FCA) array to enhance communication performance with low hardware cost. At each FCA, there is one fixed-position active antenna and multiple passive couplers that can move within a designated region around the active antenna. Moreover, each FCA is equipped with a local processing unit (LPU). All LPUs exchange signals with a central processing unit (CPU) for joint signal processing. We study an FCA-aided multiuser multiple-input multiple-output (MIMO) system, where an FCA array base station (BS) is deployed to enhance the downlink communication between the BS and multiple single-antenna users. We formulate optimization problems to maximize the achievable sum rate of users by jointly optimizing the coupler positions and digital beamforming, subject to movement constraints on the coupler positions and the transmit power constraint. To address the resulting nonconvex optimization problem, the digital beamforming is expressed as a function of the FCA position vectors, which are then optimized using the proposed distributed coupler position optimization algorithm. Considering a structured time domain pattern of pilots and coupler positions, pilot-assisted centralized and distributed channel estimation algorithms are designed under the FCA array architecture. Simulation results demonstrate that the distributed FCA array achieves substantial rate gains over conventional benchmarks in multiuser systems without moving active antennas, and approaches the performance of fully active arrays while significantly reducing hardware cost and power consumption. Moreover, the proposed channel estimation algorithms outperform the benchmark schemes in terms of both pilot overhead and channel reconstruction accuracy.


[118] 2603.00474

Wireless Power Control Based on Large Language Models

This paper investigates the power control problem in wireless networks by repurposing pre-trained large language models (LLMs) as relational reasoning backbones. In hyper-connected interference environments, traditional optimization methods face high computational cost, while standard message passing neural networks suffer from aggregation bottlenecks that can obscure critical high-interference structures. In response, we propose PC-LLM, a physics-informed framework that augments a pre-trained LLM with an interference-aware attention bias. The proposed bias tuning mechanism injects the physical channel gain matrix directly into the self-attention scores, enabling explicit fusion of wireless topology with pre-trained relational priors without retraining the backbone from scratch. Extensive experiments demonstrate that PC-LLM consistently outperforms both traditional optimization methods and state-of-the-art graph neural network baselines, while exhibiting exceptional zero-shot generalization to unseen environments. We further observe that topology-relevant relational reasoning is concentrated in shallow layers, whereas deeper layers encode task-irrelevant semantic noise. Motivated by this finding, we develop a lightweight adaptation strategy that reduces model depth by 50%, significantly lowering inference cost while preserving state-of-the-art spectral efficiency.