New articles on Electrical Engineering and Systems Science


[1] 2605.12533

Investigation of Chaotic Behavior in Clapp Oscillator

In this paper we investigate the chaotic behavior of the class of oscillators denoted as Clapp oscillators. Clapp oscillator is a simple oscillator containing one transistor and a few reactive elements - inductors and capacitors. This oscilllator is chosen for its design simplicity and a good performance. Oscillator with chaotic behavior can be used to construct chaotic radar. For that matter, in this paper is investigated approach for construction of the chaotic Clapp oscillator, which can be further verified experimentally using microstrip technology.


[2] 2605.12541

PG-LRF: Physiology-Guided Latent Rectified Flow for Electro-Hemodynamic PPG-to-ECG Generation

Electrocardiography (ECG) is the clinical standard for cardiac assessment but requires dedicated hardware that does not scale to daily-life monitoring. Photoplethysmography (PPG) is ubiquitous in wearables but lacks ECG-specific diagnostic morphology and is corrupted by motion and sensor noise. PPG-to-ECG generation aims to bridge this gap by recovering electrical morphology and timing from peripheral pulse signals. However, existing methods largely rely on statistical alignment and data-driven generation. They fail to explicitly structure the latent space around physiology-aware electro-hemodynamic factors and lack constraints from forward physiological dynamics. To address these challenges, we propose PG-LRF, a physiology-guided latent rectified flow framework. PG-LRF introduces an electro-hemodynamic simulator that co-models ECG and PPG through shared cardiac phase dynamics. Guided by this simulator, a Physiology-Aware AutoEncoder learns a structured electro-hemodynamic latent space. Then we integrate this simulator guidance into a PPG-conditioned latent rectified flow, enforcing ECG-side morphology consistency and ECG-to-PPG forward hemodynamic consistency during generative transport. Experiments on the large-scale MC-MED dataset demonstrate that PG-LRF significantly improves PPG-to-ECG generation and downstream cardiovascular disease classification, proving its ability to generate ECGs that are both signal-faithful and physiologically plausible under the ECG-to-PPG hemodynamic pathway


[3] 2605.12553

ChannelKAN: Multi-Scale Dual-Domain Channel Prediction via Hybrid CNN-KAN Architecture

Accurate channel state information (CSI) prediction is essential for improving the reliability and spectral efficiency of massive MIMO-OFDM systems in high-mobility scenarios. Existing deep learning methods struggle to jointly capture short-term local variations and long-range nonlinear dependencies in CSI sequences. To address this challenge, we propose ChannelKAN, a hybrid CNN-KAN channel prediction model with multi-scale frequency domain information enhancement. The key insight is that CNNs and Kolmogorov-Arnold Networks (KANs) are naturally complementary: CNNs extract intra-time-step local spatial-frequency correlations, while KANs with learnable Chebyshev polynomial activations fit inter-time-step nonlinear temporal evolution in a holistic manner. Specifically, a dual-domain expansion module first generates complementary frequency-domain and delay-domain CSI representations. A multi-scale frequency information enhancement module then retains dominant spectral components at multiple scales to strengthen key features and suppress noise. Next, a CNN-KAN feature extraction module captures local correlations via cascaded convolutions and models long-range dependencies via Chebyshev KAN layers. Finally, a dual-domain fusion module adaptively integrates features from both branches to produce the prediction. Experiments on 3GPP-compliant QuaDRiGa datasets demonstrate that ChannelKAN outperforms RNN, LSTM, GRU, CNN, and Transformer baselines in normalized mean square error (NMSE), spectral efficiency (SE), and bit error rate (BER) across various velocities and signal-to-noise ratios. Ablation studies further confirm the effectiveness of each proposed module.


[4] 2605.12557

Localization in OFDM Passive Distributed Antenna Systems with Pilots and Unknown Data Payloads: A Marginal Maximum Likelihood Approach

Integrated Sensing and Communications (ISAC) is emerging as a key paradigm for future Sixth-Generation (6G) networks, with communication-centric designs favored for their compatibility with existing standards. Communication signals contain both known deterministic pilot symbols and unknown random data payloads. Most localization approaches rely solely on pilots, discarding the position information contained in the data symbols, which constitute the majority of each transmitted frame. Alternatively, Decision-Directed (DD) approaches exploit data decisions, thereby inherently limiting positioning performance to that of the communication system. In this paper, we derive a Marginal Maximum Likelihood (MML) estimator that jointly leverages pilot and data payloads without requiring data decoding, enabling operation with high-order constellations and under challenging noise conditions. We consider an opportunistic scenario in which an Orthogonal Frequency-Division Multiplexing (OFDM) signal transmitted by a User Equipment (UE) is captured by a distributed receiver array. Through numerical simulations, we demonstrate that the proposed method achieves superior localization performance compared to existing approaches and consistently converges to the genie bound (where data symbols are assumed perfectly known) at a lower Signal-to-Noise Ratio (SNR) than any DD method. Furthermore, the proposed method remains robust to constellation size, unlike DD approaches, whose performance degrades with increasing modulation order. Finally, we provide a computational complexity analysis of the proposed method and the considered baselines, highlighting the impact of system parameters on their respective computational costs.


[5] 2605.12560

Brain Tumor Classification in MRI Images: A Computationally Efficient Convolutional Neural Network

Improving patient outcomes depends on the prompt and accurate diagnosis of brain tumors, but manual MRI scan analysis is still time-consuming and unreliable. Although deep learning has shown promise, many of the models that are now in use are computationally intensive and have difficulty handling the intrinsic complexity and variety of different types of brain tumors. In this work, we propose a lightweight yet high-performing Convolutional Neural Network (CNN) for multi-class brain tumor classification, employing MRI images to target gliomas, meningiomas, pituitary tumors, and healthy (no tumor) instances. The model was rigorously evaluated on two publicly accessible datasets from Figshare and Kaggle. Leveraging efficient feature extraction and optimized training strategies, our CNN achieved classification accuracies of 99.03% and 99.28%, along with ROC scores of 99.88% and 99.94% on Dataset 1 and Dataset 2, respectively-all while utilizing significantly fewer parameters than popular pre-trained architectures. In contrast to cutting-edge models like DenseNet201, MobileNetV2, VGG19, Xception, InceptionV3, and ResNet50, our approach consistently demonstrated superior performance with reduced computational overhead. These findings highlight the potential of the proposed model as a practical and reliable diagnostic aid in clinical environments.


[6] 2605.12562

Uncovering Latent Pathological Signatures in Pulmonary CT via Cross-Window Knowledge Distillation

Multi-window CT imaging captures complementary pathological information across anatomical structures of differing densities, yet existing deep learning methods fuse representations only at later stages, missing cross-density interactions. We propose a cross-window knowledge distillation framework in which student encoders learn latent clinical priors from a teacher trained on the most informative window. Evaluated retrospectively on three cohorts - COPD-CT-DF (n=719), RSNA PE (n=1,433), and an in-house CTEPD dataset (n=161) - distillation improved per-window AUC by 10.1-16.5 percentage points on COPD-CT-DF (0.75-0.81 to 0.90-0.94; all P<0.001), with ensemble AUC reaching 0.9960. Similar gains were observed on RSNA PE (0.80-0.83 to 0.90-0.92) and CTEPD (AUC 0.7481 vs. 0.6264). Cross-window distillation internalises pathological signatures invisible to supervised approaches, offering a generalisable solution for multi-window pulmonary CT analysis.


[7] 2605.12564

Multiport Antenna Q-factor

This article proposes an estimate of multiport antenna bandwidth based on a generalization of a single-port Q-factor. The explicit derivation is based on converting the stored energy matrix to its port equivalent and on the port parameters themselves. The work discusses the bandwidth dependencies on feeding and matching. Derived formulas are shown to utilize the total active reflection coefficient and allow for a single-frequency bandwidth evaluation. Examples comprising two different dipole arrays and electrically large patch antenna arrays validate the theory.


[8] 2605.12566

On Privacy-Preserving Image Transmission in Low-Altitude Networks: A Swin Transformer-Based Framework with Federated Learning

The rapid development of low-altitude economy has driven the proliferation of Unmanned Aerial Vehicle (UAV) applications, including logistics, inspection, and emergency response. However, transmitting high-volume image data from UAVs to ground stations faces significant challenges due to limited bandwidth and stringent privacy requirements. To address these issues, a Semantic Communication (SC) framework based on Federated Learning (FL) is proposed for efficient and privacy-preserving image transmission. A Swin Transformer-based Semantic Communication (STSC) architecture is designed to extract multi-scale semantic features under constrained bandwidth conditions. Dedicated communication and computing nodes are deployed on UAVs to enhance real-time coverage and flexibility. Meanwhile, a FL mechanism enables global model training across distributed devices without sharing raw data, thus preserving user privacy. Simulation experiments conducted on the CIFAR-10 dataset demonstrate that the proposed STSC framework achieves at least 5.7 dB improvement in Peak Signal-to-Noise Ratio (PSNR) compared to DeepJSCC baselines, while also showing superior convergence and generalization performance. The framework effectively integrates UAV-assisted deployment with SC and privacy protection, offering a practical solution for bandwidth-constrained image transmission in low-altitude networks.


[9] 2605.12569

Active Sensing with Meta-Reinforcement Learning for Emitter Localization from RF Observations

Global navigation satellite system (GNSS) interference poses a serious threat to reliable positioning, especially in indoor and multipath-rich environments where source localization is highly challenging. In this paper, we formulate GNSS interference localization as an active sensing problem and propose a reinforcement learning (RL) framework in which an agent sequentially explores the environment to infer the position of an emitter source from radio frequency (RF) observations acquired with a 2x2 patch antenna. The localization task is modeled as a partially observable decision process, since single-snapshot measurements are often ambiguous under multipath propagation and changing channel conditions. To address this, the proposed framework combines high-dimensional RF sensing with deep RL and recurrent policy learning. We investigate both value-based and policy-based approaches, namely Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO), and study their behavior under domain shift. The approach is evaluated on a simulated dataset generated with the Sionna ray-tracing module, which provides realistic propagation effects and diverse environment configurations. Experimental results show that the proposed method achieves a localization success rate of 80.1%, demonstrating the potential of RL for adaptive GNSS interference localization. Overall, the results highlight simulation-assisted training as a promising direction for robust interference localization in challenging propagation environments.


[10] 2605.12575

Are Compact Rationales Free? Measuring Tile Selection Headroom in Frozen WSI-MIL

Whole-slide image (WSI) multiple instance learning (MIL) classifiers can achieve strong slide-level AUC while leaving the full-bag prediction opaque. Attention scores are widely reused as post-hoc explanations, but high attention can reflect aggregation preference rather than a compact, model-sufficient rationale. We study post-hoc rationale highlighting for frozen WSI-MIL: given a trained classifier, can its slide-level prediction be recovered from a compact, output-consistent tile subset without retraining the backbone? We instantiate this with Finding Optimal Contextual Instances (FOCI), a lightweight rationale-readout layer over a frozen MIL backbone. FOCI is trained with model-output sufficiency and exclusion objectives over keep/drop tile subsets, evaluated with an insertion-style Sequential Reveal Protocol (SRP) adapted to WSI-MIL, and summarized by the Selection Headroom Index (SHI). Across three WSI benchmarks and seven MIL backbones, FOCI reveals that compact rationales are selection-headroom dependent: transformer and multi-branch attention aggregators can admit compact rationales, near-minimal attention-pooling baselines enter a selection-saturation regime, and hard-selection backbones can conflict with an external readout. For TransMIL, relative to its documented CLS-proxy ranking, FOCI reduces the Minimum Sufficient K (MSK) tile count by 32-56% across benchmarks, while ACMIL+FOCI attains the highest mean SHI (+0.465). Deletion-based perturbation and selected-only downstream evaluation provide complementary checks. These results position FOCI as a model-level interpretability and audit layer: selected tiles are not claims of clinical or pathologist-level diagnostic sufficiency, but candidate rationales that offer a compact, reviewable view of when a frozen MIL prediction can be localized to a small output-consistent subset.


[11] 2605.12578

Recurrent Transformer-Based Near- and Far-Field THz Wideband Channel Estimation for UM-MIMO

The integration of terahertz communications and ultra-massive multiple-input multiple-output (UM-MIMO) systems in 6G networks is motivated by their ability to enable unprecedented data rates, mitigate spectrum congestion, and enhance overall network performance. However, the enlarged antenna apertures and higher carrier frequencies in these systems increase the Rayleigh distance, causing users to span both the near-field and conventional far-field regions. Accurate spatial precoding thus requires exact channel estimation at the base station - a task made more challenging by the hybrid coexistence of near- and far-field effects and the limited number of digital chains available in hybrid beamforming architectures. In this paper, we propose a block recurrent transformer model to address this challenge. We demonstrate that a single transformer block equipped with state memory can be trained once and then iteratively applied for hybrid-field channel estimation. Furthermore, we train the model such that it generalizes to wireless channels with varying scatterer distances, different numbers of propagation paths, and wideband operation. Simulation results show that the proposed method achieves performance gains of approximately 5 dB and 7.5 dB in normalized mean squared error (NMSE) over state-of-the-art solutions in narrowband and wideband scenarios, respectively.


[12] 2605.12728

Grid-Orch: An LLM-Powered Orchestrator for Distribution Grid Simulation and Analytics

The power distribution engineering workforce faces a projected shortage of up to 1.5 million engineers by 2030, creating urgent demand for more accessible analysis tools. This paper introduces Grid-Orch, a framework that bridges Large Language Models (LLMs) and power system simulation through the Model Context Protocol (MCP), enabling engineers to perform complex distribution analyses via natural language. Using OpenDSS as the reference implementation, Grid-Orch provides 36 domain-specific tools across eleven categories, covering power flow, voltage analysis, quasi-static time series (QSTS) simulation, and automated optimization. A provider-agnostic LLM layer supports both cloud-hosted (Gemini, Claude) and locally deployed (Ollama, llama-cpp) models, enabling air-gapped operation for security-sensitive utility environments. Three optimization skills, capacitor placement, voltage violation analysis, and overvoltage mitigation, extend the platform beyond single-tool queries to multi-step engineering workflows. Grid-Orch is delivered as an interactive web platform with chat-based interaction, a QSTS dashboard, and feeder topology visualization, and renders simulation results inline. Workflow demonstrations show that distribution analyses formerly requiring hours of scripting, such as distributed energy resource (DER) interconnection screening, complete in under two minutes through natural language, producing numerically identical results to direct OpenDSS scripting.


[13] 2605.12753

Optimization in Sparse 2D to Dense 3D Weakly Supervised Learning: Application to Multi-Label Segmentation of Large ex vivo MRI Data

INTRODUCTION | Fully supervised 3D segmentation of high-resolution ex vivo MRI is limited by the prohibitive cost of volumetric annotation, forcing reliance on sparse 2D slices. Weakly supervised Sparse-to-Dense frameworks bridge this gap, but guidelines remain ambiguous regarding human-centric visual enhancements and transferring optimization strategies across dimensions. We analyze divergent regularization needs for multi-class segmentation of high-resolution ex vivo spinal cord MRI. METHODS | We used 9.4T MRI of multiple sclerosis spinal cords (>104,000 slices) with sparse annotations (428 slices). A 2D Teacher trained on sparse slices generated dense pseudo-labels to train a 3D Student. We systematically evaluated the impact of human-centric preprocessing, spatial augmentation, and soft-label regularization on both architectures. RESULTS | We identified a critical divergence in training dynamics. The 2D Teacher required strong spatial augmentation and soft-labeling to overcome data scarcity, improving White Matter Lesion Dice scores by >11 points. However, propagating these techniques to the 3D Student degraded its performance. Furthermore, human-centric preprocessing (e.g., CLAHE) disrupted global statistical cues, dropping Gray Matter Lesion Dice scores by ~25 points. DISCUSSION | Our study highlights a perception divergence (human-centric contrast enhancement harms machine models) and a regularization conflict across dimensions. 3D architectures trained on dense pseudo-labels exhibit fundamentally different optimization landscapes than 2D counterparts and require distinct, conservative regularization. Code and models: this https URL.


[14] 2605.12779

Safe and Energy-Aware Decentralized PDE-Constrained Optimization-Based Control of Multi-UAVs for Persistent Wildfire Suppression

This paper presents a safe and energy-aware optimization-based control framework for multi-UAV wildfire suppression under localization and motion uncertainties. We first develop a centralized density-based controller that couples UAV motion and water deployment in a wildfire-specific control Lyapunov function. This framework is then extended to a decentralized setting suitable for large-scale operations using only local information. The controllers use control barrier function constraints to enforce both danger zone avoidance and the ability to reach a charging region. Simulations and real quadcopter experiments demonstrate the controller's effectiveness in fire suppression while preserving safety and energy sufficiency over multiple charge cycles.


[15] 2605.12806

Cross-Harmonic Ambiguity-Aligned Multiport Parameter Estimation for Time-Floquet RIS

A time-Floquet reconfigurable intelligent surface (TF-RIS) periodically modulates its elements within a signaling interval, enabling frequency conversion and additional degrees of freedom compared with a conventional RIS. Time-Floquet multiport-network theory (TF-MNT) provides a physics-consistent model for TF-RISs that accounts for inter-element coupling, but its practical use requires estimating the underlying parameters when the TF-RIS design and radio environment are (partially) unknown. In this Letter, we propose a segmented estimation approach for constructing an accurate proxy TF-MNT model from end-to-end measurements. First, with the TF-RIS operated as a conventional RIS, we estimate conventional proxy MNT parameters independently at each considered time-Floquet harmonic. Second, under periodic time modulation, we align the inherent ambiguities among the per-harmonic conventional proxy MNT parameters, considering three measurement setups with different access to phase and harmonic information. Based on full-wave numerical simulations, we quantify the impact of the number of measurements and the noise level on the proxy-model accuracy. Finally, we demonstrate the performance loss incurred without the proposed ambiguity alignment in a canonical harmonic backscatter communications scenario.


[16] 2605.12971

Port-Hamiltonian Systems with Dissipation Potential: Modelling and Trajectory Tracking Control

Port-Hamiltonian systems (PHS) and interconnection and damping assignment passivity-based control (IDA-PBC) have achieved broad success in modelling and stabilisation of physical systems. However, the absence of a dedicated scalar potential for the momentum channel forces any modification of the momentum-dependent dynamics to proceed indirectly through the interconnection and damping matrices, rendering the matching partial differential equation (PDE) difficult to solve and complicating extensions to trajectory tracking. This paper proposes a port-Hamiltonian system with dissipation potential (PHS-DP), in which the damping matrix is replaced by scalar convex dissipation potentials, providing independent scalar objects for the momentum and auxiliary state channels and restoring the variational symmetry between stored and dissipated energy. Building on this framework, Dual Potential Shaping Control (DPSC) achieves trajectory tracking by sequentially shaping the potential energy and dissipation potentials without modifying the interconnection structure. Contraction of the closed-loop cascade is established via a hierarchical contraction argument, and the matching condition is satisfied automatically for any admissible choice of shaped potentials, requiring no PDE to be solved. In contrast to existing PDE-free energy shaping approaches, which achieve this by abandoning the port-Hamiltonian closed-loop structure and sacrificing physical interpretability, the proposed framework preserves the interconnection structure and retains a transparent energy-based interpretation at every stage of the design. Validation on a magnetic levitation system demonstrates tracking performance comparable to timed IDA-PBC with substantially reduced design complexity.


[17] 2605.13005

Geometry-Aware Multi-Armed Bandits for Antenna Beam Selection on Spheres, Tori, $\SO(3)$, and Reconfigurable Intelligent Surfaces

Beam alignment in mmWave phased arrays and RIS-assisted links is a stochastic bandit under both short TTI budgets and Doppler-induced non-stationarity. The arm space is a Riemannian manifold: $\sphere^2$ for steering, $\torus^n$ for phase combining, $\SO(3)$ for panel orientation, or the discrete torus $(\mathbb Z_B)^M$ with up to $K\!\sim\!10^{90}$ configurations for $B$-level RIS ($B\!=\!2^b$, $b$ bits/element); the intrinsic Matérn kernel of Borovitskiy et al.\ provides the base GP. We contribute two algorithmic pieces. \textbf{(C1)} A Kronecker-factorised intrinsic-product Matérn kernel on $(\mathbb Z_B)^M$ evaluating in $O(M)$ table lookups, making GP-UCB tractable at $K\sim 10^{90}$ where the extrinsic alternative is infeasible. \textbf{(C2)} AdaptiveGP-v2, an online sliding-window controller that selects $W$ by per-sample marginal likelihood, with predictive-variance and drift $z$-score reset triggers and a post-reset $\beta$-boost. On a four-speed ($v\!\in\!\{0.02,0.08,0.12,0.20\}$~km/h), $20$-seed paired campaign at $T\!=\!3000$, AdaptiveGP-v2 is statistically indistinguishable from the hand-tuned fixed-window oracle at every speed (Holm--Bonferroni-corrected paired differences cross zero); the operational benefit is the absence of a deployment-time per-speed calibration step, not a mean-regret improvement. On four static 3GPP-style mmWave benchmarks, intrinsic-kernel GP-UCB reduces cumulative regret by $25$--$45\%$ vs.\ codebook UCB1/Thompson and by $10$--$33\%$ vs.\ Euclidean-ambient GP-UCB on the toroidal arm spaces; a wideband OFDM ablation on a $100$~MHz channel confirms the advantage persists under frequency-selective fading ($\sim\!32$~Mbps/UE at initial access vs.\ UCB1). A third-party-simulator sanity check on Sionna CDL is reported in Section~V.


[18] 2605.13015

A General Bézier Tree Encoding Counterfactual Framework for Retinal-Vessel-Mediated Disease Analysis

The geometry of the retinal vessel is a key biomarker of vascular diseases, yet clinical evidence remains primarily observational. Existing generative counterfactuals intervene only at the image-level disease label, failing to isolate explicit anatomical structure. To address this limitation, we propose the Bézier Tree Encoding Counterfactual Framework (BTECF). By abstracting vascular networks into interconnected cubic-Bézier segments, BTECF establishes a disease-agnostic representation in which structural topology is explicitly preserved and atomically perturbable. Coupling this encoding with a diffusion-based generator enables parameter-level do-interventions on explicit geometric axes (e.g., tortuosity, caliber) while preserving background fundus textures. We validate BTECF on diabetic retinopathy, together with independent cohorts for ischemic stroke and Alzheimer's disease. Isolated counterfactual interventions produce dose-responsive shifts in classifier predictions; a matched pixel-drop control attenuates this response by an order of magnitude or more, ruling out out-of-distribution generation artifacts. By enforcing causal isolation between vessel topology and pixel-level confounders, BTECF provides a unified generative paradigm for hypothesis verification across systemic diseases. To support reproducibility, the code will be publicly released upon acceptance.


[19] 2605.13031

Relative Pose-Velocity Estimation Using Dual IMU Measurements and Relative Position Sensing

This paper addresses the problem of estimating the relative pose (position and orientation) and velocity of a vehicle with respect to a moving target, where both are equipped with Inertial Measurement Units (IMUs), assuming the availability of relative position or bearing measurements. The body-target relative dynamics are formulated on $\mathbf{SE}_2(3)$ and recast into a linear time-varying (LTV) model in the ambient space $\mathbb{R}^{15}$, on which a deterministic Riccati observer is designed. We analyze the uniform observability (UO) conditions required to guarantee global exponential convergence of the estimation error in the ambient space for both measurement cases. In the case of relative position measurements, UO requires only a persistence-of-excitation condition on the target acceleration, whereas for bearing measurements, additional conditions are required. Building on this, a nonlinear complementary filter on $\mathbf{SO}(3)$ is designed to provide a smooth estimate of the orientation component of the state with almost global asymptotic stability. Finally, simulation results are provided to validate the proposed solution.


[20] 2605.13061

Revisiting Voltage and Synchronization Stability Analysis in Converter-Integrated Weak Grids: Insights from Non-Minimum-Phase Zeros

The increasing penetration of converter-interfaced generators (CIGs) intensifies concerns over small-signal voltage and synchronization stability. While existing theories treat these two stability issues distinctly, practical wisdom in contrast employs a unified and static metric, short-circuit ratio (SCR), to assess both in weak grids. This paper aims to bridge this theory-practice gap by introducing the insight of non-minimum phase (NMP) zeros. First, we demonstrate that the two stability issues in weak grids originate from NMP zeros in the grid Jacobian transfer matrix: a zero at the origin corresponds to voltage instability, while low-frequency zeros impose fundamental constraints on synchronization dynamics. The traditional SCR is proven to be a special case of our proposed novel stability metric, NMP-zero (NMP-Z) factor, evaluated at the rated operating point. This establishes the theoretical foundation for the empirical success of SCR. Building on this insight, we then develop a unified stability assessment method for multi-converter systems. The method retains the simplicity of SCR, requiring only the NMP-Z factor together with individual CIG dynamic models and enabling stability margin assessment under various operating points. Our work provides a simple yet theoretically rigorous framework for stability analysis in CIG-integrated weak grids, with all theoretical findings and the proposed method validated through detailed time-domain simulations.


[21] 2605.13098

Impact of Terrestrial Blockage on the Coverage of Integrated Satellite-Terrestrial Networks

The integration of non-terrestrial networks (NTNs) with terrestrial networks (TNs) is an important step toward ubiquitous connectivity in sixth-generation (6G). Despite growing interest, the geometric impact of urban blockages on an integrated satellite-terrestrial network (ISTN) has not been rigorously quantified. In this paper, we develop a stochastic geometry-based analytical framework that incorporates a Boolean blockage model to characterize the downlink coverage probability of the ISTN and to provide insights for blockage-aware system design. Our analysis reveals that blockages affect satellite links in two competing ways: while they attenuate desired signals, they can also act as spatial shields that suppress aggregate interference. Leveraging this observation, we analytically show that satellite-terrestrial integration can enhance coverage probability across diverse environments ranging from open areas to dense urban deployments, offering a resilient and mathematically tractable approach to maintaining connectivity under heterogeneous blockage conditions.


[22] 2605.13120

D-Optimized Sampling Design for System Identification

Traditional system identification with multisine inputs relies on uniform sampling and periodic excitation to preserve Fourier orthogonality and avoid spectral leakage, limiting its use in scenarios with irregular sampling or nonperiodic inputs. This work investigates continuous-time system identification under nonperiodic multisine excitation and nonuniform sampling. We develop a nonparametric frequency response function estimator suited to such conditions and design irregular sampling schemes that enhance the informativeness of measurements and reduce spectral leakage. The proposed sampling scheme improve the statistical accuracy of system identification in settings where periodic excitation is impractical.


[23] 2605.13134

Security-Aware Planning and Control of Multi-Agent Systems with LTL Tasks

This paper presents a secure-by-construction planning and control framework for multi-agent systems subject to linear temporal logic (LTL) specifications. The framework protects sensitive information from a passive intruder with partial observations of the agents' motion. Security in multi-agent coordination is captured by two notions that prevent the intruder from inferring whether a secret task has been executed and from identifying the agent responsible for its execution. The proposed framework incorporates the security constraints directly into the LTL synthesis procedure by constructing a secure finite transition system that removes all paths violating these constraints. Standard LTL synthesis is then applied to this secure abstraction to generate discrete plans, which are then refined into dynamically feasible continuous trajectories. This synthesis procedure provides formal guarantees that the resulting behavior of the multi-agent system satisfies both the global LTL specification and the security constraints. The effectiveness of the proposed framework is demonstrated through a two-drone case study.


[24] 2605.13135

Subspace Pruning via Principal Vectors for Accurate Koopman-Based Approximations

The accuracy of Koopman operator approximations over finite-dimensional spaces relies critically on their invariance properties. These can be rigorously quantified via the principal angles between a candidate subspace and its image under the Koopman operator. This paper proposes a unified algebraic framework for subspace pruning designed to systematically refine the invariance error. We establish the geometric equivalence between consistency-based methods and principal-vector pruning, and build on this insight to introduce a hybrid strategy that balances between multiple and single principal vector pruning for improved numerical stability and scalability. We derive error bounds for the retention of approximate and external eigenfunctions, demonstrating that the multi-vector approach mitigates the numerical drift inherent to sequential pruning. To ensure scalability, we develop an efficient numerical update scheme based on rank-one modifications that reduces the computational complexity of tracking principal angles by an order of magnitude. Finally, we exploit the subspace obtained from the pruning algorithms to build a lifted linear model for state prediction that accounts for the trade-offs between improving invariance and minimizing state reconstruction error. Simulations demonstrate the effectiveness of our approach.


[25] 2605.13220

Real-time Gaussian Process based Approximate Model Predictive Trajectory Tracking Control for Autonomous Vehicles

Applying model predictive control on embedded systems remains challenging due to the high computational cost of solving optimal control problems. To address this limitation, computationally efficient Gaussian process approximations of the implicit model predictive control law can be employed. However, for trajectory-tracking applications, the large amount of training data required for successful generalization across distinct reference trajectories poses a significant challenge. To improve data efficiency, we propose to transform the model into curvilinear coordinates around the reference trajectory. Secondly, we use a nominal feedforward component, allowing the Gaussian process to learn only the residual control input, making the approximation of a trajectory-tracking controller feasible. To underline the applicability of the approach, we deploy the controller on a Raspberry Pi in a small-scale vehicle and validate it experimentally. Compared to a model predictive control implementation using real-time iterations, the Gaussian process based approximation computes control inputs about five times faster while achieving similar closed-loop tracking performance.


[26] 2605.13243

Spatial Competition for Low-Complexity Learned Image Compression

Autoencoder-based image codecs achieve state-of-the-art compression performance but often incur high computational complexity, particularly at decoding time. This work introduces a low-complexity learned image compression framework based on spatial competition between multiple specialized neural codecs. For each image region, the encoder selects the codec that best matches the local content according to a rate-distortion cost. A mode map is transmitted as side information to indicate the per-region codec selection. At decoding time, this mode map-based selection guides reconstruction while preserving the complexity of a single codec. This design enables per-image adaptation with low decoding complexity and fast encoding. On the CLIC 2020 dataset, our method achieves up to -14.5% rate reduction compared to a single codec and reaches HEVC-level performance with a decoding complexity of 1433 MACs per pixel.


[27] 2605.13248

Compact Latent Manifold Translation: A Parameter-Efficient Foundation Model for Cross-Modal and Cross-Frequency Physiological Signal Synthesis

The analysis of physiological time series, such as electrocardiograms (ECG) and photoplethysmograms (PPG), is persistently hindered by modality and frequency gaps stemming from heterogeneous recording devices. Existing foundation models typically rely on continuous latent spaces, which frequently suffer from severe modality entanglement, lack high-fidelity cross-frequency generative capacity, and impose high computational costs that prohibit edge-device deployment. In this paper, we propose Compact Latent Manifold Translation (CLMT), a highly parameter-efficient (0.09B) unified framework that bridges these gaps through a novel two-stage discrete translation paradigm. First, we introduce a Universal Tokenizer utilizing Hierarchical Residual Vector Quantization (RVQ) to decouple heterogeneous signals into isolated, well-structured discrete latent manifolds, effectively preventing inter-modality interference. Second, a Context-Prompted Latent Translator maps these discrete tokens across modalities by integrating static physiological priors, reframing complex signal synthesis as a pure latent sequence translation task. Extensive evaluations demonstrate that our 0.09B model significantly outperforms massive baselines. In cross-modal PPG-to-ECG synthesis, it resolves temporal phase drift and dramatically improves the clinical R-peak detection F1-score from 0.37 (baseline) to 0.83. Furthermore, in extreme cross-frequency super-resolution (25Hz to 100Hz), it successfully recovers high-frequency diagnostic landmarks, achieving an unprecedented Pearson correlation of 0.9956. By learning a universal discrete language for biological signals with a fraction of the computational footprint, our approach sets a new trajectory for edge-deployable, multi-modal medical foundation models.


[28] 2605.13269

Submodular Multi-Agent Policy Learning for Online Distributed Task Allocation in Open Multi-Agent Systems

This paper studies multi-agent reinforcement learning with submodular team utilities for online distributed task allocation. In this setting, each agent selects one action from a local categorical policy, so feasible joint actions form a partition matroid over agent-action pairs. Classical multilinear extensions use independent Bernoulli sampling and therefore do not match the categorical policies executed by decentralized agents. To address this mismatch, we introduce the Partition Multilinear Extension (PME), a continuous relaxation whose value equals the expected team utility under factorized categorical policies. We prove that submodular difference rewards provide unbiased PME marginal-gradient information and yield a stagewise score-function policy-gradient estimator. Based on this connection, we propose SubMAPG, a centralized-training decentralized-execution policy-gradient framework with masked categorical policies and submodular difference-reward training signals. For the associated PME marginal-space projected stochastic-gradient dynamics, we prove a stagewise 1/2-approximation guarantee and sublinear dynamic regret in slowly varying environments, measured by the path length of the optimal PME marginals. To handle open systems with time-varying agents and targets, we instantiate SubMAPG with graph neural network policies. Experiments on multi-robot coverage and multi-target tracking show that SubMAPG outperforms local greedy and shared-reward baselines and is competitive with centralized myopic greedy strategies.


[29] 2605.13286

Implementing Fluid Antennas in the Beamspace: Performance Evaluation and Codebook Design

Metasurface-based fluid antenna systems (FASs) have been recently proposed as an inexpensive, scalable and practical alternative implementation for the fluid-antenna concept. This work thoroughly evaluates the performance of metasurface-based FASs in the context of multi-user communications. We extend the state-of-the-art signal model of FASs to electronically-reconfigurable designs, explicitly including the antenna response in the equivalent channel and resulting correlation structure. A general codebook design procedure, accounting for practical aspects like reflections and radiation efficiency, is presented and used to design the different antenna configurations (regarded as FAS ports). Importantly, we show that, with proper design, metasurface-based FASs can significantly outperform conceptual ones. While state-of-the-art theoretical embodiments of FAS rely on spatial flexibility for constructive/destructive interference, metasurface-based FASs exploit interference cancellation through projection onto the interference null space. Numerical results show a remarkable improvement when the system is dominated by interference (i.e., the natural FASs operational regime), regardless of spatial propagation characteristics.


[30] 2605.13309

SimART: A Unified and Open Real-world Multimodal Simulation Platform for 6G Integrated Sensing and Communication

Research on sixth-generation (6G) integrated sensing and communication (ISAC) increasingly depends on multimodal datasets. These datasets need to jointly characterize wireless propagation, onboard sensing, and platform mobility. Existing tools cover only part of these aspects. Robotics simulators model physics and perception but not site-specific channels, while ray tracing and link level tools lack vehicle dynamics and onboard sensors. Combining them manually leads to workflows that are fragile and hard to reproduce. Rather than introducing another standalone simulator, this article presents SimART. It integrates mature robotics, ray tracing, and wireless evaluation engines into a single reproducible pipeline. The key idea is a robot operating system (ROS) backbone that both synchronizes and organizes all multimodal streams. A shared clock, a common coordinate frame, and timestamped messages keep the streams aligned in time and space, and a single rosbag recording captures the full session into one reproducible file. This design decouples the sensing front end from the wireless back end, so that any ROS-compatible simulator can be plugged in while reusing the same back end across aerial, ground, indoor, and maritime ISAC settings. On top of this backbone, SimART contributes a scene construction pipeline that converts both OpenStreetMap extracts and user-defined layouts into spatially aligned visual and electromagnetic assets, and a channel knowledge map (CKM) generator that aggregates ray tracing and system level outputs into spatial priors for ISAC algorithms. A case study on vision and position aided beam prediction demonstrates the utility of the platform. The code is publicly available at this https URL.


[31] 2605.13355

Impedance-Based VSC Unit Commitment with STATCOM Support under High IBG Penetration

The large-scale replacement of synchronous machines with inverter-based generation (IBG) introduces critical challenges to both voltage and frequency stability. This work builds on a mixed-integer second-order cone programming (MISOCP) framework that co-optimizes unit commitment (UC) model which embeds frequency-nadir constraints through synthetic inertia (SI) dispatch and an SOC voltage stability boundary for IBG buses. The formulation extends by modeling a STATCOM as a reactive-power decision variable in the same MISOCP model. A modified IEEE 30-bus system is used to assess three scheduling strategies: (i) baseline UC with SI only, (ii) voltage-stability-constrained (VSC) UC with SI, and (iii) the joint UC with SI and reactive power support from IBGs. The impact of incorporating a 30~MVAr STATCOM at a weak grid location near the IBG buses is investigated. Simulation results show that the proposed framework enhances voltage security, maintains frequency-nadir compliance, and reduces operating cost, while STATCOM integration further improves dispatch feasibility under high IBG.


[32] 2605.13390

Sensitivity Quantification for Distribution System State Estimation

Pseudo-measurements are the dominant source of uncertainty in distribution system state estimation (DSSE), yet their distributional assumptions are treated as fixed inputs by existing uncertainty quantification methods. This paper investigates whether the uncertainty bounds assumed by weighted least squares (WLS)-based DSSE are sensitive to these distributional assumptions, and whether this sensitivity is quantifiable using the Fisher Information Matrix (FIM). We propose a diagnostic framework that compares the true Cramér-Rao Bound (CRB) against the WLS-assumed CRB via a per-bus, per-scenario ratio, computed directly from the converged WLS solution. Pseudo-measurement distributions are varied across five types in 22 variants matched at equal spread to isolate shape effects from variance. Experiments on the CIGRE MV network across 100 operating scenarios yield three findings. First, heavy-tailed and skewed distributions show consistently that WLS systematically overstates its uncertainty bounds. Second, the degree of miscalibration varies across buses and operating scenarios, confirming that distributional sensitivity is not uniform. Third, the CRB ratio is structurally blind to mean-shift bias, exposing a fundamental limitation of variance-based uncertainty diagnostics. Together, these results confirm the hypothesis and show that the choice of pseudo-measurement distribution directly distorts the confidence limits under WLS-based assumptions, which must be explicitly accounted for in any uncertainty-aware DSSE method.


[33] 2605.13394

Decoupled Azimuth Elevation AoA Estimation Exploiting Kronecker Separable Steering Matrices

Uniform rectangular arrays (URA), structured non-uniform rectangular arrays (NURA), and parallelogram shaped (UPgA and NUPgA) arrays admit steering vectors that can be expressed as the Kronecker product of azimuth and elevation steering vectors. Accordingly, the full steering matrix can be represented as the Khatri Rao product of the corresponding azimuth and elevation steering matrices. This paper exploits this structure to develop an economical subspace decoupling framework for two dimensional angle of arrival (AoA) estimation. The proposed method first extracts the joint signal subspace from the spatial covariance matrix. Then it applies a low complexity decoupling scheme to recover the column spaces of the azimuth and elevation steering matrices. With the estimated decoupled subspaces, conventional one dimensional algorithms such as MUSIC, root MUSIC, and ESPRIT can be applied independently along each dimension, followed by pairing through a two dimensional spectral function. Monte Carlo simulations show that the proposed approach achieves higher accuracy than state of the art methods, i.e., two dimensional MUSIC, reduced-dimension MUSIC, and two-dimensional ESPRIT, for medium- and large scale arrays while requiring fewer snapshots, consequently with improved spectral efficiency.


[34] 2605.13453

Learning a Contracting KKL-observer with Local Optimal Guarantees

The Kazantzis-Kravaris-Luenberger (KKL) observer provides a general framework for nonlinear state estimation by immersing the system dynamics into a stable linear or nonlinear latent dynamics. However, the performance of KKL observers relies heavily on the specific choice of these latent dynamics, which is often heuristic. This paper proposes a methodology to learn a KKL observer that combines global stability guarantees with local optimality. We derive a condition on the latent dynamics such that the observer locally mimics the behavior of a Minimum Energy Estimator (Mortensen observer). We then employ Deep Learning to approximate the KKL transformation and the latent dynamics, using neural network architectures that structurally enforce the contraction property. The proposed strategy is validated through numerical simulations on nonlinear benchmarks, demonstrating a good performance in the presence of state and measurement noise.


[35] 2605.13502

A Multi-Modal Intelligent U2V Channel Model for 6G Sensing-Communication Integration

This paper proposes a novel UAV-to-Vehicle (U2V) channel model for sixth-generation (6G) intelligent sensing-communication integration, based on three-dimensional (3D) scatterer prediction. To explore the mapping relationship between physical environment and electromagnetic space, a new high-fidelity mixed sensing-communication integration U2V simulation dataset under wide-lane scenarios with different vehicular traffic densities (VTDs) and UAV heights is constructed. Based on the constructed dataset, a novel 3D Scatterer Prediction and Distribution Estimation (3D-SPADE) algorithm is proposed, which leverages LiDAR point clouds to accurately predict the spatial distribution of scatterers. Furthermore, the clustering of scatterers and the subsequent classification into dynamic and static types are meticulously designed for highly dynamic U2V scenarios, while reducing computational complexity and improving modeling accuracy. As LiDAR point clouds vary over time, dynamic and static clusters evolve via 3D-SPADE, enabling precise modeling of channel non-stationarity and consistency. Simulation results demonstrate that, in the wide-lane scenario with varying VTDs and UAV heights, the proposed 3D-SPADE consistently achieves high scatterer occupancy detection performance within the voxel grid. In particular, under favorable configurations, recall reaches 93.26%, and precision reaches 95.74%, highlighting the reliability of 3D-SPADE. Key channel statistical characteristics are simulated and analyzed. These characteristics from the simulation experiments are highly consistent with ray-tracing results and exhibit better agreement than with the standardized model and inconsistent model, validating the necessity of exploring the mapping relationship and the effectiveness of the proposed model.


[36] 2605.13516

Sensing-Assisted LoS/NLoS Identification in Dynamic UAV Positioning Systems

In this paper, a sensing-assisted non-line-of-sight (NLoS) identification method for dynamic uncrewed aerial vehicle (UAV) positioning is proposed for the first time. For urban UAV-to-ground scenarios, a new multi-modal sensing-communication integrated dataset is constructed to support line-of-sight (LoS)/NLoS identification, covering two typical urban scenarios and a wide range of flight altitudes. Based on the constructed dataset, a novel dual-input feature fusion network is proposed, which addresses the challenge of heterogeneous representations between RGB images and channel impulse response (CIR) data to enable the joint extraction and fusion of sensing and communication features for LoS/NLoS identification. Simulation results show that the identification accuracy can reach up to 97.69%, while achieving an improvement of at least 3.59% compared to traditional CIR-only and RGB-only methods. Moreover, strong few-shot generalization is observed, as the proposed method stabilizes and approaches full-sample performance with fewer than 200 target samples and exceeds traditional CIR-only and RGB-only methods with fewer than 100 target samples in all cross-scenario and cross-altitude experiments. Even under Gaussian noise with a variance of 0.35 applied to RGB images, the accuracy degradation remains approximately 0.5%. By utilizing the proposed LoS/NLoS identification method, the error of trilateration positioning can be reduced by approximately 70% in a crossroad scenario, verifying the utility of the proposed method.


[37] 2605.13524

Manifold-Aware Information Gain and Lower Bounds for Gaussian-Process Bandits on Riemannian Quotient Spaces

We prove a regret lower bound for Gaussian-process bandits on a smooth compact Riemannian manifold $\M$ of dimension $d$ with intrinsic Matérn-$\nu$ kernel ($\nu>d/2$) that exposes how the geometry of the arm space enters the constant. For any algorithm and time horizon $T$ exceeding an explicit threshold, the worst-case expected regret over the RKHS-ball $\|f\|_{\Hil_{k_\nu}}\!\le\!B$ satisfies \begin{multline*} \E[R_T(f)]\;\ge\;c_*(d,\nu)\,B^{d/(2\nu+d)}\,\sigma_n^{2\nu/(2\nu+d)} \\ \cdot\,\vol_g(\M)^{\nu/(2\nu+d)}\,T^{(\nu+d)/(2\nu+d)}(\log T)^{\nu/(2\nu+d)}. \end{multline*} The exponent matches the Vakili--Khezeli--Picheny upper bound \cite{vakili2021information}; the $\vol_g(\M)^{\nu/(2\nu+d)}$ factor is, to our knowledge, the first explicit volume-dependent geometric constant in a manifold GP-bandit lower bound. We extend the analysis in five directions: (i)~a companion Assouad-style proof gives a different lower bound with a strictly smaller $T$-exponent $(2\nu+3d)/(4(\nu+d))$ but with a polylog factor of the form $1/(\log\log T)^{(2\nu+d)/(4(\nu+d))}$, sharpening the $(\log T)^{\nu/(2\nu+d)}$ Fano polylog of Theorem~\ref{thm:main}; (ii)~we prove a $|G|^{1/2}$ upper bound on the regret of an extrinsic-kernel GP-UCB algorithm on a quotient space $\M=\Mt/G$, plus a bracketing theorem (Theorem~\ref{thm:gauge-bracket}); the precise constant is conjectured to take the modulated form $(1+(|G|-1)h(\rinj/\kappa))^{1/2}$ (Conjecture~\ref{conj:gauge-modulated}), validated numerically on $\SO(3)$; (iii)~we write the leading constant $c_*(d,\nu)$ out fully; (iv)~we extract a curvature dependence $1+O(K\eps_T^2)$ via Bishop--Gromov; (v)~we transfer the bound to the Bayesian regret framework via the Yang--Barron / Castillo et al.\ Bayesian-Fano transfer.


[38] 2605.13529

Decentralized Frequency-Domain Conditions for D-Stability with Application to DC Microgrids

This paper proposes a decentralized method for regional pole placement, or $\mathcal{D}$-stability, in linearized networked systems. Existing LMI-based methods are hindered by confidentiality concerns regarding proprietary subsystem models and the absence of communication infrastructures. To overcome these barriers, we map the target region $\mathcal{D}$ of pole placement to an auxiliary left-half plane and introduce positive functions to handle the resulting complex-coefficient dynamics. We prove that $\mathcal{D}$-stability is guaranteed via local frequency-domain criteria without requiring shared subsystem models or inter-subsystem communication. This method is then tailored to DC microgrids, where a loop transformation is utilized to reallocate the burden of stability certification, deriving a broadcastable grid code for decentralized parameter synthesis. Numerical examples verify the efficacy of the proposed method.


[39] 2605.13580

Joint Segment Activation and Antenna Placement for Uplink SWAN Systems

This article analyzes the achievable sum-rate of multiuser uplink segmented waveguide-enabled pinching-antenna systems (SWANs). To unveil system-design insights, an upper bound on the achievable sum-rate is derived, based on which the existence of an optimal segment activation level is theoretically established. Motivated by this result, hybrid segment selection and aggregation (HSS/A) schemes are proposed to jointly optimize segment activation and pinching-antenna (PA) placement. Correspondingly, low-complexity greedy algorithms are developed for the considered optimization problem. Numerical results validate the theoretical analysis and demonstrate that the proposed HSS/A schemes outperform conventional full-segment aggregation.


[40] 2605.13661

Air-Sea Surface Modeling and Operating Link Range Evaluation for AUV-to-UAV Optical Wireless Communication Links

Air-sea surface interactions play a critical role in underwater-to-air optical wireless communication (OWC) links, particularly in vertical autonomous underwater vehicle (AUV) to unmanned aerial vehicle (UAV) scenarios, where the stochastic nature of the sea surface introduces optical distortions that impair link reliability. This work investigates the impact of air-sea surface roughness on AUV-to-UAV OWC systems using two experimentally validated models: the classical Cox-Munk and the Elfouhaily-Chapron-Katsaros-Vandemark (ECKV). A tractable analytical representation of the ECKV model is derived and validated against measured sea-state data. Using both analytical and Monte Carlo approaches, the link ergodic capacity is evaluated with particular emphasis on operating range, pointing errors, receiver field-of-view, and solar noise level, providing practical system design insights.


[41] 2605.13669

Bounded-Input True Proportional Navigation for Impact-Time Control

This paper proposes a nonlinear guidance strategy capable of intercepting a constant-velocity, non-maneuvering target while strictly satisfying the prescribed bounds on the control input (commanded acceleration). Unlike conventional strategies that estimate time-to-go using linearization or small-angle approximations, the proposed strategy employs true proportional-navigation guidance (TPNG) as a baseline, which utilizes an exact time-to-go formulation and is applicable over a wide range of target motions. In contrast to most existing strategies, which do not incorporate control input bounds into the guidance design, the proposed approach explicitly accounts for these limits by modeling the interceptor acceleration as a dynamic variable. Based on the sliding mode control technique, an effective guidance law that achieves time-constrained interception while accounting for bounded input is then derived. The performance of the proposed strategy is evaluated for various engagement scenarios.


[42] 2605.13720

An Underwater Dehazing Network with Implicit Transmission Estimation

Underwater images suffer from wavelength-dependent light absorption and scattering, which reduces visual quality. This phenomenon could limit the operational reliability of autonomous underwater vehicles, marine surveys, and offshore inspection systems. Purely classical methods often achieve suboptimal performance in real-world datasets, while purely data-driven methods lack physical interpretability. In this letter, we propose UDehaze-iT, a deep network for underwater image enhancement that estimates scene depth implicitly and derives per-channel transmission through the Beer-Lambert law with learnable attenuation coefficients. We estimate atmospheric light as a semi-classical per-channel scalar, and a zero-initialized residual refiner corrects remaining artefacts after dehazing. To effectively train our method, we apply a composite loss function consisting of five key terms: a L1 loss, a multi-scale patchwise DCT loss, a forward model reconstruction loss, and two regularization terms. With ~0.9M parameters, UDehaze-iT achieves competitive performance on UIEB and UFO-120 datasets.


[43] 2605.13836

Reachable-Set Decomposition for Real-Time Aggregation of Multi-Zone HVAC Fleets

Aggregating building heating, ventilation, and air-conditioning (HVAC) fleets provides substantial real-time flexibility to power system operations. However, real-time aggregation of multi-zone HVAC fleets faces two key challenges: (i) strong coupling across zones and time makes flexibility characterization high-dimensional and computationally demanding, and (ii) the sequential revelation of temperature states and exogenous conditions requires that decisions made at each period preserve feasibility over the remaining horizon using only currently realized information. To address these challenges, this paper proposes a reachable-set decomposition framework comprising an offline decomposition stage and a real-time policy. In the offline stage, backward reachable sets are formulated to encode remaining-horizon feasibility into per-period state constraints, so that any state within the current reachable set is guaranteed to sustain feasible operation over the entire remaining horizon. A tailored inner approximation is then developed for tractable calculation in multi-zone-coupled HVAC settings. In the real-time stage, aggregate flexibility is computed efficiently via building-level parallel linear programs followed by closed-form Minkowski summation of power intervals, and any regulation signal within the reported flexibility interval admits a recursively feasible disaggregation. Case studies demonstrate the effectiveness of the proposed framework in aggregate flexibility characterization, disaggregation feasibility, and scalable computation.


[44] 2605.12506

Scale-Gest: Scalable Model-Space Synthesis and Runtime Selection for On-Device Gesture Detection

Realizing on-device ML-based gesture detection under tight real-time performance, energy and memory constraints is challenging, especially when considering mobile devices with varying battery-power levels. Existing EdgeAI deployments typically rely on a single fixed detector, limiting optimization opportunities. We present Scale-Gest, a novel run-time adaptive gesture detection framework that expands the detector space into a dense family of tiny-YOLO architectures. We introduce multiple novel device-calibrated ACE (Accuracy-Complexity-Energy) profiles by analyzing different model-resolution-stride operating points. A lightweight run-time controller selects an appropriate ACE mode under user-defined and battery constraints, while a motion-aware hand-gesture-tracking ROI gate crops the input for reduced complexity detection. To evaluate performance of our system in real-world car driving scenarios, we introduce a temporally-annotated Driver Simulated Gesture (DSG-18) dataset. Scale-Gest maintains event-level F1 while significantly reducing energy and latency compared to single-detector approaches. On a battery-powered laptop running gesture streams, our ACE controller reduces per-frame energy by 4x (from 6.9 mJ to 1.6 mJ) while maintaining high gesture-detection performance (event-level F1 = 0.8-0.9) and low mean latency (6 ms).


[45] 2605.12552

DQN-Driven Adaptive Neighbor Discovery for Directional Aerial Networks

Directional antenna systems are gaining substantial traction for aerial networks due to their higher gain, extended transmission range, and enhanced security. However, the requirement of beam alignment makes the task of finding and reaching neighbors challenging, particularly in a mobile setting. For wireless networks, privacy concerns play an equally critical role. However, the problem of ensuring network-wide connectivity while maintaining limited exposure when probing around is still unexplored. We address this trade-off by proposing an adaptive transceiver selection protocol based on the Deep Q-Network (DQN) framework. Each node acts as an independent DQN agent and interacts with the environment to learn how to balance the trade-off. Since the directional nodes operate only based on local observations, we adopt a weighted mechanism that guides them in prioritizing either high reachability or privacy by adaptively tuning the probing patterns. Results show that DQN framework surpasses the Random and Q-Learning baselines. Weights favoring discovery provide higher probing efficiency and reachability, while weights prioritizing privacy ensure limited exposure at the cost of low reachability, eventually attaining higher objective value.


[46] 2605.12771

Adaptive Smooth Tchebycheff Attention for Multi-Objective Policy Optimization

Multi-objective reinforcement learning in robotic domains requires balancing complex, non-convex trade-offs between conflicting objectives. While linear scalarization methods provide stability, they are theoretically incapable of recovering solutions within non-convex regions of the Pareto front. Conversely, static non-linear scalarizations (e.g., Tchebycheff) can theoretically access these regions but often suffer from severe gradient variance and optimization instability in deep RL. In this work, we propose an Adaptive Smooth Tchebycheff framework that resolves this tension by dynamically modulating the curvature of the optimization landscape. We introduce a novel conflict-driven controller that regulates the optimization smoothness based on real-time gradient interference. This allows the agent to anneal toward precise, non-convex scalarization when objectives align, while elastically reverting to stable, smooth approximations when destructive gradient conflicts emerge. We validate our approach on a challenging robotic stealth visual search task -- a proxy for monitoring of protected/fragile ecosystems -- where an agent must balance search, exposure/interference minimization and exploration speed. Extensive ablations confirm that our conflict-aware adaptation enables the robust discovery of Pareto-optimal policies in non-convex regions inaccessible to linear baselines and unstable for static non-linear methods. Website: this https URL


[47] 2605.12785

Identifying the nonlinear string dynamics with port-Hamiltonian neural networks

Hybrid machine learning combines physical knowledge with data-driven models to enhance interpretability and performance. In this context, Port-Hamiltonian Systems (PHS), which generalize Hamiltonian mechanics to describe open, non-autonomous dynamical systems, have been successfully integrated with neural networks under the name Port-Hamiltonian Neural Networks (PHNNs). While the ability of PHNNs to identify Hamiltonian ordinary differential equation (ODE) systems has already been demonstrated, their application to learning Hamiltonian partial differential equation (PDE) systems remains largely unexplored. This limitation restricts their use in musical acoustics, where instruments are typically modeled as distributed parameter systems governed by PDEs. In this work, we demonstrate how to learn the nonlinear string dynamics from data in a physically-consistent framework through a PHNN extension to PDEs. By constructing structured neural network architectures based on PHS, we can recover both the Hamiltonian governing the string and the dissipation affecting it. This approach outperforms baseline, non-physics-informed methods in terms of both accuracy and interpretability. Numerical experiments using synthetic data demonstrate the ability of the proposed PHNN model to identify and emulate the nonlinear dynamics of the system.


[48] 2605.12794

Dynamic Transaction Scheduling and Pricing in the Ethereum Mempool

The Ethereum blockchain utilizes the EIP-1559 algorithm to manage transaction inclusion and block assembly. However, EIP-1559 and much of the existing literature study this problem from a static perspective, focusing on price evolution without modelling transaction dynamics within the mempool. Motivated by this limitation, we study a dynamic transaction scheduling problem in which transactions with heterogeneous sizes and per-unit values arrive over time and remain in the mempool until scheduled. To capture the stochastic mempool evolution, we formulate the problem as a Markov Decision Process (MDP) whose state represents the mempool configuration and whose actions correspond to block prices. We first provide a primal-dual interpretation of the static EIP-1559 mechanism, showing that block prices arise naturally as dual variables of a social-welfare maximization problem. Building on this perspective, we extend the framework to the dynamic setting and formulate an objective that maximizes long-run discounted reward while incorporating holding costs and overshoot penalties. We then employ a Natural Policy Gradient (NPG) algorithm to compute the optimal policy. Our results show that dynamic pricing stabilizes the mempool while maximizing long-run discounted reward. In particular, as the overshoot penalty increases, the average scheduled transaction volume converges to the target block capacity, and the resulting NPG updates closely resemble the EIP-1559 price update rule. Finally, we study two special cases of the MDP formulation: homogeneous transactions and uniform arrivals. In the homogeneous setting, where the protocol directly controls scheduled volume, we show that the optimal policy has a threshold structure. We then propose a bang-bang pricing mechanism for uniform arrivals and derive a lower bound on the block capacity needed to ensure system stability.


[49] 2605.12829

Optimal excitation and measurement patterns for networks with tree topology

In this work we evaluate the excitation and measurement patterns (EMP) for networks with tree topology. We investigate guidelines for the selection of the minimal EMPs, i.e. those with the least number of excited and measured nodes combined, for which the accuracy obtained, in terms of the trace of the asymptotic covariance matrix, is optimal. We introduce the concept of partial information matrix as a means to systematically obtain the information matrix for any dynamic network. For a specific tree class, called cross, we show that the accuracy of a particular module depends on the magnitude of the parameters to be estimated. Furthermore, when all factors are equal, it is best to excite. %we show that for small magnitudes of this parameter, it is best to excite. We extend a topological condition for branches under which the accuracy of a particular module of the network is independent of the other parameters from the tree. We provide a numerical analysis showing that our guidelines could be used as a selection tool for minimal EMPs for tree networks.


[50] 2605.12974

Distributionally Robust Safety Under Arbitrary Uncertainties: A Safety Filtering Approach

In this work, we study how to ensure probabilistic safety for nonlinear systems under distributional ambiguity. Our approach builds on a backup-based safety filtering framework that switches between a high-performance nominal policy and a certified backup policy to ensure safety. To handle arbitrary uncertainties from ambiguous distributions, i.e., where the distribution is not of specific structure and the true distribution is unknown, we adopt a distributionally robust (DR) formulation using Wasserstein ambiguity sets. Rather than solving a high-dimensional DR trajectory optimization problem online, we exploit the structure of backup-based safety filtering to reduce safety certification to a one-dimensional search over the switching time between nominal and backup policies. We then develop a sampling-based certification procedure with finite-sample guarantees, where empirical failure probabilities are compared against a Wasserstein-inflated threshold. We validate our method through simulations across three systems, from a Dubins vehicle to a high-speed racing car and a fighter jet, demonstrating the broad applicability and computational efficiency.


[51] 2605.13010

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

We study image inpainting with generative diffusion models. Existing methods typically either train dedicated task-specific models, or adapt a pretrained diffusion model separately for each masked image at deployment. We introduce a middle-ground model, termed Amortized Inpainting with Diffusion (AID), which keeps a pretrained diffusion backbone fixed, trains a small reusable guidance module offline, and then reuses it across masked images without per-instance optimization. We formulate it as a deterministic guidance problem with a supervised terminal objective. To make this problem learnable in high dimensions, we derive an auxiliary Gaussian formulation and prove that solving this randomized problem recovers the optimal deterministic guidance field. This bridge yields a principled continuous-time actor--critic algorithm for learning the guidance module in a fully data-driven manner. Empirically, on AFHQv2 and FFHQ under the pixel EDM pipeline and on ImageNet under the latent EDM2 pipeline, AID consistently improves the quality--speed trade-off over strong fixed-backbone and amortized inpainting baselines across multiple mask types, while adding less than one percent trainable overhead.


[52] 2605.13028

Local Conformal Calibration of Dynamics Uncertainty from Semantic Images

We introduce Observation-aware Conformal Uncertainty Local-Calibration (OCULAR), a conformal prediction-based algorithm that uses perception information to provide uncertainty quantification guarantees for unseen test-time environments. While previous conformal approaches lack the ability to discriminate between state-action space regions leading to higher or lower model mismatch, and require environment-specific data, our method uses data collected from visually similar environments to provably calibrate a given linear Gaussian dynamics model of arbitrary fidelity. The prediction regions generated from OCULAR are guaranteed to contain the future system states with, at least, a user-set likelihood, despite both aleatoric and epistemic uncertainty -- i.e., uncertainty arising from both stochastic disturbances and lack of data. Our guarantees are non-asymptotic and distribution-free, not requiring strong assumptions about the unknown real system dynamics. Our calibration procedure enables distinguishing between observation-velocity-action inputs leading to higher and lower next-state-uncertainty, which is helpful for probabilistically-safe planning. We numerically validate our algorithm on a double-integrator system subject to random perturbations and significant model mismatch, using both a simplified sensor and a more realistic simulated camera. Our approach appropriately quantifies uncertainty both when in-distribution and out-of-distribution, being comparatively volume-efficient to baselines requiring environment-specific data.


[53] 2605.13103

Guaranteed cost structured control in infinite-horizon linear-quadratic cooperative differential games

In this paper, we consider infinite-horizon linear-quadratic cooperative differential games with output feedback information structure. We first demonstrate that, under output feedback information structure, computing Pareto optimal controls can be difficult even for simple low-dimensional differential games. To address this issue, this paper introduces the concept of feedback guaranteed cost structured control (GCSC). The feedback GCSC concept is inspired from suboptimal control. At a feedback GCSC, the total weighted team cost remains below a prescribed threshold while satisfying the structural constraints. We derive fundamental properties of the feedback GCSC and the admissible weight set, including their monotonicity properties. In particular, we show that if Pareto optimal controls exist, they belong to the class of feedback GCSCs. We also quantify the suboptimalty of Pareto optimal controls (if they exist) and the proposed GCSC with respect to output feedback optimal control. Furthermore, we provide the conditions for verification and the synthesis of a feedback GCSC. Finally, we illustrate the effectiveness of the proposed approach through numerical examples, including a case study on tracking synchronization in a microgrid.


[54] 2605.13133

KAST-BAR: Knowledge-Anchored Semantically-Dynamic Topology Brain Autoregressive Modeling for Universal Neural Interpretation

While EEG foundation models have shown significant potential in universal neural decoding across tasks, their advancement remains constrained by the inadequacy modeling of complex spatiotemporal topology, as well as the inherent modality gap between low-level physiological signals and high-level textual semantics. To address these challenges, we propose a Knowledge-Anchored Semantically-Dynamic Topology Brain Autoregressive Model (KAST-BAR), which dynamically aligns physiological representations derived from multi-level brain topology with an expert-level semantic space. Specifically, we design a Dual-Stream Hierarchical Attention (DSHA) encoder that accurately captures the brain's intrinsic non-Euclidean topology by modeling local temporal dynamics with global spatial contexts. On this basis, a Knowledge-Anchored Semantic Profiler (KASP) is proposed to synthesize physically-grounded and instance-level textual profiles, which subsequently drive a Semantic Text-Aware Refiner (STAR) to dynamically reconstruct EEG representations using Latent Expert Queries. By conducting large-scale pre-training on 21 diverse datasets to build a foundation model, KAST-BAR effectively integrates expert-level medical knowledge into EEG signal representations, consistently achieving superior performance across six downstream tasks. Our code is available at this https URL


[55] 2605.13302

Safe Bayesian Optimization for Uncertain Correlations Matrices in Linear Models of Co-Regionalization

This paper extends safety guarantees for multi-task Bayesian optimization with uncertain correlation matrices from intrinsic co-reginalization models to linear models of co-reginalization. The latter allows for more flexible modeling of the inter-task correlations by composing multiple features. We derive uniform error bounds for vector-valued functions sampled from a Gaussian process with a linear model of co-reginalization kernel. Furthermore, we show the potential improvement of performance using linear models of co-reginalization in a numerical comparison on a safe multi-task Bayesian optimization benchmark.


[56] 2605.13315

Embodied Neurocomputation: A Framework for Interfacing Biological Neural Cultures with Scaled Task-Driven Validation

Biological neural networks (BNNs) have been established as a powerful and adaptive substrate that offer the potential for incredibly energy and data efficient information processing with distinct learning mechanisms. Yet a core challenge to utilizing BNN for neurocomputation is determining the optimal encoding and decoding mechanisms between the traditional silicon computing interface and the living biology. Here, we propose an Embodied Neurocomputation framework as a systems-level approach to this multi-variable optimization encoding/decoding problem. We operationalize this approach through the first large-scale parameter optimization of encoding configurations for a BNN agent performing closed-loop navigation along an odor-style gradient in a simulated grid-world. Despite the relative simplicity of the task, the biological interactions gave rise to a massive multi-combinatorial search space for optimal parameters. By considering how the components of the system are interconnected and parameterized, we evaluated approximately 1,300 parameter combinations, over 4,000 hours of real-time agent-environment interactions, to identify 12 configurations that consistently demonstrated learning across multiple episodes. These configurations achieved significantly higher task performances than optimized silicon-based DQN agents under the same interaction budget. These findings represent an initial step toward robust and scalable goal-oriented learning using BNNs. Our framework establishes a foundation for applying task-driven neurocomputing and supports the development of field-wide benchmarks. In the long term, this work supports the development of hybrid bio-silicon architectures capable of efficient, adaptive and real-time computation, including the potential for robotic control applications.


[57] 2605.13587

Reframing preprocessing selection as model-internal calibration in near-infrared spectroscopy: A large-scale benchmark of operator-adaptive PLS and Ridge models

Near-infrared spectroscopy (NIRS) is rapid and non-destructive, but reliable calibration still depends heavily on spectral preprocessing. In routine practice, preprocessing is often selected by large external pipeline searches that are costly, unstable on small calibration sets, and difficult to audit. We introduce operator-adaptive calibration, a framework that moves linear preprocessing selection inside the calibration model. Candidate treatments are encoded as linear spectral operators, while nonlinear or sample-adaptive corrections such as SNV, MSC, and ASLS are handled as fold-local branches to prevent leakage. We instantiate the framework for PLS and Ridge regression. For PLS, covariance identities enable fast NIPALS and SIMPLS variants while preserving original-wavelength coefficients. For Ridge, operator-adaptive kernels yield a dual formulation with recoverable original-space coefficients. The approach was evaluated on more than 50 heterogeneous NIRS datasets against conventional PLS, Ridge, CatBoost, and CNN baselines under documented search budgets. Compact operator-adaptive PLS with ASLS branch preprocessing achieved a median RMSEP/PLS ratio of 0.960 with 42 wins on 57 datasets, while a deployable AOM-Ridge selector improved over tuned Ridge by a median 2.22% with 35 wins on 52 datasets. The proposed models reduce dependence on large preprocessing-HPO campaigns, produce traceable operator choices, retain interpretable coefficients, and fit in seconds for compact AOM-PLS. Operator-adaptive calibration therefore offers a practical route to faster, more robust, and more auditable NIRS method development.


[58] 2605.13713

Learning to Optimize Radiotherapy Plans via Fluence Maps Diffusion Model Generation and LSTM-based Optimization

Volumetric Modulated Arc Therapy (VMAT) is a cornerstone of modern radiation therapy, enabling highly conformal tumor irradiation and healthy-tissue sparing. Yet, its planning solves inverse and nested optimization for multi-leaf collimators, monitor units and dose parameters, while enforcing their consistency to ensure mechanical deliverability. Nevertheless, this process often requires repeated re-optimization when treatment configurations change, resulting in substantial planning time per patient. To address these problems, we present a diffusion-driven Learning-to-Optimize (L2O) method for end-to-end VMAT planning. A distribution-matching distilled diffusion model learns a clinically feasible manifold of fluence maps, enabling their one-shot generation. On top of this, an LSTM-based L2O module learns gradient update dynamics to swiftly refine fluence maps toward prescribed dose objectives during inference. Experimental results on clinical and public prostate cancer cohorts demonstrate improved planning efficiency, flexibility, and machine deliverability over currently available end-to-end VMAT planners.


[59] 2605.13748

TinySDP: Real Time Semidefinite Optimization for Certifiable and Agile Edge Robotics

Semidefinite programming (SDP) provides a principled framework for convex relaxations of nonconvex geometric constraints in motion planning, yet existing solvers are too computationally expensive for real-time control, particularly on resource-constrained embedded systems. To address this gap, we introduce TinySDP, the first semidefinite programming solver designed for embedded systems, enabling real-time model-predictive control (MPC) on microcontrollers for problems with nonconvex obstacle constraints. Our approach integrates positive-semidefinite cone projections into a cached-Riccati-based ADMM solver, leveraging computational structure for embedded tractability. We pair this solver with an a posteriori rank-1 certificate that converts relaxed solutions into explicit geometric guarantees at each timestep. On challenging benchmarks, e.g., cul-de-sac and dynamic obstacle avoidance scenarios that induce failures in local methods, TinySDP achieves collision-free navigation with up to 73% shorter paths than state-of-the-art baselines. We validate our approach on a Crazyflie quadrotor, demonstrating that semidefinite constraints can be enforced at real-time rates for agile embedded robotics.


[60] 2605.13751

Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles

Establishing trustworthy safety assurance for autonomous driving systems (ADSs) requires evidence that failures arise from avoidable system deficiencies rather than unavoidable traffic conflicts. Current adversarial simulation methods can efficiently expose collisions, but generally lack mechanisms to distinguish these fundamentally different failure modes. Here we present CARS (Context-Aware, Responsibility-attributed Scenario generation), a framework that integrates responsibility attribution directly into adversarial scenario generation. CARS combines context-aware adversary selection with a generative adversarial policy optimized in closed-loop simulation to construct collision scenarios that are both physically feasible and diagnostically attributable. Across benchmark datasets spanning heterogeneous national traffic environments, CARS consistently discovers feasible collision scenarios with high attribution rates under multiple regulation-prescribed careful and competent driver models. By coupling adversarial generation with normative responsibility assessment, CARS moves simulation testing beyond collision discovery toward the construction of interpretable, regulation-aligned safety evidence for scalable ADS validation.


[61] 2605.13822

Loiter UAV Reinsertion Guidance for Fixed-wing UAV Corridors

This paper considers fixed-wing unmanned aerial vehicle (UAV) corridors comprising a main lane, a circular loiter lane for managing traffic congestion, and transit lanes connecting the two. In particular, we address the problem of conflict-free reinsertion of UAVs from the loiter lane back into the main lane. The loiter lane contains a fixed number of equidistant virtual slots that UAVs can occupy. Reinsertion of loiter UAVs into the main lane becomes essential either due to reduced traffic in the main lane or due to a loiter UAV needing to reach its destination urgently. Given the total number of loiter slots, UAV speed limits, and the minimum safety distance, a guidance algorithm is developed to compute the required speed of a loiter UAV in the transit lane to ensure safe reinsertion. The proposed guidance and automation strategies are validated through numerical simulations.


[62] 2502.09592

A Data-Driven Method for Microgrid System Identification: Physically Consistent Sparse Identification of Nonlinear Dynamics

Microgrids (MGs) play a crucial role in utilizing distributed energy resources (DERs) like solar and wind power, enhancing the sustainability and flexibility of modern power systems. However, the inherent variability in MG topology, power flow, and DER operating modes poses significant challenges to the accurate system identification of MGs, which is crucial for designing robust control strategies and ensuring MG stability. This paper proposes a Physically Consistent Sparse Identification of Nonlinear Dynamics (PC-SINDy) method for accurate MG system identification. By leveraging an analytically derived library of candidate functions, PC-SINDy extracts accurate dynamic models using only phasor measurement unit (PMU) data. Simulations on a 4-bus system demonstrate that PC-SINDy can reliably and accurately predict frequency trajectories under large disturbances, including scenarios not encountered during the identification/training phase, even when using noisy, low-sampled PMU data.


[63] 2506.22117

Safe Multi-Agent Navigation via Constrained HJB-Informed Learning

Multi-agent navigation in unknown and cluttered environments has broad applications, yet remains fundamentally challenging. In particular, dense agent-agent and agent-obstacle reactive interactions can exacerbate the inherent competition between collision-avoidance constraints and goal-reaching objectives. Most existing approaches mitigate this by applying per-step safety filtering on top of a predefined goal-reaching controller or by designing heuristic loss functions that penalizes safety constraints violation gradient. While effective in sparse environments, these methods still suffer from overly-conservative behaviors when interactions become dense. To overcome these limitations, we propose HJB-GNN, a Hamilton-Jacobi-Bellman (HJB)-based learning framework that jointly learns a graph neural network (GNN)-parameterized control barrier function for explicit safety enforcement, a distributed GNN-based navigation policy, and a value function that induces goal-reaching behavior. By exploiting the analytical solution of the constrained HJB equation, the proposed method derives graph-dependent Lagrange multipliers that adaptively balance collision-avoidance and goal-reaching across diverse multi-agent navigation scenarios. Moreover, HJB-GNN supports centralized training with distributed deployment. Extensive simulations and real-world experiments with Crazyflie drone swarms demonstrate its superior safety and goal-reaching performance, as well as strong scalability and generalizability to large-scale teams operating in previously unseen, dense environments.


[64] 2507.02385

Parameter estimation of range-migrating targets using OTFS signals from LEO satellites

This study investigates a communication-centric integrated sensing and communication system that utilizes orthogonal time-frequency space (OTFS) modulated signals emitted by low Earth orbit satellites to estimate the parameters of space targets experiencing range migration, hereinafter referred to as high-speed targets. Leveraging the signal samples produced by off-the-shelf OTFS demodulators, we derive a novel input-output model for the echo generated by a high-speed target when ideal and rectangular shaping filters are employed. Our findings reveal that the target response exhibits a sparse structure in the delay-Doppler domain, whose support is determined by the target initial-range and range-rate. Range migration induces a structured spread of this response, which is explicitly characterized in the paper and differs from that in previous models. We propose an approximate implementation of the maximum likelihood estimator for the target initial-range, range-rate, and amplitude. The estimation process first obtains coarse information on the target response using a block orthogonal matching pursuit algorithm, followed by a refinement step based on a bank of matched filters focused on a smaller initial-range/range-rate region. The proposed single-target procedure is extended to multiple targets via iterative estimation, reconstruction, and cancellation of dominant echoes. Finally, numerical examples are provided to evaluate the estimation performance.


[65] 2508.11351

Important Bit Prefix M-ary Quadrature Amplitude Modulation for Semantic Communications

M-ary Quadrature Amplitude Modulation (MQAM) is a commonly used channel modulation technology in wireless communication systems. To achieve dedicated channel modulation for semantic communication (SemCom), we propose an Important-Bit-Prefixed MQAM (IBP-MQAM) scheme and derive its approximate expression of important symbol error rate (ISER) and unimportant symbol error rate (USER). By extracting and quantifying text semantics using Latent Dirichlet Allocation (LDA), we verify that IBP-MQAM achieves improved performance over MQAM in SemCom scenarios and further analyze the effects of key system parameters.


[66] 2508.14950

Potential and challenges of generative adversarial networks for super-resolution in 4D Flow MRI

4D Flow Magnetic Resonance Imaging (4D Flow MRI) enables non-invasive quantification of blood flow and hemodynamic parameters. However, its clinical application is limited by low spatial resolution and noise, particularly affecting near-wall velocity measurements. Machine learning-based super-resolution has shown promise in addressing these limitations, but challenges remain, not least in recovering near-wall velocities. Generative adversarial networks (GANs) offer a compelling solution, having demonstrated strong capabilities in restoring sharp boundaries in non-medical super-resolution tasks. Yet, their application in 4D Flow MRI remains unexplored, with implementation challenged by known issues such as training instability and non-convergence. In this study, we investigate GAN-based super-resolution in 4D Flow MRI. Training and validation were conducted using patient-specific cerebrovascular in-silico models, converted into synthetic images via an MR-true reconstruction pipeline. A dedicated GAN architecture was implemented and evaluated across three adversarial loss functions: Vanilla, Relativistic, and Wasserstein. Our results demonstrate that the proposed GAN improved near-wall velocity recovery compared to a non-adversarial reference (vNRMSE: 6.9% vs. 9.6%); however, that implementation specifics are critical for stable network training. While Vanilla and Relativistic GANs proved unstable compared to generator-only training (vNRMSE: 8.1% and 7.8% vs. 7.2%), a Wasserstein GAN demonstrated optimal stability and incremental improvement (vNRMSE: 6.9% vs. 7.2%). The Wasserstein GAN further outperformed the generator-only baseline at low SNR (vNRMSE: 8.7% vs. 10.7%). These findings highlight the potential of GAN-based super-resolution in enhancing 4D Flow MRI, particularly in challenging cerebrovascular regions, while emphasizing the need for careful selection of adversarial strategies.


[67] 2508.20474

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder

This paper presents a unified multi-speaker encoder (UME), a novel architecture that jointly learns representations for speaker diarization (SD), speech separation (SS), and multi-speaker automatic speech recognition (ASR) tasks using a shared speech foundational encoder. We leverage the hidden representations from multiple layers of UME as a residual weighted-sum encoding (RWSE) to effectively use information from different semantic levels, contributing to bottom-up alignment between tasks. This joint training approach captures the inherent interdependencies among the tasks, enhancing overall performance on overlapping speech data. Our evaluations demonstrate that UME substantially improves over the single-task baselines dedicated to SD, SS, and multi-speaker ASR on LibriMix evaluation sets. Notably, for SD, UME outperforms the previous studies, achieving diarization error rates of 1.37% and 2.29% on Libri2Mix and Libri3Mix evaluation sets, respectively.


[68] 2510.14244

Reinforcement Learning for Unsupervised Domain Adaptation in Spatio-Temporal Echocardiography Segmentation

Domain adaptation methods aim to bridge the gap between datasets by enabling knowledge transfer across domains, reducing the need for additional expert annotations. However, many approaches struggle with reliability in the target domain, an issue particularly critical in medical image segmentation, where accuracy and anatomical validity are essential. This challenge is further exacerbated in spatio-temporal data, where the lack of temporal consistency can significantly degrade segmentation quality, and particularly in echocardiography, where the presence of artifacts and noise can further hinder segmentation performance. To address these issues, we present RL4Seg3D, an unsupervised domain adaptation framework for 2D + time echocardiography segmentation. RL4Seg3D integrates novel reward functions and a fusion scheme to enhance key landmark precision in its segmentations while processing full-sized input videos. By leveraging reinforcement learning for image segmentation, our approach improves accuracy, anatomical validity, and temporal consistency while also providing, as a beneficial side effect, a robust uncertainty estimator, which can be used at test time to further enhance segmentation performance. We demonstrate the effectiveness of our framework on over 30,000 echocardiographic videos, showing that it outperforms standard domain adaptation techniques without the need for any labels on the target domain. Code is available at this https URL.


[69] 2510.20152

Soft Switching Expert Policies for Controlling Systems with Uncertain Parameters

This paper proposes a simulation-based reinforcement learning algorithm for controlling systems with uncertain and varying system parameters. While simulators are useful for safely learning control policies, the reality gap remains a major challenge. To alleviate this challenge, we propose a two-stage algorithm. First, multiple control policies are learned for systems with different system parameters in a simulator. Second, for a real system, the control policies are adaptively switched using an online convex optimization algorithm based on observations. This approach is expected to reduce learning complexity compared with existing approaches that rely on a single policy to address the reality gap.


[70] 2511.06257

Fast Reconstruction of Motion-Corrupted Data with Mobile-GRAPPA: Motion and dB0 Inhomogeneity Correction Leveraging Efficient GRAPPA

Advanced motion navigations now enable rapid tracking of subject motion and dB0-induced phase, but accurately incorporating this high-temporal-resolution information into SENSE (Aligned-SENSE) is often computationally prohibitive. We propose "Mobile-GRAPPA", a k-space "cleaning" approach that uses local GRAPPA operators to remove motion and dB0 related corruption so that the resulting data can be reconstructed with standard SENSE. We efficiently train a family of k-space-position-specific Mobile-GRAPPA kernels via a lightweight multilayer perceptron (MLP) and apply them across k-space to generate clean data. In experiments on highly motion-corrupted 1-mm whole-brain GRE (Tacq = 10 min; 1,620 motion/dB0 trackings) and EPTI (Tacq = 2 min; 544 trackings), Mobile-GRAPPA enabled accurate reconstruction with negligible time penalty, whereas full Aligned-SENSE was impractical (reconstruction times > 10 h for GRE and > 10 days for EPTI). These results show that Mobile-GRAPPA incorporates detailed motion and dB0 tracking into SENSE with minimal computational overhead, enabling fast, high-quality reconstructions of challenging data.


[71] 2511.06874

Radio-Coverage-Aware Path Planning for Cooperative Autonomous Vehicles

Fleets of autonomous vehicles (AV) often are at the core of intelligent transportation scenarios for smart cities, and may require a wireless Internet connection to offload computer vision tasks to data centers located either in the edge or the cloud section of the network. Cooperation among AVs is successful when the environment is unknown, or changes dynamically, so as to improve coverage and trip time, and minimize the traveled distance. The AVs, while mapping the environment with range-based sensors, move across the wireless coverage areas, with consequences on the experienced access bit rate, latency, and handover rate. In this paper, we propose to modify the cost of common path planning algorithms such as Dijkstra and A*, so that the best path solution takes into account not only the traveled distance, but also the radio coverage experience. To this aim, several radio-related cost-weighting functions are introduced and tested, to assess the performance of the proposed approaches with extensive simulations. The proposed mapping algorithm can achieve a mapping error probability below $2\%$, while the proposed path-planning algorithms extend the radio coverage of the AVs, with only a limited increase in traveled distance with respect to shortest-path existing methods, such as conventional Dijkstra and A* algorithms.


[72] 2511.08499

Approaching Safety-Argumentation-by-Design: A Requirement-based Safety Argumentation Life Cycle for Automated Vehicles

Despite the growing number of automated vehicles on public roads, operating such systems in open contexts inevitably involves incidents. Developing a defensible case that the residual risk is reduced to a reasonable (societally acceptable) level is hence a prerequisite to be prepared for potential liability cases. A "safety argumentation" is a common means to represent this case. In this paper, we contribute to the state of the art in terms of process guidance on argumentation creation and maintenance - aiming to promote a safety-argumentation-by-design paradigm, which mandates co-developing both the system and argumentation from the earliest stages. Initially, we extend a systematic design model for automated driving functions with an argumentation layer to address prevailing misconceptions regarding the development of safety arguments in a process context. Identified limitations of this extension motivate our complementary design of a dedicated argumentation life cycle that serves as an additional process viewpoint. Correspondingly, we define literature- and expert-based process requirements. To illustrate the safety argumentation life cycle that we propose as a result of implementing these consolidated requirements, we demonstrate principles of the introduced process phases (baselining, evolution, continuous maintenance) by an argumentation example on an operational design domain exit response.


[73] 2511.19383

A Hybrid Learning-to-Optimize Framework for Mixed-Integer Quadratic Programming

In this paper, we propose a learning-to-optimize (L2O) framework to accelerate solving parametric mixed-integer quadratic programming (MIQP) problems, with a particular focus on mixed-integer model predictive control (MI-MPC) applications. The framework learns to predict integer solutions with enhanced optimality and feasibility by integrating supervised learning (for optimality), self-supervised learning (for feasibility), and a differentiable quadratic programming (QP) layer, resulting in a hybrid L2O framework. Specifically, a neural network (NN) is used to learn the mapping from problem parameters to optimal integer solutions, while a differentiable QP layer is integrated to compute the corresponding continuous variables given the predicted integers and problem parameters. Moreover, a hybrid loss function is proposed, which combines a supervised loss with respect to the global optimal solution, and a self-supervised loss derived from the problem's objective and constraints. The effectiveness of the proposed framework is demonstrated on two benchmark MI-MPC problems, with comparative results against purely supervised and self-supervised learning models.


[74] 2511.20383

Accelerating Time-Optimal Trajectory Planning for Connected and Automated Vehicles with Graph Neural Networks

In this paper, we present a learning-based framework that accelerates time- and energy-optimal trajectory planning for connected and automated vehicles (CAVs) using graph neural networks (GNNs). We formulate the multi-agent coordination problem encountered in traffic scenarios as a cooperative trajectory planning problem that minimizes travel time, subject to motion primitives derived from energy-optimal solutions. The performance of this framework can be further improved through replanning at each time step, enabling the system to incorporate newly observed information. To achieve real-time execution, we employ a graph isomorphism network with edge features (GINEConv) to learn the solutions of the time-optimal trajectory planning problem from offline-generated data. The trained model produces online predictions that serve as warm-starts for numerical optimization, thereby enabling rapid computation of minimal exit times and the associated feasible trajectories. This learning-to-warm-start approach substantially reduces computation time while preserving the control performance of the time- and energy-optimal trajectory planning framework.


[75] 2601.12695

From Noise to Knowledge: System Identification with Systematic Polytope Construction via Cyclic Reformulation

Model-based robust control requires not only accurate nominal models but also systematic uncertainty representations to guarantee stability and performance. However, constructing polytopic uncertainty models typically demands multiple experiments or a priori structural this http URL paper proposes an identification framework based on intentional periodicity induction, in which cyclic reformulation with period $N$ is applied to a linear time-invariant system to interpret noise-induced parameter fluctuations as a structured manifestation of estimation uncertainty. The $N$ parameter sets obtained from a single identification experiment -- which would coincide in the noise-free case -- are used as polytope vertices, providing systematic control over the granularity of the uncertainty description through the choice of $N$. The practical utility of the constructed polytope is demonstrated through robust $H_\infty$ state-feedback synthesis via LMI optimization at the polytope vertices; the synthesis uses only noisy identification data and is shown across Monte Carlo trials to stabilize the true plant with only marginal conservatism. Complementarily, a diagnostic assessment based on the best in-polytope point confirms that the polytope captures meaningful uncertainty information. For a third-order system under Gaussian and uniform noise, a comparison with bootstrap-inspired resampling baselines indicates that cyclic reformulation provides a competitive or favorable trade-off by utilizing the full data record; the construction is further validated on a fourth-order MIMO system.


[76] 2601.22792

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

We present CALM, a joint Contextual Acoustic-Linguistic Modeling framework for multi-speaker automatic speech recognition (ASR). In personalized AI scenarios, the joint availability of acoustic and linguistic cues naturally motivates the integration of target-speaker conditioning with contextual biasing in overlapping conversations. CALM implements this integration in an end-to-end framework through speaker embedding-driven target-speaker extraction and dynamic vocabulary-based contextual biasing. We evaluate CALM on simulated English (LibriSpeechMix) and Japanese (Corpus of Spontaneous Japanese mixtures, CSJMix). On two-speaker mixtures, CALM reduces biased word error rate (B-WER) from 12.7 to 4.7 on LibriSpeech2Mix and biased character error rate (B-CER) from 16.6 to 8.4 on CSJMix2 (eval3), demonstrating the effectiveness of joint acoustic-linguistic modeling across languages. We additionally report results on the AMI corpus (IHM-mix condition) to validate performance on standardized speech mixtures.


[77] 2602.07029

Guidestar-Free Adaptive Optics with Asymmetric Apertures

This work introduces the first closed-loop adaptive optics (AO) system capable of optically correcting aberrations in real-time without a guidestar or a wavefront sensor. Nearly 40 years ago, Cederquist et al. demonstrated that asymmetric apertures enable phase retrieval (PR) algorithms to perform fully computational wavefront sensing, albeit at a high computational cost. More recently, Chimitt et al. extended this approach with machine learning and demonstrated real-time wavefront sensing using only a single (guidestar-based) point-spread-function (PSF) measurement. Inspired by these works, we introduce a guidestar-free AO framework built around asymmetric apertures and machine learning. Our approach combines three key elements: (1) an asymmetric aperture placed at the system's pupil plane that enables PR-based wavefront sensing, (2) a pair of machine learning algorithms that estimate the PSF from natural scene measurements and reconstruct phase aberrations, and (3) a spatial light modulator that performs optical correction. We experimentally validate this framework on dense natural scenes imaged through unknown obscurants. Our method outperforms state-of-the-art guidestar-free wavefront shaping methods, using an order of magnitude fewer measurements and three orders of magnitude less computation.


[78] 2602.16253

How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?

Anomalous sound detection (ASD) benchmarks typically assume that the identity of the monitored machine is known at test time and that recordings are evaluated in a machine-wise manner. However, in realistic monitoring scenarios with multiple known machines operating concurrently, test recordings may not be reliably attributable to a specific machine, and requiring machine identity imposes deployment constraints such as dedicated sensors per machine. To reveal performance degradations and method-specific differences in robustness that are hidden under standard machine-wise evaluation, we consider a minimal modification of the ASD evaluation protocol in which test recordings from multiple machines are merged and evaluated jointly without access to machine identity at inference time. Training data and evaluation metrics remain unchanged, and machine identity labels are used only for post hoc evaluation. Experiments with representative ASD methods show that relaxing this assumption reveals performance degradations and method-specific differences in robustness that are hidden under standard machine-wise evaluation, and that these degradations are strongly related to implicit machine identification accuracy.


[79] 2603.02245

LMU-Based Sequential Learning and Posterior Ensemble Fusion for Cross-Domain Infant Cry Classification

Decoding infant cry causes remains challenging for healthcare monitoring due to short nonstationary signals, limited annotations, and strong domain shifts across infants and datasets. We propose a compact acoustic framework that fuses mel-frequency cepstral coefficients (MFCCs), short-time Fourier transform (STFT) features, and fundamental-frequency (F0) contours within a multi-branch convolutional neural network (CNN) encoder, and models temporal dynamics using an enhanced Legendre Memory Unit (LMU). Compared to LSTMs, the LMU backbone provides stable sequence modeling with substantially fewer recurrent parameters, supporting efficient deployment. To improve cross-dataset generalization, we introduce calibrated posterior ensemble fusion with entropy-gated weighting to preserve domain-specific expertise while mitigating dataset bias. Experiments on Baby2020 and Baby Crying demonstrate improved macro-F1 under cross-domain evaluation, along with leakage aware splits and real-time feasibility for on-device monitoring.


[80] 2603.09708

Adapting a Text-to-Audio Model for Room Impulse Response Generation

Room Impulse Responses (RIRs) enable realistic acoustic simulation, with applications ranging from multimedia production to speech data augmentation. However, acquiring high-quality real-world RIRs is labor-intensive, and data scarcity remains a challenge for data-driven RIR generation approaches. In this paper, we propose a novel approach to RIR generation by adapting a pre-trained text-to-audio model, demonstrating for the first time that large-scale generative audio priors can be effectively leveraged for the task. To address the lack of text-RIR paired data, we utilize a labeling pipeline leveraging vision-language models to extract acoustic descriptions from existing image-RIR datasets. We introduce an in-context learning strategy to accommodate free-form user prompts during inference. Evaluations including subjective listening test demonstrate that our model generates plausible RIRs. Audio examples are available on our demo website.


[81] 2603.19910

Learning Adaptive Parameter Policies for Nonlinear Bayesian Filtering

For many nonlinear Bayesian state estimation problems, the posterior recursion is not analytically tractable, leading to algorithms that are influenced by numerical approximation errors. These algorithms depend on parameters that affect the approximation's accuracy and computational cost. The parameters include, for example, the number of particles, scaling parameters, and the number of iterations in iterative computations. Typically, these parameters are fixed or adjusted heuristically, although the approximation accuracy can change over time with the local degree of nonlinearity and uncertainty. The approximation errors introduced at a time step propagate through subsequent updates, affecting the accuracy, consistency, and robustness of future estimates. This paper presents adaptive parameter selection in nonlinear Bayesian filtering as a sequential decision-making problem, where parameters influence not only the immediate estimation outcome but also the future estimates. The decision-making problem is addressed using reinforcement learning to learn adaptive parameter policies for nonlinear Bayesian filters. Experiments with the unscented Kalman filter and stochastic integration filter demonstrate that the learned policies improve both estimate quality and consistency.


[82] 2604.13505

Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments

This work presents a cascaded hybrid control framework for quadrotor trajectory tracking under nonlinear dynamics and external disturbances. In quadrotor systems, the altitude and attitude channels exhibit fast, structured dynamics that are well suited to reliable regulation, whereas horizontal-position control is more strongly affected by coupling effects, uncertainty, and disturbances, so that neither pure feedback control nor purely learning-based control alone is equally well suited to all channels. Accordingly, the proposed framework augments conventional proportional-integral-derivative (PID) stabilization for altitude and attitude control with an enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) agent incorporating a multi-Q-network structure, thereby improving horizontal-position control under severe disturbances. To further strengthen disturbance rejection in altitude and attitude control, a hybrid disturbance observer (HDOB) using low-pass and exponential moving average filtering is embedded in the control loops. The proposed TD3 enhancements are verified through ablation studies, and both numerical simulations and real-world flight tests on the quadrotor platform demonstrate that the proposed method achieves more accurate and robust trajectory tracking under wind disturbances than baseline approaches.


[83] 2604.14842

Simplification Ad Absurdum? Revisiting Gas Flow Modeling for Integrated Energy System Planning

This paper analyzes the implications of simplified pipeline gas flow models for integrated energy system planning. A case study of an integrated power-hydrogen expansion planning problem shows that simplifying pressure-flow relationships and gas dynamics can lead to expansion plans that incur substantial regret when evaluated under a more realistic dynamic gas flow model -- due to suboptimal system expansion, operation, and non-supplied hydrogen. Numerical experiments show that planning under the highly simplified transport and transport-linepack models -- commonly used in expansion studies -- can result in regret exceeding several thousand percent and yield expansion plans that lack robustness across demand levels. Planning under steady-state conditions partially mitigates these effects, but still leaves significant cost-reduction potential untapped compared to dynamic planning due to neglected linepack flexibility. Developing efficient solution algorithms for the dynamic model is a promising direction for future research.


[84] 2604.18263

Passive RIS Is Not Silent: Revisiting Performance Limits Under Thermal Noise

Reconfigurable intelligent surfaces (RISs) have emerged as a promising solution for enabling energy-efficient and flexible spectrum usage in wireless communication, particularly in the context of sixth-generation (6G) networks. While passive RIS architectures are widely regarded as virtually noiseless due to the lack of active components, this idealized assumption can lead to misleading performance evaluations. In this paper, we revisit this assumption and demonstrate that the thermal noise generated by passive RIS elements, though often neglected, can significantly affect system performance. We propose a tractable approximated analytical framework that incorporates RIS-induced thermal noise into the system and derive closed-form expressions for key performance metrics, such as outage probability and throughput. Simulation results validate our approximated analysis and highlight the substantial performance discrepancies that arise when RIS thermal noise is ignored. Our results offer valuable insights into the trade-offs between receiver and RIS noise, guiding the development of robust and efficient 6G communication systems.


[85] 2604.24610

Matching-free Acquisition of Channels with Anisotropic Wavefronts

The escalating data rate demands of future wireless communications necessitate the deployment of extremely large aperture arrays (ELAAs) in communication systems. Acquiring accurate channel state information is crucial to execute effective precoding for such systems, in which the near-field curvature effects on the channel must be considered. Current channel estimation algorithms are generally restricted to the spherical wavefront channel (SWC), which is appropriate for isotropic scatterers, point sources, and planar reflecting surfaces. However, in practical scenarios involving curved reflecting surfaces, the reflected waves exhibit anisotropic rather than spherical wavefronts, significantly degrading the accuracy of conventional SWC-based algorithms. To tackle this challenge, we first derive a parameterized model for the anisotropic wavefront channel (AWC). Based on this model, we then propose the matching-free acquisition of channels with anisotropic wavefronts (MACAW) algorithm. Unlike conventional dictionary-based matching pursuit techniques, MACAW recovers channel parameters through fast-Fourier-transform-based frequency analysis. This approach enables precise channel estimation in AWC scenarios while maintaining a significantly lower computational complexity than existing methods. Simulation results illustrate how physical characteristics of the propagation environment influence the degree of wavefront anisotropy, and demonstrate the effectiveness of the proposed algorithm.


[86] 2605.11940

Lane-Aware Graph Attention Network for Multi-Vehicle Trajectory Prediction in Expressway Merge Zones

Accurate multi-vehicle trajectory prediction in expressway merge and diverge areas is fundamental to the decision-making frameworks of autonomous vehicle systems. However, the majority of existing graph-based prediction models are developed and validated on mainline freeway segments and do not address the geometrically distinct interaction structures that characterize merge zones. Furthermore, standard evaluation protocols rely exclusively on displacement error metrics, leaving the safety consequences of predicted trajectories unquantified. This paper proposes a Lane-Aware Graph Attention Network (LA-GAT) that encodes vehicle interaction within dynamic scene graphs, augmented with a trainable lane-relationship attention bias that prioritizes merge-conflict interactions from the outset of training. The model is pre-trained on the raw NGSIM US-101 and I-80 datasets and subsequently fine-tuned on UAV-captured UTE SQM-W-1 trajectory data from a Chinese expressway merge area, with final evaluation on the held-out SQM-W-2 dataset. Evaluation spans both displacement metrics (ADE, FDE at 1s, 3s, 5s horizons) and surrogate safety measures (TTC violation rate, DRAC exceedance rate, collision rate). Fine-tuned results on SQM-W-2 yield ADE of 0.865 m at 1s and 2.518 m at 3s, demonstrating that drone-informed fine-tuning substantially reduces the cross-dataset transfer gap. The deliberate use of unfiltered NGSIM data is shown to characterize raw-condition generalization limits, with the performance degradation attributed to the well-documented measurement errors in that dataset.


[87] 2605.12244

Estimation Problems and the Modulating Function Method: The Algebra of Modulating Functions

State and parameter estimation, along with fault detection, are three crucial estimation problems within the control systems community. Although different approaches have been proposed for each type of problem, the modulating function method proposes a more unified approach to all three problem classes, being used for state and parameter estimation of lumped systems, fault detection, and estimation of distributed and fractional systems. At the core of the method is the modulating function: a function that evaluates to 0 at the left or right boundaries up to a certain order of derivatives. By selecting the modulating functions, one directly determines the filter characteristics, and, for that reason, different function families have been proposed over the years. Nevertheless, many families of modulating functions are given in a rather similar mathematical structure. In light of these structures, this paper formally discusses the algebraic properties of modulating functions, and, after formalizing the closedness and group properties of modulating functions, a simple algorithm to construct new modulating functions is proposed, discussed, and illustrated with the construction of the newly introduced logarithmic modulating function families and 3 non-analytic modulating function families. Moreover, the fact that total modulating functions form a vector space and an algebra is exploited to construct orthonormal modulating functions, which are then used for the parameter estimation of a boat's roll dynamics, effectively avoiding matrix inversion issues.


[88] 2307.14072

Negative Spin $Δ_T$ noise Induced by Spin-Flip Scattering and Andreev Reflection

We study charge $\Delta_T$ noise, followed by an examination of spin $\Delta_T$ noise, in the normal metal-spin flipper-normal metal-insulator-superconductor (N-sf-N-I-S) junction. Our analysis reveals a key contrast: while charge $\Delta_T$ noise remains strictly positive, spin $\Delta_T$ noise undergoes a sign reversal from positive to negative, driven by the interplay between spin-flip scattering as well as Andreev reflection. In contrast, charge quantum shot noise remains positive and sign-definite, which is also valid for spin quantum shot noise. The emergence of negative spin $\Delta_T$ noise has two major implications. First, it establishes a clear distinction between spin-resolved $\Delta_T$ noise and quantum shot noise: the former is dominated by opposite-spin correlations, whereas the latter is led by same-spin correlations. Second, it provides access to scattering mechanisms that are not captured by quantum shot noise alone. Thus, negative spin $\Delta_T$ noise serves as a unique probe of the cooperative effects of Andreev reflection and spin flipping. We further place our results in context by comparing them with earlier reports of negative $\Delta_T$ noise in strongly correlated systems, such as fractional quantum Hall states, and in multiterminal hybrid superconducting junctions. Overall, this work offers new insights into the mechanisms governing sign reversals in $\Delta_T$ noise and highlights their role as distinctive fingerprints of spin-dependent scattering in superconducting hybrid devices.


[89] 2411.15913

Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms

Music style transfer blends source structure with reference style to enable personalized music creation. However, existing zero-shot methods often struggle to capture fine-grained audio nuances, relying on coarse text descriptions or requiring expensive task-specific training. We propose Stylus, a training-free framework that repurposes pretrained image diffusion models for music style transfer in the Mel-spectrogram domain. By treating audio as structured time-frequency images, Stylus manipulates self-attention by injecting style keys and values while preserving source structural queries. To ensure high fidelity, we introduce a phase-preserving reconstruction strategy to mitigate spectrogram inversion artifacts, alongside a classifier-free-guidance-inspired control for adjustable stylization. Extensive evaluations including 2,925 human ratings demonstrate that Stylus outperforms state-of-the-art baselines, achieving 34.1% higher content preservation and 25.7% better perceptual quality. Our work validates that generic image priors can be effectively leveraged for the training-free transformation of structured Mel-spectrograms. Code and materials are available at this https URL.


[90] 2502.20427

DeePen: Penetration Testing for Audio Deepfake Detection

Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen. Our approach operates without prior knowledge of or access to the target deepfake detection models. Instead, it leverages a set of carefully selected signal processing modifications - referred to as attacks - to evaluate model vulnerabilities. Using DeePen, we analyze both real-world production systems and publicly available academic model checkpoints, demonstrating that all tested systems exhibit weaknesses and can be reliably deceived by simple manipulations such as time-stretching or echo addition. Furthermore, our findings reveal that while some attacks can be mitigated by retraining detection systems with knowledge of the specific attack, others remain persistently effective.


[91] 2510.19471

Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition

Recent work has shown that sample-based Minimum Bayes Risk (MBR) decoding outperforms beam search in text-to-text generation tasks, such as machine translation, text summarization, and image captioning. On the other hand, beam search is the current practice for speech-to-text tasks such as automatic speech recognition (ASR) and Speech Translation (ST). Given that MBR decoding is effective in text-to-text generation tasks, it is reasonable to expect it to also be effective for speech-to-text tasks. In this paper, we evaluate MBR decoding for ASR and ST tasks on English and Japanese using Whisper and its derivative models. We observe that the accuracy of MBR decoding outperforms that of beam search in most of the experimental settings we have evaluated. The results show that MBR decoding is a promising method for offline ASR and ST tasks that require high accuracy. The code is available at this https URL


[92] 2512.06109

Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions

This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler (KL) regularization. We propose a central problem that separates the KL penalties on policies and transitions with independent weights, thus generalizing the standard trajectory-level KL-regularization used in probabilistic optimal control. This umbrella formulation recovers various control problems: the classical Stochastic Optimal Control (SOC), Risk-Sensitive Stochastic Optimal Control (RSOC), and their policy-based KL-regularized counterparts, termed soft-policy SOC and RSOC, which yield tractable surrogates. Beyond being regularized variants, these soft-policy formulations majorize the original SOC and RSOC, thus, iterating their solutions recovers the original objectives. We further identify a synchronized case of soft-policy RSOC where the policy and transition KL weights coincide, yielding a linear Bellman operator, path-integral solution, and compositionality -- extending these computationally favourable properties to a broad class of control problems.


[93] 2512.20211

Aliasing-Free Neural Audio Synthesis

In neural audio synthesis, neural vocoders and codecs are models that reconstruct waveforms from acoustic and latent representations, which are essential to the resulting audio quality. While current models are capable of generating perceptually natural speech, they still struggle with high-fidelity music and singing voice synthesis, as severe aliasing artifacts are introduced by non-linear activation functions and upsampling layers in existing architectures. Although various anti-aliasing techniques have been proposed in digital signal processing, their integration into neural vocoders and codecs remains under-explored. This paper incorporates differentiable anti-aliasing techniques into the activation and upsampling modules to bridge this gap, and thus presents Pupu-Vocoder and Pupu-Codec. We build a test signal benchmark to evaluate the anti-aliased modules, and validate our proposed models on speech, singing voice, music, and audio. Experimental results show that Pupu-Vocoder and Pupu-Codec outperform existing systems on singing voice, music, and audio, while achieving comparable performance on speech. Demos, codes, and checkpoints are available at this http URL.


[94] 2601.02053

Ageing Monitoring for Commercial Microcontrollers Based on Timing Windows

Microcontrollers are increasingly present in embedded deployments and dependable systems, for which malfunctions due to hardware ageing can have severe impact. The lack of deployable techniques for ageing monitoring on these devices has spread the application of guard bands to prevent timing errors due to degradation. Applying this static technique can limit performance and lead to sudden failures as devices age. In this paper, we follow a software-based self-testing approach to design monitoring of hardware degradation for microcontrollers. Deployable in the field, our technique leverages timing windows of variable lengths to determine the maximum operational frequency of the devices. We empirically validate the method on real hardware and find that it consistently detects temperature-induced degradations in maximum operating frequency of up to 13.79 % across devices for 60 °C temperature increase.


[95] 2602.01629

AdaptNC: Adaptive Nonconformity Scores for Conformal Prediction under Distribution Shift

Rigorous uncertainty quantification is essential for the safe deployment of autonomous systems in unconstrained environments. Conformal Prediction (CP) provides a distribution-free framework for this task, yet its standard formulations rely on exchangeability assumptions that are violated by the distribution shifts inherent in real-world robotics. Existing online CP methods maintain target coverage by adaptively scaling the conformal threshold, but typically employ a static nonconformity score function. We show that this fixed geometry leads to highly conservative, volume-inefficient prediction regions when environments undergo structural shifts. To address this, we propose $\textbf{AdaptNC}$, a framework for the joint online adaptation of both the nonconformity score parameters and the conformal threshold. AdaptNC leverages an adaptive reweighting scheme to optimize score functions, and introduces a replay buffer mechanism to mitigate the coverage instability that occurs during score transitions. We evaluate AdaptNC on diverse robotic benchmarks involving multi-agent policy changes, environmental changes and sensor degradation. Our results demonstrate that AdaptNC significantly reduces prediction region volume compared to state-of-the-art threshold-only baselines while maintaining target coverage levels.


[96] 2603.22267

TiCo: Time-Controllable Spoken Dialogue Model

We introduce TiCo, a time-controllable spoken dialogue model (SDM) that follows time-constrained instructions (e.g., "Please generate a response lasting about 15 seconds") and generates spoken responses with controllable duration. This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality. However, despite their strong ability to generate natural spoken responses, existing models lack time awareness and struggle to follow duration-related instructions. To systematically evaluate this, we introduce TiCo-Bench, the first benchmark for time-controllable instruction following in SDMs, on which existing open-source and commercial models frequently fail to satisfy explicit time constraints. TiCo addresses this limitation by enabling an SDM to estimate elapsed speaking time during generation through Spoken Time Markers (STM) (e.g., <10.6 seconds>). These markers help the model maintain awareness of time and adjust the remaining content to meet the target duration. TiCo is post-trained efficiently without question-answer paired data, relying on self-generation and reinforcement learning with verifiable reward. Experimental results show that TiCo reduces duration error by 2.7x over its backbone and 1.6x over the strongest baseline, while preserving response quality.