New articles on Electrical Engineering and Systems Science


[1] 2602.17701

Deep Neural Network Architectures for Electrocardiogram Classification: A Comprehensive Evaluation

With the rising prevalence of cardiovascular diseases, electrocardiograms (ECG) remain essential for the non-invasive detection of cardiac abnormalities. This study presents a comprehensive evaluation of deep neural network architectures for automated arrhythmia classification, integrating temporal modeling, attention mechanisms, and ensemble strategies. To address data scarcity in minority classes, the MIT-BIH Arrhythmia dataset was augmented using a Generative Adversarial Network (GAN). We developed and compared four distinct architectures, including Convolutional Neural Networks (CNN), CNN combined with Long Short-Term Memory (CNN-LSTM), CNN-LSTM with Attention, and 1D Residual Networks (ResNet-1D), to capture both local morphological features and long-term temporal dependencies. Performance was rigorously evaluated using accuracy, F1-score, and Area Under the Curve (AUC) with 95\% confidence intervals to ensure statistical robustness, while Gradient-weighted Class Activation Mapping (Grad-CAM) was employed to validate model interpretability. Experimental results indicate that the CNN-LSTM model achieved the optimal stand-alone balance between sensitivity and specificity, yielding an F1-score of 0.951. Conversely, the CNN-LSTM-Attention and ResNet-1D models exhibited higher sensitivity to class imbalance. To mitigate this, a dynamic ensemble fusion strategy was introduced; specifically, the Top2-Weighted ensemble achieved the highest overall performance with an F1-score of 0.958. These findings demonstrate that leveraging complementary deep architectures significantly enhances classification reliability, providing a robust and interpretable foundation for intelligent arrhythmia detection systems.


[2] 2602.17705

Wavenumber-domain signal processing for holographic MIMO: Foundations, methods, and future directions

Holographic multiple-input multiple-output (H-MIMO) systems represent a paradigm shift in wireless communications by enabling quasi-continuous apertures. Unlike conventional MIMO systems, H-MIMO with subwavelength antenna spacing operates in both far-field and near-field regimes, where classical discrete Fourier transform (DFT) representations fail to sufficiently capture the channel characteristics. To address this challenge, this article provides an overview of the emerging wavenumber-domain signal processing framework. Specifically, by leveraging spatial Fourier plane-wave decomposition to model H-MIMO channels, the wavenumber domain offers a unified and physically consistent basis for characterizing subwavelength-level spatial correlation and spherical wave propagation. This article first introduces the concept of H-MIMO and the wavenumber representation of H-MIMO channels. Next, it elaborates on wavenumber-domain signal processing technologies reported in the literature, including multiplexing, channel estimation, and waveform designs. Finally, it highlights open challenges and outlines future research directions in wavenumber-domain signal processing for next-generation wireless systems.


[3] 2602.17732

SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics

This paper presents virtual upmixing of steering vectors captured by a fewer-channel spherical microphone array. This challenge has conventionally been addressed by recovering the directions and signals of sound sources from first-order ambisonics (FOA) data, and then rendering the higher-order ambisonics (HOA) data using a physics-based acoustic simulator. This approach, however, struggles to handle the mutual dependency between the spatial directivity of source estimation and the spatial resolution of FOA ambisonics data. Our method, named SIRUP, employs a latent diffusion model architecture. Specifically, a variational autoencoder (VAE) is used to learn a compact encoding of the HOA data in a latent space and a diffusion model is then trained to generate the HOA embeddings, conditioned by the FOA data. Experimental results showed that SIRUP achieved a significant improvement compared to FOA systems for steering vector upmixing, source localization, and speech denoising.


[4] 2602.17745

Driving-Over Detection in the Railway Environment

To enable fully automated driving of trains, numerous new technological components must be introduced into the railway system. Tasks that are nowadays carried out by the operating stuff, need to be taken over by automatic systems. Therefore, equipment for automatic train operation and observing the environment is needed. Here, an important task is the detection of collisions, including both (1) collisions with the front of the train as well as (2) collisions with the wheel, corresponding to an driving-over event. Technologies for detecting the driving-over events are barely investigated nowadays. Therefore, detailed driving-over experiments were performed to gather knowledge for fully automated rail operations, using a variety of objects made from steel, wood, stone and bones. Based on the captured test data, three methods were developed to detect driving-over events automatically. The first method is based on convolutional neural networks and the other two methods are classical threshold-based approaches. The neural network based approach provides an mean accuracy of 99.6% while the classical approaches show 85% and 88.6%, respectively.


[5] 2602.17749

Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations

A challenge in marine bioacoustic analysis is the detection of animal signals, like calls, whistles and clicks, for behavioral studies. Manual labeling is too time-consuming to process sufficient data to get reasonable results. Thus, an automatic solution to overcome the time-consuming data analysis is necessary. Basic mathematical models can detect events in simple environments, but they struggle with complex scenarios, like differentiating signals with a low signal-to-noise ratio or distinguishing clicks from echoes. Deep Learning Neural Networks, such as ANIMAL-SPOT, are better suited for such tasks. DNNs process audio signals as image representations, often using spectrograms created by Short-Time Fourier Transform. However, spectrograms have limitations due to the uncertainty principle, which creates a tradeoff between time and frequency resolution. Alternatives like the wavelet, which provides better time resolution for high frequencies and improved frequency resolution for low frequencies, may offer advantages for feature extraction in complex bioacoustic environments. This thesis shows the efficacy of CLICK-SPOT on Norwegian Killer whale underwater recordings provided by the cetacean biologist Dr. Vester. Keywords: Bioacoustics, Deep Learning, Wavelet Transformation


[6] 2602.17797

Deep Learning for Dermatology: An Innovative Framework for Approaching Precise Skin Cancer Detection

Skin cancer can be life-threatening if not diagnosed early, a prevalent yet preventable disease. Globally, skin cancer is perceived among the finest prevailing cancers and millions of people are diagnosed each year. For the allotment of benign and malignant skin spots, an area of critical importance in dermatological diagnostics, the application of two prominent deep learning models, VGG16 and DenseNet201 are investigated by this paper. We evaluate these CNN architectures for their efficacy in differentiating benign from malignant skin lesions leveraging enhancements in deep learning enforced to skin cancer spotting. Our objective is to assess model accuracy and computational efficiency, offering insights into how these models could assist in early detection, diagnosis, and streamlined workflows in dermatology. We used two deep learning methods DenseNet201 and VGG16 model on a binary class dataset containing 3297 images. The best result with an accuracy of 93.79% achieved by DenseNet201. All images were resized to 224x224 by rescaling. Although both models provide excellent accuracy, there is still some room for improvement. In future using new datasets, we tend to improve our work by achieving great accuracy.


[7] 2602.17813

Promptable segmentation with region exploration enables minimal-effort expert-level prostate cancer delineation

Purpose: Accurate segmentation of prostate cancer on magnetic resonance (MR) images is crucial for planning image-guided interventions such as targeted biopsies, cryoablation, and radiotherapy. However, subtle and variable tumour appearances, differences in imaging protocols, and limited expert availability make consistent interpretation difficult. While automated methods aim to address this, they rely on large expertly-annotated datasets that are often inconsistent, whereas manual delineation remains labour-intensive. This work aims to bridge the gap between automated and manual segmentation through a framework driven by user-provided point prompts, enabling accurate segmentation with minimal annotation effort. Methods: The framework combines reinforcement learning (RL) with a region-growing segmentation process guided by user prompts. Starting from an initial point prompt, region-growing generates a preliminary segmentation, which is iteratively refined through RL. At each step, the RL agent observes the image and current segmentation to predict a new point, from which region growing updates the mask. A reward, balancing segmentation accuracy and voxel-wise uncertainty, encourages exploration of ambiguous regions, allowing the agent to escape local optima and perform sample-specific optimisation. Despite requiring fully supervised training, the framework bridges manual and fully automated segmentation at inference by substantially reducing user effort while outperforming current fully automated methods. Results: The framework was evaluated on two public prostate MR datasets (PROMIS and PICAI, with 566 and 1090 cases). It outperformed the previous best automated methods by 9.9% and 8.9%, respectively, with performance comparable to manual radiologist segmentation, reducing annotation time tenfold.


[8] 2602.17855

TopoGate: Quality-Aware Topology-Stabilized Gated Fusion for Longitudinal Low-Dose CT New-Lesion Prediction

Longitudinal low-dose CT follow-ups vary in noise, reconstruction kernels, and registration quality. These differences destabilize subtraction images and can trigger false new lesion alarms. We present TopoGate, a lightweight model that combines the follow-up appearance view with the subtraction view and controls their influence through a learned, quality-aware gate. The gate is driven by three case-specific signals: CT appearance quality, registration consistency, and stability of anatomical topology measured with topological metrics. On the NLST--New-Lesion--LongCT cohort comprising 152 pairs from 122 patients, TopoGate improves discrimination and calibration over single-view baselines, achieving an area under the ROC curve of 0.65 with a standard deviation of 0.05 and a Brier score of 0.14. Removing corrupted or low-quality pairs, identified by the quality scores, further increases the area under the ROC curve from 0.62 to 0.68 and reduces the Brier score from 0.14 to 0.12. The gate responds predictably to degradation, placing more weight on appearance when noise grows, which mirrors radiologist practice. The approach is simple, interpretable, and practical for reliable longitudinal LDCT triage.


[9] 2602.17874

Modal Energy for Power System Analysis: Definitions and Requirements

Modal energy provides information complementary to and based on conventional eigenvalues and participation factors for power system modal analysis. However, modal energy definition is not unique. This letter clarifies the definitions and applicability of mainstream modal energy approaches, focusing on their mappings to eigenvalues and to the total system energy. It is shown that these mappings hold only under restrictive conditions, notably system normality, which limits their applicability in inverter-dominated power systems.


[10] 2602.17877

A Scalable Reconfigurable Intelligent Surface with 3 Bit Phase Resolution and High Bandwidth for 3.6 GHz 5G/6G Applications

Reconfigurable Intelligent Surfaces enable active control of wireless propagation channels, which is crucial for future 5G and 6G networks. This work presents a scalable RIS design operating at 3.6 GHz with both 1 bit and 3 bit phase resolution, supporting wideband applications. The unit cells employ low-cost printed circuit board technology with an innovative spring-contact feeding structure, enabling efficient assembly and reduced manufacturing complexity for large-area arrays. The design achieves broadband phase control, low power consumption, and high scalability, with experimental results demonstrating phase tunability across the n78 frequency band and competitive reflection performance compared to existing solutions. This RIS architecture provides a practical platform for experimental studies of smart radio environments, beam steering, and sensing applications in next-generation wireless networks.


[11] 2602.17901

MeDUET: Disentangled Unified Pretraining for 3D Medical Image Synthesis and Analysis

Self-supervised learning (SSL) and diffusion models have advanced representation learning and image synthesis. However, in 3D medical imaging, they remain separate: diffusion for synthesis, SSL for analysis. Unifying 3D medical image synthesis and analysis is intuitive yet challenging, as multi-center datasets exhibit dominant style shifts, while downstream tasks rely on anatomy, and site-specific style co-varies with anatomy across slices, making factors unreliable without explicit constraints. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework that performs SSL in the Variational Autoencoder (VAE) latent space which explicitly disentangles domain-invariant content from domain-specific style. The token demixing mechanism serves to turn disentanglement from a modeling assumption into an empirically identifiable property. Two novel proxy tasks, Mixed-Factor Token Distillation (MFTD) and Swap-invariance Quadruplet Contrast (SiQC), are devised to synergistically enhance disentanglement. Once pretrained, MeDUET is capable of (i) delivering higher fidelity, faster convergence, and improved controllability for synthesis, and (ii) demonstrating strong domain generalization and notable label efficiency for analysis across diverse medical benchmarks. In summary, MeDUET converts multi-source heterogeneity from an obstacle into a learning signal, enabling unified pretraining for 3D medical image synthesis and analysis. The code is available at this https URL .


[12] 2602.17977

A Survey on Reconfigurable and Movable Antennas for Wireless Communications and Sensing

Reconfigurable antennas (RAs) and movable antennas (MAs) have been recognized as promising technologies to enhance the performance of wireless communication and sensing systems by introducing additional degrees of freedom (DoFs) in tuning antenna radiation and/or placement. This paradigm shift from conventional non-reconfigurable/movable antennas offers tremendous new opportunities for realizing multi-functional, more adaptive, and efficient next-generation wireless networks. In this paper, we provide a comprehensive survey on the fundamentals, architectures, and applications of these two emerging antenna technologies. First, we provide a chronological overview of the parallel historical development of both RA and MA technologies. Next, we review and classify the state-of-the-art hardware architectures for implementing RAs and MAs, followed by a detailed comparison of their distinct mechanisms, performance metrics, and functionalities. Subsequently, we focus on various applications of RAs and MAs in wireless communication systems, analyzing their respective performance advantages and key design considerations such as mode selection, movement optimization, and channel acquisition. We also explore the significant roles of RAs and MAs in advancing wireless sensing and integrated sensing and communication (ISAC). Furthermore, we present numerical performance comparisons to illustrate the distinct characteristics and complementary advantages of RA and MA systems. Finally, we outline key challenges and identify promising future research directions to inspire further innovations in this burgeoning field.


[13] 2602.17979

Learning While Transmitting: Pilotless Polar Coded Modulation for Short Packet Transmission

Short packets make channel learning expensive. In pilot-aided transmission (PAT), a non-negligible fraction of the packet is consumed by pilots, creating a direct pre-log loss and tightening the reliability margin needed for ultra-reliable low-latency communication. We propose a pilot-free polar-coded framework that replaces explicit pilots with \emph{coded pilots}. The message is carried by two polar-coded segments: a quadrature phase shift keying (QPSK) segment that is decodable without channel state information (CSI), and a higher-order quadrature amplitude modulation (QAM) segment that provides high spectral efficiency. The receiver employs \emph{hybrid decoding}: it first jointly infers CSI during successive-cancellation-based decoding of the QPSK segment by exploiting QPSK phase-rotation invariance together with polar frozen-bit constraints; the decoded QPSK symbols then act as \emph{implicit pilots} for coherent detection and decoding of the QAM segment. The split also makes rate adaptation practical by confining the symmetry/frozen-bit requirements for phase resolution to the QPSK segment, enabling puncturing and shortening without breaking the pilot-free mechanism. For multi-block fading, we optimize the split and code parameters via density evolution with Gaussian approximation (DEGA); for higher-order modulation, we use bit-interleaved coded modulation capacity approximation to obtain equivalent channel parameters. Incorporating channel-estimation error variance into the DEGA-based analysis, simulations over practical multi-block block-fading channels show gains up to $1.5$~dB over PAT in the short-blocklength regime.


[14] 2602.17986

From Global Radiomics to Parametric Maps: A Unified Workflow Fusing Radiomics and Deep Learning for PDAC Detection

Radiomics and deep learning both offer powerful tools for quantitative medical imaging, but most existing fusion approaches only leverage global radiomic features and overlook the complementary value of spatially resolved radiomic parametric maps. We propose a unified framework that first selects discriminative radiomic features and then injects them into a radiomics-enhanced nnUNet at both the global and voxel levels for pancreatic ductal adenocarcinoma (PDAC) detection. On the PANORAMA dataset, our method achieved AUC = 0.96 and AP = 0.84 in cross-validation. On an external in-house cohort, it achieved AUC = 0.95 and AP = 0.78, outperforming the baseline nnUNet; it also ranked second in the PANORAMA Grand Challenge. This demonstrates that handcrafted radiomics, when injected at both global and voxel levels, provide complementary signals to deep learning models for PDAC detection. Our code can be found at this https URL


[15] 2602.18005

Multi-Modal Sensing Residual-Corrected GNN for mmWave Path Loss Prediction via Synesthesia of Machines

To support sixth-generation (6G)-enabled intelligent transportation systems (ITSs), a multi-modal sensing residual-corrected graph neural network (MM-ResGNN) framework is proposed for millimeter-wave (mmWave) path loss prediction in vehicular communications for the first time. The propagation environment is formulated as an environment sensing path loss graph (ESPL-Graph), where nodes represent the transmitter (Tx) and receiver (Rx) entities and edges jointly describe Tx--Rx transmission links and Rx--Rx spatial correlation links. Meanwhile, a geometry-driven physical baseline is introduced to decouple deterministic attenuation trends from stochastic residual variations. A vehicular multi-modal path loss dataset (VMMPL) is constructed, which covers three representative scenarios, including the urban wide lane, urban crossroad, and suburban forking road environments, and achieves precise alignment between RGB images and global semantic information in the physical space, and link-level ray-tracing (RT)-based path loss data in the electromagnetic space. In MM-ResGNN, topology-aware graph representations and fine-grained visual semantics are synergistically integrated through a gated fusion mechanism to estimate the path loss residual relative to the physical baseline. Experimental results demonstrate that MM-ResGNN achieves significant improvements over empirical models and conventional data-driven baselines, with a normalized mean squared error (NMSE) of 0.0098, a mean absolute error (MAE) of 5.7991~dB, and a mean absolute percentage error (MAPE) of 5.0498\%. Furthermore, MM-ResGNN exhibits robust cross-scenario generalization through a few-shot fine-tuning strategy, enabling accurate path loss prediction in unseen vehicular environments with limited labeled data.


[16] 2602.18018

Joint Multi-User Tracking and Signal Detection in Reconfigurable Intelligent Surface-Assisted Cell-Free ISAC Systems

This paper investigates the cell-free multi-user integrated sensing and communication (ISAC) system, where multiple base stations collaboratively track the users and detect their signals. Moreover, reconfigurable intelligent surfaces (RISs) are deployed to serve as additional reference nodes to overcome the line-of-sight blockage issue of mobile users for accomplishing seamless sensing. Due to the high-speed user mobility, the multi-user tracking and signal detection performance can be significantly deteriorated without elaborated online user kinematic state updating principles. To tackle this challenge, we first manage to establish a probabilistic signal model to comprehensively characterize the interdependencies among user states, transmit signals, and received signals during the tracking procedure. Based on the Bayesian problem formulation, we further propose a novel hybrid variational message passing (HVMP) algorithm to realize computationally efficient joint estimation of user states and transmit signals in an online manner, which integrates VMP and standard MP to derive the posterior probabilities of estimated variables. Furthermore, the Bayesian Cramer-Rao bound is provided to characterize the performance limit of the multi-user tracking problem, which is also utilized to optimize RIS phase profiles for tracking performance enhancement. Numerical results demonstrate that the proposed algorithm can significantly improve both tracking and signal detection performance over the representative Bayesian estimation counterparts.


[17] 2602.18031

Decision Support under Prediction-Induced Censoring

In many data-driven online decision systems, actions determine not only operational costs but also the data availability for future learning -- a phenomenon termed Prediction-Induced Censoring (PIC). This challenge is particularly acute in large-scale resource allocation for generative AI (GenAI) serving: insufficient capacity triggers shortages but hides the true demand, leaving the system with only a "greater-than" constraint. Standard decision-making approaches that rely on uncensored data suffer from selection bias, often locking the system into a self-reinforcing low-provisioning trap. To break this loop, this paper proposes an adaptive approach named PIC-Reinforcement Learning (PIC-RL), a closed-loop framework that transforms censoring from a data quality problem into a decision signal. PIC-RL integrates (1) Uncertainty-Aware Demand Prediction to manage the information-cost trade-off, (2) Pessimistic Surrogate Inference to construct decision-aligned conservative feedback from shortage events, and (3) Dual-Timescale Adaptation to stabilize online learning against distribution drift. The analysis provides theoretical guarantees that the feedback design corrects the selection bias inherent in naive learning. Experiments on production Alibaba GenAI traces demonstrate that PIC-RL consistently outperforms state-of-the-art baselines, reducing service degradation by up to 50% while maintaining cost efficiency.


[18] 2602.18048

Incremental Data Driven Transfer Identification

We introduce a geometric method for online transfer identification of a deterministic linear time-invariant system. At the beginning of the identification process, we assume access to abundant data from a system that is similar, though not identical, to the true system. In the early stages of data collection from the true system, the dataset generated is still not sufficiently informative to enable precise identification. Consequently, multiple candidate models remain consistent with the observations available at that point. Our method picks, at each instant, the model closest to the similar system that is consistent with the current data. As more data are collected, the proposed model gradually moves away from the initial similar system and eventually converges to the true system when the data set grows to be informative. Numerical examples demonstrate the effectiveness of the incremental transfer identification paradigm, where identified models with minimal data are used to solve the pole placement problem.


[19] 2602.18059

Iterative McCormick Relaxation for Joint Impedance Control and Network Topology Optimization

Power system operators are increasingly deploying Variable Impedance Devices (VIDs), e.g., Smart Wires, and Network Topology Optimization (NTO) schemes for mitigating operational challenges such as line and transformer congestion, and voltage violations. This work aims to optimize and coordinate the operation of distributed VIDs considering fixed and optimized topologies. This problem is inherently non-linear due to power flow equations as well as bilinear terms introduced due to variable line impedance of VIDs. Furthermore, the topology optimization scheme makes it a mixed integer nonlinear problem. To tackle this, we introduce using McCormick relaxation scheme, which converts the bilinear constraints into a linear set of constraints along with the DC power flow equations. We propose an iterative correction of the McCormick relaxation to enhance its accuracy. The proposed framework is validated on standard IEEE benchmark test systems, and we present a performance comparison of the iterative McCormick method against the non-linear, SOS2 piecewise linear approximation, and original McCormick relaxation.


[20] 2602.18076

Extremely Large Antenna Spacing Method for Enhanced Wideband Near-Field Sensing

This paper proposes a monostatic wideband system for integrated sensing and communication (ISAC) at millimeter-wave frequencies, based on multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM). The system operates in a hybrid near-/far-field regime. The transmitter (Tx) operates in the far field (FF) and uses low-complexity beam steering. The receiver (Rx), on the other hand, operates in a pervasive near field (NF), enabled by a very large effective array aperture. To enable a fully digital implementation, we introduce an extremely large antenna spacing (ELAS) design. This design attains the required aperture with only a few widely spaced antenna elements while avoiding grating lobes in the composite Tx-Rx response. We analytically characterize the NF range-angle response of this architecture and study the interplay between NF effects and waveform bandwidth. This leads to the definition of a super-resolution region, where NF propagation at the Rx dominates the achievable range resolution and surpasses the classical, bandwidth-limited resolution. As a case study, we consider an extended target modeled as a collection of scatterers and assess localization performance via maximum-likelihood estimation. Numerical results evaluated in terms of root mean square error (RMSE) and generalized optimal sub-pattern assignment (GOSPA) show that operating in NF conditions with the ELAS-based design yields significant gains compared to a conventional FF baseline at both the Tx and Rx.


[21] 2602.18086

Non-Contiguous Wi-Fi Spectrum for ISAC: Impact on Multipath Delay Estimation

Leveraging channel state information from multiple Wi-Fi bands can improve delay resolution for ranging and sensing when a wide contiguous spectrum is unavailable. However, frequency gaps shape the delay response, introducing sidelobes and secondary peaks that can obscure closely spaced multipath components. This paper examines multipath delay estimation for Wi-Fi-compliant multiband configurations using channel state information (CSI). For a two-path model with unknown complex gains and delays, the Cramér-Rao lower bound (CRLB) for delay separation is derived and analyzed, confirming the benefit of larger frequency aperture, while revealing pronounced, separation-dependent oscillations driven by gap geometry and inter-path coupling. Given the local nature of Cramér-Rao lower bound, the delay response is analyzed next. In the single-path case, the combined subband responses determine how delay-domain sidelobe levels are distributed. The dominant peak spacing is set primarily by the separation between subband center frequencies. In the two-path case, increased aperture sharpens the mainlobe but also intensifies sidelobes and leakage, yielding competing peaks and, in some regimes, a dominant peak shifted from the true delay. Finally, a normalized leakage metric is introduced to predict problematic separations and to identify regimes where local Cramér-Rao lower bound analysis does not capture practical peak-leakage behavior in delay estimation.


[22] 2602.18119

RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis

Histopathology, the current gold standard for cancer diagnosis, involves the manual examination of tissue samples after chemical staining, a time-consuming process requiring expert analysis. Raman spectroscopy is an alternative, stain-free method of extracting information from samples. Using nnU-Net, we trained a segmentation model on a novel dataset of spatial Raman spectra aligned with tumour annotations, achieving a mean foreground Dice score of 80.9%, surpassing previous work. Furthermore, we propose a novel, interpretable, prototype-based architecture called RamanSeg. RamanSeg classifies pixels based on discovered regions of the training set, generating a segmentation mask. Two variants of RamanSeg allow a trade-off between interpretability and performance: one with prototype projection and another projection-free version. The projection-free RamanSeg outperformed a U-Net baseline with a mean foreground Dice score of 67.3%, offering a meaningful improvement over a black-box training approach.


[23] 2602.18247

Hybrid Control of ADT Switched Linear Systems subject to Actuator Saturation

This paper develops a hybrid output-feedback control framework for average dwell-time (ADT) switched linear systems subject to actuator saturation. The considered subsystems may be exponentially unstable, and the saturation nonlinearity is explicitly handled through a deadzone-based representation. The proposed hybrid controller combines mode-dependent full-order dynamic output-feedback controllers with a supervisory reset mechanism that updates controller states at switching instants. By incorporating the reset rule directly into the synthesis conditions, switching boundary constraints and performance requirements are addressed in a unified convex formulation. Sufficient conditions are derived in terms of linear matrix inequalities (LMIs) to guarantee exponential stability under ADT switching and a prescribed weighted ${\cal L}_2$-gain disturbance attenuation level for energy-bounded disturbances. An explicit controller construction algorithm is provided based on feasible LMI solutions. Simulation results demonstrate the effectiveness and computational tractability of the proposed approach and highlight its advantages over existing output-feedback saturation control methods.


[24] 2602.18254

m^3TrackFormer: Transformer-based mmWave Multi-Target Tracking with Lost Target Re-Acquisition Capability

This paper considers a millimeter wave (mmWave) integrated sensing and communication (ISAC) system, where a base station (BS) equipped with a large number of antennas but a small number of radio-frequency (RF) chains emits pencillike narrow beams for persistent tracking of multiple moving targets. Under this model, the tracking lost issue arising from the misalignment between the pencil-like beams and the true target positions is inevitable, especially when the trajectories of the targets are complex, and the conventional Kalman filter-based scheme does not work well. To deal with this issue, we propose a Transformer-based mmWave multi-target tracking framework, namely m3TrackFormer, with a novel re-acquisition mechanism, such that even if the echo signals from some targets are too weak to extract sensing information, we are able to re-acquire their locations quickly with small beam sweeping overhead. Specifically, the proposed framework can operate in two modes of normal tracking and target re-acquisition during the tracking procedure, depending on whether the tracking lost occurs. When all targets are hit by the swept beams, the framework works in the Normal Tracking Mode (N-Mode) with a Transformer encoder-based Normal Tracking Network (N-Net) to accurately estimate the positions of these targets and predict the swept beams in the next time block. While the tracking lost happens, the framework will switch to the Re-Acquisition Mode (R-Mode) with a Transformer decoder-based Re-Acquisition Network (RNet) to adjust the beam sweeping strategy for getting back the lost targets and maintaining the tracking of the remaining targets. Thanks to the ability of global trajectory feature extraction, the m3TrackFormer can achieve high beam prediction accuracy and quickly re-acquire the lost targets, compared with other tracking methods.


[25] 2602.18261

Accurate Data-Based State Estimation from Power Loads Inference in Electric Power Grids

Accurate state estimation is a crucial requirement for the reliable operation and control of electric power systems. Here, we construct a data-driven, numerical method to infer missing power load values in large-scale power grids. Given partial observations of power demands, the method estimates the operational state using a linear regression algorithm, exploiting statistical correlations within synthetic training datasets. We evaluate the performance of the method on three synthetic transmission grid test systems. Numerical experiments demonstrate the high accuracy achieved by the method in reconstructing missing demand values under various operating conditions. We further apply the method to real data for the transmission power grid of Switzerland. Despite the restricted number of observations in this dataset, the method infers missing power loads rather accurately. Furthermore, Newton-Raphson power flow solutions show that deviations between true and inferred values for power loads result in smaller deviations between true and inferred values for flows on power lines. This ensures that the estimated operational state correctly captures potential line contingencies. Overall, our results indicate that simple data-based regression techniques can provide an efficient and reliable alternative for state estimation in modern power grids.


[26] 2602.18263

Channel Estimation for Double-BD-RIS-Assisted Multi-User MIMO Communication

Deploying multiple beyond diagonal reconfigurable intelligent surfaces (BD-RISs) can potentially improve the communication performance thanks to inter-element connections of each BD-RIS and inter-surface cooperative beamforming gain among BD-RISs. However, a major issue for multi-BD-RISassisted communication lies in the channel estimation overhead - the channel coefficients associated with the off-diagonal elements in each BD-RIS's scattering matrix as well as those associated with the reflection links among BD-RISs have to be estimated. In this paper, we propose an efficient channel estimation framework for double-BD-RIS-assisted multi-user multipleinput multiple-output (MIMO) systems. Specifically, we reveal that high-dimensional cascaded channels are characterized by five low-dimensional matrices by exploiting channel correlation properties. Based on this novel observation, in the ideal noiseless case, we develop a channel estimation scheme to recover these matrices sequentially and characterize the closed-form overhead required for perfect estimation as a function of the numbers of users and each BD-RIS's elements and channel ranks, which is with the same order as that in double-diagonal-RIS-aided communication systems. This exciting result implies the superiority of cooperative BD-RIS-aided communication over the diagonal- RIS counterpart even when channel estimation overhead is considered. We further extend the proposed scheme to practical noisy scenarios and provide extensive numerical simulations to validate its effectiveness.


[27] 2602.18331

Koopman-BoxQP: Solving Large-Scale NMPC at kHz Rates

Solving large-scale nonlinear model predictive control (NMPC) problems at kilohertz (kHz) rates on standard processors remains a formidable challenge. This paper proposes a Koopman-BoxQP framework that i) learns a linear Koopman high-dimensional model, ii) eliminates the high-dimensional observables to construct a multi-step prediction model of the states and control inputs, iii) penalizes the multi-step prediction model into the objective, which results in a structured box-constrained quadratic program (BoxQP) whose decision variables include both the system states and control inputs, iv) develops a structure-exploited and warm-starting-supported variant of the feasible Mehrotra's interior-point algorithm for BoxQP. Numerical results demonstrate that Koopman-BoxQP can solve a large-scale NMPC problem with $1040$ variables and $2080$ inequalities at a kHz rate.


[28] 2602.18332

MD-AirComp+: Adaptive Quantization for Blind Massive Digital Over-the-Air Computation

Recent research has shown that unsourced massive access (UMA) is naturally well-suited for over-the-air computation (AirComp), as it does not require knowledge of each individual signal, as demonstrated by the massive digital AirComp (MD-AirComp) scheme proposed in prior work. The MD-AirComp scheme has proven effective in federated edge learning and is highly compatible with current digital wireless networks. However, it depends on channel pre-equalization, which may amplify computation errors in the presence of channel estimation inaccuracies, thus limiting its practical use. In this paper, we propose a blind MD-AirComp+ scheme, which takes advantage of the channel hardening effect in massive multiple-input multiple-output (MIMO) systems. We provide an upper bound on the computation mean square error, analyze the trade-off between computation accuracy and communication overhead, and determine the optimal quantization level. Additionally, we introduce a deep unfolding algorithm to reduce the computational complexity of solving the underdetermined detection problem formulated as a least absolute shrinkage and selection operator optimization problem. Simulation results confirm the effectiveness of the proposed MD-AirComp+ framework, the optimal quantization selection strategy, and the low-complexity detection algorithm.


[29] 2602.18339

GS-SBL: Bridging Greedy Pursuit and Sparse Bayesian Learning for Efficient 3D Wireless Channel Modeling

Robust cognitive radio development requires accurate 3D path loss models. Traditional empirical models often lack environment-awareness, while deep learning approaches are frequently constrained by the scarcity of large-scale training datasets. This work leverages the inherent sparsity of wireless propagation to model scenario-specific channels by identifying a discrete set of virtual signal sources. We propose a novel Greedy Sequential Sparse Bayesian Learning (GS-SBL) framework that bridges the gap between the computational efficiency of Orthogonal Matching Pursuit (OMP) and the robust uncertainty quantification of SBL. Unlike standard top-down SBL, which updates all source hyperparameters simultaneously, our approach employs a ``Micro-SBL'' architecture. We sequentially evaluate candidate source locations in isolation by executing localized, low-iteration SBL loops and selecting the source that minimizes the $L_2$ residual error. Once identified, the source and its corresponding power are added to the support set, and the process repeats on the signal residual to identify subsequent sources. Experimental results on real-world 3D propagation data demonstrate that the GS-SBL framework significantly outperforms OMP in terms of generalization. By utilizing SBL as a sequential source identifier rather than a global optimizer, the proposed method preserves Bayesian high-resolution accuracy while achieving the execution speeds necessary for real-time 3D path loss characterization.


[30] 2602.18355

Rethinking Flow and Diffusion Bridge Models for Speech Enhancement

Flow matching and diffusion bridge models have emerged as leading paradigms in generative speech enhancement, modeling stochastic processes between paired noisy and clean speech signals based on principles such as flow matching, score matching, and Schrödinger bridge. In this paper, we present a framework that unifies existing flow and diffusion bridge models by interpreting them as constructions of Gaussian probability paths with varying means and variances between paired data. Furthermore, we investigate the underlying consistency between the training/inference procedures of these generative models and conventional predictive models. Our analysis reveals that each sampling step of a well-trained flow or diffusion bridge model optimized with a data prediction loss is theoretically analogous to executing predictive speech enhancement. Motivated by this insight, we introduce an enhanced bridge model that integrates an effective probability path design with key elements from predictive paradigms, including improved network architecture, tailored loss functions, and optimized training strategies. Experiments on denoising and dereverberation tasks demonstrate that the proposed method outperforms existing flow and diffusion baselines with fewer parameters and reduced computational complexity. The results also highlight that the inherently predictive nature of this generative framework imposes limitations on its achievable upper-bound performance.


[31] 2602.18365

A Marginal Reliability Impact Based Accreditation Framework for Capacity Markets

This paper presents a Marginal Reliability Impact (MRI) based resource accreditation framework for capacity market design. Under this framework, a resource is accredited based on its marginal impact on system reliability, thus aligning the resource accreditation value with its reliability contribution. A key feature of the MRI based accreditation is that the accredited capacities supplied by different resources to the capacity market are substitutable in reliability contribution, a desired feature of homogeneous products. Moreover, with MRI based capacity demand, substitutability between supply and demand for capacity is also achieved. As a result, a capacity market with the MRI based capacity product can better characterize the underlying resource adequacy problem and lead to more efficient market outcomes.


[32] 2602.18376

Parameter Update Laws for Adaptive Control with Affine Equality Parameter Constraints

In this paper, constrained parameter update laws for adaptive control with convex equality constraint on the parameters are developed, one based on a gradient only update and the other incorporating concurrent learning (CL) update. The update laws are derived by solving a constrained optimization problem with affine equality constraints. This constrained problem is reformulated as an equivalent unconstrained problem in a new variable, thereby eliminating the equality constraints. The resulting update law is integrated with an adaptive trajectory tracking controller, enabling online learning of the unknown system parameters. Lyapunov stability of the closed-loop system with the equality-constrained parameter update law is established. The effectiveness of the proposed equality-constrained adaptive control law is demonstrated through simulations, validating its ability to maintain constraints on the parameter estimates, achieving convergence to the true parameters for CL-based update law, and achieving asymptotic and exponential tracking performance for constrained gradient and constrained CL-based update laws, respectively.


[33] 2602.18382

Incremental Input-to-State Stability and Equilibrium Tracking for Stochastic Contracting Dynamics

In this paper, we study the contractivity of nonlinear stochastic differential equations (SDEs) driven by deterministic inputs and Brownian motions. Given a weighted $\ell_2$-norm for the state space, we show that an SDE is incrementally noise- and input-to-state stable if its vector field is uniformly contracting in the state and uniformly Lipschitz in the input. This result is applied to error estimation for time-varying equilibrium tracking in the presence of noise affecting both the system dynamics and the input signals. We consider both Ornstein-Uhlenbeck processes modeling unbounded noise and Jacobi diffusion processes modeling bounded noise. Finally, we turn our attention to the associated Fokker-Planck equation of an SDE. For this context, we prove incremental input-to-state stability with respect to an arbitrary $p$-Wasserstein metric when the drift vector field is uniformly contracting in the state and uniformly Lipschitz in the input with respect to an arbitrary norm.


[34] 2602.18400

Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis, that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility. The code is available at this https URL .


[35] 2602.18408

Modeling UAV-aided Roadside Cell-Free Networks with Matérn Hard-Core Point Processes

This paper investigates a uncrewed aerial vehicles (UAV)-assisted cell-free architecture for vehicular networks in road-constrained environments. Roads are modeled using a Poisson Line Process (PLP), with multi-layer roadside access points (APs) deployed via 1-D Poisson Point Process (PPP). Each user forms a localized cell-free cluster by associating with the nearest AP in each layer along its corresponding road. This forms a road-constrained cell-free architecture. To enhance coverage, UAV act as an aerial tier, extending access from 1-D road-constrained layouts (embedded in 2-D) to 3-D. We employ a Matérn Hard-Core (MHC) point process to model the spatial distribution of UAV base stations, ensuring a minimum safety distance between them. In order to enable tractable analysis of the aggregate signal from multiple APs, a distance-based power control scheme is introduced. Leveraging tools from stochastic geometry, we have studied the coverage probability. Furthermore, we analyze the impact of key system parameters on coverage performance, providing useful insights into the deployment and optimization of UAV-assisted cell-free vehicular networks.


[36] 2602.18416

Convex Block-Cholesky Approach to Risk-Constrained Low-thrust Trajectory Design under Operational Uncertainty

Designing robust trajectories under uncertainties is an emerging technology that may represent a key paradigm shift in space mission design. As we pursue more ambitious scientific goals (e.g., multi-moon tours, missions with extensive components of autonomy), it becomes more crucial that missions are designed with navigation (Nav) processes in mind. The effect of Nav processes is statistical by nature, as they consist of orbit determination (OD) and flight-path control (FPC). Thus, this mission design paradigm calls for techniques that appropriately quantify statistical effects of Nav, evaluate associated risks, and design missions that ensure sufficiently low risk while minimizing a statistical performance metric; a common metric is Delta-V99: worst-case (99%-quantile) Delta-V expenditure including statistical FPC efforts. In response to the need, this paper develops an algorithm for risk-constrained trajectory optimization under operational uncertainties due to initial state dispersion, navigation error, maneuver execution error, and imperfect dynamics modeling. We formulate it as a nonlinear stochastic optimal control problem and develop a computationally tractable algorithm that combines optimal covariance steering and sequential convex programming (SCP). Specifically, the proposed algorithm takes a block-Cholesky approach for convex formulation of optimal covariance steering, and leverages a recent SCP algorithm, SCvx*, for reliable numerical convergence. We apply the developed algorithm to risk-constrained, statistical trajectory optimization for exploration of dwarf planet Ceres with a Mars gravity assist, and demonstrate the robustness of the statistically-optimal trajectory and FPC policies via nonlinear Monte Carlo simulation.


[37] 2602.17711

Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance

Multi-branch deep neural networks like AASIST3 achieve state-of-the-art comparable performance in audio anti-spoofing, yet their internal decision dynamics remain opaque compared to traditional input-level saliency methods. While existing interpretability efforts largely focus on visualizing input artifacts, the way individual architectural branches cooperate or compete under different spoofing attacks is not well characterized. This paper develops a framework for interpreting AASIST3 at the component level. Intermediate activations from fourteen branches and global attention modules are modeled with covariance operators whose leading eigenvalues form low-dimensional spectral signatures. These signatures train a CatBoost meta-classifier to generate TreeSHAP-based branch attributions, which we convert into normalized contribution shares and confidence scores (Cb) to quantify the model's operational strategy. By analyzing 13 spoofing attacks from the ASVspoof 2019 benchmark, we identify four operational archetypes-ranging from Effective Specialization (e.g., A09, Equal Error Rate (EER) 0.04%, C=1.56) to Ineffective Consensus (e.g., A08, EER 3.14%, C=0.33). Crucially, our analysis exposes a Flawed Specialization mode where the model places high confidence in an incorrect branch, leading to severe performance degradation for attacks A17 and A18 (EER 14.26% and 28.63%, respectively). These quantitative findings link internal architectural strategy directly to empirical reliability, highlighting specific structural dependencies that standard performance metrics overlook.


[38] 2602.17769

MusicSem: A Semantically Rich Language--Audio Dataset of Natural Music Descriptions

Music representation learning is central to music information retrieval and generation. While recent advances in multimodal learning have improved alignment between text and audio for tasks such as cross-modal music retrieval, text-to-music generation, and music-to-text generation, existing models often struggle to capture users' expressed intent in natural language descriptions of music. This observation suggests that the datasets used to train and evaluate these models do not fully reflect the broader and more natural forms of human discourse through which music is described. In this paper, we introduce MusicSem, a dataset of 32,493 language-audio pairs derived from organic music-related discussions on the social media platform Reddit. Compared to existing datasets, MusicSem captures a broader spectrum of musical semantics, reflecting how listeners naturally describe music in nuanced and human-centered ways. To structure these expressions, we propose a taxonomy of five semantic categories: descriptive, atmospheric, situational, metadata-related, and contextual. In addition to the construction, analysis, and release of MusicSem, we use the dataset to evaluate a wide range of multimodal models for retrieval and generation, highlighting the importance of modeling fine-grained semantics. Overall, MusicSem serves as a novel semantics-aware resource to support future research on human-aligned multimodal music representation learning.


[39] 2602.17793

LGD-Net: Latent-Guided Dual-Stream Network for HER2 Scoring with Task-Specific Domain Knowledge

It is a critical task to evalaute HER2 expression level accurately for breast cancer evaluation and targeted treatment therapy selection. However, the standard multi-step Immunohistochemistry (IHC) staining is resource-intensive, expensive, and time-consuming, which is also often unavailable in many areas. Consequently, predicting HER2 levels directly from H&E slides has emerged as a potential alternative solution. It has been shown to be effective to use virtual IHC images from H&E images for automatic HER2 scoring. However, the pixel-level virtual staining methods are computationally expensive and prone to reconstruction artifacts that can propagate diagnostic errors. To address these limitations, we propose the Latent-Guided Dual-Stream Network (LGD-Net), a novel framework that employes cross-modal feature hallucination instead of explicit pixel-level image generation. LGD-Net learns to map morphological H&E features directly to the molecular latent space, guided by a teacher IHC encoder during training. To ensure the hallucinated features capture clinically relevant phenotypes, we explicitly regularize the model training with task-specific domain knowledge, specifically nuclei distribution and membrane staining intensity, via lightweight auxiliary regularization tasks. Extensive experiments on the public BCI dataset demonstrate that LGD-Net achieves state-of-the-art performance, significantly outperforming baseline methods while enabling efficient inference using single-modality H&E inputs.


[40] 2602.17929

ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

Vision Transformers rely on positional embeddings and class tokens that encode fixed spatial priors. While effective for natural images, these priors may hinder generalization when spatial layout is weakly informative or inconsistent, a frequent condition in medical imaging and edge-deployed clinical systems. We introduce ZACH-ViT (Zero-token Adaptive Compact Hierarchical Vision Transformer), a compact Vision Transformer that removes both positional embeddings and the [CLS] token, achieving permutation invariance through global average pooling over patch representations. The term "Zero-token" specifically refers to removing the dedicated [CLS] aggregation token and positional embeddings; patch tokens remain unchanged and are processed normally. Adaptive residual projections preserve training stability in compact configurations while maintaining a strict parameter budget. Evaluation is performed across seven MedMNIST datasets spanning binary and multi-class tasks under a strict few-shot protocol (50 samples per class, fixed hyperparameters, five random seeds). The empirical analysis demonstrates regime-dependent behavior: ZACH-ViT (0.25M parameters, trained from scratch) achieves its strongest advantage on BloodMNIST and remains competitive with TransMIL on PathMNIST, while its relative advantage decreases on datasets with strong anatomical priors (OCTMNIST, OrganAMNIST), consistent with the architectural hypothesis. These findings support the view that aligning architectural inductive bias with data structure can be more important than pursuing universal benchmark dominance. Despite its minimal size and lack of pretraining, ZACH-ViT achieves competitive performance while maintaining sub-second inference times, supporting deployment in resource-constrained clinical environments. Code and models are available at this https URL.


[41] 2602.17975

Generating adversarial inputs for a graph neural network model of AC power flow

This work formulates and solves optimization problems to generate input points that yield high errors between a neural network's predicted AC power flow solution and solutions to the AC power flow equations. We demonstrate this capability on an instance of the CANOS-PF graph neural network model, as implemented by the PF$\Delta$ benchmark library, operating on a 14-bus test grid. Generated adversarial points yield errors as large as 3.4 per-unit in reactive power and 0.08 per-unit in voltage magnitude. When minimizing the perturbation from a training point necessary to satisfy adversarial constraints, we find that the constraints can be met with as little as an 0.04 per-unit perturbation in voltage magnitude on a single bus. This work motivates the development of rigorous verification and robust training methods for neural network surrogate models of AC power flow.


[42] 2602.17998

PHAST: Port-Hamiltonian Architecture for Structured Temporal Dynamics Forecasting

Real physical systems are dissipative -- a pendulum slows, a circuit loses charge to heat -- and forecasting their dynamics from partial observations is a central challenge in scientific machine learning. We address the \emph{position-only} (q-only) problem: given only generalized positions~$q_t$ at discrete times (momenta~$p_t$ latent), learn a structured model that (a)~produces stable long-horizon forecasts and (b)~recovers physically meaningful parameters when sufficient structure is provided. The port-Hamiltonian framework makes the conservative-dissipative split explicit via $\dot{x}=(J-R)\nabla H(x)$, guaranteeing $dH/dt\le 0$ when $R\succeq 0$. We introduce \textbf{PHAST} (Port-Hamiltonian Architecture for Structured Temporal dynamics), which decomposes the Hamiltonian into potential~$V(q)$, mass~$M(q)$, and damping~$D(q)$ across three knowledge regimes (KNOWN, PARTIAL, UNKNOWN), uses efficient low-rank PSD/SPD parameterizations, and advances dynamics with Strang splitting. Across thirteen q-only benchmarks spanning mechanical, electrical, molecular, thermal, gravitational, and ecological systems, PHAST achieves the best long-horizon forecasting among competitive baselines and enables physically meaningful parameter recovery when the regime provides sufficient anchors. We show that identification is fundamentally ill-posed without such anchors (gauge freedom), motivating a two-axis evaluation that separates forecasting stability from identifiability.


[43] 2602.18014

Quasi-Periodic Gaussian Process Predictive Iterative Learning Control

Repetitive motion tasks are common in robotics, but performance can degrade over time due to environmental changes and robot wear and tear. Iterative learning control (ILC) improves performance by using information from previous iterations to compensate for expected errors in future iterations. This work incorporates the use of Quasi-Periodic Gaussian Processes (QPGPs) into a predictive ILC framework to model and forecast disturbances and drift across iterations. Using a recent structural equation formulation of QPGPs, the proposed approach enables efficient inference with complexity $\mathcal{O}(p^3)$ instead of $\mathcal{O}(i^2p^3)$, where $p$ denotes the number of points within an iteration and $i$ represents the total number of iterations, specially for larger $i$. This formulation also enables parameter estimation without loss of information, making continual GP learning computationally feasible within the control loop. By predicting next-iteration error profiles rather than relying only on past errors, the controller achieves faster convergence and maintains this under time-varying disturbances. We benchmark the method against both standard ILC and conventional Gaussian Process (GP)-based predictive ILC on three tasks, autonomous vehicle trajectory tracking, a three-link robotic manipulator, and a real-world Stretch robot experiment. Across all cases, the proposed approach converges faster and remains robust under injected and natural disturbances while reducing computational cost. This highlights its practicality across a range of repetitive dynamical systems.


[44] 2602.18058

Probabilistic Methods for Initial Orbit Determination and Orbit Determination in Cislunar Space

In orbital mechanics, Gauss's method for orbit determination (OD) is a popular, minimal assumption solution for obtaining the initial state estimate of a passing resident space object (RSO). Since much of the cislunar domain relies on three-body dynamics, a key assumption of Gauss's method is rendered incompatible, creating a need for a new, minimal assumption method for initial orbit determination (IOD). In this work, we present a framework for short and long term probabilistic target tracking in cislunar space which produces an initial state estimate with as few assumptions as possible. Specifically, we propose an IOD method involving the kinematic fitting of several series of noisy, consecutive ground-based observations. Once a probabilistic initial state estimate in the form of a particle cloud is formed, we apply the powerful Particle Gaussian Mixture (PGM) Filter to reduce the uncertainty of our state estimate over time. This combined IOD/OD framework is demonstrated for several classes of trajectories in cislunar space and compared to better-known filtering frameworks.


[45] 2602.18104

MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows

In voice conversion (VC) applications, diffusion and flow-matching models have exhibited exceptional speech quality and speaker similarity performances. However, they are limited by slow conversion owing to their iterative inference. Consequently, we propose MeanVoiceFlow, a novel one-step nonparallel VC model based on mean flows, which can be trained from scratch without requiring pretraining or distillation. Unlike conventional flow matching that uses instantaneous velocity, mean flows employ average velocity to more accurately compute the time integral along the inference path in a single step. However, training the average velocity requires its derivative to compute the target velocity, which can cause instability. Therefore, we introduce a structural margin reconstruction loss as a zero-input constraint, which moderately regularizes the input-output behavior of the model without harmful statistical averaging. Furthermore, we propose conditional diffused-input training in which a mixture of noise and source data is used as input to the model during both training and inference. This enables the model to effectively leverage source information while maintaining consistency between training and inference. Experimental results validate the effectiveness of these techniques and demonstrate that MeanVoiceFlow achieves performance comparable to that of previous multi-step and distillation-based models, even when trained from scratch. Audio samples are available at this https URL.


[46] 2602.18109

TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs

Real-time schedulers must reason about tight deadlines under strict compute budgets. We present TempoNet, a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation. An Urgency Tokenizer discretizes temporal slack into learnable embeddings, stabilizing value learning and capturing deadline proximity. A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets with near-linear scaling and sub-millisecond inference. A multicore mapping layer converts contextualized Q-scores into processor assignments through masked-greedy selection or differentiable matching. Extensive evaluations on industrial mixed-criticality traces and large multiprocessor settings show consistent gains in deadline fulfillment over analytic schedulers and neural baselines, together with improved optimization stability. Diagnostics include sensitivity analyses for slack quantization, attention-driven policy interpretation, hardware-in-the-loop and kernel micro-benchmarks, and robustness under stress with simple runtime mitigations; we also report sample-efficiency benefits from behavioral-cloning pretraining and compatibility with an actor-critic variant without altering the inference pipeline. These results establish a practical framework for Transformer-based decision making in high-throughput real-time scheduling.


[47] 2602.18165

Uncertainty-Aware Jamming Mitigation with Active RIS: A Robust Stackelberg Game Approach

Malicious jamming presents a pervasive threat to the secure communications, where the challenge becomes increasingly severe due to the growing capability of the jammer allowing the adaptation to legitimate transmissions. This paper investigates the jamming mitigation by leveraging an active reconfigurable intelligent surface (ARIS), where the channel uncertainties are particularly addressed for robust anti-jamming design. Towards this issue, we adopt the Stackelberg game formulation to model the strategic interaction between the legitimate side and the adversary, acting as the leader and follower, respectively. We prove the existence of the game equilibrium and adopt the backward induction method for equilibrium analysis. We first derive the optimal jamming policy as the follower's best response, which is then incorporated into the legitimate-side optimization for robust anti-jamming design. We address the uncertainty issue and reformulate the legitimate-side problem by exploiting the error bounds to combat the worst-case jamming attacks. The problem is decomposed within a block successive upper bound minimization (BSUM) framework to tackle the power allocation, transceiving beamforming, and active reflection, respectively, which are iterated towards the robust jamming mitigation scheme. Simulation results are provided to demonstrate the effectiveness of the proposed scheme in protecting the legitimate transmissions under uncertainties, and the superior performance in terms of jamming mitigation as compared with the baselines.


[48] 2602.18386

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Pure Pursuit (PP) is widely used in autonomous racing for real-time path tracking due to its efficiency and geometric clarity, yet performance is highly sensitive to how key parameters-lookahead distance and steering gain-are chosen. Standard velocity-based schedules adjust these only approximately and often fail to transfer across tracks and speed profiles. We propose a reinforcement-learning (RL) approach that jointly chooses the lookahead Ld and a steering gain g online using Proximal Policy Optimization (PPO). The policy observes compact state features (speed and curvature taps) and outputs (Ld, g) at each control step. Trained in F1TENTH Gym and deployed in a ROS 2 stack, the policy drives PP directly (with light smoothing) and requires no per-map retuning. Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, and it also exceeds a kinematic MPC raceline tracker under our evaluated settings in lap time, path-tracking accuracy, and steering smoothness, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.


[49] 2602.18396

PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing

We propose PRISM-FCP (Partial shaRing and robust calIbration with Statistical Margins for Federated Conformal Prediction), a Byzantine-resilient federated conformal prediction framework that utilizes partial model sharing to improve robustness against Byzantine attacks during both model training and conformal calibration. Existing approaches address adversarial behavior only in the calibration stage, leaving the learned model susceptible to poisoned updates. In contrast, PRISM-FCP mitigates attacks end-to-end. During training, clients partially share updates by transmitting only $M$ of $D$ parameters per round. This attenuates the expected energy of an adversary's perturbation in the aggregated update by a factor of $M/D$, yielding lower mean-square error (MSE) and tighter prediction intervals. During calibration, clients convert nonconformity scores into characterization vectors, compute distance-based maliciousness scores, and downweight or filter suspected Byzantine contributions before estimating the conformal quantile. Extensive experiments on both synthetic data and the UCI Superconductivity dataset demonstrate that PRISM-FCP maintains nominal coverage guarantees under Byzantine attacks while avoiding the interval inflation observed in standard FCP with reduced communication, providing a robust and communication-efficient approach to federated uncertainty quantification.


[50] 2602.18428

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.


[51] 2310.09450

A Non-intrusive Decentralized Approach to Stabilizing IBR-dominated AC Microgrids

This paper presents a non-intrusive, decentralized approach that stabilizes AC microgrids dominated by inverter-based resources (IBRs). By "non-intrusive" we mean that the approach does not require reprogramming IBRs' controllers to stabilize the microgrids. "Decentralized" is in the sense that the approach stabilizes the microgrids without communication among IBRs. Implementing the approach only requires very minimal information of IBR dynamics, i.e., the L2 gain of an IBR, and sharing such information with the non-IBR-manufacturer parties does not cause any concerns on intellectual property privacy. The approach allows for plug-and-play operation of IBRs, while maintaining microgrid stability. The proposed approach is tested by simulating 2-IBR and 10-IBR microgrids where lines and IBRs are modeled in the electromagnetic transient time scale. Simulations show that oscillations with increasing amplitudes may occur, when two stable microgrids are networked. Simulations also suggest that the proposed approach can mitigate such a system-level symptom.


[52] 2412.06327

Robust Output Tracking for Induced Seismicity Mitigation in Underground Reservoirs Governed by a Nonlinear 3D PDE-ODE System

This paper presents a robust output-feedback controller for induced seismicity mitigation in geological reservoirs described by a coupled 3D PDE-ODE model. The controller is a MIMO Super-Twisting design, producing a continuous control signal and requiring minimal model information, while accommodating parameter uncertainty and spatial heterogeneity. Two operational outputs are regulated simultaneously: regional pressures and seismicity rates computed over reservoir sub-regions. Closed-loop properties are established via explicit bounds on the solution and its time derivative for both the infinite-dimensional dynamics and the nonlinear ODE system, yielding finite-time or exponential convergence of the tracking errors. The method is evaluated on a Groningen gas-field case study in two scenarios: gas production while not exceeding the intrinsic seismicity of the region, and combined production with CO$_2$ injection toward net-zero operation. Simulations demonstrate accurate tracking of pressure and seismicity targets across regions under significant parameter uncertainty, supporting safer reservoir operation without sacrificing production objectives.


[53] 2501.06945

OpenGERT: Open Source Automated Geometry Extraction with Geometric and Electromagnetic Sensitivity Analyses for Ray-Tracing Propagation Models

Accurate RF propagation modeling in urban environments is critical for developing digital spectrum twins and optimizing wireless communication systems. We introduce OpenGERT, an open-source automated Geometry Extraction tool for Ray Tracing, which collects and processes terrain and building data from OpenStreetMap, Microsoft Global ML Building Footprints, and USGS elevation data. Using the Blender Python API, it creates detailed urban models for high-fidelity simulations with NVIDIA Sionna RT. We perform sensitivity analyses to examine how variations in building height, position, and electromagnetic material properties affect ray-tracing accuracy. Specifically, we present pairwise dispersion plots of channel statistics (path gain, mean excess delay, delay spread, link outage, and Rician K-factor) and investigate how their sensitivities change with distance from transmitters. We also visualize the variance of these statistics for selected transmitter locations to gain deeper insights. Our study covers Munich and Etoile scenes, each with 10 transmitter locations. For each location, we apply five types of perturbations: material, position, height, height-position, and all combined, with 50 perturbations each. Results show that small changes in permittivity and conductivity minimally affect channel statistics, whereas variations in building height and position significantly alter all statistics, even with noise standard deviations of 1 meter in height and 0.4 meters in position. These findings highlight the importance of precise environmental modeling for accurate propagation predictions, essential for digital spectrum twins and advanced communication networks. The code for geometry extraction and sensitivity analyses is available at this http URL.


[54] 2505.12664

Multi-View Wireless Sensing via Conditional Generative Learning: Framework and Model Design

In this paper, we incorporate physical knowledge into learning-based high-precision target sensing using the multi-view channel state information (CSI) between multiple base stations (BSs) and user equipment (UEs). Such kind of multi-view sensing problem can be naturally cast into a conditional generation framework. To this end, we design a bipartite neural network architecture, the first part of which uses an elaborately designed encoder to fuse the latent target features embedded in the multi-view CSI, and then the second uses them as conditioning inputs of a powerful generative model to guide the target's reconstruction. Specifically, the encoder is designed to capture the physical correlation between the CSI and the target, and also be adaptive to the numbers and positions of BS-UE pairs. Therein the view-specific nature of CSI is assimilated by introducing a spatial positional embedding scheme, which exploits the structure of electromagnetic(EM)-wave propagation channels. Finally, a conditional diffusion model with a weighted loss is employed to generate the target's point cloud from the fused features. Extensive numerical results demonstrate that the proposed generative multi-view (Gen-MV) sensing framework exhibits excellent flexibility and significant performance improvement on the reconstruction quality of target's shape and EM properties.


[55] 2507.11551

Landmark Detection for Medical Images using a General-purpose Segmentation Model

Radiographic images are a cornerstone of medical diagnostics in orthopaedics, with anatomical landmark detection serving as a crucial intermediate step for information extraction. General-purpose foundational segmentation models, such as SAM (Segment Anything Model), do not support landmark segmentation out of the box and require prompts to function. However, in medical imaging, the prompts for landmarks are highly specific. Since SAM has not been trained to recognize such landmarks, it cannot generate accurate landmark segmentations for diagnostic purposes. Even MedSAM, a medically adapted variant of SAM, has been trained to identify larger anatomical structures, such as organs and their parts, and lacks the fine-grained precision required for orthopaedic pelvic landmarks. To address this limitation, we propose leveraging another general-purpose, non-foundational model: YOLO. YOLO excels in object detection and can provide bounding boxes that serve as input prompts for SAM. While YOLO is efficient at detection, it is significantly outperformed by SAM in segmenting complex structures. In combination, these two models form a reliable pipeline capable of segmenting not only a small pilot set of eight anatomical landmarks but also an expanded set of 72 landmarks and 16 regions with complex outlines, such as the femoral cortical bone and the pelvic inlet. By using YOLO-generated bounding boxes to guide SAM, we trained the hybrid model to accurately segment orthopaedic pelvic radiographs. Our results show that the proposed combination of YOLO and SAM yields excellent performance in detecting anatomical landmarks and intricate outlines in orthopaedic pelvic radiographs.


[56] 2509.00479

A Novel Method to Determine Total Oxidant Concentration Produced by Non-Thermal Plasma Based on Image Processing and Machine Learning

Accurate determination of total oxidant concentration [Ox]tot in nonthermal plasma treated aqueous systems remains a critical challenge due to the transient nature of reactive oxygen and nitrogen species and the subjectivity of conventional titration methods used for [Ox]tot determination. This study introduces a color based computer analysis method that integrates advanced image processing with machine learning to quantify colorimetric changes in potassium iodide solutions during oxidation. A custom built visual acquisition system recorded high resolution video of the color transitions occurring during plasma treatment while the change in oxidant concentration was simultaneously monitored using a standard titrimetric method. Extracted image frames were processed through a structured pipeline to obtain RGB, HSV, and Lab color features. Statistical analysis revealed strong linear relationships between selected color features and measured oxidant concentrations, particularly for HSV saturation, Lab a and b channels, and the blue component of RGB. These features were subsequently used to train and validate multiple machine learning models including linear regression, ridge regression, random forest, gradient boosting, and neural networks. Linear regression and gradient boosting demonstrated the highest predictive accuracy with R2 values exceeding 0.99. Dimensionality reduction from nine features to smaller feature subsets preserved predictive performance while improving computational efficiency. Comparison with experimental titration measurements showed that the proposed system predicts total oxidant concentration in potassium iodide solution with very high accuracy, achieving R2 values above 0.998 even under reduced feature conditions.


[57] 2509.01929

Binaural Unmasking in Practical Use: Perceived Level of Phase-inverted Speech in Environmental Noise

We aim to develop a technology that makes the sound from earphones and headphones easier to hear without increasing the sound pressure or eliminating ambient noise. To this end, we focus on harnessing the phenomenon of binaural unmasking through phase reversal in one ear. Specifically, we conduct experiments to evaluate the improvement of audibility caused by the phenomenon, using conditions that approximate practical scenarios. We use speech sounds by various speakers and noises that can be encountered in daily life (urban environmental sounds, cheers) to verify the effects of binaural unmasking under conditions close to practical situations. The results of experiments using the Japanese language showed that (i) speech in a noisy environment is perceived to be up to about 6 dB louder with phase reversal in one ear, and (ii) a certain effect (improvement of audibility by 5 dB or more) is obtained for all speakers and noises targeted in this study. These findings demonstrate the effectiveness of binaural unmasking attributed to interaural phase differences in practical scenarios.


[58] 2509.04055

Constellation Shaping for OFDM-ISAC Systems: From Theoretical Bounds to Practical Implementation

Integrated sensing and communications (ISAC) promises new use cases for mobile communication systems by reusing the communication signal for radar-like sensing. However, sensing and communications (S&C) impose conflicting requirements on the modulation format, resulting in a tradeoff between their corresponding performance. This paper investigates constellation shaping as a means to simultaneously improve S&C performance in orthogonal frequency division multiplexing (OFDM)-based ISAC systems. We begin by deriving how the transmit symbols affect detection performance and derive theoretical lower and upper bounds on the maximum achievable information rate under a given sensing constraint. Using an autoencoder-based optimization, we investigate geometric, probabilistic, and joint constellation shaping, where joint shaping combines both approaches, employing both optimal maximum a-posteriori decoding and practical bit-metric decoding. Our results show that constellation shaping enables a flexible trade-off between S&C, can approach the derived upper bound, and significantly outperforms conventional modulation formats. Motivated by its practical implementation feasibility, we review probabilistic amplitude shaping (PAS) and propose a generalization tailored to ISAC. For this generalization, we propose a low-complexity log-likelihood ratio computation with negligible rate loss. We demonstrate that combining conventional and generalized PAS enables a flexible and low-complexity tradeoff between S&C, closely approaching the performance of joint constellation shaping.


[59] 2509.12253

Physics-Informed Neural Networks vs. Physics Models for Non-Invasive Glucose Monitoring: A Comparative Study Under Noise-Stressed Synthetic Conditions

Non-invasive glucose monitoring outside controlled settings is dominated by low signal-to-noise ratio (SNR): hardware drift, environmental variation, and physiology suppress the glucose signature in NIR signals. We present a noise-stressed NIR simulator that injects 12-bit ADC quantisation, LED drift, photodiode dark noise, temperature/humidity variation, contact-pressure noise, Fitzpatrick I-VI melanin, and glucose variability to create a low-correlation regime (rho_glucose-NIR = 0.21). Using this platform, we benchmark six methods: Enhanced Beer-Lambert (physics-engineered ridge regression), Original PINN, Optimised PINN, RTE-inspired PINN, Selective RTE PINN, and a shallow DNN. The physics-engineered Beer Lambert model achieves the lowest error (13.6 mg/dL RMSE) with only 56 parameters and 0.01 ms inference, outperforming deeper PINNs and the SDNN baseline under low-SNR conditions. The study reframes the task as noise suppression under weak signal and shows that carefully engineered physics features can outperform higher-capacity models in this regime.


[60] 2510.03070

Eigenvalue Tracking of Large-Scale Systems Impacted by Time Delays

The paper focuses on tracking eigenvalue trajectories in power system models with time delays. We formulate a continuation-based approach that employs numerical integration to follow eigenvalues as system parameters vary, in the presence of one or multiple delayed variables. The formulation is compatible with sparse delay differential-algebraic equation (DDAE) formulations of the system model and allows treating the delay magnitude itself as a varying parameter with implementation aspects discussed in detail. The proposed approach is illustrated on a modified IEEE 39-bus system, as well as on a real-world-scale dynamic model of the Irish transmission network.


[61] 2510.06170

Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2

Smartphone-based iris recognition in the visible spectrum (VIS) remains difficult due to illumination variability, pigmentation differences, and the absence of standardized capture controls. This work presents a compact end-to-end pipeline that enforces ISO/IEC 29794-6 quality compliance at acquisition and demonstrates that accurate VIS iris recognition is feasible on commodity devices. Using a custom Android application performing real-time framing, sharpness evaluation, and feedback, we introduce the CUVIRIS dataset of 752 compliant images from 47 subjects. A lightweight MobileNetV3-based multi-task segmentation network (LightIrisNet) is developed for efficient on-device processing, and a transformer matcher (IrisFormer) is adapted to the VIS domain. Under a standardized protocol and comparative benchmarking against prior CNN baselines, OSIRIS attains a TAR of 97.9% at FAR=0.01 (EER=0.76%), while IrisFormer, trained only on UBIRIS.v2, achieves an EER of 0.057% on CUVIRIS. The acquisition app, trained models, and a public subset of the dataset are released to support reproducibility. These results confirm that standardized capture and VIS-adapted lightweight models enable accurate and practical iris recognition on smartphones.


[62] 2510.13887

Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

Incomplete multi-view data, where certain views are entirely missing for some samples, poses significant challenges for traditional multi-view clustering methods. Existing deep incomplete multi-view clustering approaches often rely on static fusion strategies or two-stage pipelines, leading to suboptimal fusion results and error propagation issues. To address these limitations, this paper proposes a novel incomplete multi-view clustering framework based on Hierarchical Semantic Alignment and Cooperative Completion (HSACC). HSACC achieves robust cross-view fusion through a dual-level semantic space design. In the low-level semantic space, consistency alignment is ensured by maximizing mutual information across views. In the high-level semantic space, adaptive view weights are dynamically assigned based on the distributional affinity between individual views and an initial fused representation, followed by weighted fusion to generate a unified global representation. Additionally, HSACC implicitly recovers missing views by projecting aligned latent representations into high-dimensional semantic spaces and jointly optimizes reconstruction and clustering objectives, enabling cooperative learning of completion and clustering. Experimental results demonstrate that HSACC significantly outperforms state-of-the-art methods on five benchmark datasets. Ablation studies validate the effectiveness of the hierarchical alignment and dynamic weighting mechanisms, while parameter analysis confirms the model's robustness to hyperparameter variations. The code is available at this https URL.


[63] 2512.17112

Kalman Filter-based Mobile User-RIS Channel Estimation and User Localization

In communication networks, channel estimation and user localization are challenging problems in harsh environments or signal-blocked areas. This paper introduces a novel approach to minimize the Mean Squared Error (MSE) in channel estimation between mobile users and rectangular Reconfigurable Intelligent Surfaces (RIS) within wireless communication systems. Meanwhile, the user localization is realized based on the estimated Channel State Information (CSI). In this paper, we assume a non-linear, user's position-dependent system model, for a user with high mobility, an RIS with multiple elements, and a base station (BS) with multiple antennas. After that, we apply the Kalman Filtering (KF) like algorithms to reduce MSE in estimating parameters of this time-variant channel model. Additionally, we propose a Non-Circular Noise Kalman Filter (NCNKF) to address scenarios with non-circular complex state-space noise. Furthermore, we apply the Discrete Space Fourier Transform (DSFT) method, combined with the interpolation techniques to decrease the Root Mean Squared Error (RMSE) for the user localization based on the estimated CSI. Finally, we extend the single-user case into the multi-user situation. Results show that KF can achieve lower MSE in estimating the channel than other known approaches, while the NCNKF algorithm has better performance in non-circular state-space noise scenarios. At the same time, the DSFT interpolation outperforms the other approaches with lower complexity. The study concludes with numerical comparisons and an in-depth discussion of the performance improvements enabled by our approaches.


[64] 2602.05208

Context-Aware Asymmetric Ensembling for Interpretable Retinopathy of Prematurity Screening via Active Query and Vascular Attention

Retinopathy of Prematurity (ROP) is among the major causes of preventable childhood blindness. Automated screening remains challenging, primarily due to limited data availability and the complex condition involving both structural staging and microvascular abnormalities. Current deep learning models depend heavily on large private datasets and passive multimodal fusion, which commonly fail to generalize on small, imbalanced public cohorts. We thus propose the Context-Aware Asymmetric Ensemble Model (CAA Ensemble) that simulates clinical reasoning through two specialized streams. First, the Multi-Scale Active Query Network (MS-AQNet) serves as a structure specialist, utilizing clinical contexts as dynamic query vectors to spatially control visual feature extraction for localization of the fibrovascular ridge. Secondly, VascuMIL encodes Vascular Topology Maps (VMAP) within a gated Multiple Instance Learning (MIL) network to precisely identify vascular tortuosity. A synergistic meta-learner ensembles these orthogonal signals to resolve diagnostic discordance across multiple objectives. Tested on a highly imbalanced cohort of 188 infants (6,004 images), the framework attained State-of-the-Art performance on two distinct clinical tasks: achieving a Macro F1-Score of 0.93 for Broad ROP staging and an AUC of 0.996 for Plus Disease detection. Crucially, the system features `Glass Box' transparency through counterfactual attention heatmaps and vascular threat maps, proving that clinical metadata dictates the model's visual search. Additionally, this study demonstrates that architectural inductive bias can serve as an effective bridge for the medical AI data gap.


[65] 2602.14612

LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio

Long-duration audio is increasingly common in industrial and consumer settings, yet reviewing multi-hour recordings is impractical, motivating systems that answer natural-language queries with precise temporal grounding and minimal hallucination. Existing audio-language models show promise, but long-audio question answering remains difficult due to context-length limits. We introduce LongAudio-RAG (LA-RAG), a hybrid framework that grounds Large Language Model (LLM) outputs in retrieved, timestamped acoustic event detections rather than raw audio. Multi-hour streams are converted into structured event records stored in an SQL database, and at inference time the system resolves natural-language time references, classifies intent, retrieves only the relevant events, and generates answers using this constrained evidence. To evaluate performance, we construct a synthetic long-audio benchmark by concatenating recordings with preserved timestamps and generating template-based question-answer pairs for detection, counting, and summarization tasks. Finally, we demonstrate the practicality of our approach by deploying it in a hybrid edge-cloud environment, where the audio grounding model runs on-device on IoT-class hardware while the LLM is hosted on a GPU-backed server. This architecture enables low-latency event extraction at the edge and high-quality language reasoning in the cloud. Experiments show that structured, event-level retrieval significantly improves accuracy compared to vanilla Retrieval-Augmented Generation (RAG) or text-to-SQL approaches.


[66] 2602.16257

SeaSpoofFinder -- Potential GNSS Spoofing Event Detection Using AIS

This paper investigates whether large-scale GNSS spoofing activity can be inferred from maritime Automatic Identification System (AIS) position reports. A data-processing framework, called SeaSpoofFinder, available here: this http URL, was developed to ingest and post-process global AIS streams and to detect candidate anomalies through a two-stage procedure. In Stage 1, implausible position jumps are identified using kinematic and data-quality filters; in Stage 2, events are retained only when multiple vessels exhibit spatially consistent source and target clustering, thereby reducing false positives from single-vessel artifacts. The resulting final potential spoofing events (FPSEs) reveal recurrent patterns in several regions, including the Baltic Sea, the Black Sea, Murmansk, Moscow, and the Haifa area, with affected footprints that can span large maritime areas. The analysis also highlights recurring non-spoofing artifacts (e.g., back-to-port jumps and data gaps) that can still pass heuristic filters in dense traffic regions. These results indicate that AIS-based monitoring can provide useful evidence for identifying and characterizing potential spoofing activity at scale, while emphasizing that AIS-only evidence does not provide definitive attribution.


[67] 2405.12535

PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation

In this paper, we study policy evaluation in continuous-time reinforcement learning (RL), where the state follows an unknown stochastic differential equation (SDE), but only discrete-time data are available. We first highlight that the discrete-time Bellman equation (BE) is not always a reliable approximation to the true value function because it ignores the underlying continuous-time structure. We then introduce a new Bellman equation, PhiBE, which integrates the discrete-time information into a continuous-time PDE formulation. By leveraging the smooth structure of the underlying dynamics, PhiBE provides a provably more accurate approximation to the true value function, especially in scenarios where the underlying dynamics change slowly or the reward oscillates. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations. We further develop a model-free algorithm for PhiBE under linear function approximation and establish its convergence under model misspecification, together with finite-sample guarantees. In contrast to existing continuous-time RL analyses, where the model misspecification error diverges as the sampling interval $\Delta t\to 0$ and the sample complexity typically scales as $O(\Delta t^{-4})$, our misspecification error is independent of $\Delta t$ and the resulting sample complexity improves to $O(\Delta t^{-1})$ by exploiting the smoothness of the underlying dynamics. Moreover, we identify a fundamental trade-off between discretization error and sample error that is intrinsic to continuous-time policy evaluation: finer time discretization reduces bias but amplifies variance, so excessively frequent sampling does not necessarily improve performance. This is an insight that does not arise in classical discrete-time RL analyses. Numerical experiments are provided to validate the theoretical guarantees we propose.


[68] 2505.14184

VaN3Twin: the Multi-Technology V2X Digital Twin with Ray-Tracing in the Loop

This paper presents VaN3Twin-the first open-source, full-stack Network Digital Twin (NDT) framework for simulating the coexistence of multiple Vehicle-to-Everything (V2X) communication technologies with accurate physical-layer modeling via ray tracing. VaN3Twin extends the ms-van3t simulator by integrating Sionna Ray Tracer (RT) in the loop, enabling high-fidelity representation of wireless propagation, including diverse Line-of-Sight (LoS) conditions with focus on LoS blockage due to other vehicles' meshes, Doppler effect, and site-dependent effects-e.g., scattering and diffraction. Unlike conventional simulation tools, the proposed framework supports realistic coexistence analysis across DSRC and C-V2X technologies operating over shared spectrum. A dedicated interference tracking module captures cross-technology interference at the time-frequency resource block level and enhances signal-to-interference-plus-noise ratio (SINR) estimation by eliminating artifacts such as the bimodal behavior induced by separate LoS/NLoS propagation models. Compared to field measurements, VaN3Twin reduces application-layer disagreement by 50% in rural and over 70% in urban environments with respect to current state-of-the-art simulation tools, demonstrating its value for scalable and accurate digital twin-based V2X coexistence simulation.


[69] 2510.00463

On the Adversarial Robustness of Learning-based Conformal Novelty Detection

This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on two powerful learning-based frameworks that come with finite-sample false discovery rate (FDR) control: one is AdaDetect (by Marandon et al., 2024) that is based on the positive-unlabeled classifier, and the other is a one-class classifier-based approach (by Bates et al., 2023). While they provide rigorous statistical guarantees under benign conditions, their behavior under adversarial perturbations remains underexplored. We first formulate an oracle attack setup, under the AdaDetect formulation, that quantifies the worst-case degradation of FDR, deriving an upper bound that characterizes the statistical cost of attacks. This idealized formulation directly motivates a practical and effective attack scheme that only requires query access to the output labels of both frameworks. Coupling these formulations with two popular and complementary black-box adversarial algorithms, we systematically evaluate the vulnerability of both frameworks on synthetic and real-world datasets. Our results show that adversarial perturbations can significantly increase the FDR while maintaining high detection power, exposing fundamental limitations of current error-controlled novelty detection methods and motivating the development of more robust alternatives.


[70] 2510.01675

Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances

This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.


[71] 2511.18554

Online Smoothed Demand Management

We introduce and study a class of online problems called online smoothed demand management $(\texttt{OSDM})$, motivated by paradigm shifts in grid integration and energy storage for large energy consumers such as data centers. In $\texttt{OSDM}$, an operator makes two decisions at each time step: an amount of energy to be purchased, and an amount of energy to be delivered (i.e., used for computation). The difference between these decisions charges (or discharges) the operator's energy storage (e.g., a battery). Two types of demand arrive online: base demand, which must be covered at the current time, and flexible demand, which can be satisfied at any time before a demand-specific deadline $\Delta_t$. The operator's goal is to minimize a cost (subject to above constraints) that combines a cost of purchasing energy, a cost for delivering energy (if applicable), and smoothness penalties on the purchasing and delivery rates to discourage fluctuations and encourage ``grid healthy'' decisions. $\texttt{OSDM}$ generalizes several problems in the online algorithms literature while being the first to fully model applications of interest. We propose a competitive algorithm for $\texttt{OSDM}$ called $\texttt{PAAD}$ (partitioned accounting & aggregated decisions) and show it achieves the optimal competitive ratio. To overcome the pessimism typical of worst-case analysis, we also propose a novel learning framework that provides guarantees on the worst-case competitive ratio (i.e., to provide robustness against nonstationarity) while allowing end-to-end differentiable learning of the best algorithm on historical instances of the problem. We evaluate our algorithms in a case study of a grid-integrated data center with battery storage, showing that $\texttt{PAAD}$ effectively solves the problem and end-to-end learning achieves substantial performance improvements compared to $\texttt{PAAD}$.


[72] 2602.11488

When Audio-LLMs Don't Listen: A Cross-Linguistic Study of Modality Arbitration

When audio and text conflict, speech-enabled language models follow the text 10 times more often than when arbitrating between two text sources, even when explicitly instructed to trust the audio. Using ALME, a benchmark of 57,602 controlled audio-text conflict stimuli across 8 languages, we find that Gemini 2.0 Flash exhibits 16.6% text dominance under audio-text conflict versus 1.6% under text-text conflict with identical reliability cues. This gap is not explained by audio quality: audio-only accuracy (97.2%) exceeds cascade accuracy (93.9%), indicating audio embeddings preserve more information than text transcripts. We propose that text dominance reflects an asymmetry not in information content but in arbitration accessibility: how easily the model can reason over competing representations. This framework explains otherwise puzzling findings. Forcing transcription before answering increases text dominance (19% to 33%), sacrificing audio's information advantage without improving accessibility. Framing text as "deliberately corrupted" reduces text dominance by 80%. A fine-tuning ablation provides interventional evidence: training only the audio projection layer increases text dominance (+26.5%), while LoRA on the language model halves it ($-$23.9%), localizing text dominance to the LLM's reasoning rather than the audio encoder. Experiments across four state-of-the-art audio-LLMs and 8 languages show consistent trends with substantial cross-linguistic and cross-model variation, establishing modality arbitration as a distinct reliability dimension not captured by standard speech benchmarks.


[73] 2602.15749

A Generative-First Neural Audio Autoencoder

Neural autoencoders underpin generative models. Practical, large-scale use of neural autoencoders for generative modeling necessitates fast encoding, low latent rates, and a single model across representations. Existing approaches are reconstruction-first: they incur high latent rates, slow encoding, and separate architectures for discrete vs. continuous latents and for different audio channel formats, hindering workflows from preprocessing to inference conditioning. We introduce a generative-first architecture for audio autoencoding that increases temporal downsampling from 2048x to 3360x and supports continuous and discrete representations and common audio channel formats in one model. By balancing compression, quality, and speed, it delivers 10x faster encoding, 1.6x lower rates, and eliminates channel-format-specific variants while maintaining competitive reconstruction quality. This enables applications previously constrained by processing costs: a 60-second mono signal compresses to 788 tokens, making generative modeling more tractable.