New articles on Electrical Engineering and Systems Science


[1] 2602.09026

Operator-Based Information Theory for Imaging: Entropy, Capacity, and Irreversibility in Physical Measurement Systems

Imaging systems are commonly described using resolution, contrast, and signal-to-noise ratio, but these quantities do not provide a general account of how physical transformations affect the flow of information. This paper introduces an operator-based formulation of information theory for imaging. The approach models the imaging chain as a composition of bounded operators acting on functions, and characterises information redistribution using the spectral properties of these operators. Three measures are developed. Operator entropy quantifies how an operator distributes energy across its singular spectrum. Operator information capacity describes the number of modes that remain recoverable above a noise-dependent threshold. An irreversibility index measures the information lost through suppression or elimination of modes and captures the accumulation of information loss under operator composition. The framework applies to linear, nonlinear, and stochastic operators and does not depend on the specific imaging modality. Analytical examples show how attenuation, blur, and sampling affect entropy, capacity, and irreversibility in different ways. The results provide a general structure for analysing the physical limits of imaging and form the basis for subsequent work on information geometry, spatiotemporal budgets, nonlinear channels, and reconstruction algorithms.


[2] 2602.09035

E2CAR: An Efficient 2D-CNN Framework for Real-Time EEG Artifact Removal on Edge Devices

Electroencephalography (EEG) signals are frequently contaminated by artifacts, affecting the accuracy of subsequent analysis. Traditional artifact removal methods are often computationally expensive and inefficient for real-time applications in edge devices. This paper presents a method to reduce the computational cost of most existing convolutional neural networks (CNN) by replacing one-dimensional (1-D) CNNs with two-dimensional (2-D) CNNs and deploys them on Edge Tensor Processing Unit (TPU), which is an open-resource hardware accelerator widely used in edge devices for low-latency, low-power operation. A new Efficient 2D-CNN Artifact Removal (E2CAR) framework is also represented using the method above, and it achieves a 90\% reduction in inference time on the TPU and decreases power consumption by 18.98\%, while maintaining comparable artifact removal performance to existing methods. This approach facilitates efficient EEG signal processing on edge devices.


[3] 2602.09040

Soft Clustering Anchors for Self-Supervised Speech Representation Learning in Joint Embedding Prediction Architectures

Joint Embedding Predictive Architectures (JEPA) offer a promising approach to self-supervised speech representation learning, but suffer from representation collapse without explicit grounding. We propose GMM-Anchored JEPA, which fits a Gaussian Mixture Model once on log-mel spectrograms and uses its frozen soft posteriors as auxiliary targets throughout training. A decaying supervision schedule allows GMM regularization to dominate early training before gradually yielding to the JEPA objective. Unlike HuBERT and WavLM, which require iterative re-clustering, our approach clusters input features once with soft rather than hard assignments. On ~50k hours of speech, GMM anchoring improves ASR (28.68% vs. 33.22% WER), emotion recognition (67.76% vs. 65.46%), and slot filling (64.7% vs. 59.1% F1) compared to a WavLM-style baseline with matched compute. Cluster analysis shows GMM-anchored representations achieve up to 98% entropy compared to 31% for WavLM-style, indicating substantially more uniform cluster utilization. Code is made available at this https URL.


[4] 2602.09043

Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition

Self-supervised learning (SSL) has advanced speech processing but suffers from quadratic complexity due to self-attention. To address this, SummaryMixing (SM) has been proposed as a linear-time alternative that summarizes entire utterances using mean pooling but lacks sufficient local context. In this work, we introduce Windowed SummaryMixing (WSM), which enhances SM by integrating local neighborhood summaries alongside the global summary, maintaining efficiency while improving temporal dependencies. Additionally, we introduce a selective fine-tuning approach, replacing self-attention layers in SSL models with WSM blocks and fine-tuning only these blocks in low-resource settings. Our approach improves ASR performance while reducing peak VRAM usage by 40\% in the SSL models. WSM blocks have linear-time complexity with enhanced context awareness. Selectively replacing some attention layers reduces compute, memory, and latency, making it ideal for low-resource speech recognition.


[5] 2602.09044

Beyond the Utterance: An Empirical Study of Very Long Context Speech Recognition

Automatic speech recognition (ASR) models are normally trained to operate over single utterances, with a short duration of less than 30 seconds. This choice has been made in part due to computational constraints, but also reflects a common, but often inaccurate, modelling assumption that treats utterances as independent and identically distributed samples. When long-format audio recordings are available, to work with such systems, these recordings must first be segmented into short utterances and processed independently. In this work, we show that due to recent algorithmic and hardware advances, this is no longer necessary, and current attention-based approaches can be used to train ASR systems that operate on sequences of over an hour in length. Therefore, to gain a better understanding of the relationship between the training/evaluation sequence length and performance, we train ASR models on large-scale data using 10 different sequence lengths from 10 seconds up to 1 hour. The results show a benefit from using up to 21.8 minutes of context, with up to a 14.2% relative improvement from a short context baseline in our primary experiments. Through modifying various architectural components, we find that the method of encoding positional information and the model's width/depth are important factors when working with long sequences. Finally, a series of evaluations using synthetic data are constructed to help analyse the model's use of context. From these results, it is clear that both linguistic and acoustic aspects of the distant context are being used by the model.


[6] 2602.09050

SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy

High-speed optical-resolution photoacoustic microscopy (OR-PAM) with bidirectional scanning enables rapid functional brain imaging but introduces severe spatiotemporal misalignment from coupled scan-direction-dependent domain shift and geometric distortion. Conventional registration methods rely on brightness constancy, an assumption violated under bidirectional scanning, leading to unreliable alignment. A unified scene-appearance separation framework is proposed to jointly address domain shift and spatial misalignment. The proposed architecture separates domain-invariant scene content from domain-specific appearance characteristics, enabling cross-domain reconstruction with geometric preservation. A scene consistency loss promotes geometric correspondence in the latent space, linking domain shift correction with spatial registration within a single framework. For in vivo mouse brain vasculature imaging, the proposed method achieves normalized cross-correlation (NCC) of 0.961 and structural similarity index (SSIM) of 0.894, substantially outperforming conventional methods. Ablation studies demonstrate that domain alignment loss is critical, with its removal causing 82% NCC reduction (0.961 to 0.175), while scene consistency and cycle consistency losses provide complementary regularization for optimal performance. The method achieves 11.2 ms inference time per frame (86 fps), substantially exceeding typical OR-PAM acquisition rates and enabling real-time processing. These results suggest that the proposed framework enables robust high-speed bidirectional OR-PAM for reliable quantitative and longitudinal functional imaging. The code will be publicly available at this https URL


[7] 2602.09115

WiLoc: Massive Measured Dataset of Wi-Fi Channel State Information with Application to Machine-Learning Based Localization

Localization is a key component of the wireless ecosystem. Machine learning (ML)-based localization using channel state information (CSI) is one of the most popular methods for achieving high-accuracy localization with low cost. However, to be accurate and robust, ML-based algorithms need to be trained and tested with large amounts of data, covering not only many user equipment (UE)/target locations, but also many different access points (APs) locations to which the UEs connect, in a variety of different environment types. This paper presents a massive-sized CSI dataset, WiLoc (Wi-Fi Localization), and makes it publicly available. WiLoc is obtained by a series of precision measurement campaigns that span three months, and it is massive in all the above-mentioned three dimensions: > 12 million UE locations, > 3,000 APs, covering 16 buildings for indoor localization, and > 30 streets for outdoor use. The paper describes the dataset structure, measurement environments, measurement protocols, and the dataset validations. Comprehensive case studies validate the advantages of large datasets in ML-driven localization strategies for both "standard" and transfer learning. We envision this dataset, which is by far the largest of its kind, to become a standard resource for researchers in the field of ML-based localization.


[8] 2602.09140

An Actor-Critic-Identifier Control Design for Increasing Energy Efficiency of Automated Electric Vehicles

Electric vehicles (EVs) are increasingly deployed, yet range limitations remain a key barrier. Improving energy efficiency via advanced control is therefore essential, and emerging vehicle automation offers a promising avenue. However, many existing strategies rely on indirect surrogates because linking power consumption to control inputs is difficult. We propose a neural-network (NN) identifier that learns this mapping online and couples it with an actor-critic reinforcement learning (RL) framework to generate optimal control commands. The resulting actor-critic-identifier architecture removes dependence on explicit models relating total power, recovered energy, and inputs, while maintaining accurate speed tracking and maximizing efficiency. Update laws are derived using Lyapunov stability analysis, and performance is validated in simulation. Compared to a traditional controller, the method increases total energy recovery by 12.84%, indicating strong potential for improving EV energy efficiency.


[9] 2602.09144

Shaping Energy Exchange with Gyroscopic Interconnections: a geometric approach

Gyroscopic interconnections enable redistribution of energy among degrees of freedom while preserving passivity and total energy, and they play a central role in controlled Lagrangian methods and IDA-PBC. Yet their quantitative effect on transient energy exchange and subsystem performance is not well characterised. We study a conservative mechanical system with constant skew-symmetric velocity coupling. Its dynamics are integrable and evolve on invariant two-tori, whose projections onto subsystem phase planes provide geometric description of energy exchange. When the ratio of normal-mode frequencies is rational, these projections become closed resonant Lissajous curves, enabling structured analysis of subsystem trajectories. To quantify subsystem behaviour, we introduce the inscribed-radius metric: the radius of the largest origin-centred circle contained in a projected trajectory. This gives a lower bound on attainable subsystem energy and acts as an internal performance measure. We derive resonance conditions and develop an efficient method to compute or certify the inscribed radius without time-domain simulation. Our results show that low-order resonances can strongly restrict energy depletion through phase-locking, whereas high-order resonances recover conservative bounds. These insights lead to an explicit interconnection-shaping design framework for both energy absorption and containment control strategies, while taking responsiveness into account.


[10] 2602.09150

Dynamic Passivity Multipliers for Plug-and-Play Stability Certificates of Converter-Dominated Grids

Ensuring small-signal stability in power systems with a high share of inverter-based resources (IBRs) is hampered by two factors: (i) device and network parameters are often uncertain or completely unknown, and (ii) brute-force enumeration of all topologies is computationally intractable. These challenges motivate plug-and-play (PnP) certificates that verify stability locally yet hold globally. Passivity is an attractive property because it guarantees stability under feedback and network interconnections; however, strict passivity rarely holds for practical controllers such as Grid Forming Inverters (GFMs) employing P-Q droop. This paper extends the passivity condition by constructing a dynamic, frequency-dependent multiplier that enables PnP stability certification of each component based solely on its admittance, without requiring any modification to the controller design. The multiplier is parameterised as a linear filter whose coefficients are tuned under a passivity goal. Numerical results for practical droop gains confirm the PnP rules, substantially enlarging the certified stability region while preserving the decentralised, model-agnostic nature of passivity-based PnP tests.


[11] 2602.09157

Foundation Model-Aided Hierarchical Deep Reinforcement Learning for Blockage-Aware Link in RIS-Assisted Networks

Reconfigurable intelligent surface (RIS) technology has the potential to significantly enhance the spectral efficiency (SE) of 6G wireless networks. However, practical deployment remains constrained by challenges in accurate channel estimation and control optimization under dynamic conditions. This paper presents a foundation model-aided hierarchical deep reinforcement learning (FM-HDRL) framework designed for joint beamforming and phase-shift optimization in RIS-assisted wireless networks. To implement this, we first fine-tune a pre-trained large wireless model (LWM) to translate raw channel data into low-dimensional, context-aware channel state information (CSI) embeddings. Next, these embeddings are combined with user location information and blockage status to select the optimal communication path. The resulting features are then fed into an HDRL model, assumed to be implemented at a centralized controller, which jointly optimizes the base station (BS) beamforming vectors and the RIS phase-shift configurations to maximize SE. Simulation results demonstrate that the proposed FM-HDRL framework consistently outperforms baseline methods in terms of convergence speed, spectral efficiency, and scalability. According to the simulation results, our proposed method improves 7.82% SE compared to the FM-aided deep reinforcement learning (FM-DRL) approach and a substantial enhancement of about 48.66% relative to the beam sweeping approach.


[12] 2602.09191

Digital-Twin-Aided Dynamic Spectrum Sharing and Resource Management in Integrated Satellite-Terrestrial Networks

The explosive growth in wireless service demand has prompted the evolution of integrated satellite-terrestrial networks (ISTNs) to overcome the limitations of traditional terrestrial networks (TNs) in terms of coverage, spectrum efficiency, and deployment cost. Particularly, leveraging LEO satellites and dynamic spectrum sharing (DSS), ISTNs offer promising solutions but face significant challenges due to diverse terrestrial environments, user and satellite mobility, and long propagation LEO-to-ground distance. To address these challenges, digitial-twin (DT) has emerged as a promising technology to offer virtual replicas of real-world systems, facilitating prediction for resource management. In this work, we study a time-window-based DT-aided DSS framework for ISTNs, enabling joint long-term and short-term resource decisions to reduce system congestion. Based on that, two optimization problems are formulated, which aim to optimize resource management using DT information and to refine obtained solutions with actual real-time information, respectively. To efficiently solve these problems, we proposed algorithms using compressed-sensing-based and successive convex approximation techniques. Simulation results using actual traffic data and the London 3D map demonstrate the superiority in terms of congestion minimization of our proposed algorithms compared to benchmarks. Additionally, it shows the adaptation ability and practical feasibility of our proposed solutions.


[13] 2602.09206

EExApp: GNN-Based Reinforcement Learning for Radio Unit Energy Optimization in 5G O-RAN

With over 3.5 million 5G base stations deployed globally, their collective energy consumption (projected to exceed 131 TWh annually) raises significant concerns over both operational costs and environmental impacts. In this paper, we present EExAPP, a deep reinforcement learning (DRL)-based xApp for 5G Open Radio Access Network (O-RAN) that jointly optimizes radio unit (RU) sleep scheduling and distributed unit (DU) resource slicing. EExAPP uses a dual-actor-dual-critic Proximal Policy Optimization (PPO) architecture, with dedicated actor-critic pairs targeting energy efficiency and quality-of-service (QoS) compliance. A transformer-based encoder enables scalable handling of variable user equipment (UE) populations by encoding all-UE observations into fixed-dimensional representations. To coordinate the two optimization objectives, a bipartite Graph Attention Network (GAT) is used to modulate actor updates based on both critic outputs, enabling adaptive tradeoffs between power savings and QoS. We have implemented EExAPP and deployed it on a real-world 5G O-RAN testbed with live traffic, commercial RU and smartphones. Extensive over-the-air experiments and ablation studies confirm that EExAPP significantly outperforms existing methods in reducing the energy consumption of RU while maintaining QoS.


[14] 2602.09210

AI-Driven Cardiorespiratory Signal Processing: Separation, Clustering, and Anomaly Detection

This research applies artificial intelligence (AI) to separate, cluster, and analyze cardiorespiratory sounds. We recorded a new dataset (HLS-CMDS) and developed several AI models, including generative AI methods based on large language models (LLMs) for guided separation, explainable AI (XAI) techniques to interpret latent representations, variational autoencoders (VAEs) for waveform separation, a chemistry-inspired non-negative matrix factorization (NMF) algorithm for clustering, and a quantum convolutional neural network (QCNN) designed to detect abnormal physiological patterns. The performance of these AI models depends on the quality of the recorded signals. Therefore, this thesis also reviews the biosensing technologies used to capture biomedical data. It summarizes developments in microelectromechanical systems (MEMS) acoustic sensors and quantum biosensors, such as quantum dots and nitrogen-vacancy centers. It further outlines the transition from electronic integrated circuits (EICs) to photonic integrated circuits (PICs) and early progress toward integrated quantum photonics (IQP) for chip-based biosensing. Together, these studies show how AI and next-generation sensors can support more intelligent diagnostic systems for future healthcare.


[15] 2602.09213

Real-time Load Current Monitoring of Overhead Lines Using GMR Sensors

Non-contact current monitoring has emerged as a prominent research focus owing to its non-intrusive characteristics and low maintenance requirements. However, while they offer high sensitivity, contactless sensors necessitate sophisticated design methodologies and thorough experimental validation. In this study, a Giant Magneto-Resistance (GMR) sensor is employed to monitor the instantaneous currents of a three-phase 400-volt overhead line, and its performance is evaluated against that of a conventional contact-based Hall effect sensor. A mathematical framework is developed to calculate current from the measured magnetic field signals. Furthermore, a MATLAB-based dashboard is implemented to enable real-time visualization of current measurements from both sensors under linear and non-linear load conditions. The GMR current sensor achieved a relative accuracy of 64.64% to 91.49%, with most phases above 80%. Identified improvements over this are possible, indicating that the sensing method has potential as a basis for calculating phase currents.


[16] 2602.09321

Performance Comparison of CNN and AST Models with Stacked Features for Environmental Sound Classification

Environmental sound classification (ESC) has gained significant attention due to its diverse applications in smart city monitoring, fault detection, acoustic surveillance, and manufacturing quality control. To enhance CNN performance, feature stacking techniques have been explored to aggregate complementary acoustic descriptors into richer input representations. In this paper, we investigate CNN-based models employing various stacked feature combinations, including Log-Mel Spectrogram (LM), Spectral Contrast (SPC), Chroma (CH), Tonnetz (TZ), Mel-Frequency Cepstral Coefficients (MFCCs), and Gammatone Cepstral Coefficients (GTCC). Experiments are conducted on the widely used ESC-50 and UrbanSound8K datasets under different training regimes, including pretraining on ESC-50, fine-tuning on UrbanSound8K, and comparison with Audio Spectrogram Transformer (AST) models pretrained on large-scale corpora such as AudioSet. This experimental design enables an analysis of how feature-stacked CNNs compare with transformer-based models under varying levels of training data and pretraining diversity. The results indicate that feature-stacked CNNs offer a more computationally and data-efficient alternative when large-scale pretraining or extensive training data are unavailable, making them particularly well suited for resource-constrained and edge-level sound classification scenarios.


[17] 2602.09389

TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization

Real-time voice conversion and speaker anonymization require causal, low-latency synthesis without sacrificing intelligibility or naturalness. Current systems have a core representational mismatch: content is time-varying, while speaker identity is injected as a static global embedding. We introduce a streamable speech synthesizer that aligns the temporal granularity of identity and content via a content-synchronous, time-varying timbre (TVT) representation. A Global Timbre Memory expands a global timbre instance into multiple compact facets; frame-level content attends to this memory, a gate regulates variation, and spherical interpolation preserves identity geometry while enabling smooth local changes. In addition, a factorized vector-quantized bottleneck regularizes content to reduce residual speaker leakage. The resulting system is streamable end-to-end, with <80 ms GPU latency. Experiments show improvements in naturalness, speaker transfer, and anonymization compared to SOTA streaming baselines, establishing TVT as a scalable approach for privacy-preserving and expressive speech synthesis under strict latency budgets.


[18] 2602.09398

Escaping Local Minima: A Finite-Time Markov Chain Analysis of Constant-Temperature Simulated Annealing

Simulated Annealing (SA) is a widely used stochastic optimization algorithm, yet much of its theoretical understanding is limited to asymptotic convergence guarantees or general spectral bounds. In this paper, we develop a finite-time analytical framework for constant-temperature SA by studying a piecewise linear cost function that permits exact characterization. We model SA as a discrete-state Markov chain and first derive a closed-form expression for the expected time to escape a single linear basin in a one-dimensional landscape. We show that this expression also accurately predicts the behavior of continuous-state searches up to a constant scaling factor, which we analyze empirically and explain via variance matching, demonstrating convergence to a factor of sqrt(3) in certain regimes. We then extend the analysis to a two-basin landscape containing a local and a global optimum, obtaining exact expressions for the expected time to reach the global optimum starting from the local optimum, as a function of basin geometry, neighborhood radius, and temperature. Finally, we demonstrate how the predicted basin escape time can be used to guide the design of a simple two-temperature switching strategy.


[19] 2602.09414

Finite-time Stable Pose Estimation on TSE(3) using Point Cloud and Velocity Sensors

This work presents a finite-time stable pose estimator (FTS-PE) for rigid bodies undergoing rotational and translational motion in three dimensions, using measurements from onboard sensors that provide position vectors to inertially-fixed points and body velocities. The FTS-PE is a full-state observer for the pose (position and orientation) and velocities and is obtained through a Lyapunov analysis that shows its stability in finite time and its robustness to bounded measurement noise. Further, this observer is designed directly on the state space, the tangent bundle of the Lie group of rigid body motions, SE(3), without using local coordinates or (dual) quaternion representations. Therefore, it can estimate arbitrary rigid body motions without encountering singularities or the unwinding phenomenon and be readily applied to autonomous vehicles. A version of this observer that does not need translational velocity measurements and uses only point clouds and angular velocity measurements from rate gyros, is also obtained. It is discretized using the framework of geometric mechanics for numerical and experimental implementations. The numerical simulations compare the FTS-PE with a dual-quaternion extended Kalman filter and our previously developed variational pose estimator (VPE). The experimental results are obtained using point cloud images and rate gyro measurements obtained from a Zed 2i stereo depth camera sensor. These results validate the stability and robustness of the FTS-PE.


[20] 2602.09419

When Movable Antennas Meet RSMA and RIS: Robust Beamforming Design With Channel Uncertainty

In this work, we propose an intelligent optimization framework for a multi-user communication system integrating movable antennas (MAs) and a reconfigurable intelligent surface (RIS) under the rate-splitting multiple access (RSMA) protocol. The system sum-rate is maximized through joint optimization of transmit precoding vectors, RIS reflection matrix, common-rate allocation, and MA positions, subject to quality-of-service (QoS), power-budget, common-rate decoding, and mutual coupling constraints. Imperfect channel state information (CSI) is considered for all links, where robustness is ensured by modeling channel estimation errors within a bounded uncertainty region, guaranteeing worst-case performance reliability. The resulting non-convex problem is solved using an alternating optimization framework. The precoding subproblem is reformulated as a semidefinite programming (SDP) problem via linear matrix inequalities derived using the S-procedure. The RIS reflection matrix is optimized using successive convex approximation (SCA), yielding an equivalent SDP formulation. The MA position optimization is addressed through SCA combined with block coordinate descent (BCD) method. Numerical results validate the effectiveness of the proposed framework and demonstrate fast convergence.


[21] 2602.09427

Lateral tracking control of all-wheel steering vehicles with intelligent tires

The accurate characterization of tire dynamics is critical for advancing control strategies in autonomous road vehicles, as tire behavior significantly influences handling and stability through the generation of forces and moments at the tire-road interface. Smart tire technologies have emerged as a promising tool for sensing key variables such as road friction, tire pressure, and wear states, and for estimating kinematic and dynamic states like vehicle speed and tire forces. However, most existing estimation and control algorithms rely on empirical correlations or machine learning approaches, which require extensive calibration and can be sensitive to variations in operating conditions. In contrast, model-based techniques, which leverage infinite-dimensional representations of tire dynamics using partial differential equations (PDEs), offer a more robust approach. This paper proposes a novel model-based, output-feedback lateral tracking control strategy for all-wheel steering vehicles that integrates distributed tire dynamics with smart tire technologies. The primary contributions include the suppression of micro-shimmy phenomena at low speeds and path-following via force control, achieved through the estimation of tire slip angles, vehicle kinematics, and lateral tire forces. The proposed controller and observer are based on formulations using ODE-PDE systems, representing rigid body dynamics and distributed tire behavior. This work marks the first rigorous control strategy for vehicular systems equipped with distributed tire representations in conjunction with smart tire technologies.


[22] 2602.09429

First-order friction models with bristle dynamics: lumped and distributed formulations

Dynamic models, particularly rate-dependent models, have proven effective in capturing the key phenomenological features of frictional processes, whilst also possessing important mathematical properties that facilitate the design of control and estimation algorithms. However, many rate-dependent formulations are built on empirical considerations, whereas physical derivations may offer greater interpretability. In this context, starting from fundamental physical principles, this paper introduces a novel class of first-order dynamic friction models that approximate the dynamics of a bristle element by inverting the friction characteristic. Amongst the developed models, a specific formulation closely resembling the LuGre model is derived using a simple rheological equation for the bristle element. This model is rigorously analyzed in terms of stability and passivity -- important properties that support the synthesis of observers and controllers. Furthermore, a distributed version, formulated as a hyperbolic partial differential equation (PDE), is presented, which enables the modeling of frictional processes commonly encountered in rolling contact phenomena. The tribological behavior of the proposed description is evaluated through classical experiments and validated against the response predicted by the LuGre model, revealing both notable similarities and key differences.


[23] 2602.09450

Orthogonal Circular Polarized Transmitter and Receiver Antennas for Mitigation of Mutual Coupling in Monostatic Radars

Through-wall radar systems require compact, wideband and high gain antennas for detecting targets. Building walls introduce considerable attenuation on the radar signals. When the transmitted power is raised to compensate the through-wall attenuation, the direct coupling between the transmitter and receiver can saturate the receiver because of which weaker reflections off the target may remain undetected. In this paper, we propose using transmitter and receiver antennas of orthogonal circular polarization to reduce the direct coupling between the transmitter and receiver while retaining the first bounce off the target. In our paper, we demonstrate that the quadrafilar helical antenna (QHA) is a good candidate for this operation since it is characterized by a small size, wide frequency band of operation, high gain and low axial ratio over a wide field of view. We compare the reduced mutual coupling between the transmitter and receiver elements for the oppositely polarized QHA antennas with other commonly used through-wall radar antennas such as the Vivaldi and horn antennas. The system is tested in through-wall conditions.


[24] 2602.09451

Performance Analysis of Millimeter Wave Radar Waveforms for Integrated Sensing and Communication

Next-generation intelligent transportation systems require both sensing and communication between road users. However, deploying separate radars and communication devices involves the allocation of individual frequency bands and hardware platforms. Integrated sensing and communication (ISAC) offers a robust solution to the challenges of spectral congestion by utilizing a shared waveform, hardware, and spectrum for both localization of mobile users and communication. Various waveforms, including phase-modulated continuous waves (PMCW) and frequency-modulated continuous waves (FMCW), have been explored for target localization using traditional radar. On the other hand, new protocols such as the IEEE 802.11ad have been proposed to support wideband communication between vehicles. This paper compares both traditional radar and communication candidate waveforms for ISAC to detect single-point and extended targets. We show that the response of FMCW to mobile targets is poorer than that of PMCW. However, the IEEE 802.11ad radar outperforms PMCW radar and FMCW radar. Additionally, the radar signal processing algorithms are implemented on Zynq system-on-chip through hardware-software co-design and fixed-point analysis to evaluate their computational complexity in real-world implementations.


[25] 2602.09452

Motion Compensation for Multiple-Input-Multiple-Output Inverse Synthetic Aperture Imaging of Automotive Targets

Inverse synthetic aperture radar (ISAR) images generated from single-channel automotive radar data provide critical information about the shape and size of automotive targets. However, the quality of ISAR images degrades due to road clutter and when translational and higher order rotational motions of the targets are not suitably compensated. One method to enhance the signal-to-clutter-and-noise ratio (SCNR) of the systems is to leverage the advantages of the multiple-input-multiple-output (MIMO) framework available in commercial automotive radars to generate MIMO-ISAR images. While substantial research has been devoted to motion compensation of single-channel ISAR images, the effectiveness of these methods for MIMO-ISAR has not been studied extensively. This paper analyzes the performance of three popular motion compensation techniques - entropy minimization, cross-correlation, and phase gradient autofocus - on MIMO-ISAR. The algorithms are evaluated on the measurement data collected using Texas Instruments millimeter-wave MIMO radar. The results indicate that the cross-correlation MOCOMP performs better than the other two MOCOMP algorithms in the MIMO configuration, with an overall improvement of 36%.


[26] 2602.09484

Smaller is Better: Generative Models Can Power Short Video Preloading

Preloading is widely used in short video platforms to minimize playback stalls by downloading future content in advance. However, existing strategies face a tradeoff. Aggressive preloading reduces stalls but wastes bandwidth, while conservative strategies save data but increase the risk of playback stalls. This paper presents PromptPream, a computation powered preloading paradigm that breaks this tradeoff by using local computation to reduce bandwidth demand. Instead of transmitting pixel level video chunks, PromptPream sends compact semantic prompts that are decoded into high quality frames using generative models such as Stable Diffusion. We propose three core techniques to enable this paradigm: (1) a gradient based prompt inversion method that compresses frames into small sets of compact token embeddings; (2) a computation aware scheduling strategy that jointly optimizes network and compute resource usage; and (3) a scalable searching algorithm that addresses the enlarged scheduling space introduced by scheduler. Evaluations show that PromptStream reduces both stalls and bandwidth waste by over 31%, and improves Quality of Experience (QoE) by 45%, compared to traditional strategies.


[27] 2602.09500

Camel: Frame-Level Bandwidth Estimation for Low-Latency Live Streaming under Video Bitrate Undershooting

Low-latency live streaming (LLS) has emerged as a popular web application, with many platforms adopting real-time protocols such as WebRTC to minimize end-to-end latency. However, we observe a counter-intuitive phenomenon: even when the actual encoded bitrate does not fully utilize the available bandwidth, stalling events remain frequent. This insufficient bandwidth utilization arises from the intrinsic temporal variations of real-time video encoding, which cause conventional packet-level congestion control algorithms to misestimate available bandwidth. When a high-bitrate frame is suddenly produced, sending at the wrong rate can either trigger packet loss or increase queueing delay, resulting in playback stalls. To address these issues, we present Camel, a novel frame-level congestion control algorithm (CCA) tailored for LLS. Our insight is to use frame-level network feedback to capture the true network capacity, immune to the irregular sending pattern caused by encoding. Camel comprises three key modules: the Bandwidth and Delay Estimator and the Congestion Detector, which jointly determine the average sending rate, and the Bursting Length Controller, which governs the emission pattern to prevent packet loss. We evaluate Camel on both large-scale real-world deployments and controlled simulations. In the real-world platform with 250M users and 2B sessions across 150+ countries, Camel achieves up to a 70.8% increase in 1080P resolution ratio, a 14.4% increase in media bitrate, and up to a 14.1% reduction in stalling ratio. In simulations under undershooting, shallow buffers, and network jitter, Camel outperforms existing congestion control algorithms, with up to 19.8% higher bitrate, 93.0% lower stalling ratio, and 23.9% improvement in bandwidth estimation accuracy.


[28] 2602.09536

UAV-Assisted 6G Communication Networks for Railways: Technologies, Applications, and Challenges

Unmanned Aerial Vehicles (UAVs) are crucial for advancing railway communication by offering reliable connectivity, adaptive coverage, and mobile edge services . This survey examines UAV-assisted approaches for 6G railway needs including ultra-reliable low-latency communication (URLLC) and integrated sensing and communication (ISAC). We cover railway channel models, reconfigurable intelligent surfaces (RIS), and UAV-assisted mobile edge computing (MEC). Key challenges include coexistence with existing systems, handover management, Doppler effect, and security. The roadmap suggests work on integrated communication-control systems and AI-driven optimization for intelligent railway networks.


[29] 2602.09589

A Survey on STAR-RIS Enabled Joint Communications and Sensing: Fundamentals, Recent Advances and Research Challenges

The joint communications and sensing (JCAS) paradigm is envisioned as a core capability of sixth-generation (6G) wireless networks, enabling the integration of data communication and environmental sensing within a unified system. By reusing spectrum, waveforms, and hardware resources, JCAS improves spectral efficiency, reduces system complexity, and hardware cost, while enabling new use cases. Nevertheless, the realization of JCAS is hindered by inherent trade-offs between communication and sensing objectives, limited controllability of wireless propagation, and stringent hardware and design constraints. Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) have recently emerged as a promising technology to address these challenges by enabling full-space programmable manipulation of electromagnetic waves. This survey provides a systematic and in-depth review of STAR-RIS-enabled JCAS systems. Specifically, we first introduce the fundamental principles of JCAS and STAR-RIS. We then classify and review the state-of-the-art research on STAR-RIS-assisted JCAS from multiple perspectives, encompassing system architectures, waveform and beamforming design, resource allocation, optimization frameworks, and learning-based control. Finally, we identify key open challenges that remain unsolved and outline promising future research directions toward intelligent, flexible, and perceptive 6G wireless networks.


[30] 2602.09594

Evaluation of acoustic Green's function in rectangular rooms with general surface impedance walls

Acoustic room modes and the Green's function mode expansion are well-known for rectangular rooms with perfectly reflecting walls. First-order approximations also exist for nearly rigid boundaries; however, current analytical methods fail to accommodate more general boundary conditions, e.g., when wall absorption is significant. In this work, we present a comprehensive analysis that extends previous studies by including additional first-order asymptotics that account for soft-wall boundaries. In addition, we introduce a semi-analytical, efficient, and reliable method for computing the Green's function in rectangular rooms, which is described and validated through numerical tests. With a sufficiently large truncation order, the resulting error becomes negligible, making the method suitable as a benchmark for numerical simulations. Additional aspects regarding the spectral basis orthogonality and completeness are also addressed, providing a general framework for the validity of the proposed approach.


[31] 2602.09605

A General Formulation for the Teaching Assignment Problem: Computational Analysis Over a Real-World Dataset

The Teacher Assignment Problem is a combinatorial optimization problem that involves assigning teachers to courses while guaranteeing that all courses are covered, teachers do not teach too few or too many hours, teachers do not switch assigned courses too often and possibly teach the courses they favor. Typically the problem is solved manually, a task that requires several hours every year. In this work we present a mathematical formulation for the problem and an experimental evaluation of the model implemented using state-of-the-art SMT, CP, and MILP solvers. The implementations are tested over a real-world dataset provided by the Division of Systems and Control at Chalmers University of Technology, and produce teacher assignments with smaller workload deviation, a more even workload distribution among the teachers, and a lower number of switched courses.


[32] 2602.09615

Collaborative Spectrum Sensing in Cognitive and Intelligent Wireless Networks: An Artificial Intelligence Perspective

Artificial intelligence (AI) has become a key enabler for next-generation wireless communication systems, offering powerful tools to cope with the increasing complexity, dynamics, and heterogeneity of modern wireless environments. To illustrate the role and impact of AI in wireless communications, this paper takes collaborative spectrum sensing (CSS) in cognitive and intelligent wireless networks as a representative application and surveys recent advances from an AI perspective. We first introduce the fundamentals of CSS, including the general framework, classical detector design, and fusion strategies. Then, we present an overview of the state-of-the-art research on AI-driven CSS, classified into three categories: discriminative deep learning (DL) models, generative DL models, and deep reinforcement learning (DRL). Furthermore, we explore semantic communication (SemCom) as a promising solution for CSS, in which task-oriented representations are exchanged to reduce reporting overhead while preserving decision-critical information. Finally, we discuss limitations, open challenges, and future research directions at the intersection of AI and wireless communication.


[33] 2602.09673

Community-Centered Resilience Enhancement of Urban Power and Gas Networks via Microgrid Partitioning, Mobile Energy Storage, and Data-Driven Risk Assessment

Urban energy systems face increasing challenges due to high penetration of renewable energy sources, extreme weather events, and other high-impact, low-probability disruptions. This project proposes a community-centered, open-access framework to enhance the resilience and reliability of urban power and gas networks by integrating microgrid partitioning, mobile energy storage deployment, and data-driven risk assessment. The approach involves converting passive distribution networks into active, self-healing microgrids using distributed energy resources and remotely controlled switches to enable flexible reconfiguration during normal and emergency operations. To address uncertainties from intermittent renewable generation and variable load, an adjustable interval optimization method combined with a column and constraint generation algorithm is developed, providing robust planning solutions without requiring probabilistic information. Additionally, a real-time online risk assessment tool is proposed, leveraging 25 multi-dimensional indices including load, grid status, resilient resources, emergency response, and meteorological factors to support operational decision-making during extreme events. The framework also optimizes the long-term sizing and allocation of mobile energy storage units while incorporating urban traffic data for effective routing during emergencies. Finally, a novel time-dependent resilience and reliability index is introduced to quantify system performance under diverse operating conditions. The proposed methodology aims to enable resilient, efficient, and adaptable urban energy networks capable of withstanding high-impact disruptions while maximizing operational and economic benefits.


[34] 2602.09685

Generalizable and Robust Beam Prediction for 6G Networks: An Deep-Learning Framework with Positioning Feature Fusion

Beamforming (BF) is essential for enhancing system capacity in fifth generation (5G) and beyond wireless networks, yet exhaustive beam training in ultra-massive multiple-input multiple-output (MIMO) systems incurs substantial overhead. To address this challenge, we propose a deep learning based framework that leverages position-aware features to improve beam prediction accuracy while reducing training costs. The proposed approach uses spatial coordinate labels to supervise a position extraction branch and integrates the resulting representations with beam-domain features through a feature fusion module. A dual-branch RegNet architecture is adopted to jointly learn location related and communication features for beam prediction. Two fusion strategies, namely adaptive fusion and adversarial fusion, are introduced to enable efficient feature integration. The proposed framework is evaluated on datasets generated by the DeepMIMO simulator across four urban scenarios at 3.5 GHz following 3GPP specifications, where both reference signal received power and user equipment location information are available. Simulation results under both in-distribution and out-of-distribution settings demonstrate that the proposed approach consistently outperforms traditional baselines and achieves more accurate and robust beam prediction by effectively incorporating positioning information.


[35] 2602.09695

Robust Macroscopic Density Control of Heterogeneous Multi-Agent Systems

Modern applications, such as orchestrating the collective behavior of robotic swarms or traffic flows, require the coordination of large groups of agents evolving in unstructured environments, where disturbances and unmodeled dynamics are unavoidable. In this work, we develop a scalable macroscopic density control framework in which a feedback law is designed directly at the level of an advection--diffusion partial differential equation. We formulate the control problem in the density space and prove global exponential convergence towards the desired behavior in $\mathcal{L}^2$ with guaranteed asymptotic rejection of bounded unknown drift terms, explicitly accounting for heterogeneous agent dynamics, unmodeled behaviors, and environmental perturbations. Our theoretical findings are corroborated by numerical experiments spanning heterogeneous oscillators, traffic systems, and swarm robotics in partially unknown environments.


[36] 2602.09699

Rolling Element Bearing Fault Detection and Diagnosis with One-Dimensional Convolutional Neural Network

Rolling element bearings are critical components in rotating machinery, and their condition significantly influences system performance, reliability, and operational lifespan. Timely and accurate fault detection is essential to prevent unexpected failures and reduce maintenance costs. Traditional diagnostic methods often rely on manual feature extraction and shallow classifiers, which may be inadequate for capturing the complex patterns embedded in raw vibration signals. In this study, a compact one-dimensional convolutional neural network (1D CNN) is developed for automated bearing fault diagnosis using raw time-domain vibration data, eliminating the need for manual feature engineering. The model is trained and evaluated on two established benchmark datasets: the Case Western Reserve University (CWRU) dataset and the Paderborn University (PU) dataset. The CWRU data were segmented based on four distinct motor load conditions (0 HP to 3 HP), with each load scenario trained and tested independently to ensure strict separation and prevent data leakage. The CNN achieved high average test accuracies of 99.14%, 98.85%, 97.42%, and 95.14% for 0 HP, 1 HP, 2 HP, and 3 HP, respectively. On the PU dataset, known for its naturally induced faults and greater operational variability the model achieved a robust average testing accuracy of 95.63%. These results affirm the model ability to generalize across datasets and varying operating conditions. Further improvements were observed through hyperparameter tuning, particularly window length and training epochs, underscoring the importance of tailored configurations for specific datasets and load conditions. Overall, the proposed method demonstrates the effectiveness and scalability of 1D CNNs for real-time, data-driven bearing fault diagnosis, offering a reliable foundation for condition monitoring in industrial applications.


[37] 2602.09754

A Dual Belief-Driven Bayesian-Stackelberg Framework for Low-Complexity and Secure Near-Field ISAC Systems

Ensuring robust security in near-field Integrated Sensing and Communication (ISAC) systems remains a critical challenge due to dynamic channel conditions, multi-eavesdropper threats, and the high computational burden of real-time optimization at mmWave and THz frequencies. To address these challenges, this paper introduces a novel Bayesian-Stackelberg framework that jointly optimizes sensing, beamforming, and communication. The dual-algorithm design integrates (i) Adaptive Hybrid Node Role Switching between secure transmission and cooperative jamming (ii) Belief-Driven Sensing and Beamforming for confidence based resource allocation. The proposed unified framework significantly improves robustness against attacks while preserving linear computational complexity. Simulation results across carrier frequencies ranging from 28 to 410 GHz demonstrate that the method achieves up to a 35% increase in secrecy rates and a success rate exceeding 98%, outperforming conventional communication systems with minimal runtime overhead. These findings underscore the scalability of belief-driven ISAC security solutions for low-complexity deployment in next generation communications.


[38] 2602.09763

An Unsupervised Normalizing Flow-Based Neyman-Pearson Detector for Covert Communications in the Presence of Disco Reconfigurable Intelligent Surfaces

Covert communications, also known as low probability of detection (LPD) communications, offer a higher level of privacy protection compared to cryptography and physical-layer security (PLS) by hiding the transmission within ambient environments. Here, we investigate covert communications in the presence of a disco reconfigurable intelligent surface (DRIS) deployed by the warden Willie, which simultaneously reduces his detection error probabilities and degrades the communication performance between Alice and Bob, without relying on either channel state information (CSI) or additional jamming power. However, the introduction of the DRIS renders it intractable for Willie to construct a Neyman-Pearson (NP) detector, since the probability density function (PDF) of the test statistic is analytically intractable under the Alice-Bob transmission hypothesis. Moreover, given the adversarial relationship between Willie and Alice/Bob, it is unrealistic to assume that Willie has access to a labeled training dataset. To address these challenges, we propose an unsupervised masked autoregressive flow (MAF)-based NP detection framework that exploits prior knowledge inherent in covert communications. We further define the false alarm rate (FAR) and the missed detection rate (MDR) as monitoring performance metrics for Willie, and the signal-to-jamming-plus-noise ratio (SJNR) as a communication performance metric for Alice-Bob transmissions. Furthermore, we derive theoretical expressions for SJNR and uncover unique properties of covert communications in the presence of a DRIS. Simulations validate the theory and show that the proposed unsupervised MAF-based NP detector achieves performance comparable to its supervised counterpart.


[39] 2602.09787

Intensity-based Segmentation of Tissue Images Using a U-Net with a Pretrained ResNet-34 Encoder: Application to Mueller Microscopy

Manual annotation of the images of thin tissue sections remains a time-consuming step in Mueller microscopy and limits its scalability. We present a novel automated approach using only the total intensity M11 element of the Mueller matrix as an input to a U-Net architecture with a pretrained ResNet-34 encoder. The network was trained to distinguish four classes in the images of murine uterine cervix sections: background, internal os, cervical tissue, and vaginal wall. With only 70 cervical tissue sections, the model achieved 89.71% pixel accuracy and 80.96% mean tissue Dice coefficient on the held-out test dataset. Transfer learning from ImageNet enables accurate segmentation despite limited size of training dataset typical of specialized biomedical imaging. This intensity-based framework requires minimal preprocessing and is readily extensible to other imaging modalities and tissue types, with publicly available graphical annotation tools for practical deployment.


[40] 2602.09820

Analysis of Edge Mismatch and Output Power Degradation in Cascoded Class-D Power Amplifiers Using Dual-Range Voltage Level Shifters

This paper presents a low-jitter hybrid voltage level shifter (HVLS) suitable for high-speed applications. The proposed architecture offers the advantage of cross-coupled feedback to simultaneously generate two voltage domain signals with available swings equal to the nominal supply and its double, which operate up to 12.4 GHz. A prototype HVLS circuit, along with impedance matching and a driver to enable high-speed off-chip testing, was fabricated in a 22-nm FD-SOI process technology. The prototype consumes a total die area, including the interface circuitry, of 477 x 462 um^2, while the active area of the level-shifter is 2 x 3.26 um^2. The average power consumption of the circuit is measured to be 4.43 uW per cycle, and the jitter is less than 150 fs-rms.


[41] 2602.09848

Robust Processing and Learning: Principles, Methods, and Wireless Applications

This tutorial-style overview article examines the fundamental principles and methods of robustness, using wireless sensing and communication (WSC) as the narrative and exemplifying framework. First, we formalize the conceptual and mathematical foundations of robustness, highlighting the interpretations and relations across robust statistics, optimization, and machine learning. Key techniques, such as robust estimation and testing, distributionally robust optimization, and regularized and adversary training, are investigated. Together, the costs of robustness in system design, for example, the compromised nominal performances and the extra computational burdens, are discussed. Second, we review recent robust signal processing solutions for WSC that address model mismatch, data scarcity, adversarial perturbation, and distributional shift. Specific applications include robust ranging-based localization, modality sensing, channel estimation, receive combining, waveform design, and federated learning. Through this effort, we aim to introduce the classical developments and recent advances in robustness theory to the general signal processing community, exemplifying how robust statistical, optimization, and machine learning approaches can address the uncertainties inherent in WSC systems.


[42] 2602.09910

Geometric Analysis of Blind User Identification for Massive MIMO Networks

Applying Nearest Convex Hull Classification (NCHC) to blind user identification in a massive Multiple Input Multiple Output (MIMO) communications system is proposed. The method is blind in the way that the Base Station (BS) only requires a training sequence containing unknown data symbols obtained from the user without further knowledge on the channel, modulation, coding or even noise power. We evaluate the algorithm under the assumption of gaussian transmit signals using the non-rigorous replica method. To facilitate the computations the existence of an Operator Valued Free Fourier Transform is postulated, which is verified by Monte Carlo simulation. The replica computations are conducted in the large but finite system by applying saddle-point integration with inverse temperature $\beta$ as the large parameter. The classifier accuracy is estimated by gaussian approximation through moment-matching.


[43] 2602.09955

Doppler Effect: Analyses and Applications in Wireless Sensing and Communications

This chapter is motivated by the need for a rigorous and comprehensive analysis of the Doppler effects encountered by electromagnetic and acoustic signals across a diverse spectrum of modern applications. These include land mobile communications, various Internet of Things (IoT) networks, machine-type communications (MTC), and various radar and satellite-based systems for navigation and sensing, as well as the emerging regime of integrated sensing and communications (ISAC). A wide array of kinematic profiles is investigated, ranging from uniform motion and constant acceleration to more complex general motion. Consequently, the multi-faceted factors influencing the Doppler shift are addressed in detail, encompassing classical kinematics, special and general relativity, atmospheric dynamics, and the properties of the propagation medium. This work is intended to establish a definitive theoretical foundation for both the general enthusiast and the specialized researcher seeking to master the complexities of signal frequency shifts in modern wireless sensing and communications systems.


[44] 2602.09960

HAPS-RIS and UAV Integrated Networks: A Unified Joint Multi-objective Framework

Future 6G non-terrestrial networks aim to deliver ubiquitous connectivity to remote and undeserved regions, but unmanned aerial vehicle (UAV) base stations face fundamental challenges such as limited numbers and power budgets. To overcome these obstacles, high-altitude platform station (HAPS) equipped with a reconfigurable intelligent surface (RIS), so-called HAPS-RIS, is a promising candidate. We propose a novel unified joint multi-objective framework where UAVs and HAPS-RIS are fully integrated to extend coverage and enhance network performance. This joint multi-objective design maximizes the number of users served by the HAPS-RIS, minimizes the number of UAVs deployed and minimizes the total average UAV path loss subject to quality-of-service (QoS) and resource constraints. We propose a novel low-complexity solution strategy by proving the equivalence between minimizing the total average UAV path loss upper bound and k-means clustering, deriving a practical closed-form RIS phase-shift design, and introducing a mapping technique that collapses the combinatorial assignments into a zone radius and a bandwidth-portioning factor. Then, we propose a dynamic Pareto optimization technique to solve the transformed optimization problem. Extensive simulation results demonstrate that the proposed framework adapts seamlessly across operating regimes. A HAPS-RIS-only setup achieves full coverage at low data rates, but UAV assistance becomes indispensable as rate demands increase. By tuning a single bandwidth portioning factor, the model recovers UAV-only, HAPS-RIS-only and equal bandwidth portioning baselines within one formulation and consistently surpasses them across diverse rate requirements. The simulations also quantify a tangible trade-off between RIS scale and UAV deployment, enabling designers to trade increased RIS elements for fewer UAVs as service demands evolve.


[45] 2602.09970

BioME: A Resource-Efficient Bioacoustic Foundational Model for IoT Applications

Passive acoustic monitoring has become a key strategy in biodiversity assessment, conservation, and behavioral ecology, especially as Internet-of-Things (IoT) devices enable continuous in situ audio collection at scale. While recent self-supervised learning (SSL)-based audio encoders, such as BEATs and AVES, have shown strong performance in bioacoustic tasks, their computational cost and limited robustness to unseen environments hinder deployment on resource-constrained platforms. In this work, we introduce BioME, a resource-efficient audio encoder designed for bioacoustic applications. BioME is trained via layer-to-layer distillation from a high-capacity teacher model, enabling strong representational transfer while reducing the parameter count by 75%. To further improve ecological generalization, the model is pretrained on multi-domain data spanning speech, environmental sounds, and animal vocalizations. A key contribution is the integration of modulation-aware acoustic features via FiLM conditioning, injecting a DSP-inspired inductive bias that enhances feature disentanglement in low-capacity regimes. Across multiple bioacoustic tasks, BioME matches or surpasses the performance of larger models, including its teacher, while being suitable for resource-constrained IoT deployments. For reproducibility, code and pretrained checkpoints are publicly available.


[46] 2602.10025

RIS-Assisted Rank Enhancement With Commodity WiFi Transceivers: Real-World Experiments

Reconfigurable intelligent surfaces (RISs) are a promising enabling technology for the sixth-generation ($6$G) of wireless communications. RISs, thanks to their intelligent design, can reshape the wireless channel to provide favorable propagation conditions for information transfer. In this work, we experimentally investigate the potential of RISs to enhance the effective rank of multiple-input multiple-output (MIMO) channels, thereby improving spatial multiplexing capabilities. In our experiment, commodity WiFi transceivers are used, representing a practical MIMO system. In this context, we propose a passive beam-focusing technique to manipulate the propagation channel between each transmit-receive antenna pair and achieve a favorable propagation condition for rank improvement. The proposed algorithm is tested in two different channel scenarios: low and medium ranks. Experimental results show that, when the channel is rank-deficient, the RIS can significantly increase the rank by $112\%$ from its default value without the RIS, providing a rank increment of $1.5$. When the rank has a medium value, a maximum of $61\%$ enhancement can be achieved, corresponding to a rank increment of $1$. These results provide the first experimental evidence of RIS-driven rank manipulation with off-the-shelf WiFi hardware, offering practical insights into RIS deployment for spatial multiplexing gains.


[47] 2602.07859

Dynamic Load Model for Data Centers with Pattern-Consistent Calibration

The rapid growth of data centers has made large electronic load (LEL) modeling increasingly important for power system analysis. Such loads are characterized by fast workload-driven variability and protection-driven disconnection and reconnection behavior that are not captured by conventional load models. Existing data center load modeling includes physics-based approaches, which provide interpretable structure for grid simulation, and data-driven approaches, which capture empirical workload variability from data. However, physics-based models are typically uncalibrated to facility-level operation, while trajectory alignment in data-driven methods often leads to overfitting and unrealistic dynamic behavior. To resolve these limitations, we design the framework to leverage both physics-based structure and data-driven adaptability. The physics-based structure is parameterized to enable data-driven pattern-consistent calibration from real operational data, supporting facility-level grid planning. We further show that trajectory-level alignment is limited for inherently stochastic data center loads. Therefore, we design the calibration to align temporal and statistical patterns using temporal contrastive learning (TCL). This calibration is performed locally at the facility, and only calibrated parameters are shared with utilities, preserving data privacy. The proposed load model is calibrated by real-world operational load data from the MIT Supercloud, ASU Sol, Blue Waters, and ASHRAE datasets. Then it is integrated into the ANDES platform and evaluated on the IEEE 39-bus, NPCC 140-bus, and WECC 179-bus systems. We find that interactions among LELs can fundamentally alter post-disturbance recovery behavior, producing compound disconnection-reconnection dynamics and delayed stabilization that are not captured by uncalibrated load models.


[48] 2602.09041

DSFlow: Dual Supervision and Step-Aware Architecture for One-Step Flow Matching Speech Synthesis

Flow-matching models have enabled high-quality text-to-speech synthesis, but their iterative sampling process during inference incurs substantial computational cost. Although distillation is widely used to reduce the number of inference steps, existing methods often suffer from process variance due to endpoint error accumulation. Moreover, directly reusing continuous-time architectures for discrete, fixed-step generation introduces structural parameter inefficiencies. To address these challenges, we introduce DSFlow, a modular distillation framework for few-step and one-step synthesis. DSFlow reformulates generation as a discrete prediction task and explicitly adapts the student model to the target inference regime. It improves training stability through a dual supervision strategy that combines endpoint matching with deterministic mean-velocity alignment, enforcing consistent generation trajectories across inference steps. In addition, DSFlow improves parameter efficiency by replacing continuous-time timestep conditioning with lightweight step-aware tokens, aligning model capacity with the significantly reduced timestep space of the discrete task. Extensive experiments across diverse flow-based text-to-speech architectures demonstrate that DSFlow consistently outperforms standard distillation approaches, achieving strong few-step and one-step synthesis quality while reducing model parameters and inference cost.


[49] 2602.09042

The SJTU X-LANCE Lab System for MSR Challenge 2025

This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for data augmentation; (3) scale-up of audio length. Our system achieved the first rank in all three subjective and three objective evaluation metrics, including an MMSNR score of 4.4623 and an FAD score of 0.1988. We have open-sourced all the code and checkpoints at this https URL.


[50] 2602.09070

NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control

Synthesizing coherent soundtracks for long-form videos remains a formidable challenge, currently stalled by three critical impediments: computational scalability, temporal coherence, and, most critically, a pervasive semantic blindness to evolving narrative logic. To bridge these gaps, we propose NarraScore, a hierarchical framework predicated on the core insight that emotion serves as a high-density compression of narrative logic. Uniquely, we repurpose frozen Vision-Language Models (VLMs) as continuous affective sensors, distilling high-dimensional visual streams into dense, narrative-aware Valence-Arousal trajectories. Mechanistically, NarraScore employs a Dual-Branch Injection strategy to reconcile global structure with local dynamism: a \textit{Global Semantic Anchor} ensures stylistic stability, while a surgical \textit{Token-Level Affective Adapter} modulates local tension via direct element-wise residual injection. This minimalist design bypasses the bottlenecks of dense attention and architectural cloning, effectively mitigating the overfitting risks associated with data scarcity. Experiments demonstrate that NarraScore achieves state-of-the-art consistency and narrative alignment with negligible computational overhead, establishing a fully autonomous paradigm for long-video soundtrack generation.


[51] 2602.09123

Agile asymmetric multi-legged locomotion: contact planning via geometric mechanics and spin model duality

Legged robot research is presently focused on bipedal or quadrupedal robots, despite capabilities to build robots with many more legs to potentially improve locomotion performance. This imbalance is not necessarily due to hardware limitations, but rather to the absence of principled control frameworks that explain when and how additional legs improve locomotion performance. In multi-legged systems, coordinating many simultaneous contacts introduces a severe curse of dimensionality that challenges existing modeling and control approaches. As an alternative, multi-legged robots are typically controlled using low-dimensional gaits originally developed for bipeds or quadrupeds. These strategies fail to exploit the new symmetries and control opportunities that emerge in higher-dimensional systems. In this work, we develop a principled framework for discovering new control structures in multi-legged locomotion. We use geometric mechanics to reduce contact-rich locomotion planning to a graph optimization problem, and propose a spin model duality framework from statistical mechanics to exploit symmetry breaking and guide optimal gait reorganization. Using this approach, we identify an asymmetric locomotion strategy for a hexapod robot that achieves a forward speed of 0.61 body lengths per cycle (a 50% improvement over conventional gaits). The resulting asymmetry appears at both the control and hardware levels. At the control level, the body orientation oscillates asymmetrically between fast clockwise and slow counterclockwise turning phases for forward locomotion. At the hardware level, two legs on the same side remain unactuated and can be replaced with rigid parts without degrading performance. Numerical simulations and robophysical experiments validate the framework and reveal novel locomotion behaviors that emerge from symmetry reforming in high-dimensional embodied systems.


[52] 2602.09202

Genocide by Algorithm in Gaza: Artificial Intelligence, Countervailing Responsibility, and the Corruption of Public Discourse

The accelerating militarization of artificial intelligence has transformed the ethics, politics, and governance of warfare. This article interrogates how AI-driven targeting systems function as epistemic infrastructures that classify, legitimize, and execute violence, using Israel's conduct in Gaza as a paradigmatic case. Through the lens of responsibility, the article examines three interrelated dimensions: (a) political responsibility, exploring how states exploit AI to accelerate warfare while evading accountability; (b) professional responsibility, addressing the complicity of technologists, engineers, and defense contractors in the weaponization of data; and (c) personal responsibility, probing the moral agency of individuals who participate in or resist algorithmic governance. This is complemented by an examination of the position and influence of those participating in public discourse, whose narratives often obscure or normalize AI-enabled violence. The Gaza case reveals AI not as a neutral instrument but as an active participant in the reproduction of colonial hierarchies and the normalization of atrocity. Ultimately, the paper calls for a reframing of technological agency and accountability in the age of automated warfare. It concludes that confronting algorithmic violence demands a democratization of AI ethics, one that resists technocratic fatalism and centers the lived realities of those most affected by high-tech militarism.


[53] 2602.09233

Gencho: Room Impulse Response Generation from Reverberant Speech and Text via Diffusion Transformers

Blind room impulse response (RIR) estimation is a core task for capturing and transferring acoustic properties; yet existing methods often suffer from limited modeling capability and degraded performance under unseen conditions. Moreover, emerging generative audio applications call for more flexible impulse response generation methods. We propose Gencho, a diffusion-transformer-based model that predicts complex spectrogram RIRs from reverberant speech. A structure-aware encoder leverages isolation between early and late reflections to encode the input audio into a robust representation for conditioning, while the diffusion decoder generates diverse and perceptually realistic impulse responses from it. Gencho integrates modularly with standard speech processing pipelines for acoustic matching. Results show richer generated RIRs than non-generative baselines while maintaining strong performance in standard RIR metrics. We further demonstrate its application to text-conditioned RIR generation, highlighting Gencho's versatility for controllable acoustic simulation and generative audio tasks.


[54] 2602.09328

In-Hospital Stroke Prediction from PPG-Derived Hemodynamic Features

The absence of pre-hospital physiological data in standard clinical datasets fundamentally constrains the early prediction of stroke, as patients typically present only after stroke has occurred, leaving the predictive value of continuous monitoring signals such as photoplethysmography (PPG) unvalidated. In this work, we overcome this limitation by focusing on a rare but clinically critical cohort - patients who suffered stroke during hospitalization while already under continuous monitoring - thereby enabling the first large-scale analysis of pre-stroke PPG waveforms aligned to verified onset times. Using MIMIC-III and MC-MED, we develop an LLM-assisted data mining pipeline to extract precise in-hospital stroke onset timestamps from unstructured clinical notes, followed by physician validation, identifying 176 patients (MIMIC) and 158 patients (MC-MED) with high-quality synchronized pre-onset PPG data, respectively. We then extract hemodynamic features from PPG and employ a ResNet-1D model to predict impending stroke across multiple early-warning horizons. The model achieves F1-scores of 0.7956, 0.8759, and 0.9406 at 4, 5, and 6 hours prior to onset on MIMIC-III, and, without re-tuning, reaches 0.9256, 0.9595, and 0.9888 on MC-MED for the same horizons. These results provide the first empirical evidence from real-world clinical data that PPG contains predictive signatures of stroke several hours before onset, demonstrating that passively acquired physiological signals can support reliable early warning, supporting a shift from post-event stroke recognition to proactive, physiology-based surveillance that may materially improve patient outcomes in routine clinical care.


[55] 2602.09473

XLB: A High Performance Layer-7 Load Balancer for Microservices using eBPF-based In-kernel Interposition

L7 load balancers are a fundamental building block in microservices as they enable fine-grained traffic distribution. Compared to monolithic applications, microservices demand higher performance and stricter isolation from load balancers. This is due to the increased number of instances, longer service chains, and the necessity for co-location with services on the same host. Traditional sidecar-based load balancers are ill-equipped to meet these demands, often resulting in significant performance degradation. In this work, we present XLB, a novel architecture that reshapes L7 load balancers as in-kernel interposition operating on the socket layer. We leverage eBPF to implement the core load balancing logic in the kernel, and address the connection management and state maintenance challenges through novel socket layer redirection and nested eBPF maps designs. XLB eliminates the extra overhead of scheduling, communication, and data movement, resulting in a more lightweight, scalable, and efficient L7 load balancer architecture. Compared to the widely used microservices load balancers (Istio and Cilium), over 50 microservice instances, XLB achieves up to 1.5x higher throughput and 60% lower end-to-end latency.


[56] 2602.09493

QoS Identifier and Slice Mapping in 5G and Non-Terrestrial Network Interconnected Systems

The interconnection of 5G and non-terrestrial networks (NTNs) has been actively studied to expand connectivity beyond conventional terrestrial infrastructure. In the 3GPP standardization of 5G systems, the 5G Quality of Service (QoS) Identifier (5QI) is defined to characterize the QoS requirements of different traffic requirements. However, it falls short in capturing the diverse latency, capacity, and reliability profiles of NTN environments, particularly when NTNs are used as backhaul. Furthermore, it is difficult to manage individual traffic flows and perform efficient resource allocation and routing when a large number of 5G traffic flows are present in NTN systems. To address these challenges, we propose an optimization framework that enhances QoS handling by introducing an NTN QoS Identifier (NQI) and grouping 5G traffic into NTN slices based on similar requirements. This enables unified resource control and routing for a large number of 5G flows in NTN systems. In this paper, we present the detailed procedure of the proposed framework, which consists of 5QI to NQI mapping, NTN traffic to NTN slice mapping, and slice-level flow and routing optimization. We evaluate the framework by comparing multiple mapping schemes through numerical simulations and analyze their impact on overall network performance.


[57] 2602.09597

Detecting radar targets swarms in range profiles with a partially complex-valued neural network

Correctly detecting radar targets is usually challenged by clutter and waveform distortion. An additional difficulty stems from the relative proximity of several targets, the latter being perceived as a single target in the worst case, or influencing each other's detection thresholds. The negative impact of targets proximity notably depends on the range resolution defined by the radar parameters and the adaptive threshold adopted. This paper addresses the matter of targets detection in radar range profiles containing multiple targets with varying proximity and distorted echoes. Inspired by recent contributions in the radar and signal processing literature, this work proposes partially complex-valued neural networks as an adaptive range profile processing. Simulated datasets are generated and experiments are conducted to compare a common pulse compression approach with a simple neural network partially defined by complex-valued parameters. Whereas the pulse compression processes one pulse length at a time, the neural network put forward is a generative architecture going through the entire received signal in one go to generate a complete detection profile.


[58] 2602.09633

ISO FastLane: Faster ISO 11783 with Dual Stack Approach as a Short Term Solution

The agricultural industry has been searching for a high-speed successor to the 250~kbit/s CAN bus backbone of ISO~11783 (ISOBUS) for over a decade, yet no protocol-level solution has reached standardization. Meanwhile, modern planters, sprayers, and Virtual Terminals are already constrained by the bus bandwidth. This paper presents ISO FastLane, a gateway-less dual-stack approach that routes point-to-point ISOBUS traffic over Ethernet while keeping broadcast messages on the existing CAN bus. The solution requires no new state machines, no middleware, and no changes to application layer code: only a simple Layer~3 routing decision and a lightweight peer discovery mechanism called Augmented Address Claim (AACL). Legacy devices continue to operate unmodified and unaware of FastLane traffic. Preliminary tests reported on the paper demonstrate that ISO FastLane accelerates Virtual Terminal object pool uploads by factor of 8 and sustains Task Controller message rates over 100 times beyond the current specification limit. Because ISO FastLane builds entirely on existing J1939 and ISO~11783 conventions, it can be implemented by ISOBUS engineers in a matter of weeks. This is delivering tangible performance gains today, without waiting for the long-term High Speed ISOBUS solution.


[59] 2602.09645

Impact of Market Reforms on Deterministic Frequency Deviations in the European Power Grid

Deterministic frequency deviations (DFDs) are systematic and predictable excursions of grid frequency that arise from synchronized generation ramps induced by electricity market scheduling. In this paper, we analyze the impact of the European day-ahead market reform of 1 October 2025, which replaced hourly trading blocks with quarter-hourly blocks, on DFDs in the Central European synchronous area. Using publicly available frequency measurements, we compare periods before and after the reform based on daily frequency profiles, indicators characterizing frequency deviations, principal component analysis, Fourier-based functional data analysis, and power spectral density analysis. We show that the reform substantially reduces characteristic hourly frequency deviations and suppresses dominant spectral components at hourly and half-hourly time scales, while quarter-hourly structures gain relative importance. While the likelihood of large frequency deviations decreases overall, reductions for extreme events are less clear and depend on the metric used. Our results demonstrate that market design reforms can effectively mitigate systematic frequency deviations, but also highlight that complementary technical and regulatory measures are required to further reduce large frequency excursions in low-inertia power systems.


[60] 2602.09667

Differentiable Modeling for Low-Inertia Grids: Benchmarking PINNs, NODEs, and DP for Identification and Control of SMIB System

The transition toward low-inertia power systems demands modeling frameworks that provide not only accurate state predictions but also physically consistent sensitivities for control. While scientific machine learning offers powerful nonlinear modeling tools, the control-oriented implications of different differentiable paradigms remain insufficiently understood. This paper presents a comparative study of Physics-Informed Neural Networks (PINNs), Neural Ordinary Differential Equations (NODEs), and Differentiable Programming (DP) for modeling, identification, and control of power system dynamics. Using the Single Machine Infinite Bus (SMIB) system as a benchmark, we evaluate their performance in trajectory extrapolation, parameter estimation, and Linear Quadratic Regulator (LQR) synthesis. Our results highlight a fundamental trade-off between data-driven flexibility and physical structure. NODE exhibits superior extrapolation by capturing the underlying vector field, whereas PINN shows limited generalization due to its reliance on a time-dependent solution map. In the inverse problem of parameter identification, while both DP and PINN successfully recover the unknown parameters, DP achieves significantly faster convergence by enforcing governing equations as hard constraints. Most importantly, for control synthesis, the DP framework yields closed-loop stability comparable to the theoretical optimum. Furthermore, we demonstrate that NODE serves as a viable data-driven surrogate when governing equations are unavailable.


[61] 2602.09668

Gas Line Absorption Mitigation in Hollow-Core Fibre using Spectral Pre-Equalisation

We study the impact of CO 2 absorption on hollow-core fibre transmission. Using spectral pre-equalisation, we digitally post-compensate gas-line absorption and show a 5.5 dB reduction in Q-factor penalty, outperforming a 383-tap equaliser by 1.3 dB.


[62] 2602.09735

An open-source implementation of a closed-loop electrocorticographic Brain-Computer Interface using Micromed, FieldTrip, and PsychoPy

We present an open-source implementation of a closed-loop Brain-Computer Interface (BCI) system based on electrocorticographic (ECoG) recordings. Our setup integrates FieldTrip for interfacing with a Micromed acquisition system and PsychoPy for implementing experiments. We open-source three custom Python libraries (psychopylib, pymarkerlib, and pyfieldtriplib) each covering different aspects of a closed-loop BCI interface: designing interactive experiments, sending event information, and real-time signal processing. Our modules facilitate the design and operation of a transparent BCI system, promoting customization and flexibility in BCI research, and lowering the barrier for researchers to translate advances in ECoG decoding into BCI applications.


[63] 2602.09823

Covo-Audio Technical Report

In this work, we present Covo-Audio, a 7B-parameter end-to-end LALM that directly processes continuous audio inputs and generates audio outputs within a single unified architecture. Through large-scale curated pretraining and targeted post-training, Covo-Audio achieves state-of-the-art or competitive performance among models of comparable scale across a broad spectrum of tasks, including speech-text modeling, spoken dialogue, speech understanding, audio understanding, and full-duplex voice interaction. Extensive evaluations demonstrate that the pretrained foundation model exhibits strong speech-text comprehension and semantic reasoning capabilities on multiple benchmarks, outperforming representative open-source models of comparable scale. Furthermore, Covo-Audio-Chat, the dialogue-oriented variant, demonstrates strong spoken conversational abilities, including understanding, contextual reasoning, instruction following, and generating contextually appropriate and empathetic responses, validating its applicability to real-world conversational assistant scenarios. Covo-Audio-Chat-FD, the evolved full-duplex model, achieves substantially superior performance on both spoken dialogue capabilities and full-duplex interaction behaviors, demonstrating its competence in practical robustness. To mitigate the high cost of deploying end-to-end LALMs for natural conversational systems, we propose an intelligence-speaker decoupling strategy that separates dialogue intelligence from voice rendering, enabling flexible voice customization with minimal text-to-speech (TTS) data while preserving dialogue performance. Overall, our results highlight the strong potential of 7B-scale models to integrate sophisticated audio intelligence with high-level semantic reasoning, and suggest a scalable path toward more capable and versatile LALMs.


[64] 2602.09928

Safe Feedback Optimization through Control Barrier Functions

Feedback optimization refers to a class of methods that steer a control system to a steady state that solves an optimization problem. Despite tremendous progress on the topic, an important problem remains open: enforcing state constraints at all times. The difficulty in addressing it lies on mediating between the safety enforcement and the closed-loop stability, and ensuring the equivalence between closed-loop equilibria and the optimization problem's critical points. In this work, we present a feedback-optimization method that enforces state constraints at all times employing high-order control-barrier functions. We provide several results on the proposed controller dynamics, including well-posedness, safety guarantees, equivalence between equilibria and critical points, and local and global (in certain convex cases) asymptotic stability of optima. Various simulations illustrate our results.


[65] 2602.10007

A Collaborative Safety Shield for Safe and Efficient CAV Lane Changes in Congested On-Ramp Merging

Lane changing in dense traffic is a significant challenge for Connected and Autonomous Vehicles (CAVs). Existing lane change controllers primarily either ensure safety or collaboratively improve traffic efficiency, but do not consider these conflicting objectives together. To address this, we propose the Multi-Agent Safety Shield (MASS), designed using Control Barrier Functions (CBFs) to enable safe and collaborative lane changes. The MASS enables collaboration by capturing multi-agent interactions among CAVs through interaction topologies constructed as a graph using a simple algorithm. Further, a state-of-the-art Multi-Agent Reinforcement Learning (MARL) lane change controller is extended by integrating MASS to ensure safety and defining a customised reward function to prioritise efficiency improvements. As a result, we propose a lane change controller, known as MARL-MASS, and evaluate it in a congested on-ramp merging simulation. The results demonstrate that MASS enables collaborative lane changes with safety guarantees by strictly respecting the safety constraints. Moreover, the proposed custom reward function improves the stability of MARL policies trained with a safety shield. Overall, by encouraging the exploration of a collaborative lane change policy while respecting safety constraints, MARL-MASS effectively balances the trade-off between ensuring safety and improving traffic efficiency in congested traffic. The code for MARL-MASS is available with an open-source licence at this https URL


[66] 2602.10044

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM, which demonstrate significant improvements in sample efficiency and cumulative return compared to their baseline counterparts.


[67] 2602.10058

Evaluating Disentangled Representations for Controllable Music Generation

Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying properties of these embeddings remain underexplored. In this work, we evaluate such disentangled representations in a set of music audio models for controllable generation using a probing-based framework that goes beyond standard downstream tasks. The selected models reflect diverse unsupervised disentanglement strategies, including inductive biases, data augmentations, adversarial objectives, and staged training procedures. We further isolate specific strategies to analyze their effect. Our analysis spans four key axes: informativeness, equivariance, invariance, and disentanglement, which are assessed across datasets, tasks, and controlled transformations. Our findings reveal inconsistencies between intended and actual semantics of the embeddings, suggesting that current strategies fall short of producing truly disentangled representations, and prompting a re-examination of how controllability is approached in music generation.


[68] 2305.05857

Diffusion-based Signal Refiner for Speech Enhancement and Separation

Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful generative capability of diffusion models' prior distributions to address this fundamental issue. Diffiner leverages the probabilistic generative framework of diffusion models and learns natural prior distributions of clean speech to convert outputs from existing speech processing systems into perceptually natural high-quality audio. In contrast to conventional deterministic approaches, our method simultaneously analyzes both the original degraded speech and the pre-processed speech to accurately identify unnatural artifacts introduced during processing. Then, through the iterative sampling process of the diffusion model, these degraded portions are replaced with perceptually natural and high-quality speech segments. Experimental results indicate that Diffiner can recover a clearer harmonic structure of speech, which is shown to result in improved perceptual quality w.r.t. several metrics as well as in a human listening test. This highlights Diffiner's efficacy as a versatile post-processor for enhancing existing speech processing pipelines.


[69] 2402.00859

Deep Room Impulse Response Completion

Rendering immersive spatial audio in virtual reality (VR) and video games demands a fast and accurate generation of room impulse responses (RIRs) to recreate auditory environments plausibly. However, the conventional methods for simulating or measuring long RIRs are either computationally intensive or challenged by low signal-to-noise ratios. This study is propelled by the insight that direct sound and early reflections encapsulate sufficient information about room geometry and absorption characteristics. Building upon this premise, we propose a novel task termed "RIR completion," aimed at synthesizing the late reverberation given only the early portion (50 ms) of the response. To this end, we introduce DECOR, Deep Exponential Completion Of Room impulse responses, a deep neural network structured as an autoencoder designed to predict multi-exponential decay envelopes of filtered noise sequences. The interpretability of DECOR's output facilitates its integration with diverse rendering techniques. The proposed method is compared against an adapted state-of-the-art network, and comparable performance shows promising results supporting the feasibility of the RIR completion task. The RIR completion can be widely adapted to enhance RIR generation tasks where fast late reverberation approximation is required.


[70] 2403.15804

Semi-on-Demand Hybrid Transit Route Design with Shared Autonomous Mobility Services

Shared Autonomous Vehicles (SAVs) enable transit agencies to design more agile and responsive services at lower operating costs. This study designs and evaluates a semi-on-demand hybrid route directional service in the public transit network, offering on-demand flexible route service in low-density areas and fixed route service in higher-density areas. We develop analytically tractable cost expressions that capture access, waiting, and riding costs for users, and distance-based operating and time-based vehicle costs for operators. Two formulations are presented for strategic and tactical decisions in flexible route portion, fleet size, headway, and vehicle size optimization, enabling the determination of route types between fixed, hybrid, and flexible routes based on demand, cost, and operational parameters. Analytical results demonstrate that the lower operating costs of SAVs favor more flexible route services. The practical applications and benefits of semi-on-demand feeders are presented with numerical examples and a large-scale case study in the Chicago metropolitan area, USA. Findings reveal scenarios in which flexible route portions serving passengers located further away reduce total costs, particularly user costs, whereas higher demand densities favor more traditional line-based operations. Current cost forecasts suggest smaller vehicles with fully flexible routes are optimal, but operating constraints or higher operating costs would favor larger vehicles with hybrid routes. The study provides an analytical tool to design SAVs as directional services and transit feeders, and tractable continuous approximation formulations for planning and research in transit network design.


[71] 2412.07818

A Real-Time DDS-Based Chest X-Ray Decision Support System for Resource-Constrained Clinics

Internet of Things (IoT)-based healthcare systems offer significant potential for improving healthcare delivery in humanitarian and resource-constrained environments, providing essential services to underserved populations in remote areas. However, limited network infrastructure in such regions makes reliable communication challenging for traditional IoT systems. This paper presents a real-time chest X-ray decision support system designed for hospitals in remote locations. The proposed system integrates a fine-tuned ResNet50 deep learning model for disease classification with Fast DDS real-time middleware to ensure reliable and low-latency communication between healthcare practitioners and the inference system. Experimental results show that the model achieves an accuracy of 88.61%, precision of 88.76%, and recall of 88.49%. The system attains an average throughput of 3.2 KB/s and an average latency of 65 ms, demonstrating its suitability for deployment in bandwidth-constrained environments. These results highlight the effectiveness of DDS-based middleware in enabling real-time medical decision support for remote healthcare applications.


[72] 2503.04681

Mixed Near-field and Far-field Target Localization for Low-altitude Economy

In this paper, we study efficient \emph{mixed near-field and far-field} target localization methods in extremely large-scale multiple-input multiple-output (XL-MIMO) systems Compared with existing works, we address two new challenges in target localization of MIMO communication systems via using decoupled subspace methods, arising from the half-wavelength antenna spacing constraint and \emph{hybrid uniform planar array} (UPA) this http URL this end, we propose a new three-step mixed-field localization method. First, we reconstruct the equivalent signals received at UPA antennas by judiciously designing analog combining matrices over time with minimum recovery this http URL, based on recovered signals, we extend the modified multiple signal classification (MUSIC) algorithm to the UPA architectures by constructing a new covariance matrix of a virtual sparse UPA (S-UPA) to decouple the 2D angles and range this http URL to the structure of the S-UPA, there exist ambiguous angles when estimating true angles of this http URL the third step, we design an effective classification method to distinguish mixed-field targets, determine true angles of all targets, as well as estimate the ranges of near-field this http URL particular, angular ambiguity is resolved by showing an important fact that the three types of estimated angles (i.e., far-field, near-field, and ambiguous angles) exhibit significantly different patterns in the range-domain MUSIC this http URL, to characterize the estimation error lower-bound, we obtain a matrix closed-form Cramér-Rao bounds for mixed-field target this http URL, numerical results demonstrate the effectiveness of our proposed mixed-field localization method, which improves target-classification accuracy and achieves a lower root mean square error than various benchmark schemes.


[73] 2505.10053

Analysis of the Range Ambiguity Function of Narrowband Near-field MIMO Sensing

This paper compares the sensing performance of a narrowband near-field system across several practical antenna array geometries and SIMO/MISO and MIMO configurations. For identical transmit and receive apertures, MIMO processing is equivalent to squaring the near-field array factor, resulting in improved beamdepth and sidelobe level. Analytical derivations, supported by simulations, show that the MIMO processing improves the maximum near-field sensing range and resolution by approximately a factor of 1.4 compared to a single-aperture system. Using a quadratic approximation of the mainlobe of the array factor, an analytical improvement factor of $\sqrt{2}$ is derived, validating the numerical results. Finally, MIMO is shown to improve the poor sidelobe performance observed in the near-field by a factor of two, due to squaring of the array factor.


[74] 2505.21872

Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images

Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first-order algorithms are used to unlearn. Our method introduces tunable loss design for controlling the forgetting-retention tradeoff and supports novel model composition strategies that merge the strengths of distinct unlearning runs. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work establishes machine unlearning as a modular, practical alternative to retraining for real-world model maintenance in clinical applications.


[75] 2508.11668

Neural Gaussian Radio Fields for Channel Estimation

Accurate channel state information (CSI) is a critical bottleneck in modern wireless networks, with pilot overhead consuming 11\% to 21\% of transmission bandwidth and feedback delays causing severe throughput degradation under mobility. Addressing this requires rethinking how neural fields represent coherent wave phenomena. This work introduces \textit{neural Gaussian radio fields (\textcolor{stanfordred}{nGRF})}, a physics-informed framework that fundamentally reframes neural field design by replacing view-dependent rasterization with direct complex-valued aggregation in 3D space. This approach natively models wave superposition rather than visual occlusion. The architectural shift transforms the learning objective from function-fitting to source-recovery, a well-posed inverse problem grounded in electromagnetic theory. While demonstrated for wireless channel estimation, the core principle of explicit primitive-based fields with physics-constrained aggregation extends naturally to any coherent wave-based domain, including acoustic propagation, seismic imaging, and ultrasound reconstruction. Evaluations show that the inductive bias of \textcolor{stanfordred}{nGRF} achieves 10.9 dB higher prediction SNR than state-of-the-art methods with 220$\times$ faster inference (1.1 ms vs. 242 ms), 18$\times$ lower measurement density, and 180$\times$ faster training. For large-scale outdoor environments where implicit methods fail, \textcolor{stanfordred}{nGRF} achieves 28.32 dB SNR, demonstrating that structured representations supplemented by domain physics can fundamentally outperform generic deep learning architectures.


[76] 2508.12059

Co-Investment with Payoff-Sharing Mechanism for Cooperative Decision-Making in Network Design Games

Network-based systems are inherently interconnected, with the design and performance of subnetworks being interdependent. However, the decisions of self-interested operators may lead to suboptimal outcomes for users and the overall system. This paper explores cooperative mechanisms that can simultaneously benefit both operators and users. We address this challenge using a game-theoretical framework that integrates both non-cooperative and cooperative game theory. In the non-cooperative stage, we propose a network design game in which subnetwork decision-makers strategically design local infrastructures. In the cooperative stage, co-investment with payoff-sharing mechanism is developed to enlarge collective benefits and fairly distribute them. To demonstrate the effectiveness of our framework, we conduct case studies on the Sioux Falls network and real-world public transport networks in Zurich and Winterthur, Switzerland. Our evaluation considers impacts on environmental sustainability, social welfare, and economic efficiency. The proposed framework provides a foundation for improving interdependent networked systems by enabling strategic cooperation among self-interested operators.


[77] 2509.19612

Federated Aggregation of Demand Flexibility

This paper proposes a federated framework for demand flexibility aggregation to support grid operations. Unlike existing geometric methods that rely on a static, pre-defined base set as the geometric template for aggregation, our framework establishes a true federated process by enabling the collaborative optimization of this base set without requiring the participants sharing sensitive data with the aggregator. Specifically, we first formulate the base set optimization problem as a bilevel program. Using optimal solution functions, we then reformulate the bilevel program into a single-level, unconstrained learning task. By exploiting the decomposable structure of the overall gradient, we further design a decentralized gradient-based algorithm to solve this learning task. The entire framework, encompassing base set optimization, aggregation, and disaggregation, operates by design without exchanging raw user data. Numerical results demonstrate that our proposed framework unlocks substantially more flexibility than the approaches with static base sets, thus providing a promising framework for efficient and privacy-enhanced approaches to coordinate demand flexibility at scale.


[78] 2510.12832

Coherent Load Profile Synthesis with Conditional Diffusion for LV Distribution Network Scenario Generation

Limited visibility of distribution network power flows at the low voltage level presents challenges to both distribution network operators from a planning perspective and distribution system operators from a congestion management perspective. More representative loads are required to support meaningful analysis of LV substations; otherwise, such analysis risks misinforming future decisions. Traditional load profiling relies on typical profiles, oversimplifying substation-level complexity. Generative models have attempted to address this through synthesising representative loads from historical exemplars; however, while these approaches can approximate load shapes to a convincing degree of fidelity, analysis of the co-behaviour between substations is limited, which ultimately impacts higher voltage level network operation. This limitation will become even more pronounced with the increasing integration of low-carbon technologies, as estimates of base loads fail to capture load diversity. To address this gap, Conditional Diffusion models for synthesising daily active and reactive power profiles at the low voltage distribution substation level are proposed. The evaluation of fidelity is demonstrated through conventional metrics capturing temporal and statistical realism, as well as power flow modelling. Multiple models are proposed to handle varying levels of data availability, ranging from unconditional synthesis to an informed generation driven by metadata and daily statistics. The results show synthesised load profiles are plausible both independently and as a cohort in a wider power systems context. The Conditional Diffusion model is benchmarked against both naive and state-of-the-art models to demonstrate its effectiveness in producing realistic scenarios on which to base sub-regional power distribution network planning and operations.


[79] 2511.06971

MARBLE-Net: Learning to Localize in Multipath Environment with Adaptive Rainbow Beams

Integrated sensing and communication (ISAC) systems demand precise and efficient target localization, a task challenged by rich multipath propagation in complex wireless environments. This paper introduces MARBLE-Net (Multipath-Aware Rainbow Beam Learning Network), a deep learning framework that jointly optimizes the analog beamforming parameters of a frequency-dependent rainbow beam and a neural localization network for high-accuracy position estimation. By treating the phase-shifter (PS) and true-time-delay (TTD) parameters as learnable weights, the system adaptively refines its sensing beam to exploit environment-specific multipath characteristics. A structured multi-stage training strategy is proposed to ensure stable convergence and effective end-to-end optimization. Simulation results show that MARBLE-Net outperforms both a fixed-beam deep learning baseline (RaiNet) and a traditional k-nearest neighbors (k-NN) method, reducing localization error by more than 50\% in a multipath-rich scene. Moreover, the results reveal a nuanced interaction with multipath propagation: while confined uni-directional multipath degrades accuracy, structured and directional multipath can be effectively exploited to achieve performance surpassing even line-of-sight (LoS) conditions.


[80] 2511.14410

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation

Speech-LLM models have demonstrated great performance in multi-modal and multi-task speech understanding. A typical speech-LLM paradigm is integrating speech modality with a large language model (LLM). While the Whisper encoder was frequently adopted in previous studies for speech input, it shows limitations regarding input format, model scale, and semantic performance. To this end, we propose a lightweight TTA model specialized in speech semantics for more effective LLM integration. With large-scale training of 358k hours of speech data on multilingual speech recognition (ASR), speech translation (ST) and speech-text alignment tasks, TTA is capable of producing robust cross-lingual speech representations. Extensive evaluations across diverse benchmarks, including ASR/ST, speech retrieval, and ASR-LLM performance assessments, demonstrate TTA's superiority over Whisper. Furthermore, we rigorously validate the interplay between cross-lingual capabilities and ASR/ST performance. The model weights and training recipes of TTA will be released as part of an audio understanding toolkit Auden.


[81] 2512.07699

Linear Quadratic Control with Non-Markovian and Non-Semimartingale Noise Models

The standard linear quadratic Gaussian (LQG) framework assumes a Brownian noise process and relies on classical stochastic calculus tools, such as those based on Itô calculus. In this paper, we solve a generalized linear quadratic optimal control problem where the process and measurement noises can be non-Markovian and non-semimartingale stochastic processes with sample paths that have low Hölder regularity. Since these noise models do not, in general, permit the use of the standard Itô calculus, we employ rough path theory to formulate and solve the problem. By leveraging signature representations and controlled rough paths, we derive the optimal state estimation and control strategies.


[82] 2512.08544

Optimal Control of Behavioral-Feedback SIR Epidemic Model

We consider a behavioral-feedback SIR epidemic model, in which the infection rate depends in feedback on the fractions of susceptible and infected agents, respectively. The considered model allows one to account for endogenous adaptation mechanisms of the agents in response to the epidemics, such as voluntary social distancing, or the adoption of face masks. For this model, we formulate an optimal control problem for a social planner that has the ability to reduce the infection rate to keep the infection curve below a certain threshold within an infinite time horizon, while minimizing the intervention cost. Based on the dynamic properties of the model, we prove that, under quite general conditions on the infection rate, the filling the box strategy is the optimal control. This strategy consists in letting the epidemics spread without intervention until the threshold is reached, then applying the minimum control that leaves the fraction of infected individuals constantly at the threshold until the reproduction number becomes less than one and the infection naturally fades out. Our result generalizes one available in the literature for the equivalent problem formulated for the classical SIR model, which can be recovered as a special case of our model when the infection rate is constant. Our contribution enhances the understanding of epidemic management with adaptive human behavior, offering insights for robust containment strategies.


[83] 2512.08608

NLoS Localization with Single Base Station Based on Radio Map

Accurate outdoor localization in Non-Line-of-Sight (NLoS) environments remains a critical challenge for wireless communication and sensing systems. Existing methods, including positioning based on the Global Navigation Satellite System (GNSS) and triple Base Stations (BSs) techniques, cannot provide reliable performance under NLoS conditions, particularly in dense urban areas with strong multipath effects. To address this limitation, we propose a single BS localization framework that integrates sequential signal measurements with prior radio information embedded in the Radio Map (RM). Using temporal measurement features and matching them with radio maps, the proposed method effectively mitigates the adverse impact of multipath propagation and reduces the dependence on LoS paths. Simulation experiments further evaluate the impact of different radio map construction strategies and the varying lengths of the measurement sequence on localization accuracy. Results demonstrate that the proposed scheme achieves sub-meter positioning accuracy in typical NLoS environments, highlighting its potential as a practical and robust solution for single-base-station deployment.


[84] 2601.20904

ECGFlowCMR: Pretraining with ECG-Generated Cine CMR Helps Cardiac Disease Classification and Phenotype Prediction

Cardiac Magnetic Resonance (CMR) imaging provides a comprehensive assessment of cardiac structure and function but remains constrained by high acquisition costs and reliance on expert annotations, limiting the availability of large-scale labeled datasets. In contrast, electrocardiograms (ECGs) are inexpensive, widely accessible, and offer a promising modality for conditioning the generative synthesis of cine CMR. To this end, we propose ECGFlowCMR, a novel ECG-to-CMR generative framework that integrates a Phase-Aware Masked Autoencoder (PA-MAE) and an Anatomy-Motion Disentangled Flow (AMDF) to address two fundamental challenges: (1) the cross-modal temporal mismatch between multi-beat ECG recordings and single-cycle CMR sequences, and (2) the anatomical observability gap due to the limited structural information inherent in ECGs. Extensive experiments on the UK Biobank and a proprietary clinical dataset demonstrate that ECGFlowCMR can generate realistic cine CMR sequences from ECG inputs, enabling scalable pretraining and improving performance on downstream cardiac disease classification and phenotype prediction tasks.


[85] 2602.02603

EchoJEPA: A Latent Predictive Foundation Model for Echocardiography

Foundation models for echocardiography often struggle to disentangle anatomical signal from the stochastic speckle and acquisition artifacts inherent to ultrasound. We present EchoJEPA, a foundation model trained on 18 million echocardiograms across 300K patients, representing the largest pretraining corpus for this modality to date. By leveraging a latent predictive objective, EchoJEPA learns robust anatomical representations that ignore speckle noise. We validate this using a novel multi-view probing framework with frozen backbones, where EchoJEPA outperforms leading baselines by approximately 20% in left ventricular ejection fraction (LVEF) estimation and 17% in right ventricular systolic pressure (RVSP) estimation. The model also exhibits remarkable sample efficiency, reaching 79% view classification accuracy with only 1% of labeled data versus 42% for the best baseline trained on 100%. Crucially, EchoJEPA demonstrates superior generalization, degrading by only 2% under physics-informed acoustic perturbations compared to 17% for competitors. Most remarkably, its zero-shot performance on pediatric patients surpasses fully fine-tuned baselines, establishing latent prediction as a superior paradigm for robust, generalizable medical AI.


[86] 2602.08163

AFDM: Evolving OFDM Towards 6G+

As the standardization of sixth generation (6G) wireless systems accelerates, there is a growing consensus in favor of evolutionary waveforms that offer new features while maximizing compatibility with orthogonal frequency division multiplexing (OFDM), which underpins the 4G and 5G systems. This article presents affine frequency division multiplexing (AFDM) as a premier candidate for 6G, offering intrinsic robustness for both high-mobility communications and integrated sensing and communication (ISAC) in doubly dispersive channels, while maintaining a high degree of synergy with the legacy OFDM. To this end, we provide a comprehensive analysis of AFDM, starting with a generalized fractional-delay-fractional-Doppler (FDFD) channel model that accounts for practical pulse shaping filters and inter-sample coupling. We then detail the AFDM transceiver architecture, demonstrating that it reuses nearly the entire OFDM pipeline and requires only lightweight digital pre- and post-processing. We also analyze the impact of hardware impairments, such as phase noise and carrier frequency offset, and explore advanced functionalities enabled by the chirp-parameter domain, including index modulation and physical-layer security. By evaluating the reusability across the radio-frequency, physical, and higher layers, the article demonstrates that AFDM provides a low-risk, feature-rich, and efficient path toward achieving high-fidelity communications in the later versions of 6G and beyond (6G+).


[87] 2406.07871

Controllable Dance Generation with Style-Guided Motion Diffusion

Dance plays an important role as an artistic form and expression in human culture, yet automatically generating dance sequences is a significant yet challenging endeavor. Existing approaches often neglect the critical aspect of controllability in dance generation. Additionally, they inadequately model the nuanced impact of music styles, resulting in dances that lack alignment with the expressive characteristics inherent in the conditioned music. To address this gap, we propose Style-Guided Motion Diffusion (SGMD), which integrates the Transformer-based architecture with a Style Modulation module. By incorporating music features with user-provided style prompts, the SGMD ensures that the generated dances not only match the musical content but also reflect the desired stylistic characteristics. To enable flexible control over the generated dances, we introduce a spatial-temporal masking mechanism. As controllable dance generation has not been fully studied, we construct corresponding experimental setups and benchmarks for tasks such as trajectory-based dance generation, dance in-betweening, and dance inpainting. Extensive experiments demonstrate that our approach can generate realistic and stylistically consistent dances, while also empowering users to create dances tailored to diverse artistic and practical needs. Code is available on Github: this https URL


[88] 2510.07342

Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding

Neural encoding models aim to predict fMRI-measured brain responses to natural images. fMRI data is acquired as a 3D volume of voxels, where each voxel has a defined spatial location in the brain. However, conventional encoding models often flatten this volume into a 1D vector and treat voxel responses as independent outputs. This removes spatial context, discards anatomical information, and ties each model to a subject-specific voxel grid. We introduce the Neural Response Function (NRF), a framework that models fMRI activity as a continuous function over anatomical space rather than a flat vector of voxels. NRF represents brain activity as a continuous implicit function: given an image and a spatial coordinate (x, y, z) in standardized MNI space, the model predicts the response at that location. This formulation decouples predictions from the training grid, supports querying at arbitrary spatial resolutions, and enables resolution-agnostic analyses. By grounding the model in anatomical space, NRF exploits two key properties of brain responses: (1) local smoothness -- neighboring voxels exhibit similar response patterns; modeling responses continuously captures these correlations and improves data efficiency, and (2) cross-subject alignment -- MNI coordinates unify data across individuals, allowing a model pretrained on one subject to be fine-tuned on new subjects. In experiments, NRF outperformed baseline models in both intrasubject encoding and cross-subject adaptation, achieving high performance while reducing the data size needed by orders of magnitude. To our knowledge, NRF is the first anatomically aware encoding model to move beyond flattened voxels, learning a continuous mapping from images to brain responses in 3D space.


[89] 2510.21797

Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning

Multimodal learning integrates diverse modalities but suffers from modality imbalance, where dominant modalities suppress weaker ones due to inconsistent convergence rates. Existing methods predominantly rely on static modulation or heuristics, overlooking sample-level distributional variations in prediction bias. Specifically, they fail to distinguish outlier samples where the modality gap is exacerbated by low data quality. We propose a framework to quantitatively diagnose and dynamically mitigate this imbalance at the sample level. We introduce the Modality Gap metric to quantify prediction discrepancies. Analysis reveals that this gap follows a bimodal distribution, indicating the coexistence of balanced and imbalanced sample subgroups. We employ a Gaussian Mixture Model (GMM) to explicitly model this distribution, leveraging Bayesian posterior probabilities for soft subgroup separation. Our two-stage framework comprises a Warm-up stage and an Adaptive Training stage. In the latter, a GMM-guided Adaptive Loss dynamically reallocates optimization priorities: it imposes stronger alignment penalties on imbalanced samples to rectify bias, while prioritizing fusion for balanced samples to maximize complementary information. Experiments on CREMA-D, AVE, and KineticSound demonstrate that our method significantly outperforms SOTA baselines. Furthermore, we show that fine-tuning on a GMM-filtered balanced subset serves as an effective data purification strategy, yielding substantial gains by eliminating extreme noisy samples even without the adaptive loss.


[90] 2510.24372

Bayesian Speech Synthesizers Can Learn from Multiple Teachers

Text-to-Speech (TTS) is inherently a "one-to-many" mapping characterized by intrinsic uncertainty, yet current paradigms often oversimplify it into a deterministic regression task. While continuous-valued autoregressive (AR) models have recently emerged as a promising alternative to discrete codec-based approaches, they typically rely on a fixed-variance prior, fundamentally constraining generation to a static point estimate that ignores the dynamic variability of natural speech. To bridge this gap, we propose BELLE (Bayesian evidential learning with language modelling), a framework that shifts from deterministic prediction to principled Bayesian inference without increasing model parameters or inference latency. By modeling the acoustic target as a Normal-Inverse-Gamma distribution, BELLE captures data-dependent aleatoric uncertainty. To enable accurate variance estimation on standard single-reference datasets, we introduce a "one-to-many" training strategy that leverages synthetic samples as a statistical support set, allowing the model to learn robust distributional properties rather than merely imitating teacher artifacts. Experiments demonstrate that BELLE, trained on only ~5k hours of data, outperforms leading open-source models trained on 50k hours (achieving a 25.8% relative WER reduction) and naturally supports high-quality streaming generation. Audio samples are available at this https URL.


[91] 2511.08019

Model Predictive Control via Probabilistic Inference: A Tutorial and Survey

This paper provides a tutorial and a survey of probabilistic inference-based model predictive control (PI-MPC) for robotics. PI-MPC defines an optimal control distribution shaped by trajectory cost, a control prior, and a temperature parameter. In the tutorial part, we derive this view and describe action generation via variational inference, highlighting Model Predictive Path Integral (MPPI) control as a representative algorithm. In the survey part, we organize the PI-MPC literature around the principal design choices identified in the tutorial: prior distribution design, multi-modal distribution handling, constraint satisfaction, scalability to high-degree-of-freedom robots, hardware acceleration, and theoretical foundations. Overall, this paper aims to serve as a coherent entry point for researchers and practitioners interested in understanding, implementing, and extending PI-MPC.


[92] 2511.08613

Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework

Video editing-based talking face generation aims to preserve video details such as pose, lighting, and gestures while modifying only lip motion, often using an identity reference image to maintain speaker consistency. However, this mechanism can introduce lip leakage, where generated lips are influenced by the reference image rather than solely by the driving audio. Such leakage is difficult to detect with standard metrics and conventional test setup. To address this, we propose a systematic evaluation methodology to analyze and quantify lip leakage. Our framework employs three complementary test setups: silent-input generation, mismatched audio-video pairing, and matched audio-video synthesis. We also introduce derived metrics including lip-sync discrepancy and silent-audio-based lip-sync scores. In addition, we study how different identity reference selections affect leakage, providing insights into reference design. The proposed methodology is model-agnostic and establishes a more reliable benchmark for future research in talking face generation.


[93] 2512.22699

Predictive Modeling of Power Outages during Extreme Events: Integrating Weather and Socio-Economic Factors

This paper presents a novel learning based framework for predicting power outages caused by extreme events. The proposed approach targets low-probability high-consequence outage scenarios and leverages a comprehensive set of features derived from publicly available data sources. We integrate EAGLE-I outage records from 2014 to 2024 with weather, socioeconomic, infrastructure, and seasonal event data. Incorporating social and demographic indicators reveals patterns of community vulnerability and improves understanding of outage risk during extreme conditions. Four machine learning models are evaluated, including Random Forest (RF), Graph Neural Network (GNN), Adaptive Boosting (AdaBoost), and Long Short-Term Memory (LSTM). Experimental validation is performed on a large-scale dataset covering counties in the lower peninsula of Michigan. Among all models tested, the LSTM network achieves higher accuracy.