New articles on Electrical Engineering and Systems Science


[1] 2512.11842

Car-following Models and Congestion Control with Followerstopper on a Ring-Road under Known Delay -- Examining Limit Cycle

This paper examines the IDM microscopic car-following model from a dynamical systems perspective, analyzing the effects of delay on congestion formation. Further, a case of mixed-autonomy is considered by controlling one car with Followerstopper in a ring road setting containing IDM vehicles as human drivers. Specifically, the stop-and-go waves phenomenon in idealized traffic from a dynamical systems perspective is examined. We show that Followerstopper-controlled vehicle is effective at eliminating emergent stop-and-go waves in the IDM traffic simulation. We show through simulation that the uniform flow manifold is unstable for the ring road simulation with IDM vehicles, and that replacing a single car with Followerstopper induces stability, allowing the cars to drive safely at a uniform speed. Additionally, the case of known delay is considered in a mixed-autonomy scenario. Our simulation result shows that while considering a known time delay, traffic waves emerge earlier than in the no-delay case. At the same time, a single-vehicle controlled using Followerstopper controller is able to prevent the emergence of traffic waves even in the presence of delay.


[2] 2512.11987

Pivot-Only Azimuthal Control and Attitude Estimation of Balloon-borne Payloads

This paper presents an attitude estimation and yaw-rate control framework for balloon-borne payloads using pivot-only actuation, motivated by the Taurus experiment. Taurus is a long-duration balloon instrument designed for rapid azimuthal scanning at approximately 30 deg/s using a motorized pivot at the flight-train connection, without a reaction wheel. We model the gondola as a rigid body subject to realistic disturbances and sensing limitations, and implement a Multiplicative Extended Kalman Filter (MEKF) that estimates attitude and gyroscope bias by fusing inertial and vector-camera measurements. A simple PI controller uses the estimated states to regulate yaw rate. Numerical simulations incorporating representative disturbance and measurement noise levels are used to evaluate closed-loop control performance and MEKF behavior under flight-like conditions. Experimental tests on the Taurus gondola validate the pivot-only approach, demonstrating stable high-rate tracking under realistic hardware constraints. The close agreement between simulation and experiment indicates that the simplified rigid-body model captures the dominant dynamics relevant for controller design and integrated estimation-and-control development.


[3] 2512.11999

Taylor-Lagrange Control for Safety-Critical Systems

This paper proposes a novel Taylor-Lagrange Control (TLC) method for nonlinear control systems to ensure the safety and stability through Taylor's theorem with Lagrange remainder. To achieve this, we expand a safety or stability function with respect to time along the system dynamics using the Lie derivative and Taylor's theorem. This expansion enables the control input to appear in the Taylor series at an order equivalent to the relative degree of the function. We show that the proposed TLC provides necessary and sufficient conditions for system safety and is applicable to systems and constraints of arbitrary relative degree. The TLC exhibits connections with existing Control Barrier Function (CBF) and Control Lyapunov Function (CLF) methods, and it further extends the CBF and CLF methods to the complex domain, especially for higher order cases. Compared to High-Order CBFs (HOCBFs), TLC is less restrictive as it does not require forward invariance of the intersection of a set of safe sets while HOCBFs do. We employ TLC to reformulate a constrained optimal control problem as a sequence of quadratic programs with a zero-order hold implementation method, and demonstrate the safety of zero-order hold TLC using an event-triggered control method to address inter-sampling effects. Finally, we illustrate the effectiveness of the proposed TLC method through an adaptive cruise control system and a robot control problem, and compare it with existing CBF methods.


[4] 2512.12016

Bandit-Based Rate Adaptation for a Single-Server Queue

This paper considers the problem of obtaining bounded time-average expected queue sizes in a single-queue system with a partial-feedback structure. Time is slotted; in slot $t$ the transmitter chooses a rate $V(t)$ from a continuous interval. Transmission succeeds if and only if $V(t)\le C(t)$, where channel capacities $\{C(t)\}$ and arrivals are i.i.d. draws from fixed but unknown distributions. The transmitter observes only binary acknowledgments (ACK/NACK) indicating success or failure. Let $\varepsilon>0$ denote a sufficiently small lower bound on the slack between the arrival rate and the capacity region. We propose a phased algorithm that progressively refines a discretization of the uncountable infinite rate space and, without knowledge of $\varepsilon$, achieves a $\mathcal{O}\!\big(\log^{3.5}(1/\varepsilon)/\varepsilon^{3}\big)$ time-average expected queue size uniformly over the horizon. We also prove a converse result showing that for any rate-selection algorithm, regardless of whether $\varepsilon$ is known, there exists an environment in which the worst-case time-average expected queue size is $\Omega(1/\varepsilon^{2})$. Thus, while a gap remains in the setting without knowledge of $\varepsilon$, we show that if $\varepsilon$ is known, a simple single-stage UCB type policy with a fixed discretization of the rate space achieves $\mathcal{O}\!\big(\log(1/\varepsilon)/\varepsilon^{2}\big)$, matching the converse up to logarithmic factors.


[5] 2512.12017

Online Full ZVS Optimization for Modular Multi-Active Bridge Converter in MV PET

Multi-active bridge (MAB) converters, the core of the state-of-the-art medium-voltage power electronic transformers, can flexibly connect multiple DC ports among distributed DC grids and loads, but suffer from hard switching under conventional single phase-shift control, especially under unbalanced voltage conversion ratios and light load conditions. Although some offline methods manage to improve the efficiency through complex optimization structures, there lacks online optimization methods that are simple but effective due to the strong coupling among ports of the converter. By leveraging the time-domain model of the MAB converter under the multiple phase-shift modulation scheme, this paper simplifies the optimization process and proposes an online optimization method that can achieve full zero-voltage switching (ZVS) operation regardless of the load conditions. The proposed method has simple solutions with only voltage conversion ratios involved and can be implemented within a wide operation range without additional sensors or advanced controllers. A four-port MAB converter is constructed as the prototype. The simulation and experimental results have verified the feasibility and superiority of the proposed online strategy in achieving ZVS operation, dynamic response, and efficiency improvement.


[6] 2512.12026

DT-MPC: Synthesizing Derivation-Free Model Predictive Control from Power Converter Netlists via Physics-Informed Neural Digital Twins

Model Predictive Control (MPC) is a powerful control strategy for power electronics, but it highly relies on manually-derived and topology-specific analytical models, which is labor-intensive and time-consuming in practical designs. To overcome this bottleneck, this paper introduces a Digital-Twin-based MPC (DT-MPC) framework for generic power converters that can systematically translate a high-level circuit into an objective-aware control policy by leveraging a DT as a high-fidelity system model. Furthermore, a physics-informed neural surrogate predictor is proposed to accelerate predictions by DT and enable real-time operation. A gradient-free simplex search optimizer is also introduced to efficiently handle complex multi-objective optimization. The efficacy of the framework has been validated through a cloud-to-edge deployment on a 1500 W dual active bridge converter. Experimental results show that the synthesized predictive model achieves an inference speed over 7 times faster than real time, the DT-MPC controller outperforms several human-designed counterparts, and the overall framework reduces engineering design time by over 95\%, verifying the superiority of DT-MPC on generalized power converters.


[7] 2512.12035

Modeling and Analysis of VOC-based Interplant Molecular Communication Channel

Molecular communication (MC) enables information transfer using particles inspired by biological systems. Volatile Organic Compounds (VOCs) are one of the most abundant and diverse classes of signaling molecules used by living or non-living objects. VOC-based MC holds great promise in developing long-range, bio-compatible communication systems capable of interfacing nano- and micro-scale devices. In this paper, we present a comprehensive end-to-end framework for VOC-based interplant MC from an ICT perspective. The communication process is divided into three stages: transmission (VOC biosynthesis and emission from leaves), channel propagation (advection-diffusion in turbulent wind via Gaussian puff for stress-induced VOC release and Gaussian plume for constitutive VOC release), and reception (VOC uptake and physiological response in the receiver plant). Each stage is analyzed by its attenuation and delay. Numerical results demonstrate that VOC-based channels exhibit low-pass behavior, with bandwidth and capacity heavily influenced by distance, wind velocity, and noise. Though the physical channel supports moderate frequencies, biological constraints at the transmitter restrict the end-to-end channel to slow-varying signals.


[8] 2512.12081

Congestion Reduction in EV Charger Placement Using Traffic Equilibrium Models

Growing EV adoption can worsen traffic conditions if chargers are sited without regard to their impact on congestion. We study how to strategically place EV chargers to reduce congestion using two equilibrium models: one based on congestion games and one based on an atomic queueing simulation. We apply both models within a scalable greedy station-placement algorithm. Experiments show that this greedy scheme yields optimal or near-optimal congestion outcomes in realistic networks, even though global optimality is not guaranteed as we show with a counterexample. We also show that the queueing-based approach yields more realistic results than the congestion-game model, and we present a unified methodology that calibrates congestion delays from queue simulation and solves equilibrium in link-space.


[9] 2512.12140

Realizing Space-oriented Control in Smart Buildings via Word Embeddings

This paper presents a novel framework for implementing space-oriented control systems in smart buildings. In contrast to conventional device-oriented approaches, which often suffer from issues related to development efficiency and portability, our framework adopts a space-oriented paradigm that leverages natural language processing and word embedding techniques. The proposed framework features a chat-based graphical user interface (GUI) that converts natural language inputs into actionable OpenAI API calls, thereby enabling intuitive space level (e.g., room) control within smart environments. To support efficient embedding-based search and metadata retrieval, the framework integrates a vector database powered by Elasticsearch. This ensures the accurate identification and invocation of appropriate smart building APIs. A prototype implementation has been tested in a smart building environment at the University of Tokyo, demonstrating the feasibility of the approach.


[10] 2512.12178

Hierarchical Deep Learning for Joint Turbulence and PE Estimation in Multi-Aperture FSO Systems

Accurate characterization of free-space optical (FSO) channels requires joint estimation of transmitter pointing errors, receiver angle-of-arrival (AoA) fluctuations, and turbulence-induced fading. However, existing literature addresses these impairments in isolation, since their multiplicative coupling in the received signal severely limits conventional estimators and prevents simultaneous recovery. In this paper, we introduce a novel multi-aperture FSO receiver architecture that leverages spatial diversity across a lens array to decouple these intertwined effects. Building on this hardware design, we propose a hierarchical deep learning framework that sequentially estimates AoA, transmitter pointing error, and turbulence coefficients. This decomposition significantly reduces learning complexity and enables robust inference even under strong atmospheric fading. Simulation results demonstrate that the proposed method achieves near-MAP accuracy with orders-of-magnitude lower computational cost, and substantially outperforms end-to-end learning baselines in terms of estimation accuracy and generalization. To the best of our knowledge, this is the first work to demonstrate practical joint estimation of these three key parameters, paving the way for reliable, turbulence-resilient multi-aperture FSO systems.


[11] 2512.12180

A Sensing Dataset Protocol for Benchmarking and Multi-Task Wireless Sensing

Wireless sensing has become a fundamental enabler for intelligent environments, supporting applications such as human detection, activity recognition, localization, and vital sign monitoring. Despite rapid advances, existing datasets and pipelines remain fragmented across sensing modalities, hindering fair comparison, transfer, and reproducibility. We propose the Sensing Dataset Protocol (SDP), a protocol-level specification and benchmark framework for large-scale wireless sensing. SDP defines how heterogeneous wireless signals are mapped into a unified perception data-block schema through lightweight synchronization, frequency-time alignment, and resampling, while a Canonical Polyadic-Alternating Least Squares (CP-ALS) pooling stage provides a task-agnostic representation that preserves multipath, spectral, and temporal structures. Built upon this protocol, a unified benchmark is established for detection, recognition, and vital-sign estimation with consistent preprocessing, training, and evaluation. Experiments under the cross-user split demonstrate that SDP significantly reduces variance (approximately 88%) across seeds while maintaining competitive accuracy and latency, confirming its value as a reproducible foundation for multi-modal and multitask sensing research.


[12] 2512.12181

Learning-Driven Dual-Line Laser Scanning for Fast and Accurate LEO Satellite Positioning

Accurate and low-latency positioning is a key enabler for optical links with Low Earth Orbit (LEO) satellites, where millisecond-level beam alignment is required to maintain reliable high-data-rate communication. This paper presents a learning-driven dual-line laser scanning framework for fast and precise satellite positioning. Unlike conventional Gaussian-beam acquisition systems that rely on multiple sequential beams or mechanical steering, the proposed approach employs two orthogonal line-shaped laser beams to perform structured optical scanning over the ambiguity region without any moving parts. A physics-based model incorporating atmospheric attenuation, turbulence, and MRR-based reflection is developed, and a data-driven neural estimator is trained to map received optical energy patterns to the satellite's two-dimensional position. Simulation results demonstrate that the learning-driven method achieves near-MAP accuracy with typical errors of 7-10 m and deterministic scanning time of 1-2 ms, while conventional two-stage Gaussian-beam schemes exhibit comparable errors but random sensing durations of up to 5 ms. The proposed framework therefore offers a favorable trade-off between positioning accuracy, computational complexity, and sensing latency, making it a practical candidate for next-generation optical LEO tracking systems.


[13] 2512.12186

MRR-Based Line-Laser Scanning for Reliable Vehicular Positioning and Optical Communication

High-speed vehicular environments require optical systems capable of joint sensing, positioning, and communication (JSPC) without mechanical tracking. Existing optical and integrated sensing-communication approaches often rely on point-source emitters or camera-based receivers, limiting spatial coverage and update rate under highway dynamics. This work introduces a new class of tracking-free optical JSPC systems that combine structured line-laser illumination with modulating retroreflector (MRR) arrays on vehicles. Two orthogonal line lasers perform synchronized longitudinal and transverse scanning to provide continuous, wide-area coverage across the roadway. A coverage-driven analytical framework models the coupling between beam divergence, scan geometry, and dwell-time allocation, enabling joint evaluation of sensing reliability and communication quality. An optimization scheme is developed to adapt scanning and divergence parameters for uniform coverage and power efficiency. Simulation results demonstrate significant improvements in spatial coverage uniformity, link stability, and reliability within a fixed scan period. These results establish a practical pathway toward scalable, turbulence-resilient optical architectures for next-generation vehicular JSPC networks.


[14] 2512.12197

Braess' Paradoxes in Coupled Power and Transportation Systems

Transportation electrification introduces strong coupling between the power and transportation systems. In this paper, we generalize the classical notion of Braess' paradox to coupled power and transportation systems, and examine how the cross-system coupling induces new types of Braess' paradoxes. To this end, we model the power and transportation networks as graphs, coupled with charging points connecting to nodes in both graphs. The power system operation is characterized by the economic dispatch optimization, while the transportation system user equilibrium models travelers' route and charging choices. By analyzing simple coupled systems, we demonstrate that capacity expansion in either transportation or power system can deteriorate the performance of both systems, and uncover the fundamental mechanisms for such new Braess' paradoxes to occur. We also provide necessary and sufficient conditions of the occurrences of Braess' paradoxes for general coupled systems, leading to managerial insights for infrastructure planners. For general networks, through characterizing the generalized user equilibrium of the coupled systems, we develop efficient algorithms to detect Braess' paradoxes and novel charging pricing policies to mitigate them.


[15] 2512.12204

Rotatable Antenna Array-Enhanced Null Steering: Performance Analysis and Optimization

Conventional fixed-orientation antenna (FOA) arrays offer limited degrees of freedom (DoF) for flexible beamforming such as null steering. To address this limitation, we propose a new rotatable antenna array (RAA) architecture in this paper, which enables three-dimensional (3D) rotational control of an antenna array to provide enhanced spatial flexibility for null steering. To characterize its performance, we aim to jointly optimize the 3D rotational angles of the RAA, to maximize the beam gain over a given desired direction, while nulling those over multiple interference directions under zero-forcing (ZF) beamforming. However, this problem is non-convex and challenging to tackle due to the highly nonlinear expression of the beam gain in terms of the rotational angles. To gain insights, we first examine several special cases including both isotropic and directional antenna radiation patterns, deriving the conditions under which full beam gain can be achieved over the desired direction while meeting the nulling constraints for interference directions. These conditions clearly indicate that compared with FOA arrays, RAAs can significantly relax the angular separation requirement for achieving effective null steering. For other general cases, we propose a sequential update algorithm, that iteratively refines the 3D rotational angles by discretizing the 3D angular search space. To avoid undesired local optimum, a Gibbs sampling (GS) procedure is also employed between two consecutive rounds of sequential update for solution exploration. Simulation results verify our analytical results and show superior null-steering performance of RAAs to FOA arrays.


[16] 2512.12214

Movable Access Points in Visible Light Communications: Opportunities, Challenges and Future Directions

Visible light communication (VLC) is expected to be a key component of future wireless networks due to its abundant license-free spectrum, inherent high-level security, and the already deployed lighting infrastructure. VLC performance, however, depends on device orientation and the availability of an unobstructed line-of-sight (LoS) link, with transmitter semi-angle and receiver field-of-view (FoV) further affecting alignment, coverage, and reliability. Reconfigurable intelligent surfaces (RISs) can mitigate blockages, orientation issues, and mobility challenges, but their data rates remain far below those of direct LoS links. This article introduces the novel concept of movable access points (MAPs)-aided VLC systems, where dynamically repositioned APs provide new degrees of freedom to ensure LoS connectivity, and transmitter-receiver alignment while providing ultra-high data rates for mobile users. Simulation results show MAPs outperform RIS-aided, fixed-AP, and RIS-only VLC systems in dynamic environments. The article also outlines key challenges and future research directions, including integration with emerging wireless technologies.


[17] 2512.12223

Robust Energy-Efficient Sleep-Mode Strategy for Multi-RIS-Aided Cell-Free Massive MIMO

With the explosive growth of data traffic and the ubiquitous connectivity of wireless devices, the energy demands of wireless networks have inevitably escalated. Reconfigurable intelligent surface (RIS) has emerged as a promising solution for 6G networks due to its energy efficiency (EE) and low cost, while cell-free massive multiple-input multiple-output (CF-mMIMO) was proposed as an innovative network architecture without fixed cell boundaries to enhance these measures even further. However, existing studies often assume consistently high traffic loads, neglecting the dynamic nature of user demand. This can result in underutilized access points (APs) and unnecessary energy expenditure during low-demand periods. To tackle the challenge of EE in CF-mMIMO systems during low load periods, this paper proposes a novel energy-efficient transmission scheme that jointly coordinates active APs and multiple passive RISs. Specifically, a dynamic AP sleep-mode strategy is designed, where certain APs are selectively deactivated while nearby RISs assist in maintaining coverage. We formulate the EE maximization objective as a fractional programming problem and adopt the Dinkelbach method in conjunction with alternating optimization (AO) to iteratively solve the three coupled subproblems: (i) AP selection via a hybrid branch-and-bound (BnB) and greedy algorithm, (ii) transmit power optimization using a sequential convex approximation (SCA) method, initialized by a heuristic zero-forcing strategy, and (iii) RIS phase shift optimization using gradient projection. Simulation results show that the proposed scheme achieves significantly higher EE than existing methods in both low and moderate user scenarios.


[18] 2512.12236

Resolution-Independent Neural Operators for Multi-Rate Sparse-View CT

Sparse-view Computed Tomography (CT) reconstructs images from a limited number of X-ray projections to reduce radiation and scanning time, which makes reconstruction an ill-posed inverse problem. Deep learning methods achieve high-fidelity reconstructions but often overfit to a fixed acquisition setup, failing to generalize across sampling rates and image resolutions. For example, convolutional neural networks (CNNs) use the same learned kernels across resolutions, leading to artifacts when data resolution changes. We propose Computed Tomography neural Operator (CTO), a unified CT reconstruction framework that extends to continuous function space, enabling generalization (without retraining) across sampling rates and image resolutions. CTO operates jointly in the sinogram and image domains through rotation-equivariant Discrete-Continuous convolutions parametrized in the function space, making it inherently resolution- and sampling-agnostic. Empirically, CTO enables consistent multi-sampling-rate and cross-resolution performance, with on average >4dB PSNR gain over CNNs. Compared to state-of-the-art diffusion methods, CTO is 500$\times$ faster in inference time with on average 3dB gain. Empirical results also validate our design choices behind CTO's sinogram-space operator learning and rotation-equivariant convolution. Overall, CTO outperforms state-of-the-art baselines across sampling rates and resolutions, offering a scalable and generalizable solution that makes automated CT reconstruction more practical for deployment.


[19] 2512.12279

WATOS: Efficient LLM Training Strategies and Architecture Co-exploration for Wafer-scale Chip

Training large language models (LLMs) imposes extreme demands on computation, memory capacity, and interconnect bandwidth, driven by their ever-increasing parameter scales and intensive data movement. Wafer-scale integration offers a promising solution by densely integrating multiple single-die chips with high-speed die-to-die (D2D) interconnects. However, the limited wafer area necessitates trade-offs among compute, memory, and communication resources. Fully harnessing the potential of wafer-scale integration while mitigating its architectural constraints is essential for maximizing LLM training performance. This imposes significant challenges for the co-optimization of architecture and training strategies. Unfortunately, existing approaches all fall short in addressing these challenges. To bridge the gap, we propose WATOS, a co-exploration framework for LLM training strategy and wafer-scale architecture. We first define a highly configurable hardware template designed to explore optimal architectural parameters for wafer-scale chips. Based on it, we capitalize on the high D2D bandwidth and fine-grained operation advantages inherent to wafer-scale chips to explore optimal parallelism and resource allocation strategies, effectively addressing the memory underutilization issues during LLM training. Compared to the state-of-the-art (SOTA) LLM training framework Megatron and Cerebras' weight streaming wafer training strategy, WATOS can achieve an average overall throughput improvement of 2.74x and 1.53x across various LLM models, respectively. In addition, we leverage WATOS to reveal intriguing insights about wafer-scale architecture design with the training of LLM workloads.


[20] 2512.12284

V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval

Streaming video large language models (LLMs) are increasingly used for real-time multimodal tasks such as video captioning, question answering, conversational agents, and augmented reality. However, these models face fundamental memory and computational challenges because their key-value (KV) caches grow substantially with continuous streaming video input. This process requires an iterative prefill stage, which is a unique feature of streaming video LLMs. Due to its iterative prefill stage, it suffers from significant limitations, including extensive computation, substantial data transfer, and degradation in accuracy. Crucially, this issue is exacerbated for edge deployment, which is the primary target for these models. In this work, we propose V-Rex, the first software-hardware co-designed accelerator that comprehensively addresses both algorithmic and hardware bottlenecks in streaming video LLM inference. At its core, V-Rex introduces ReSV, a training-free dynamic KV cache retrieval algorithm. ReSV exploits temporal and spatial similarity-based token clustering to reduce excessive KV cache memory across video frames. To fully realize these algorithmic benefits, V-Rex offers a compact, low-latency hardware accelerator with a dynamic KV cache retrieval engine (DRE), featuring bit-level and early-exit based computing units. V-Rex achieves unprecedented real-time of 3.9-8.3 FPS and energy-efficient streaming video LLM inference on edge deployment with negligible accuracy loss. While DRE only accounts for 2.2% power and 2.0% area, the system delivers 1.9-19.7x speedup and 3.1-18.5x energy efficiency improvements over AGX Orin GPU. This work is the first to comprehensively tackle KV cache retrieval across algorithms and hardware, enabling real-time streaming video LLM inference on resource-constrained edge devices.


[21] 2512.12368

XR Capacity Enhancement through Multi-Connected XR Tethering Groups

Extended Reality (XR) applications have limited capacity in 5th generation-advanced (5G-A) cellular networks due to high throughput requirements coupled with strict latency and high reliability constraints. To enhance XR capacity in the downlink (DL), this paper investigates multi-connected XR tethering groups (TGrs), comprising an XR device and a cooperating 5G-A device. This paper presents investigations for two types of cooperation within XR TGr, i.e., selection combining (SC) and soft combining and their impact on the XR capacity of the network. These investigations consider joint hybrid automatic repeat request (HARQ) feedback processing algorithm and also propose enhanced joint Outer Loop Link Adaptation (OLLA) algorithm to leverage the benefits of multi-connectivity. These enhancements aim to improve the spectral efficiency of the network by limiting HARQ retransmissions and enabling the use of higher modulation and coding scheme (MCS) indices for given signal-to-interference-plus-noise ratio (SINR), all while maintaining or operating below than the target block error rate (BLER). Dynamic system-level simulation demonstrate that XR TGrs with soft combining achieve performance improvements of 23 - 42% in XR capacity with only XR users and 38-173% in the coexistence scenarios consisting of XR users and enhanced mobile broadband (eMBB) user. Furthermore, the enhanced joint OLLA algorithm enables similar performance gains even when only one device per XR TGr provides channel state information (CSI) reports, compared to scenarios where both devices report CSI. Notably, XR TGrs with soft combining also enhance eMBB throughput in coexistence scenarios.


[22] 2512.12370

Data-Selective Online Battery Identification Using Extended Time Regular Expressions

In this paper, we propose a data-efficient online battery identification method which targets highly informative battery cell data segments based on the driving pattern of the vehicle. We consider the case of a vehicle driving on/off a motorway and construct an Extended Time Regular Expression (ETRE) to detect data segments fitting these driving patterns. Simulation results indicate that by only using up to 10.71% of the data on average, the proposed method provides a low-bias and low-variance estimator under non-negligible current and voltage noise compared to other conventional estimation algorithms.


[23] 2512.12449

Comparing Stochastic and Ray-tracing Datasets in Machine Learning for Wireless Applications

Machine learning for wireless systems is commonly studied using standardized stochastic channel models (e.g., TDL/CDL/UMa) because of their legacy in wireless communication standardization and their ability to generate data at scale. However, some of their structural assumptions may diverge from real-world propagation. This paper asks when these models are sufficient and when ray-traced (RT) data - a proxy for the real world - provides tangible benefits. To answer these questions, we conduct an empirical study on two representative tasks: CSI compression and temporal channel prediction. Models are trained and evaluated using in-domain, cross-domain, and small-data fine-tuning protocols. Across settings, we observe that stochastic-only evaluation may over- or under-estimate performance relative to RT. These findings support a task-aware recipe where stochastic models can be leveraged for scalable pre-training and for tasks that do not rely on strong spatiotemporal coupling. When that coupling matters, pre-training and evaluation should be grounded in spatially consistent or geometrically similar RT scenarios. This study provides initial guidance to inform future discussions on benchmarking and standardization.


[24] 2512.12474

AI-Driven Real-Time Kick Classification in Olympic Taekwondo Using Sensor Fusion

Olympic Taekwondo has faced challenges in spectator engagement due to static, defensive gameplay and contentious scoring. Current Protector and Scoring Systems (PSS) rely on impact sensors and simplistic logic, encouraging safe strategies that diminish the sport's dynamism. This paper proposes an AI-powered scoring system that integrates existing PSS sensors with additional accelerometers, gyroscopes, magnetic/RFID, and impact force sensors in a sensor fusion framework. The system classifies kicks in real-time to identify technique type, contact location, impact force, and even the part of the foot used. A machine learning pipeline employing sensor fusion and Support Vector Machines (SVMs) is detailed, enabling automatic kick technique recognition for scoring. We present a novel kick scoring rubric that awards points based on specific kick techniques (e.g., turning and spinning kicks) to incentivize dynamic attacks. Drawing on a 2024 study achieving 96-98% accuracy, we validate the feasibility of real-time kick classification and further propose enhancements to this methodology, such as ensemble SVM classifiers and expanded datasets, to achieve the high-stakes accuracy required by the sport. We analyze how the proposed system can improve scoring fairness, reduce rule exploitation and illegitimate tactics, encourage more dynamic techniques, and enhance spectator understanding and excitement. The paper includes system design illustrations, a kick scoring table from an AI-augmented rule set, and discusses anticipated impacts on Olympic Taekwondo.


[25] 2512.12475

Improved Directional State Transition Tensors for Accurate Aerocapture Performance Analysis

Aerocapture is a unique challenge for semi-analytical propagation because its nonconservative dynamics lead to force magnitudes that vary substantially across the trajectory. State transition tensors (STTs), higher-order Taylor series expansions of the solution flow, have been widely used as a computationally efficient semi-analytical propagation method for orbital scenarios, but have not previously been applied to aerocapture. However, obtaining the higher-order STTs requires integrating exponentially more equations. Directional state transition tensors (DSTTs) mitigate this cost by projecting the state into a reduced-dimension basis. This work develops novel dynamics analysis techniques to identify effective bases for this reduction, including augmented higher-order Cauchy Green tensors tailored to quantities of interest such as apoapsis radius. Results show that DSTTs constructed along these bases significantly reduce computational cost while maintaining accuracy in apoapsis and energy prediction. In particular, certain of these DSTTs outperform traditional DSTTs in nonlinear perturbation propagation for key state subsets and quantities of interest. These results establish STTs and DSTTs as practical tools for aerocapture performance analysis to enable robust guidance and navigation.


[26] 2512.12528

Wavelet-Packet-based Noise Signatures With Higher-Order Statistics for Anomaly Prediction

This note develops the first-ever noise-centric anomaly prediction method for a fused discrete-time signal. A Wavelet Packet Transform (WPT) provides a time--frequency expansion in which structure and residual can be separated via orthogonal projection. Higher-Order Statistics (HOS), particularly the third-order cumulant (and its bispectral interpretation), quantify non-Gaussianity and nonlinear coupling in the extracted residual. Compact noise signatures are constructed and an analytically calibrated Mahalanobis detector yields a closed-form decision rule with non-central chi-square performance under mean-shift alternatives. Propositions and proofs establish orthonormality, energy preservation, Gaussian-null behavior of cumulants, and the resulting test statistics.


[27] 2512.12600

Feasible-Set Reshaping for Constraint Qualification in Optimization-Based Control

This paper presents a novel feasible-set reshaping technique to optimization-based control with ensured constraint qualification. In our problem setting, the feasible set of admissible control inputs depends on the real-time state of the plant, and the linear independence constraint qualification (LICQ) may not be satisfied in some regions of interest. By feasible-set reshaping, we project the constraints of the original feasible set onto an appropriately chosen constant matrix with its rows forming a positive span of the space of the optimization variable. It is proved that the reshaped feasible set is nonempty and satisfies LICQ, as long as the original feasible set is nonempty. The effectiveness of the proposed method is verified by constructing Lipschitz continuous quadratic-program-based (QP-based) controllers based on the reshaped feasible sets.


[28] 2512.12601

Quadratic-Programming-based Control of Multi-Robot Systems for Cooperative Object Transport

This paper investigates the control problem of steering a group of spherical mobile robots to cooperatively transport a spherical object. By controlling the movements of the robots to exert appropriate contact (pushing) forces, it is desired that the object follows a velocity command. To solve the problem, we first treat the robots' positions as virtual control inputs of the object, and propose a velocity-tracking controller based on quadratic programming (QP), enabling the robots to cooperatively generate desired contact forces while minimizing the sum of the contact-force magnitudes. Then, we design position-tracking controllers for the robots. By appropriately designing the objective function and the constraints for the QP, it is guaranteed that the QP admits a unique solution and the QP-based velocity-tracking controller is Lipschitz continuous. Finally, we consider the closed-loop system as an interconnection of two subsystems, corresponding to the velocity-tracking error of the object and the position-tracking error of the robots, and employ nonlinear small-gain techniques for stability analysis. The effectiveness of the proposed design is demonstrated through numerical simulations.


[29] 2512.12638

Electric Road Systems for Smart Cities: A Scalable Infrastructure Framework for Dynamic Wireless Charging

The transition to electric transportation is a key enabler for intelligent and sustainable cities; however, inadequate charging infrastructure remains a major barrier to large-scale electric vehicle (EV) adoption. This paper presents a scalable Electric Road System (ERS) architecture that enables Dynamic Wireless Charging (DWC) of EVs during motion. The proposed framework integrates inductive charging coils embedded in road pavement, real-time vehicle-to-infrastructure (V2I) communication, and adaptive energy management coordinated with smart grid systems. Modular road segments with a standardized charging process are employed to ensure scalability across urban corridors and interoperability among different EV platforms. System performance is evaluated using a co-simulation framework combining MATLAB-based power analysis with traffic inputs generated in SUMO. Key performance metrics include charging efficiency, energy cost per kilometer, and battery lifecycle improvement. Simulation results indicate a potential reduction in range anxiety and an increase in battery lifespan due to frequent shallow charging cycles. The study further discusses deployment challenges, policy considerations, and energy distribution strategies aligned with climate-resilient urban development. A case study of a tier-1 Indian city is presented to analyze the cost-benefit trade-offs of retrofitting high-density urban corridors with ERS. The proposed framework provides a practical foundation for next-generation EV infrastructure planning in smart cities.


[30] 2512.12725

Power Consumption and Energy Efficiency of Mid-Band XL-MIMO: Modeling, Scaling Laws, and Performance Insights

Mid-band extra-large-scale multiple-input multiple-output (XL-MIMO), emerging as a critical enabler for future communication systems, is expected to deliver significantly higher throughput by leveraging the extended bandwidth and enlarged antenna aperture. However, power consumption remains a significant concern due to the enlarged system dimension, underscoring the need for thorough investigations into efficient system design and deployment. To this end, an in-depth study is conducted on mid-band XL-MIMO systems. Specifically, a comprehensive power consumption model is proposed, encompassing the power consumption of major hardware components and signal processing procedures, while capturing the influence of key system parameters. Considering typical near-field propagation characteristics, closed-form approximations of throughput are derived, providing an analytical framework for assessing energy efficiency (EE). Based on the proposed framework, the scaling law of EE with respect to key system configurations is derived, offering valuable insights for system design. Subsequently, extensions and comparisons are conducted among representative multi-antenna technologies, demonstrating the superiority of mid-band XL-MIMO in EE. Extensive numerical results not only verify the tightness of the throughput analysis but also validate the EE evaluations, unveiling the potential of energy-efficient mid-band XL-MIMO systems.


[31] 2512.12755

An End-to-End Approach for Microgrid Probabilistic Forecasting and Robust Operation via Decision-focused Learning

High penetration of renewable energy sources (RES) introduces significant uncertainty and intermittency into microgrid operations, posing challenges to economic and reliable scheduling. To address this, this paper proposes an end-to-end decision-focused framework that jointly optimizes probabilistic forecasting and robust operation for microgrids. A multilayer encoder-decoder (MED) probabilistic forecasting model is integrated with a two-stage robust optimization (TSRO) model involving direct load control (DLC) through a differentiable decision pathway, enabling gradient-based feedback from operational outcomes to improve forecasting performance. Unlike conventional sequential approaches, the proposed method aligns forecasting accuracy with operational objectives by directly minimizing decision regret via a surrogate smart predict-then-optimize (SPO) loss function. This integration ensures that probabilistic forecasts are optimized for downstream decisions, enhancing both economic efficiency and robustness. Case studies on modified IEEE 33-bus and 69-bus systems demonstrate that the proposed framework achieves superior forecasting accuracy and operational performance, reducing total and net operation costs by up to 18% compared with conventional forecasting and optimization combinations. The results verify the effectiveness and scalability of the end-to-end decision-focused approach for resilient and cost-efficient microgrid management under uncertainty.


[32] 2512.12761

Lexicographic Multi-Objective Stochastic Shortest Path with Mixed Max-Sum Costs

We study the Stochastic Shortest Path (SSP) problem for autonomous systems with mixed max-sum cost aggregations under Linear Temporal Logic constraints. Classical SSP formulations rely on sum-aggregated costs, which are suitable for cumulative quantities such as time or energy but fail to capture bottleneck-style objectives such as avoiding high-risk transitions, where performance is determined by the worst single event along a trajectory. Such objectives are particularly important in safety-critical systems, where even one hazardous transition can be unacceptable. To address this limitation, we introduce max-aggregated objectives that minimize the bottleneck cost, i.e., the maximum one-step cost along a trajectory. We show that standard Bellman equations on the original state space do not apply in this setting and propose an augmented MDP with a state variable tracking the running maximum cost, together with a value iteration algorithm. We further identify a cyclic policy phenomenon, where zero-marginal-cost cycles prevent goal reaching under max-aggregation, and resolve it via a finite-horizon formulation. To handle richer task requirements, linear temporal logic specifications are translated into deterministic finite automata and combined with the system to construct a product MDP. We propose a lexicographic value iteration algorithm that handles mixed max-sum objectives under lexicographic ordering on this product MDP. Gridworld case studies demonstrate the effectiveness of the proposed framework.


[33] 2512.12776

High Order Control Lyapunov Function - Control Barrier Function - Quadratic Programming Based Autonomous Driving Controller for Bicyclist Safety

Ensuring the safety of Vulnerable Road Users (VRUs) is a critical challenge in the development of advanced autonomous driving systems in smart cities. Among vulnerable road users, bicyclists present unique characteristics that make their safety both critical and also manageable. Vehicles often travel at significantly higher relative speeds when interacting with bicyclists as compared to their interactions with pedestrians which makes collision avoidance system design for bicyclist safety more challenging. Yet, bicyclist movements are generally more predictable and governed by clear traffic rules as compared to the sudden and sometimes erratic pedestrian motion, offering opportunities for model-based control strategies. To address bicyclist safety in complex traffic environments, this study proposes and develops a High Order Control Lyapunov Function High Order Control Barrier Function Quadratic Programming (HOCLF HOCBF QP) control framework. Through this framework, CLFs constraints guarantee system stability so that the vehicle can track its reference trajectory, whereas CBFs constraints ensure system safety by letting vehicle avoiding potential collisions region with surrounding obstacles. Then by solving a QP problem, an optimal control command that simultaneously satisfies stability and safety requirements can be calculated. Three key bicyclist crash scenarios recorded in the Fatality Analysis Reporting System (FARS) are recreated and used to comprehensively evaluate the proposed autonomous driving bicyclist safety control strategy in a simulation study. Simulation results demonstrate that the HOCLF HOCBF QP controller can help the vehicle perform robust, and collision-free maneuvers, highlighting its potential for improving bicyclist safety in complex traffic environments.


[34] 2512.12794

A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems

Many cyber-physical systems (CPS) rely on high-dimensional numeric telemetry and explicit operating rules to maintain safe and efficient operation. Recent large language models (LLMs) are increasingly considered as decision-support components in such systems, yet most deployments focus on textual inputs and do not directly address rule-grounded reasoning over numeric measurements. This paper proposes a rule-aware prompt framework that systematically encodes CPS domain context, numeric normalization, and decision rules into a modular prompt architecture for LLMs. The framework decomposes prompts into five reusable modules, including role specification, CPS domain context, numeric normalization, rule-aware reasoning, and output schema, and exposes an interface for plugging in diverse rule sets. A key design element is separating rule specification from the representation of normalized numeric deviations, which enables concise prompts that remain aligned with domain rules. We analyze how different normalization strategies and prompt configurations influence rule adherence, interpretability, and token efficiency. The framework is model-agnostic and applicable across CPS domains. To illustrate its behavior, we instantiate it on numeric anomaly assessment in an IEEE 118-bus electric power transmission network and evaluate several prompting and adaptation regimes. The results show that rule-aware, z-score-based value blocks and a hybrid LLM-detector architecture can substantially improve consistency with CPS rules and anomaly detection performance while reducing token usage, providing a reusable bridge between numeric telemetry and general-purpose LLMs.


[35] 2512.12803

Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks

Centralised reinforcement learning (RL) for voltage magnitude regulation in distribution networks typically involves numerous agent-environment interactions and power flow (PF) calculations, inducing computational overhead and privacy concerns over shared data. Thus, we propose a distributed RL algorithm to regulate voltage magnitude. First, a dynamic Thevenin equivalent model is integrated within smart meters (SM), enabling local voltage magnitude estimation using local SM data for RL agent training, and mitigating the dependency of synchronised data collection and centralised PF calculations. To mitigate estimation errors induced by Thevenin model inaccuracies, a voltage magnitude correction strategy that combines piecewise functions and neural networks is introduced. The piecewise function corrects the large errors of estimated voltage magnitude, while a neural network mimics the grid's sensitivity to control actions, improving action adjustment precision. Second, a coordination strategy is proposed to refine local RL agent actions online, preventing voltage magnitude violations induced by excessive actions from multiple independently trained agents. Case studies on energy storage systems validate the feasibility and effectiveness of the proposed approach, demonstrating its potential to improve voltage regulation in distribution networks.


[36] 2512.12811

Channel Estimation for Full-duplex Multi-tag Ambient Backscatter Communication Systems with I/Q Imbalance

Ambient backscatter communication (AmBC) has emerged as a highly attractive paradigm for energy-efficient communication. Full-duplex multi-tag AmBC systems provide the scalability and efficient spectrum utilization essential for next generation Internet-of-Things (IoT) networks. However, the presence of multiple tags, self-interference and hardware impairments such as inphase/quadrature (I/Q) imbalance, makes accurate channel estimation indispensable for efficient interference management. The large number of channel parameters and the presence of mirror images of each signal component necessitate careful design of the channel estimation phase to prevent performance degradation. In this work, we propose a novel three-stage training protocol and pilot-based estimation scheme that ensure signal orthogonality and successfully avoid error floors. We also propose two semi-blind estimators, one based on decision-directed (DD) criterion and the other on the expectation conditional maximization (ECM) framework. By exploiting both pilots and data symbols, these two estimators achieve higher estimation accuracy than pilot-based estimation, at the cost of additional complexity. Cramer-Rao bounds (CRBs) for both types of estimation are also derived. The pilot-based estimator and the ECM estimator approach their respective CRBs, while the DD estimator performs mid-way between them. The three proposed solutions support different use cases by offering distinct tradeoffs between performance and computational complexity.


[37] 2512.12826

Sensitivity increase of 3D printed, self-sensing, carbon fibers structures with conductive filament matrix due to flexural loading

The excellent structural and piezoresistive properties of continuous carbon fiber make it suitable for both structural and sensing applications. This work studies the use of 3D printed, continuous carbon fiber reinforced beams as self-sensing structures. It is demonstrated how the sensitivity of these carbon fiber strain gauges can be increased irreversibly by means of a pretreatment by ``breaking-in'' the sensors with a large compressive bending load. The increase in the gauge factor is attributed to local progressive fiber failure, due to the combination of the thermal residual stress from the printing process and external loading. The coextrusion of conductive filament around the carbon fibers is demonstrated as a means of improving the reliability, noise and electrical connection of the sensors. A micrograph of the sensor cross section shows that the conductive filament contacts the various carbon fiber bundles. All-in-all, the use of ``breaking-in'' carbon fiber strain gauges in combination with coextrusion of conductive filament hold promises for 3D printed structural sensors with a high sensitivity.


[38] 2512.12833

Data-driven Supervisory Control under Attacks via Spectral Learning

The technological advancements facilitating the rapid development of cyber-physical systems (CPS) also render such systems vulnerable to cyber attacks with devastating effects. Supervisory control is a commonly used control method to neutralize attacks on CPS. The supervisor strives to confine the (symbolic) paths of the system to a desired language via sensors and actuators in a closed control loop, even when attackers can manipulate the symbols received by the sensors and actuators. Currently, supervisory control methods face limitations when effectively identifying and mitigating unknown, broad-spectrum attackers. In order to capture the behavior of broad-spectrum attacks on both sensing and actuation channels we model the plant, supervisors, and attackers with finite-state transducers (FSTs). Our general method for addressing unknown attackers involves constructing FST models of the attackers from spectral analysis of their input and output symbol sequences recorded from a history of attack behaviors observed in a supervisory control loop. To construct these FST models, we devise a novel learning method based on the recorded history of attack behaviors. A supervisor is synthesized using such models to neutralize the attacks.


[39] 2512.12851

BUT Systems for WildSpoof Challenge: SASV in the Wild

This paper presents the BUT submission to the WildSpoof Challenge, focusing on the Spoofing-robust Automatic Speaker Verification (SASV) track. We propose a SASV framework designed to bridge the gap between general audio understanding and specialized speech analysis. Our subsystem integrates diverse Self-Supervised Learning front-ends ranging from general audio models (e.g., Dasheng) to speech-specific encoders (e.g., WavLM). These representations are aggregated via a lightweight Multi-Head Factorized Attention back-end for corresponding subtasks. Furthermore, we introduce a feature domain augmentation strategy based on Distribution Uncertainty to explicitly model and mitigate the domain shift caused by unseen neural vocoders and recording environments. By fusing these robust CM scores with state-of-the-art ASV systems, our approach achieves superior minimization of the a-DCFs and EERs.


[40] 2512.12871

CapOptix: An Options-Framework for Capacity Market Pricing

Electricity markets are under increasing pressure to maintain reliability amidst rising renewable penetration, demand variability, and occasional price shocks. Traditional capacity market designs often fall short in addressing this by relying on expected-value metrics of energy unserved, which overlook risk exposure in such systems. In this work, we present CapOptix, a capacity pricing framework that interprets capacity commitments as reliability options, i.e., financial derivatives of wholesale electricity prices. CapOptix characterizes the capacity premia charged by accounting for structural price shifts modeled by the Markov Regime Switching Process. We apply the framework to historical price data from multiple electricity markets and compare the resulting premium ranges with existing capacity remuneration mechanisms.


[41] 2512.12872

Heavy-Duty Electric Vehicles Contribution for Frequency Response in Power Systems with V2G

The integration of heavy-duty electric vehicles (EVs) with Vehicle-to-Grid (V2G) capability offers a promising solution to enhance grid stability by providing primary frequency response in power systems. This paper investigates the potential of heavy-duty EVs to support the California power grid under different charging strategies: immediate, delayed, and constant minimum power charging. Simulation results demonstrate that both V2G-capable EVs and non-V2G modes have great potential to provide primary frequency response, with V2G-capable EVs exhibiting especially strong contributions. The study highlights the influence of charging strategies, control modes, and grid conditions on EV contributions to grid stability, emphasizing their critical role in mitigating the adverse effects of renewable energy penetration.


[42] 2512.12883

On the embedding transformation for optimal control of multi-mode switched systems

This paper develops an embedding-based approach to solve switched optimal control problems (SOCPs) with an arbitrary number of subsystems. Initially, the discrete switching signal is represented by a set of binary variables, encoding each mode in binary format. An embedded optimal control problem (EOCP) is then formulated by replacing these binary variables with continuous embedded variables that can take intermediate values between zero and one. Although embedding allows SOCPs to be addressed using conventional techniques, the optimal solutions of EOCPs often yield intermediate values for binary variables, which may not be feasible for the original SOCP. To address this challenge, a modified EOCP (MEOCP) is introduced by adding a concave auxiliary cost function of appropriate dimensionality to the main cost function. This addition ensures that the optimal solution of the EOCP is bang-bang, and as a result, feasible for the original SOCP.


[43] 2512.12923

Information-Optimal Formation Geometry Design for Multimodal UAV Cooperative Perception

The efficacy of UAV swarm cooperative perception fundamentally depends on three-dimensional (3D) formation geometry, which governs target observability and sensor complementarity. In the literature, the exploitation of formation geometry and its impact on UAV sensing have rarely been studied, which can significantly degrade multimodal cooperative perception at scenarios where heterogeneous payloads (vision cameras and LiDAR) should be geometrically arranged to exploit their complementary strengths while managing communication interference and hardware budgets. To bridge this critical gap, we propose an information-theoretic optimization framework that allocation of UAVs and multimodal sensors, configures formation geometries, and flight control. The UAV-sensor allocation is optimized by the Fisher Information Matrix (FIM) determinant maximization. Under this framework we introduce an equivalent formation transition strategy that enhances field-of-view (FOV) coverage without compromising perception accuracy and communication interference. Furthermore, we design a novel Lyapunov-stable flight control scheme with logarithmic potential fields to generate energy-efficient trajectories for formation transitions. Extensive simulations demonstrate our formation-aware design achieves 25.0\% improvement in FOV coverage, 104.2\% enhancement in communication signal strength, and 47.2\% reduction in energy consumption compared to conventional benchmarks. This work establishes that task-driven geometric configuration represents a foundational rather than incidental component in next-generation UAV swarm systems.


[44] 2512.12952

Leveraging Compression to Construct Transferable Bitrate Ladders

Over the past few years, per-title and per-shot video encoding techniques have demonstrated significant gains as compared to conventional techniques such as constant CRF encoding and the fixed bitrate ladder. These techniques have demonstrated that constructing content-gnostic per-shot bitrate ladders can provide significant bitrate gains and improved Quality of Experience (QoE) for viewers under various network conditions. However, constructing a convex hull for every video incurs a significant computational overhead. Recently, machine learning-based bitrate ladder construction techniques have emerged as a substitute for convex hull construction. These methods operate by extracting features from source videos to train machine learning (ML) models to construct content-adaptive bitrate ladders. Here, we present a new ML-based bitrate ladder construction technique that accurately predicts the VMAF scores of compressed videos, by analyzing the compression procedure and by making perceptually relevant measurements on the source videos prior to compression. We evaluate the performance of our proposed framework against leading prior methods on a large corpus of videos. Since training ML models on every encoder setting is time-consuming, we also investigate how per-shot bitrate ladders perform under different encoding settings. We evaluate the performance of all models against the fixed bitrate ladder and the best possible convex hull constructed using exhaustive encoding with Bjontegaard-delta metrics.


[45] 2512.12972

Headroom as A Grid Service in Software-Defined Power Grids: A Peak-to-Peak Control Design Approach

To address system frequency challenges driven by the integration of renewable generation, advanced control strategies are designed at the device level to provide effective frequency support following disturbances. However, typically relying on energy-based performance metrics, these methods cannot guarantee the system frequency constraints such as frequency nadir and maximum Rate-of-Change-of-Frequency (RoCoF). Moreover, locally-designed frequency support cannot minimize the overall system cost to maintain frequency stability. On the other hand, the concept of frequency-constrained system scheduling is introduced, which incorporates frequency dynamic constraints into the system economic optimization, so that frequency requirements can be maintained with minimum cost. However, these works rely on analytical approximations of the frequency dynamic metrics, which are mathematically complicated and tend to be over-conservative for the approximation of IBR headroom requirements.


[46] 2512.13004

Large Language Models for Power System Applications: A Comprehensive Literature Survey

This comprehensive literature review examines the emerging applications of Large Language Models (LLMs) in power system engineering. Through a systematic analysis of recent research published between 2020 and 2025, we explore how LLMs are being integrated into various aspects of power system operations, planning, and management. The review covers key application areas including fault diagnosis, load forecasting, cybersecurity, control and optimization, system planning, simulation, and knowledge management. Our findings indicate that while LLMs show promising potential in enhancing power system operations through their advanced natural language processing and reasoning capabilities, significant challenges remain in their practical implementation. These challenges include limited domain-specific training data, concerns about reliability and safety in critical infrastructure, and the need for enhanced explainability. The review also highlights emerging trends such as the development of power system-specific LLMs and hybrid approaches combining LLMs with traditional power engineering methods. We identify crucial research directions for advancing the field, including the development of specialized architectures, improved security frameworks, and enhanced integration with existing power system tools. This survey provides power system researchers and practitioners with a comprehensive overview of the current state of LLM applications in the field and outlines future pathways for research and development.


[47] 2512.13021

Safe Control of Multi-Agent Systems with Minimal Communication

In many multi-agent systems, communication is limited by bandwidth, latency, and energy constraints. Designing controllers that achieve coordination and safety with minimal communication is critical for scalable and reliable deployment. This paper presents a method for designing controllers that minimize inter-agent communication in multi-agent systems while satisfying safety and coordination requirements, while conforming to communication delay constraints. The control synthesis problem is cast as a rank minimization problem, where a convex relaxation is obtained via system level synthesis. Simulation results on various tasks, including trajectory tracking with relative and heterogeneous sensing, demonstrate that the proposed method significantly reduces inter-agent transmission compared to baseline approaches.


[48] 2512.13032

A Comprehensive Survey of Channel Estimation Techniques for OTFS in 6G and Beyond Wireless Networks

Orthogonal time-frequency space (OTFS) modulation has emerged as a powerful wireless communication technology that is specifically designed to address the challenges of high-mobility scenarios and significant Doppler effects. Unlike conventional modulation schemes that operate in the time-frequency (TF) domain, OTFS projects signals to the delay-Doppler (DD) domain, where wireless channels exhibit sparse and quasi-static characteristics. This fundamental transformation enables superior channel estimation (CE) performance in challenging propagation environments characterized by high-mobility, severe multipath effects, and rapidly time-varying channel conditions. This article provides a systematic examination of CE techniques for OTFS systems, covering the extensive research landscape from foundational methods to cutting-edge approaches. We present a detailed analysis of DD and TF domain CE techniques presented in the literature, including separate pilot, embedded pilot, and superimposed pilot approaches. The article encompasses various algorithmic frameworks including Bayesian learning, matching pursuit-based techniques, message passing algorithms, deep learning (DL)-based methods, and recent CE approaches. Additionally, we explore joint CE and signal detection (SD) strategies, the integration of OTFS with next-generation wireless systems including massive multiple-input multiple-output (MIMO), millimeter wave (mmWave) communications, reconfigurable intelligent surfaces (RISs), and integrated sensing and communication (ISAC) systems. Critical implementation challenges are presented, including leakage suppression, inter-Doppler interference mitigation, impulsive noise handling, signaling overhead reduction, guard space requirements, peak-to-average power ratio (PAPR) management, beam squint effects, and hardware impairments.


[49] 2512.13099

On the Complementarity of Shared Electric Mobility and Renewable Energy Communities

Driven by the ongoing energy transition, shared mobility service providers are emerging actors in electrical power systems which aim to shift combustion-based mobility to electric paradigm. In the meantime, Energy Communities are deployed to enhance the local usage of distributed renewable production. As both ators share the same goal of satisfying the demand at the lowest cost, they could take advantage of their complementarity and coordinate their decisions to enhance each other operation. This paper presents an original Mixed-Integer Second Order Cone Programming long-term Electric Vehicle fleet planning optimization problem that integrates the coordination with a Renewable Energy Community and Vehicle-to-Grid capability. This model is used to assess the economic, energy, and grid performances of their collaboration in a 21 buses low-voltage distribution network. Key results show that, both actors coordination can help reducing the yearly cost up to 11.3 % compared to their stand-alone situation and that it may reduce the stress on the substation transformer by 46 % through the activation of the inherent EVs flexibility when subject to peak penalties from the grid operator.


[50] 2512.13224

MR Fingerprinting for Imaging Brain Hemodynamics and Oxygenation

Over the past decade, several studies have explored the potential of magnetic resonance fingerprinting (MRF) for the quantification of brain hemodynamics, oxygenation, and perfusion. Recent advances in simulation models and reconstruction frameworks have also significantly enhanced the accuracy of vascular parameter estimation. This review provides an overview of key vascular MRF studies, emphasizing advancements in geometrical models for vascular simulations, novel sequences, and state-of-the-art reconstruction techniques incorporating machine learning and deep learning algorithms. Both pre-clinical and clinical applications are discussed. Based on these findings, we outline future directions and development areas that need to be addressed to facilitate their clinical translation. Evidence Level N/A. Technical Efficacy Stage 1.


[51] 2512.13229

Plant Equivalent Controller Realizations for Attack-Resilient Cyber-Physical Systems

As cyber-physical systems (CPSs) become more dependent on data and communication networks, their vulnerability to false data injection (FDI) attacks has raised significant concerns. Among these, stealthy attacks, those that evade conventional detection mechanisms, pose a critical threat to closed-loop performance. This paper introduces a controller-oriented method to enhance CPS resiliency against such attacks without compromising nominal closed-loop behavior. Specifically, we propose the concept of plant equivalent controller (PEC) realizations, representing a class of dynamic output-feedback controllers that preserve the input-output behavior of a given base controller while exhibiting distinct robustness properties in the presence of disturbances and sensor attacks. To quantify and improve robustness, we employ reachable set analysis to assess the impact of stealthy attacks on the closed-loop dynamics. Building on this analysis, we provide mathematical tools (in terms of linear matrix inequalities) to synthesize the optimal PEC realization that minimizes the reachable set under peak-bounded disturbances. The proposed framework thus provides systematic analysis and synthesis tools to enhance the attack resilience of CPSs while maintaining the desired nominal performance. The effectiveness of the approach is demonstrated on the quadruple-tank process subject to stealthy sensor attacks.


[52] 2512.13233

Measurement of Material Volume Fractions in a Microwave Resonant Cavity Sensor Using Convolutional Neural Network

A non-destructive, real-time method for estimating the volume fraction of a dielectric mixture inside a resonant cavity is presented. A convolutional neural network (CNN)-based approach is used to estimate the fractional composition of two-phase dielectric mixtures inside a resonant cavity using scattering parameter (S-parameter) measurements. A rectangular cavity sensor with a strip feed structure is characterized using a vector network analyzer (VNA) from 0.01--20~GHz. The CNN is trained using both simulated and experimentally measured S-parameters and achieves high predictive accuracy even without de-embedding or filtering, demonstrating robustness to measurement imperfections. The simulation results achieve a coefficient of determination ($R^2$)=0.99 using $k$-fold cross-validation, while the experimental model using raw data achieves an $R^2=0.94$ with a mean absolute error (MAE) below 6\%. Data augmentation further improves the accuracy of the experimental prediction to above $R^2=0.998$ (MAE$<$0.72\%). The proposed method enables rapid, non-destructive, accurate, low-cost, and real-time estimation of material fractions, illustrating strong potential for sensing applications in microwave material characterization.


[53] 2512.13243

Interference-Free RIS-Aided Cell-Free Massive MIMO with Physical Layer Security

In this paper, a reconfigurable intelligent surface (RIS) assisted cell free massive MIMO (CFmMIMO) framework is designed to enhance physical layer security (PLS) and mitigate multi user (MU) interference in next generation wireless networks. A channel state information (CSI) based precoder is designed at the access point (AP) to suppress MU interference, enabling interference free reception for the legitimate users. To further enhance secrecy performance, we formulate a joint optimization problem that maximizes the secrecy sum rate using an alternating optimization (AO) framework, which iteratively updates the active beamforming at the AP, user power allocation, and the RIS phase shift matrix. The highly nonconvex problem is addressed under the Riemannian manifold optimization (RMO) framework and solved using a Riemannian Conjugate Gradient (RCG) algorithm for RIS phase shift design. Simulation results verify that the proposed framework effectively enhances the secrecy sum rate and eliminates interference, demonstrating its potential for secure and scalable CFmMIMO networks in dense wireless environments.


[54] 2512.13263

An End-to-End Neural Network Transceiver Design for OFDM System with FPGA-Accelerated Implementation

The evolution toward sixth-generation (6G) wireless networks demands high-performance transceiver architectures capable of handling complex and dynamic environments. Conventional orthogonal frequency-division multiplexing (OFDM) receivers rely on cascaded discrete Fourier transform (DFT) and demodulation blocks, which are prone to inter-stage error propagation and suboptimal global performance. In this work, we propose two neural network (NN) models DFT-Net and Demodulation-Net (Demod-Net) to jointly replace the IDFT/DFT and demodulation modules in an OFDM transceiver. The models are trained end-to-end (E2E) to minimize bit error rate (BER) while preserving operator equivalence for hybrid deployment. A customized DFT-Demodulation Net Accelerator (DDNA) is further developed to efficiently map the proposed networks onto field-programmable gate array (FPGA) platforms. Leveraging fine-grained pipelining and block matrix operations, DDNA achieves high throughput and flexibility under stringent latency constraints. Experimental results show that the DL-based transceiver consistently outperforms the conventional OFDM system across multiple modulation schemes. With only a modest increase in hardware resource usage, it achieves approximately 1.5 dB BER gain and up to 66\% lower execution time.


[55] 2512.13331

A Multi-Worker Assembly Line Rebalancing with Spatial and Ergonomic Considerations

This work addresses the Assembly Line Rebalancing Problem in manual assembly systems where multiple workers operate in parallel within the same station - an industrially relevant scenario that remains insufficiently explored in the literature. A multi-objective optimization model is proposed that incorporates task reassignment, worker allocation, ergonomic evaluation, and explicit spatial feasibility through work-area constraints. The formulation minimizes deviations from the current configuration while promoting balanced workload and ergonomic conditions among workers. Computational experiments on synthetic problem instances demonstrate that the model consistently generates feasible and human-centered reconfigurations across varying cycle-time conditions, highlighting its potential as a decision-support tool for industrial rebalancing in flexible production environments.


[56] 2512.13344

Neural Control Barrier Functions for Signal Temporal Logic Specifications with Input Constraints

Signal Temporal Logic (STL) provides a powerful framework to describe complex tasks involving temporal and logical behavior in dynamical systems. In this work, we address the problem of synthesizing controllers for continuous-time systems under STL specifications with input constraints. We propose a neural network-based framework for synthesizing time-varying control barrier functions (TVCBF) and their corresponding controllers for systems to fulfill STL specifications while respecting input constraints. We formulate barrier conditions incorporating the spatial and temporal logic of the given STL specification. Additionally, we introduce a validity condition to provide formal safety guarantees across the entire state space. Finally, we demonstrate the effectiveness of the proposed approach through several simulation studies considering different STL tasks.


[57] 2512.13393

QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution

Unlicensed spectrum supports diverse traffic with stringent Quality-of-Service (QoS) requirements. In NR-U/Wi-Fi coexistence,the values of MAC parameters critically influence delay, collision behavior, and airtime fairness and efficiency. In this paper, we investigate the impact of (i) cost scaling and violation modeling, (ii) choice of MAC parameters, and (iii) an enhanced collision resolution scheme for the Listen-Before-Talk (LBT) mechanism on the performance of a state-augmented constrained reinforcement learning controller for NR-U/Wi-Fi coexistence. Coexistence control is formulated as a constrained Markov decision process with an explicit delay constraint for high-priority traffic and fairness as the optimization goal. Our simulation results show three key findings: (1) signed, threshold-invariant cost scaling with temporal smoothing stabilizes learning and strengthens long-term constraint adherence; (2) use of the contention window parameter for control provides smoother adaptation and better delay compliance than other MAC parameters; and (3) enhanced LBT significantly reduces collisions and improves airtime efficiency. These findings provide practical insights for achieving robust, QoS-aware coexistence control.


[58] 2512.13420

From Nodes to Edges: Edge-Based Laplacians for Brain Signal Processing

Traditional graph signal processing (GSP) methods applied to brain networks focus on signals defined on the nodes. Thus, they are unable to capture potentially important dynamics occurring on the edges. In this work, we adopt an edge-centric GSP approach to analyze edge signals constructed from 100 unrelated subjects of the Human Connectome Project. Specifically, we describe structural connectivity through the lens of the 1-dimensional Hodge Laplacian, processing signals defined on edges to capture co-fluctuation information between brain regions. We demonstrate that edge-based approaches achieve superior task decoding accuracy in static and dynamic scenarios compared to conventional node-based techniques, thereby unveiling unique aspects of brain functional organization. These findings underscore the promise of edge-focused GSP strategies for deepening our understanding of brain connectivity and functional dynamics.


[59] 2512.13434

Self-Supervised Ultrasound Representation Learning for Renal Anomaly Prediction in Prenatal Imaging

Prenatal ultrasound is the cornerstone for detecting congenital anomalies of the kidneys and urinary tract, but diagnosis is limited by operator dependence and suboptimal imaging conditions. We sought to assess the performance of a self-supervised ultrasound foundation model for automated fetal renal anomaly classification using a curated dataset of 969 two-dimensional ultrasound images. A pretrained Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE) was fine-tuned for binary and multi-class classification of normal kidneys, urinary tract dilation, and multicystic dysplastic kidney. Models were compared with a DenseNet-169 convolutional baseline using cross-validation and an independent test set. USF-MAE consistently improved upon the baseline across all evaluation metrics in both binary and multi-class settings. USF-MAE achieved an improvement of about 1.87% (AUC) and 7.8% (F1-score) on the validation set, 2.32% (AUC) and 4.33% (F1-score) on the independent holdout test set. The largest gains were observed in the multi-class setting, where the improvement in AUC was 16.28% and 46.15% in F1-score. To facilitate model interpretability, Score-CAM visualizations were adapted for a transformer architecture and show that model predictions were informed by known, clinically relevant renal structures, including the renal pelvis in urinary tract dilation and cystic regions in multicystic dysplastic kidney. These results show that ultrasound-specific self-supervised learning can generate a useful representation as a foundation for downstream diagnostic tasks. The proposed framework offers a robust, interpretable approach to support the prenatal detection of renal anomalies and demonstrates the promise of foundation models in obstetric imaging.


[60] 2512.13439

Balancing Timeliness and Privacy in Discrete-Time Updating Systems

We study the trade-off between Age of Information (AoI) and maximal leakage (MaxL) in discrete-time status updating systems. A source generates time-stamped update packets that are processed by a server that delivers them to a monitor. An adversary, who eavesdrops on the server-monitor link, wishes to infer the timing of the underlying source update sequence. The server must balance the timeliness of the status information at the monitor against the timing information leaked to the adversary. We consider a model with Bernoulli source updates under two classes of Last-Come-First-Served (LCFS) service policies: (1) Coupled policies that tie the server's deliveries to the update arrival process in a preemptive queue; (2) Decoupled (dumping) policies in which the server transmits its freshest update according to a schedule that is independent of the update arrivals. For each class, we characterize the structure of the optimal policy that minimizes AoI for a given MaxL rate. Our analysis reveals that decoupled dumping policies offer a superior age-leakage trade-off to coupled policies. When subject to a MaxL constraint, we prove that the optimal dumping strategy is achieved by dithering between two adjacent deterministic dump periods.


[61] 2512.13482

Real-Time AI-Driven Milling Digital Twin Towards Extreme Low-Latency

Digital twin (DT) enables smart manufacturing by leveraging real-time data, AI models, and intelligent control systems. This paper presents a state-of-the-art analysis on the emerging field of DTs in the context of milling. The critical aspects of DT are explored through the lens of virtual models of physical milling, data flow from physical milling to virtual model, and feedback from virtual model to physical milling. Live data streaming protocols and virtual modeling methods are highlighted. A case study showcases the transformative capability of a real-time machine learning-driven live DT of tool-work contact in a milling process. Future research directions are outlined to achieve the goals of Industry 4.0 and beyond.


[62] 2512.13483

Impact analysis of hidden faults in nonlinear control systems using output-to-output gain

Networked control systems (NCSs) are vulnerable to faults and hidden malfunctions in communication channels that can degrade performance or even destabilize the closed loop. Classical metrics in robust control and fault detection typically treat impact and detectability separately, whereas the output-to-output gain (OOG) provides a unified measure of both. While existing results have been limited to linear systems, this paper extends the OOG framework to nonlinear NCSs with quadratically constrained nonlinearities, considering false-injection attacks that can also manipulate sensor measurements through nonlinear transformations. Specifically, we provide computationally efficient linear matrix inequality conditions and complementary frequency-domain tests that yield explicit upper bounds on the OOG of this class of nonlinear systems. Furthermore, we derive frequency-domain conditions for absolute stability of closed-loop systems, generalizing the Yakubovich quadratic criterion.


[63] 2512.13496

Identification of Technical Design Constraints and Considerations for Transmission Grid Expansion Planning Projects

The large-scale deployment of renewable energy sources, particularly offshore wind, requires large-scale transmission grid expansion projects to transmit the produced low-carbon power to the main demand centers. However, the planning and design of such complex projects currently lack a transparent and systematic process that system operators can follow when considering such investments in their grids. This paper identifies and classifies the main technical design constraints and considerations relevant to the planning of transmission grid expansion projects, and more specifically, electrical energy hubs. Seven key areas of interest are identified, namely network integration, HVDC technologies, costs (CAPEX, OPEX, and space requirements), electricity market design, future proofness and modular expandability, reliability-availability-maintainability, and sustainability. Each area of interest is analyzed in terms of its technical and operational relevance, with technical design constraints and considerations derived from such analysis. In addition, a hierarchical classification of the identified constraints and considerations (and therefore areas of interest) is introduced, distinguishing them between three criticality classes, namely hard constraints, main drivers, and key considerations. The dependencies between the different areas are discussed, too. Therefore, this work provides system operators and policymakers with a structured basis to support a transparent planning methodology with clear decision hierarchies for investments in transmission grid expansion projects.


[64] 2512.13504

An $H_2$-norm approach to performance analysis of networked control systems under multiplicative routing transformations

This paper investigates the performance of networked control systems subject to multiplicative routing transformations that alter measurement pathways without directly injecting signals. Such transformations, arising from faults or adversarial actions, modify the feedback structure and can degrade performance while remaining stealthy. An $H_2$-norm framework is proposed to quantify the impact of these transformations by evaluating the ratio between the steady-state energies of performance and residual outputs. Equivalent linear matrix inequality (LMI) formulations are derived for computational assessment, and analytical upper bounds are established to estimate the worst-case degradation. The results provide structural insight into how routing manipulations influence closed-loop behavior and reveal conditions for stealthy multiplicative attacks.


[65] 2512.13533

Interference Mitigation Recommender System using U-Net Autoencoders

Building on the previous work on interference mitigation, this paper introduces a modular recommender system that automatically selects the most effective interference mitigation strategy based on the interference characteristics present in the received signal. The system integrates three key stages: an SPS classifier module, a SIR predictor, and a bank of specialized U-Net autoencoders designed for different interference conditions. The classification block identifies the parameters required for cancellation. The recommender then directs the signal to the appropriate mitigation model, optionally incorporating SIR-based decisions for scenarios where successive interference cancellation may be advantageous. Experiments conducted across diverse SIR levels and modulation environments show that the recommender strategy improves robustness and reduces BER compared to using any single mitigation method alone. The results demonstrate the potential of adaptive, model-selective architectures to enhance interference resilience in dynamic communication environments.


[66] 2512.13542

On the Ability of Deep Learning to Detect Signals with Unknown Parameters

In many signal processing applications, including communications, sonar, radar, and localization, a fundamental problem is the detection of a signal of interest in background noise, known as signal detection [1] [2]. A simple version of this problem is the detection of a signal of interest with unknown parameters in Additive White Gaussian Noise (AWGN). When the parameters defining the signal are not known, an optimal detector (in the Neyman-Pearson sense) does not exist. An upper bound on the performance of any detector is the matched filter, which implies perfect sample by sample knowledge of the signal of interest. In recent years Deep Neural Networks (DNNs) have proven to be very effective at hypothesis testing problems such as object detection and image classification. This paper examines the application of DNN-based approaches to the signal detection problem at the raw I/Q level and compares them to statistically based approaches as well as the Matched Filter. These methods aim to maximize the Probability of Detection Pd while maintaining a constant Probability of False Alarm PF A. Two Machine Learning (ML) algorithms are trained and assessed on this signal detection problem, across three signal of interest models. A model was also trained on a unified dataset and assessed across all signals of interest.


[67] 2512.13545

Competent Discrete Time Modeling For analogue controlled PWM Converter Considering State-Feedback

Ever since this http URL proposed the state space averaging notion. The small signal model has been widely used as a design tool to tune control parameters. As Moore's law is continuing and the AI chip's high demand for power consumption and dynamic response, the control bandwidth needs to be boosted. However, the average model has two basic assumptions: the low-frequency assumption, the small ripple assumption. In high-bandwidth design, these two assumptions are violated. In order to solve this, various methods have been proposed. This paper gives a comprehensive overview of the existing small signal model for PWM converters from the following perspectives: 1. model fidelity, 2. analytical tractability. 3. complexity of the derivation process and result this http URL.


[68] 2512.13557

Bidding Aggregated Flexibility in European Electricity Auctions

Bidding flexibility in day-ahead and intraday auctions would enable decentralized flexible resources, such as electric vehicles and heat pumps, to efficiently align their consumption with the intermittent generation of renewable energy. However, because these resources are individually too small to participate in those auctions directly, an aggregator (e.g., a utility) must act on their behalf. This requires aggregating many decentralized resources, which is a computationally challenging task. In this paper, we propose a computationally efficient and highly accurate method that is readily applicable to European day-ahead and intraday auctions. Distinct from existing methods, we aggregate only economically relevant power profiles, identified through price forecasts. The resulting flexibility is then conveyed to the market operator via exclusive groups of block bids. We evaluate our method for a utility serving the Swiss town of Losone, where flexibility from multiple heat pumps distributed across the grid must be aggregated and bid in the Swiss day-ahead auction. Results show that our method aggregates accurately, achieving 98% of the theoretically possible cost savings. This aggregation accuracy remains stable even as the number of heat pumps increases, while computation time grows only linearly, demonstrating strong scalability.


[69] 2512.13605

A new data weighted averaging algorithm to reduce tones in the signal band

Digital/Analog converters based on sigma-delta modulation are simple and unexpensive circuits featuring a signal bandwidth limited by speed constraints. Multi-bit modulators allow balancing complexity and speed by reducing the clock frequency and increasing the number of levels in the quantizer. In this case, the multi-bit digital to analog block (DAC) can reduce the performance of the entire system. Data Weighted Averaging (DWA) methods have been proposed to reduce the vulnerability to DAC errors at the cost of spurious tones in the signal band. This work analyzes the tone producing mechanism and proposes a modification of the DWA to remove spurious tones.


[70] 2512.13647

REVERB-FL: Server-Side Adversarial and Reserve-Enhanced Federated Learning for Robust Audio Classification

Federated learning (FL) enables a privacy-preserving training paradigm for audio classification but is highly sensitive to client heterogeneity and poisoning attacks, where adversarially compromised clients can bias the global model and hinder the performance of audio classifiers. To mitigate the effects of model poisoning for audio signal classification, we present REVERB-FL, a lightweight, server-side defense that couples a small reserve set (approximately 5%) with pre- and post-aggregation retraining and adversarial training. After each local training round, the server refines the global model on the reserve set with either clean or additional adversarially perturbed data, thereby counteracting non-IID drift and mitigating potential model poisoning without adding substantial client-side cost or altering the aggregation process. We theoretically demonstrate the feasibility of our framework, showing faster convergence and a reduced steady-state error relative to baseline federated averaging. We validate our framework on two open-source audio classification datasets with varying IID and Dirichlet non-IID partitions and demonstrate that REVERB-FL mitigates global model poisoning under multiple designs of local data poisoning.


[71] 2512.13652

Performance Limits of Hardware-Constrained THz Inter-Satellite MIMO-ISAC Systems

Terahertz inter-satellite links (THz-ISL) offer unprecedented bandwidth for future space networks but face fundamental constraints from onboard power and thermal budgets. This paper establishes theoretical performance limits for MIMO Integrated Sensing and Communication (ISAC) systems under per-element constant-envelope (CE) transmission constraints. We demonstrate that hardware distortions -- specifically power amplifier nonlinearity, ADC quantization, and oscillator phase noise -- impose a capacity ceiling that cannot be overcome by increasing transmit power. A unified link budget framework integrates wideband beam squint, aperture pointing errors, and colored noise sources through a spectral consistency principle that ensures residual phase noise is counted exactly once across communication and sensing analyses. The sensing bounds are derived via the Whittle-Fisher Information Matrix under a Constant Acceleration kinematic model with jerk noise, yielding closed-form scaling laws: residual phase noise variance scales as $\alpha^{-1}$ while dynamic state-estimation error (DSE) variance scales as $\alpha^{-5}$ with pilot overhead $\alpha$. Numerical results show divergent MIMO scaling: sensing precision improves with array size ($\mathrm{RMSE} \propto 1/\sqrt{N_t N_r}$), while the critical SNR exhibits scale invariance regarding array size, implying that the distortion-limited transition point stabilizes regardless of the array scale. The steep $\alpha^{-5}$ DSE scaling creates an operationally infeasible region at $\alpha < \alpha^* \approx 0.16$, where $\alpha^* = (C_{\mathrm{DSE}}/C_{\mathrm{PN}})^{1/4}$ -- a constraint-driven threshold under the adopted baseline for LEO operation. These findings provide design guidelines for hardware-efficient THz-ISL constellations.


[72] 2506.11176

Model Discovery and Graph Simulation: A Lightweight Gateway to Chaos Engineering

Chaos engineering reveals resilience risks but is expensive and operationally risky to run broadly and often. Model-based analyses can estimate dependability, yet in practice they are tricky to build and keep current because models are typically handcrafted. We claim that a simple connectivity-only topological model - just the service-dependency graph plus replica counts - can provide fast, low-risk availability estimates under fail-stop faults. To make this claim practical without hand-built models, we introduce model discovery: an automated step that can run in CI/CD or as an observability-platform capability, synthesizing an explicit, analyzable model from artifacts teams already have (e.g., distributed traces, service-mesh telemetry, configs/manifests) - providing an accessible gateway for teams to begin resilience testing. As a proof by instance on the DeathStarBench Social Network, we extract the dependency graph from Jaeger and estimate availability across two deployment modes and five failure rates. The discovered model closely tracks live fault-injection results; with replication, median error at mid-range failure rates is near zero, while no-replication shows signed biases consistent with excluded mechanisms. These results create two opportunities: first, to triage and reduce the scope of expensive chaos experiments in advance, and second, to generate real-time signals on the system's resilience posture as its topology evolves, preserving live validation for the most critical or ambiguous scenarios.


[73] 2511.01411

Extremal Contours: Gradient-driven contours for compact visual attribution

Faithful yet compact explanations for vision models remain a challenge, as commonly used dense perturbation masks are often fragmented and overfitted, needing careful post-processing. Here, we present a training-free explanation method that replaces dense masks with smooth tunable contours. A star-convex region is parameterized by a truncated Fourier series and optimized under an extremal preserve/delete objective using the classifier gradients. The approach guarantees a single, simply connected mask, cuts the number of free parameters by orders of magnitude, and yields stable boundary updates without cleanup. Restricting solutions to low-dimensional, smooth contours makes the method robust to adversarial masking artifacts. On ImageNet classifiers, it matches the extremal fidelity of dense masks while producing compact, interpretable regions with improved run-to-run consistency. Explicit area control also enables importance contour maps, yielding a transparent fidelity-area profiles. Finally, we extend the approach to multi-contour and show how it can localize multiple objects within the same framework. Across benchmarks, the method achieves higher relevance mass and lower complexity than gradient and perturbation based baselines, with especially strong gains on self-supervised DINO models where it improves relevance mass by over 15% and maintains positive faithfulness correlations.


[74] 2512.11826

FSL-HDnn: A 40 nm Few-shot On-Device Learning Accelerator with Integrated Feature Extraction and Hyperdimensional Computing

This paper introduces FSL-HDnn, an energy-efficient accelerator that implements the end-to-end pipeline of feature extraction and on-device few-shot learning (FSL). The accelerator addresses fundamental challenges of on-device learning (ODL) for resource-constrained edge applications through two synergistic modules: a parameter-efficient feature extractor employing weight clustering and an FSL classifier based on hyperdimensional computing (HDC). The feature extractor exploits the weight clustering mechanism to reduce computational complexity, while the HDC-based FSL classifier eliminates gradient-based back propagation operations, enabling single-pass training with substantially reduced latency. Additionally, FSL-HDnn enables low-latency ODL and inference via two proposed optimization strategies, including an early-exit mechanism with branch feature extraction and batched single-pass training that improves hardware utilization. Measurement results demonstrate that our chip fabricated in a 40 nm CMOS process delivers superior training energy efficiency of 6 mJ/image and end-to-end training throughput of 28 images/s on a 10-way 5-shot FSL task. The end-to-end training latency is also reduced by 2x to 20.9x compared to state-of-the-art ODL chips.


[75] 2512.11858

Adaptive Path Integral Diffusion: AdaPID

Diffusion-based samplers -- Score Based Diffusions, Bridge Diffusions and Path Integral Diffusions -- match a target at terminal time, but the real leverage comes from choosing the schedule that governs the intermediate-time dynamics. We develop a path-wise schedule -- selection gramework for Harmonic PID with a time-varying stiffness, exploiting Piece-Wise-Constant(PWC) parametrizations and a simple hierarchical refinement. We introduce schedule-sensitive Quality-of-Sampling (QoS) diagnostics. Assuming a Gaussian-Mixture (GM) target, we retain closed-form Green functions' ration and numerically stable, Neural-Network free oracles for predicted-state maps and score. Experiments in 2D show that QoS driven PWC schedules consistently improve early-exit fidelity, tail accuracy, conditioning of the dynamics, and speciation (label-selection) timing at fixed integration budgets.


[76] 2512.11859

Generative Stochastic Optimal Transport: Guided Harmonic Path-Integral Diffusion

We introduce Guided Harmonic Path-Integral Diffusion (GH-PID), a linearly-solvable framework for guided Stochastic Optimal Transport (SOT) with a hard terminal distribution and soft, application-driven path costs. A low-dimensional guidance protocol shapes the trajectory ensemble while preserving analytic structure: the forward and backward Kolmogorov equations remain linear, the optimal score admits an explicit Green-function ratio, and Gaussian-Mixture Model (GMM) terminal laws yield closed-form expressions. This enables stable sampling and differentiable protocol learning under exact terminal matching. We develop guidance-centric diagnostics -- path cost, centerline adherence, variance flow, and drift effort -- that make GH-PID an interpretable variational ansatz for empirical SOT. Three navigation scenarios illustrated in 2D: (i) Case A: hand-crafted protocols revealing how geometry and stiffness shape lag, curvature effects, and mode evolution; (ii) Case B: single-task protocol learning, where a PWC centerline is optimized to minimize integrated cost; (iii) Case C: multi-expert fusion, in which a commander reconciles competing expert/teacher trajectories and terminal beliefs through an exact product-of-experts law and learns a consensus protocol. Across all settings, GH-PID generates geometry-aware, trust-aware trajectories that satisfy the prescribed terminal distribution while systematically reducing integrated cost.


[77] 2512.11867

On the Dangers of Bootstrapping Generation for Continual Learning and Beyond

The use of synthetically generated data for training models is becoming a common practice. While generated data can augment the training data, repeated training on synthetic data raises concerns about distribution drift and degradation of performance due to contamination of the dataset. We investigate the consequences of this bootstrapping process through the lens of continual learning, drawing a connection to Generative Experience Replay (GER) methods. We present a statistical analysis showing that synthetic data introduces significant bias and variance into training objectives, weakening the reliability of maximum likelihood estimation. We provide empirical evidence showing that popular generative models collapse under repeated training with synthetic data. We quantify this degradation and show that state-of-the-art GER methods fail to maintain alignment in the latent space. Our findings raise critical concerns about the use of synthetic data in continual learning.


[78] 2512.11876

Traversability Aware Autonomous Navigation for Multi-Modal Mobility Morphobot (M4)

Autonomous navigation in unstructured environments requires robots to assess terrain difficulty in real-time and plan paths that balance efficiency with safety. This thesis presents a traversability-aware navigation framework for the M4 robot platform that uses learned terrain analysis to generate energy-efficient paths avoiding difficult this http URL approach uses FAST-LIO for real-time localization, generating 2.5D elevation maps from LiDAR point clouds. A CNN-based model processes these elevation maps to estimate traversability scores, which are converted into navigation costs for path planning. A custom A* planner incorporates these costs alongside geometric distance and energy consumption to find paths that trade modest distance increases for substantial terrain quality improvements. Before system development, a platform-agnostic study compared LiDAR-based and camera-based SLAM using OptiTrack ground truth. Point cloud comparison through ICP alignment and cloud-to-mesh distance analysis demonstrated that LiDAR-based mapping achieves centimeter-level precision essential for elevation mapping, while camera-based approaches exhibited significantly higher geometric error. These findings directly resulted in the selection of LiDAR as the primary sensor to generate elevation maps. The complete pipeline integrates FAST-LIO localization, GPU-accelerated elevation mapping, CNN-based traversability estimation, and Nav2 navigation with a custom traversability-aware planner. Experimental results demonstrate that the system successfully avoids low traversability regions and accepts a few longer paths to achieve a reduction in terrain cost. This work establishes a foundation for intelligent terrain-aware navigation applicable to multi-modal robotic platforms.


[79] 2512.11891

VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer

Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in generalizing across diverse robotic manipulation tasks. However, deploying these models in unstructured environments remains challenging due to the critical need for simultaneous task compliance and safety assurance, particularly in preventing potential collisions during physical interactions. In this work, we introduce a Vision-Language-Safe Action (VLSA) architecture, named AEGIS, which contains a plug-and-play safety constraint (SC) layer formulated via control barrier functions. AEGIS integrates directly with existing VLA models to improve safety with theoretical guarantees, while maintaining their original instruction-following performance. To evaluate the efficacy of our architecture, we construct a comprehensive safety-critical benchmark SafeLIBERO, spanning distinct manipulation scenarios characterized by varying degrees of spatial complexity and obstacle intervention. Extensive experiments demonstrate the superiority of our method over state-of-the-art baselines. Notably, AEGIS achieves a 59.16% improvement in obstacle avoidance rate while substantially increasing the task execution success rate by 17.25%. To facilitate reproducibility and future research, we make our code, models, and the benchmark datasets publicly available at this https URL.


[80] 2512.12013

Exploring Spatial-Temporal Representation via Star Graph for mmWave Radar-based Human Activity Recognition

Human activity recognition (HAR) requires extracting accurate spatial-temporal features with human movements. A mmWave radar point cloud-based HAR system suffers from sparsity and variable-size problems due to the physical features of the mmWave signal. Existing works usually borrow the preprocessing algorithms for the vision-based systems with dense point clouds, which may not be optimal for mmWave radar systems. In this work, we proposed a graph representation with a discrete dynamic graph neural network (DDGNN) to explore the spatial-temporal representation of human movement-related features. Specifically, we designed a star graph to describe the high-dimensional relative relationship between a manually added static center point and the dynamic mmWave radar points in the same and consecutive frames. We then adopted DDGNN to learn the features residing in the star graph with variable sizes. Experimental results demonstrated that our approach outperformed other baseline methods using real-world HAR datasets. Our system achieved an overall classification accuracy of 94.27\%, which gets the near-optimal performance with a vision-based skeleton data accuracy of 97.25\%. We also conducted an inference test on Raspberry Pi~4 to demonstrate its effectiveness on resource-constraint platforms. \sh{ We provided a comprehensive ablation study for variable DDGNN structures to validate our model design. Our system also outperformed three recent radar-specific methods without requiring resampling or frame aggregators.


[81] 2512.12021

Modified Hybrid A* Collision-Free Path-Planning for Automated Reverse Parking

Parking a vehicle in tight spaces is a challenging task to perform due to the scarcity of feasible paths that are also collision-free. This paper presents a strategy to tackle this kind of maneuver with a modified Hybrid-A* path-planning algorithm that combines the feasibility guarantee inherent in the standard Hybrid A* algorithm with the addition of static obstacle collision avoidance. A kinematic single-track model is derived to describe the low-speed motion of the vehicle, which is subsequently used as the motion model in the Hybrid A* path-planning algorithm to generate feasible motion primitive branches. The model states are also used to reconstruct the vehicle centerline, which, in conjunction with an inflated binary occupancy map, facilitates static obstacle collision avoidance functions. Simulation study and animation are set up to test the efficacy of the approach, and the proposed algorithm proves to consistently provide kinematically feasible trajectories that are also collision-free.


[82] 2512.12031

Differentially Private Community Detection in $h$-uniform Hypergraphs

This paper studies the exact recovery threshold subject to preserving the privacy of connections in $h$-uniform hypergraphs. Privacy is characterized by the $(\epsilon, \delta)$-hyperedge differential privacy (DP), an extension of the notion of $(\epsilon, \delta)$-edge DP in the literature. The hypergraph observations are modeled through a $h$-uniform stochastic block model ($h$-HSBM) in the dense regime. We investigate three differentially private mechanisms: stability-based, sampling-based, and perturbation-based mechanisms. We calculate the exact recovery threshold for each mechanism and study the contraction of the exact recovery region due to the privacy budget, $(\epsilon, \delta)$. Sampling-based mechanisms and randomized response mechanisms guarantee pure $\epsilon$-hyperedge DP where $\delta=0$, while the stability-based mechanisms cannot achieve this level of privacy. The dependence of the limits of the privacy budget on the parameters of the $h$-uniform hypergraph is studied. More precisely, it is proven rigorously that the minimum privacy budget scales logarithmically with the ratio between the density of in-cluster hyperedges and the cross-cluster hyperedges for stability-based and Bayesian sampling-based mechanisms, while this budget depends only on the size of the hypergraph for the randomized response mechanism.


[83] 2512.12046

Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.


[84] 2512.12111

Fractional Calculus in Optimal Control and Game Theory: Theory, Numerics, and Applications -- A Survey

Many physical, biological, and engineered systems exhibit memory effects that challenge Markovian models. Fractional calculus provides nonlocal operators to capture hereditary dynamics. This survey connects modeling, analysis, and controller/game design for systems with memory. We unify notation for Caputo, Riemann-Liouville, and Grunwald-Letnikov derivatives and relate them to practical approximations, including diffusive (sum-of-exponentials) state augmentation and frequency-domain realizations (e.g., Oustaloup). We review fractional extensions of the calculus of variations and the Pontryagin maximum principle, and dynamic-programming formulations with memory, including path-dependent HJB for optimal control and HJI for zero-sum games. We cover design tools such as LQR, MPC, and fractional-order PID, as well as fractional differential games with Nash, Stackelberg, and minimax equilibria. Computational approaches are compared across time-domain schemes, frequency-domain approximations, and diffusive augmentations, highlighting accuracy-complexity trade-offs and remedies for the curse of history (windowing and sum-of-exponentials). We conclude with applications and open problems on equilibria with memory, Isaacs-type conditions, constraint handling, and scalable solvers.


[85] 2512.12196

AutoMV: An Automatic Multi-Agent System for Music Video Generation

Music-to-Video (M2V) generation for full-length songs faces significant challenges. Existing methods produce short, disjointed clips, failing to align visuals with musical structure, beats, or lyrics, and lack temporal consistency. We propose AutoMV, a multi-agent system that generates full music videos (MVs) directly from a song. AutoMV first applies music processing tools to extract musical attributes, such as structure, vocal tracks, and time-aligned lyrics, and constructs these features as contextual inputs for following agents. The screenwriter Agent and director Agent then use this information to design short script, define character profiles in a shared external bank, and specify camera instructions. Subsequently, these agents call the image generator for keyframes and different video generators for "story" or "singer" scenes. A Verifier Agent evaluates their output, enabling multi-agent collaboration to produce a coherent longform MV. To evaluate M2V generation, we further propose a benchmark with four high-level categories (Music Content, Technical, Post-production, Art) and twelve ine-grained criteria. This benchmark was applied to compare commercial products, AutoMV, and human-directed MVs with expert human raters: AutoMV outperforms current baselines significantly across all four categories, narrowing the gap to professional MVs. Finally, we investigate using large multimodal models as automatic MV judges; while promising, they still lag behind human expert, highlighting room for future work.


[86] 2512.12211

Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving

Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.


[87] 2512.12314

Evaluating Asynchronous Semantics in Trace-Discovered Resilience Models: A Case Study on the OpenTelemetry Demo

While distributed tracing and chaos engineering are becoming standard for microservices, resilience models remain largely manual and bespoke. We revisit a trace-discovered connectivity model that derives a service dependency graph from traces and uses Monte Carlo simulation to estimate endpoint availability under fail-stop service failures. Compared to earlier work, we (i) derive the graph directly from raw OpenTelemetry traces, (ii) attach endpoint-specific success predicates, and (iii) add a simple asynchronous semantics that treats Kafka edges as non-blocking for immediate HTTP success. We apply this model to the OpenTelemetry Demo ("Astronomy Shop") using a GitHub Actions workflow that discovers the graph, runs simulations, and executes chaos experiments that randomly kill microservices in a Docker Compose deployment. Across the studied failure fractions, the model reproduces the overall availability degradation curve, while asynchronous semantics for Kafka edges change predicted availabilities by at most about 10^(-5) (0.001 percentage points). This null result suggests that for immediate HTTP availability in this case study, explicitly modeling asynchronous dependencies is not warranted, and a simpler connectivity-only model is sufficient.


[88] 2512.12366

ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) Systems

Diverse emerging VR applications integrate streaming of high fidelity 360 video content that requires ample amounts of computation and data rate. Scalable 360 video tiling enables having elastic VR computational tasks that can be scaled adaptively in computation and data rate based on the available user and system resources. We integrate scalable 360 video tiling in an edge-client wireless multi-connectivity architecture for joint elastic task computation offloading across multiple VR users called ElasticVR. To balance the trade-offs in communication, computation, energy consumption, and QoE that arise herein, we formulate a constrained QoE and energy optimization problem that integrates the multi-user/multi-connectivity action space with the elasticity of VR computational tasks. The ElasticVR framework introduces two multi-agent deep reinforcement learning solutions, namely CPPG and IPPG. CPPG adopts a centralized training and centralized execution approach to capture the coupling between users' communication and computational demands. This leads to globally coordinated decisions at the cost of increased computational overheads and limited scalability. To address the latter challenges, we also explore an alternative strategy denoted IPPG that adopts a centralized training with decentralized execution paradigm. IPPG leverages shared information and parameter sharing to learn robust policies; however, during execution, each user takes action independently based on its local state information only. The decentralized execution alleviates the communication and computation overhead of centralized decision-making and improves scalability. We show that the ElasticVR framework improves the PSNR by 43.21%, while reducing the response time and energy consumption by 42.35% and 56.83%, respectively, compared with a case where no elasticity is incorporated into VR computations.


[89] 2512.12427

Unifying Quadrotor Motion Planning and Control by Chaining Different Fidelity Models

Many aerial tasks involving quadrotors demand both instant reactivity and long-horizon planning. High-fidelity models enable accurate control but are too slow for long horizons; low-fidelity planners scale but degrade closed-loop performance. We present Unique, a unified MPC that cascades models of different fidelity within a single optimization: a short-horizon, high-fidelity model for accurate control, and a long-horizon, low-fidelity model for planning. We align costs across horizons, derive feasibility-preserving thrust and body-rate constraints for the point-mass model, and introduce transition constraints that match the different states, thrust-induced acceleration, and jerk-body-rate relations. To prevent local minima emerging from nonsmooth clutter, we propose a 3D progressive smoothing schedule that morphs norm-based obstacles along the horizon. In addition, we deploy parallel randomly initialized MPC solvers to discover lower-cost local minima on the long, low-fidelity horizon. In simulation and real flights, under equal computational budgets, Unique improves closed-loop position or velocity tracking by up to 75% compared with standard MPC and hierarchical planner-tracker baselines. Ablations and Pareto analyses confirm robust gains across horizon variations, constraint approximations, and smoothing schedules.


[90] 2512.12518

Robustness analysis in static and dynamic quantum state tomography

Quantum state tomography is a core task in quantum system identification. Real experimental conditions often deviate from nominal designs, introducing errors in both the measurement devices and the Hamiltonian governing the system's dynamics. In this paper, we investigate the robustness of quantum state tomography against such perturbations in both static and dynamic settings using linear regression estimation. We derive explicit bounds that quantify how bounded errors in the measurement devices and the Hamiltonian affect the mean squared error (MSE) upper bound in each scenario. Numerical simulations for qubit systems illustrate how these bounds scale with resources.


[91] 2512.12590

Automatic Wire-Harness Color Sequence Detector

Wire harness inspection process remains a labor-intensive process prone to errors in the modern Electronics Manufacturing Services (EMS) industry. This paper introduces a semiautomated machine vision system capable of verifying correct wire positioning, correctness of the connector polarity and correctness of color sequences for both linear and circular wire harness configurations. Five industrial standard CMOS cameras are integrated into a modularized mechanical framework in the physical structure of the solution and a HSV and RGB color domain value comparison based color sequence classifier is used in the operation. For each harness batch, a user can train the system using at least five reference samples; the trained file is stored and reused for similar harness types. The Solution is deployed at GPV Lanka Pvt. Ltd. (Fig. 2) and the system achieved 100% detection accuracy and reduced inspection time by 44% compared to manual methods. Additional features include user management, adjustable lighting, session data storage, and secure login. Results of this product usage in the real world situation demonstrate that this approach delivers reliable and efficient inspection capabilities.


[92] 2512.12649

Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller

Parameter tuning in real-world experiments is constrained by the limited evaluation budget available on hardware. The path-following controller studied in this paper reflects a typical situation in nonlinear geometric controller, where multiple gains influence the dynamics through coupled nonlinear terms. Such interdependence makes manual tuning inefficient and unlikely to yield satisfactory performance within a practical number of trials. To address this challenge, we propose a Bayesian optimization (BO) framework that treats the closed-loop system as a black box and selects controller gains using a Gaussian-process surrogate. BO offers model-free exploration, quantified uncertainty, and data-efficient search, making it well suited for tuning tasks where each evaluation is costly. The framework is implemented on Honda's AI-Formula three-wheeled robot and assessed through repeated full-lap experiments on a fixed test track. The results show that BO improves controller performance within 32 trials, including 15 warm-start initial evaluations, indicating that it can efficiently locate high-performing regions of the parameter space under real-world conditions. These findings demonstrate that BO provides a practical, reliable, and data-efficient tuning approach for nonlinear path-following controllers on real robotic platforms.


[93] 2512.12736

Personalized QoE Prediction: A Demographic-Augmented Machine Learning Framework for 5G Video Streaming Networks

Quality of Experience (QoE) prediction is a critical component of modern multimedia systems, particularly for adaptive video streaming in 5G networks. Accurate QoE estimation enables intelligent resource management and supports user centric service delivery. Existing QoE prediction approaches primarily rely on limited datasets and assume uniform user perception, which restricts their applicability in heterogeneous real world environments. This paper proposes a demographic aware machine learning framework for personalized QoE prediction. We introduce a behaviorally realistic demographic based data augmentation strategy that expands a small QoE dataset six fold by modeling varying user sensitivities to streaming impairments such as rebuffering, bitrate variation, and quality degradation. Using the augmented dataset, we evaluate a comprehensive set of classical machine learning models alongside advanced deep learning architectures, including an attention-based MLP and TabNet. Experimental results demonstrate significant improvements in prediction accuracy across RMSE, MAE, and R metrics compared to baseline models. Among all evaluated approaches, TabNet achieves the strongest performance, benefiting from its inherent feature selection and attention mechanisms. The results confirm that demographic-aware augmentation substantially enhances QoE prediction robustness and provides a scalable direction for personalized QoE-aware intelligence in 5G video streaming networks.


[94] 2512.12758

From Information Freshness to Semantics of Information and Goal-oriented Communications

Future wireless networks must support real-time, data-driven cyber-physical systems in which communication is tightly coupled with sensing, inference, control, and decision-making. Traditional communication paradigms centered on accuracy, throughput, and latency are increasingly inadequate for these systems, where the value of information depends on its semantic relevance to a specific task. This paper provides a unified exposition of the progression from classical distortion-based frameworks, through information freshness metrics such as the Age of Information (AoI) and its variants, to the emerging paradigm of goal-oriented semantics-aware communication. We organize and systematize existing semantics-aware metrics, including content- and version-aware measures, context-dependent distortion formulations, and history-dependent error persistence metrics that capture lasting impact and urgency. Within this framework, we highlight how these metrics address the limitations of purely accuracy- or freshness-centric designs, and how they collectively enable the selective generation and transmission of only task-relevant information. We further review analytical tools based on Markov decision process (MDP) and Lyapunov optimization methods that have been employed to characterize optimal or near-optimal timing and scheduling policies under semantic performance criteria and communication constraints. By synthesizing these developments into a coherent framework, the paper clarifies the design principles underlying goal-oriented, semantics-aware communication systems. It illustrates how they can significantly improve efficiency, reliability, and task performance. The presented perspective aims to serve as a bridge between information-theoretic, control-theoretic, and networking viewpoints, and to guide the design of semantic communication architectures for 6G and beyond.


[95] 2512.12850

KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.


[96] 2512.12855

MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems

Modern engineering systems, such as autonomous vehicles, flexible robotics, and intelligent aerospace platforms, require controllers that are robust to uncertainties, adaptive to environmental changes, and safety-aware under real-time constraints. RL offers powerful data-driven adaptability for systems with nonlinear dynamics that interact with uncertain environments. RL, however, lacks built-in mechanisms for dynamic constraint satisfaction during exploration. MPC offers structured constraint handling and robustness, but its reliance on accurate models and computationally demanding online optimization may pose significant challenges. This paper proposes an integrated MPC-RL framework that combines stability and safety guarantees of MPC with the adaptability of RL. During training, MPC defines safe control bounds that guide the RL component and that enable constraint-aware policy learning. At deployment, the learned policy operates in real time with a lightweight safety filter based on Lipschitz continuity to ensure constraint satisfaction without heavy online optimizations. The approach, which is validated on a nonlinear aeroelastic wing system, demonstrates improved disturbance rejection, reduced actuator effort, and robust performance under turbulence. The architecture generalizes to other domains with structured nonlinearities and bounded disturbances, offering a scalable solution for safe artificial-intelligence-driven control in engineering applications.


[97] 2512.13144

Weight Space Correlation Analysis: Quantifying Feature Utilization in Deep Learning Models

Deep learning models in medical imaging are susceptible to shortcut learning, relying on confounding metadata (e.g., scanner model) that is often encoded in image embeddings. The crucial question is whether the model actively utilizes this encoded information for its final prediction. We introduce Weight Space Correlation Analysis, an interpretable methodology that quantifies feature utilization by measuring the alignment between the classification heads of a primary clinical task and auxiliary metadata tasks. We first validate our method by successfully detecting artificially induced shortcut learning. We then apply it to probe the feature utilization of an SA-SonoNet model trained for Spontaneous Preterm Birth (sPTB) prediction. Our analysis confirmed that while the embeddings contain substantial metadata, the sPTB classifier's weight vectors were highly correlated with clinically relevant factors (e.g., birth weight) but decoupled from clinically irrelevant acquisition factors (e.g. scanner). Our methodology provides a tool to verify model trustworthiness, demonstrating that, in the absence of induced bias, the clinical model selectively utilizes features related to the genuine clinical signal.


[98] 2512.13170

Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks

Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.


[99] 2512.13183

Efficient Generation of Smooth Paths with Curvature Guarantees by Mollification

Most path following and trajectory tracking algorithms in mobile robotics require the desired path or trajectory to be defined by at least twice continuously differentiable functions to guarantee key properties such as global convergence, especially for nonholonomic robots like unicycles with speed constraints. Consequently, these algorithms typically exclude continuous but non-differentiable paths, such as piecewise functions. Despite this exclusion, such paths provide convenient high-level inputs for describing robot missions or behavior. While techniques such as spline interpolation or optimization-based methods are commonly used to smooth non-differentiable paths or create feasible ones from sequences of waypoints, they either can produce unnecessarily complex trajectories or are computationally expensive. In this work, we present a method to regularize non-differentiable functions and generate feasible paths through mollification. Specifically, we approximate an arbitrary path with a differentiable function that can converge to it with arbitrary precision. Additionally, we provide a systematic method for bounding the curvature of generated paths, which we demonstrate by applying it to paths resulting from linking a sequence of waypoints with segments. The proposed approach is computationally efficient, enabling real-time implementation on microcontrollers and compatibility with standard trajectory tracking and path following algorithms.


[100] 2512.13214

Differentiable Material Point Method for the Control of Deformable Objects

Controlling the deformation of flexible objects is challenging due to their non-linear dynamics and high-dimensional configuration space. This work presents a differentiable Material Point Method (MPM) simulator targeted at control applications. We exploit the differentiability of the simulator to optimize a control trajectory in an active damping problem for a hyperelastic rope. The simulator effectively minimizes the kinetic energy of the rope around 2$\times$ faster than a baseline MPPI method and to a 20% lower energy level, while using about 3% of the computation time.


[101] 2512.13319

Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation

This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs), with the goal of improving computational speed on parallel architectures. The MAP estimation problem is reformulated as a continuous-time optimal control problem based on the Onsager-Machlup functional. This reformulation enables the use of a previously proposed parallel-in-time solution for optimal control problems, which we adapt to the current problem. The structure of the resulting optimal control problem admits a parallel solution based on parallel associative scan algorithms. In the linear Gaussian special case, it yields a parallel Kalman-Bucy filter and a parallel continuous-time Rauch-Tung-Striebel smoother. These linear computational methods are further extended to nonlinear continuous-time state-space models through Taylor expansions. We also present the corresponding parallel two-filter smoother. The graphics processing unit (GPU) experiments on linear and nonlinear models demonstrate that the proposed framework achieves a significant speedup in computations while maintaining the accuracy of sequential algorithms.


[102] 2512.13397

rNCA: Self-Repairing Segmentation Masks

Accurately predicting topologically correct masks remains a difficult task for general segmentation models, which often produce fragmented or disconnected outputs. Fixing these artifacts typically requires hand-crafted refinement rules or architectures specialized to a particular task. Here, we show that Neural Cellular Automata (NCA) can be directly re-purposed as an effective refinement mechanism, using local, iterative updates guided by image context to repair segmentation masks. By training on imperfect masks and ground truths, the automaton learns the structural properties of the target shape while relying solely on local information. When applied to coarse, globally predicted masks, the learned dynamics progressively reconnect broken regions, prune loose fragments and converge towards stable, topologically consistent results. We show how refinement NCA (rNCA) can be easily applied to repair common topological errors produced by different base segmentation models and tasks: for fragmented retinal vessels, it yields 2-3% gains in Dice/clDice and improves Betti errors, reducing $\beta_0$ errors by 60% and $\beta_1$ by 20%; for myocardium, it repairs 61.5% of broken cases in a zero-shot setting while lowering ASSD and HD by 19% and 16%, respectively. This showcases NCA as effective and broadly applicable refiners.


[103] 2512.13458

SSAS: Cross-subject EEG-based Emotion Recognition through Source Selection with Adversarial Strategy

Electroencephalographic (EEG) signals have long been applied in the field of affective brain-computer interfaces (aBCIs). Cross-subject EEG-based emotion recognition has demonstrated significant potential in practical applications due to its suitability across diverse people. However, most studies on cross-subject EEG-based emotion recognition neglect the presence of inter-individual variability and negative transfer phenomena during model training. To address this issue, a cross-subject EEG-based emotion recognition through source selection with adversarial strategy is introduced in this paper. The proposed method comprises two modules: the source selection network (SS) and the adversarial strategies network (AS). The SS uses domain labels to reverse-engineer the training process of domain adaptation. Its key idea is to disrupt class separability and magnify inter-domain differences, thereby raising the classification difficulty and forcing the model to learn domain-invariant yet emotion-relevant representations. The AS gets the source domain selection results and the pretrained domain discriminators from SS. The pretrained domain discriminators compute a novel loss aimed at enhancing the performance of domain classification during adversarial training, ensuring the balance of adversarial strategies. This paper provides theoretical insights into the proposed method and achieves outstanding performance on two EEG-based emotion datasets, SEED and SEED-IV. The code can be found at this https URL.


[104] 2512.13477

Evaluating the Navigation Capabilities of a Modified COAST Guidewire Robot in an Anatomical Phantom Model

To address the issues that arise due to the manual navigation of guidewires in endovascular interventions, research in medical robotics has taken a strong interest in developing robotically steerable guidewires, which offer the possibility of enhanced maneuverability and navigation, as the tip of the guidewire can be actively steered. The COaxially Aligned STeerable (COAST) guidewire robot has the ability to generate a wide variety of motions including bending motion with different bending lengths, follow-the-leader motion, and feedforward motion. In our past studies, we have explored different designs of the COAST guidewire robot and developed modeling, control, and sensing strategies for the COAST guidewire robot. In this study, the performance of a modified COAST guidewire robot is evaluated by conducting navigation experiments in an anatomical phantom model with pulsatile flow. The modified COAST guidewire robot is a simplified version of the COAST guidewire robot and consists of two tubes as opposed to three tubes. Through this study, we demonstrate the effectiveness of the modified COAST guidewire robot in navigating the tortuous phantom vasculature.


[105] 2512.13527

DarkSPARC: Dark-Blood Spectral Self-Calibrated Reconstruction of 3D Left Atrial LGE MRI for Post-Ablation Scar Imaging

Purpose: To develop DarkSPARC, a retrospective, training-free, self-calibrated spectral reconstruction method that converts routine bright-blood 3D left atrial (LA) late gadolinium enhancement (LGE) MRI into a dark-blood image, and to quantify its impact on LA scar-pool CNR, SNR, effective CNR (eCNR), and scar quantification accuracy. Methods: DarkSPARC embeds bright-blood LA LGE into a calibrator-conditioned (N+1)-dimensional spectral domain and reconstructs a dark-blood-like image using scan-specific spectral landmarks. A scan-specific 3D numerical phantom framework was built from LAScarQS post-ablation LGE by cloning remote myocardium into the LA wall and imposing controlled scar burden. Five baseline cases spanning the 5th-95th percentiles of native scar-pool CNR, each with multiple scar burdens and 10 CNR degradation levels, yielded 200 phantoms. For every phantom, LA scar-pool CNR, SNR, eCNR, and Scar% were measured on bright-blood and DarkSPARC images. In vivo performance was evaluated in 60 public post-ablation scans of atrial fibrillation patients. Results: In scan-specific phantoms, DarkSPARC increased LA scar-pool CNR, SNR, and eCNR over bright-blood in all 200 experiments, with DarkSPARC/bright-blood ratios up to about 30-fold for CNR and about 6-fold for SNR in the lowest-CNR conditions. At 70% CNR degradation, bright-blood underestimated ground-truth LA Scar% by -37% to -54%, whereas DarkSPARC reduced bias to about -3% to -5%. In vivo, DarkSPARC similarly improved metrics: median scar-pool CNR, SNR, and eCNR increased from 20.0 to 135.9 (6.8x), 70.6 to 200.6 (2.8x), and 0.22 to 0.75 (3.4x), respectively (all p<0.001), and LA Scar% increased from 3.9% to 9.75%. Conclusion: DarkSPARC is a self-calibrated, training-free reconstruction that yields dark-blood 3D LA LGE, boosting CNR/SNR/eCNR and stabilizing reliable scar quantification without extra scans.


[106] 2402.17247

Inverse Optimal Control for Linear Quadratic Tracking with Unknown Target States

This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determination of both the linear quadratic cost function as well as the target state. A computationally efficient and numerically reliable parameter identification algorithm is proposed by equating optimal control strategies with a system of linear equations, and the associated relative error upper bound is derived in terms of data volume and signal-to-noise ratio. Moreover, the proposed inverse optimal control algorithm is applied for the joint cluster coordination and intent identification of a multi-agent system. By incorporating the structural constraint of the Laplace matrix, the relative error upper bound can be reduced accordingly. Finally, the algorithm's efficiency and accuracy are validated by a vehicle-on-a-lever example and a multi-agent formation control example.


[107] 2405.13199

Tau Anomaly Detection in PET Imaging via Bilateral-Guided Deterministic Diffusion Model

The emergence of tau PET imaging over the last decade has enabled Alzheimer's disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajectories of AD. Current tau PET analysis methods, however, typically perform inferences on large cortical ROIs and are limited in the detection of localized tau pathology that varies across subjects. In this work, we propose a novel bilateral-guided deterministic diffusion sampling method to perform anomaly detection from tau PET imaging data. By including individualized brain structure and cognitively normal (CN) template conditions, our model computes a voxel-level anomaly map based on the deterministically sampled pseudo-healthy reconstruction. We train our model on ADNI CN subjects (n=380) and evaluate anomaly localization performance on the left MCI/AD subjects (n=154) and the preclinical subjects of the A4 clinical trial (n=447). We further train a CNN classifier on the derived 3D anomaly maps from ADNI, including CN and MCI/AD, to classify subjects into two groups and test classification performance on A4. We demonstrate that our method outperforms baselines in anomaly localization. Additionally, we show that our method can successfully group preclinical subjects with significantly different cognitive functions, highlighting the potential of our approach for application in preclinical screening tests. The code will be publicly available.


[108] 2406.07903

GAPses: Versatile smart glasses for comfortable and fully-dry acquisition and parallel ultra-low-power processing of EEG and EOG

Recent advancements in head-mounted wearable technology are revolutionizing the field of biopotential measurement, but the integration of these technologies into practical, user-friendly devices remains challenging due to issues with design intrusiveness, comfort, and data privacy. To address these challenges, this paper presents GAPses, a novel smart glasses platform designed for unobtrusive, comfortable, and secure acquisition and processing of electroencephalography (EEG) and electrooculography (EOG) signals. We introduce a direct electrode-electronics interface with custom dry soft electrodes to enhance comfort for long wear. An integrated parallel ultra-low-power RISC-V processor (GAP9, Greenwaves Technologies) processes data at the edge, thereby eliminating the need for continuous data streaming through a wireless link, enhancing privacy, and increasing system reliability in adverse channel conditions. We demonstrate the broad applicability of the designed prototype through validation in a number of EEG-based interaction tasks, including alpha waves, steady-state visual evoked potential analysis, and motor movement classification. Furthermore, we demonstrate an EEG-based biometric subject recognition task, where we reach a sensitivity and specificity of 98.87% and 99.86% respectively, with only 8 EEG channels and an energy consumption per inference on the edge as low as 121 $\mu J$. Moreover, in an EOG-based eye movement classification task, we reach an accuracy of 96.68% on 11 classes, resulting in an information transfer rate of 94.78 bit/min, which can be further increased to 161.43 bit/min by reducing the accuracy to 81.43%. The deployed implementation has an energy consumption of 40 $\mu J$ per inference and a total system power of only 12.4 mW, of which only 1.61% is used for classification, allowing for continuous operation of more than 22 h with a small 75 mAh battery.


[109] 2407.15226

Variational Bayesian Inference for Multiple Extended Targets or Unresolved Group Targets Tracking

In this work, we propose a method for tracking multiple extended targets or unresolvable group targets in a clutter environment. Firstly, based on the Random Matrix Model (RMM), the joint state of the target is modeled as the Gamma Gaussian Inverse Wishart (GGIW) distribution. Considering the uncertainty of measurement origin caused by the clutters, we adopt the idea of probabilistic data association and describe the joint association event as an unknown parameter in the joint prior distribution. Then the Variational Bayesian Inference (VBI) is employed to approximately solve the non-analytical posterior distribution. Furthermore, to ensure the practicability of the proposed method, we further provide two potential lightweight schemes to reduce its computational complexity. One of them is based on clustering, which effectively prunes the joint association events. The other is a simplification of the variational posterior through marginal association probabilities. Finally, the effectiveness of the proposed method is demonstrated by simulation and real data experiments, and we show that the proposed method outperforms current state-of-the-art methods in terms of accuracy and adaptability.


[110] 2407.15870

CIC: Circular Image Compression

Learned image compression (LIC) is currently the cutting-edge method. However, the inherent difference between testing and training images of LIC results in performance degradation to some extent. Especially for out-of-sample, out-of-distribution, or out-of-domain testing images, the performance of LIC dramatically degraded. Classical LIC is a serial image compression (SIC) approach that utilizes an open-loop architecture with serial encoding and decoding units. Nevertheless, according to the theory of automatic control, a closed-loop architecture holds the potential to improve the dynamic and static performance of LIC. Therefore, a circular image compression (CIC) approach with closed-loop encoding and decoding elements is proposed to minimize the gap between testing and training images and upgrade the capability of LIC. The proposed CIC establishes a nonlinear loop equation and proves that steady-state error between reconstructed and original images is close to zero by Taylor series expansion. The proposed CIC method possesses the property of Post-Training and plug-and-play which can be built on any existing advanced SIC methods. Experimental results on five public image compression datasets demonstrate that the proposed CIC outperforms five competing state-of-the-art open-source SIC algorithms in reconstruction capacity. Experimental results further show that the proposed method is suitable for out-of-sample testing images with dark backgrounds, sharp edges, high contrast, grid shapes, or complex patterns.


[111] 2407.21600

Robust Simultaneous Multislice MRI Reconstruction Using Slice-Wise Learned Generative Diffusion Priors

Simultaneous multislice (SMS) imaging is a powerful technique for accelerating magnetic resonance imaging (MRI) acquisitions. However, SMS reconstruction remains challenging due to complex signal interactions between and within the excited slices. In this study, we introduce ROGER, a robust SMS MRI reconstruction method based on deep generative priors. Utilizing denoising diffusion probabilistic models (DDPM), ROGER begins with Gaussian noise and gradually recovers individual slices through reverse diffusion iterations while enforcing data consistency from measured k-space data within the readout concatenation framework. The posterior sampling procedure is designed such that the DDPM training can be performed on single-slice images without requiring modifications for SMS tasks. Additionally, our method incorporates a low-frequency enhancement (LFE) module to address the practical issue that SMS-accelerated fast spin echo (FSE) and echo planar imaging (EPI) sequences cannot easily embed fully-sampled autocalibration signals. Extensive experiments on both retrospectively and prospectively accelerated datasets demonstrate that ROGER consistently outperforms existing methods, enhancing both anatomical and functional imaging with strong out-of-distribution generalization. The source code and sample data for ROGER are available at this https URL.


[112] 2408.04210

Adaptive Cohen's Class Time-Frequency Distribution

Inspired by the use of adaptive kernel-based Cohen's class time-frequency distributions (CCTFDs) for cross-term suppression, this paper aims to explore novel adaptive kernel functions for denoising. We integrate Wiener filter principle and the time-frequency filtering mechanism of CCTFD to design the least-squares adaptive filter method in the Wigner-Ville distribution (WVD) domain, giving birth to the least-squares adaptive filter-based CCTFD whose kernel function can be adjusted with the input signal automatically to achieve the minimum mean-square error denoising in the WVD domain. Some examples are also carried out to demonstrate that the proposed adaptive CCTFD outperforms some state-of-the-arts in noise suppression.


[113] 2409.07770

Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models

Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing approaches have underutilized the multi-layered nature of SSL encoders. To address this limitation, we propose the layer-aware time-delay neural network (L-TDNN), which directly performs layer/frame-wise processing on the layer-wise hidden state outputs from pre-trained models, extracting fixed-size speaker vectors. L-TDNN comprises a layer-aware convolutional network, a frame-adaptive layer aggregation, and attentive statistic pooling, explicitly modeling of the recognition and processing of previously overlooked layer dimension. We evaluated L-TDNN across multiple speech SSL Transformers and diverse speech-speaker corpora against other approaches for leveraging pre-trained encoders. L-TDNN consistently demonstrated robust verification performance, achieving the lowest error rates throughout the experiments. Concurrently, it stood out in terms of model compactness and exhibited inference efficiency comparable to the existing systems. These results highlight the advantages derived from the proposed layer-aware processing approach. Future work includes exploring joint training with SSL frontends and the incorporation of score calibration to further enhance state-of-the-art verification performance.


[114] 2409.11169

MAISI: Medical AI for Synthetic Imaging

Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion model to produce high-resolution CT images (up to a landmark volume dimension of 512 x 512 x 768 ) with flexible volume dimensions and voxel spacing. By incorporating ControlNet, MAISI can process organ segmentation, including 127 anatomical structures, as additional conditions and enables the generation of accurately annotated synthetic images that can be used for various downstream tasks. Our experiment results show that MAISI's capabilities in generating realistic, anatomically accurate images for diverse regions and conditions reveal its promising potential to mitigate challenges using synthetic data.


[115] 2409.18303

Deep-ER: Deep Learning ECCENTRIC Reconstruction for fast high-resolution neurometabolic imaging

Introduction: Altered neurometabolism is an important pathological mechanism in many neurological diseases and brain cancer, which can be mapped non-invasively by Magnetic Resonance Spectroscopic Imaging (MRSI). Advanced MRSI using non-cartesian compressed-sense acquisition enables fast high-resolution metabolic imaging but has lengthy reconstruction times that limits throughput and needs expert user interaction. Here, we present a robust and efficient Deep Learning reconstruction to obtain high-quality metabolic maps. Methods: Fast high-resolution whole-brain metabolic imaging was performed at 3.4 mm$^3$ isotropic resolution with acquisition times between 4:11-9:21 min:s using ECCENTRIC pulse sequence on a 7T MRI scanner. Data were acquired in a high-resolution phantom and 27 human participants, including 22 healthy volunteers and 5 glioma patients. A deep neural network using recurring interlaced convolutional layers with joint dual-space feature representation was developed for deep learning ECCENTRIC reconstruction (Deep-ER). 21 subjects were used for training and 6 subjects for testing. Deep-ER performance was compared to conventional iterative Total Generalized Variation reconstruction using image and spectral quality metrics. Results: Deep-ER demonstrated 600-fold faster reconstruction than conventional methods, providing improved spatial-spectral quality and metabolite quantification with 12%-45% (P<0.05) higher signal-to-noise and 8%-50% (P<0.05) smaller Cramer-Rao lower bounds. Metabolic images clearly visualize glioma tumor heterogeneity and boundary. Conclusion: Deep-ER provides efficient and robust reconstruction for sparse-sampled MRSI. The accelerated acquisition-reconstruction MRSI is compatible with high-throughput imaging workflow. It is expected that such improved performance will facilitate basic and clinical MRSI applications.


[116] 2410.00746

WALINET: A water and lipid identification convolutional Neural Network for nuisance signal removal in 1H MR Spectroscopic Imaging

Purpose. Proton Magnetic Resonance Spectroscopic Imaging (1H-MRSI) provides non-invasive spectral-spatial mapping of metabolism. However, long-standing problems in whole-brain 1H-MRSI are spectral overlap of metabolite peaks with large lipid signal from scalp, and overwhelming water signal that distorts spectra. Fast and effective methods are needed for high-resolution 1H-MRSI to accurately remove lipid and water signals while preserving the metabolite signal. The potential of supervised neural networks for this task remains unexplored, despite their success for other MRSI processing. Methods. We introduce a deep-learning method based on a modified Y-NET network for water and lipid removal in whole-brain 1H-MRSI. The WALINET (WAter and LIpid neural NETwork) was compared to conventional methods such as the state-of-the-art lipid L2 regularization and Hankel-Lanczos singular value decomposition (HLSVD) water suppression. Methods were evaluated on simulated and in-vivo whole-brain MRSI using NMRSE, SNR, CRLB, and FWHM metrics. Results. WALINET is significantly faster and needs 8s for high-resolution whole-brain MRSI, compared to 42 minutes for conventional HLSVD+L2. Quantitative analysis shows WALINET has better performance than HLSVD+L2: 1) more lipid removal with 41% lower NRMSE, 2) better metabolite signal preservation with 71% lower NRMSE in simulated data, 155% higher SNR and 50% lower CRLB in in-vivo data. Metabolic maps obtained by WALINET in healthy subjects and patients show better gray/white-matter contrast with more visible structural details. Conclusions. WALINET has superior performance for nuisance signal removal and metabolite quantification on whole-brain 1H-MRSI compared to conventional state-of-the-art techniques. This represents a new application of deep-learning for MRSI processing, with potential for automated high-throughput workflow.


[117] 2410.00976

Learning Dissipative Chaotic Dynamics with Boundedness Guarantees

Chaotic dynamics, commonly seen in weather systems and fluid turbulence, are characterized by their sensitivity to initial conditions, which makes accurate prediction challenging. Recent approaches have focused on developing data-driven models that attempt to preserve invariant statistics over long horizons since many chaotic systems exhibit dissipative behaviors and ergodicity. Despite the recent progress in such models, they are still often prone to generating unbounded trajectories, leading to invalid statistics evaluation. To address this fundamental challenge, we introduce a modular framework that provides formal guarantees of trajectory boundedness for neural network chaotic dynamics models. Our core contribution is a dissipative projection layer that leverages control-theoretic principles to ensure the learned system is dissipative. Specifically, our framework simultaneously learns a dynamics emulator and an energy-like function, where the latter is used to construct an algebraic dissipative constraint within the projection layer. Furthermore, the learned invariant level set provides an outer estimate for the system's strange attractor, which is known to be difficult to characterize due to its complex geometry. We demonstrate our model's ability to produce bounded long-horizon forecasts that preserve invariant statistics for chaotic dynamical systems, including Lorenz 96 and a reduced-order model of the Kuramoto-Sivashinsky equation.


[118] 2410.13436

Multiframe Detection via Graph Neural Networks: A Link Prediction Approach

Multi-frame detection algorithms can effectively utilize the correlation between consecutive echoes to improve the detection performance of weak targets. Existing efficient multi-frame detection algorithms are typically based on three sequential steps: plot extraction via a relative low primary threshold, track search and track detection. However, these three-stage processing algorithms may result in a notable loss of detection performance and do not fully leverage the available echo information across frames. As to applying graph neural networks in multi-frame detection, the algorithms are primarily based on node classification tasks, which cannot directly output target tracks. In this paper, we reformulate the multi-frame detection problem as a link prediction task in graphs. First, we perform a rough association of multi-frame observations that exceed the low threshold to construct observation association graphs. Subsequently, a multi-feature link prediction network is designed based on graph neural networks, which integrates multi-dimensional information, including echo structure, Doppler information, and spatio-temporal coupling of plots. By leveraging the principle of link prediction, we unifies the processes of track search and track detection into one step to reduce performance loss and directly output target tracks. Experimental results indicate that, compared with traditional single-frame and multi-frame detection algorithms, the proposed algorithm improves the detection performance of weak targets while suppressing false alarms. Additionally, interpretable analysis shows that the designed network effectively integrates the utilized features, allowing for accurate associations between targets and false alarms.


[119] 2411.13475

Efficient and Physically-Consistent Modeling of Reconfigurable Electromagnetic Structures

Reconfigurable electromagnetic structures (REMSs), such as reconfigurable reflectarrays (RRAs) or reconfigurable intelligent surfaces (RISs), hold significant potential to improve the spectral efficiency of wireless communication systems and the accuracy of wireless sensing systems. Even though several REMS modeling approaches have been proposed in recent years, the literature lacks models that are both computationally efficient and physically consistent. As a result, algorithms that control the reconfigurable elements of REMSs (e.g., the phase shifts of a RIS) are often built on simplistic and thus inaccurate models. To enable physically accurate REMS-parameter tuning, we present a new framework for efficient and physically consistent modeling of general REMSs. Our modeling method combines a circuit-theoretic approach with a new formalism that describes a REMS's interaction with the electromagnetic (EM) waves in its far-field region. Our modeling method enables efficient computation of the entire far-field radiation pattern for arbitrary configurations of the REMS reconfigurable elements once a single full-wave EM simulation of the non-reconfigurable parts of the REMS has been performed. The predictions made by our framework align with the physical laws of classical electrodynamics and model effects caused by inter-antenna coupling, non-reciprocal materials, polarization, ohmic losses, matching losses, influence of metallic housings, noise from low-noise amplifiers, and noise arising in or received by antennas. In order to validate the efficiency and accuracy of our modeling approach, we (i) compare our modeling method to EM simulations and (ii) conduct a case study involving an RRA that enables simultaneous multiuser beam- and null-forming using a new, computationally efficient, and physically accurate parameter tuning algorithm.


[120] 2412.17827

A Physics-Embedded Dual-Learning Imaging Framework for Electrical Impedance Tomography

Electrical Impedance Tomography (EIT) is a promising noninvasive imaging technique that reconstructs the spatial conductivity distribution from boundary voltage measurements. However, it poses a highly nonlinear and ill-posed inverse problem. Traditional regularization-based methods are sensitive to noise and often produce significant artifacts. Physics-Embedded learning frameworks, particularly Physics-Informed Neural Networks (PINNs), have shown success in solving such inverse problems under ideal conditions with abundant internal data. Yet in practical EIT applications, only sparse and noisy boundary measurements are available. Moreover, changing boundary excitations require the simultaneous training of multiple forward networks and one inverse network, which significantly increases computational complexity and hampers convergence. To overcome these limitations, we propose a Physics-Embedded Dual-Learning Imaging Framework for EIT. The dual-learning strategy is composed of a supervised CNN-based forward network, which learns to predict a discrete internal potential distribution under fixed Neumann-to-Dirichlet boundary conditions, and an unsupervised PINN-based inverse network, which reconstructs the conductivity by enforcing the governing PDE through discrete numerical differentiation of the predicted potentials. This decoupled architecture removes the need for smooth conductivity assumptions, reduces the number of forward networks required from $K$ to 1, and improves reconstruction robustness and efficiency under realistic measurement constraints.(this https URL)


[121] 2412.17852

Compact Neural Network Algorithm for Electrocardiogram Classification

In this paper, we present a powerful, compact electrocardiogram (ECG) classification algorithm for cardiac arrhythmia diagnosis that addresses the current reliance on deep learning and convolutional neural networks (CNNs) in ECG analysis. This work aims to reduce the demand for deep learning, which often requires extensive computational resources and large labeled datasets. Our approach introduces an artificial neural network (ANN) with a simple architecture combined with advanced feature engineering techniques. A key contribution of this work is the incorporation of 17 engineered features that enable the extraction of critical patterns from raw ECG signals. By integrating mathematical transformations, signal processing methods, and data extraction algorithms, our model captures the morphological and physiological characteristics of ECG signals with high efficiency, without requiring deep learning. Our method demonstrates a similar performance to other state-of-the-art models in classifying 4 types of arrhythmias, including atrial fibrillation, sinus tachycardia, sinus bradycardia, and ventricular flutter. Our algorithm achieved an accuracy of 97.36% on the MIT-BIH and St. Petersburg INCART arrhythmia databases. Our approach offers a practical and feasible solution for real-time diagnosis of cardiac disorders in medical applications, particularly in resource-limited environments.


[122] 2503.03199

PathRWKV: Enabling Whole Slide Prediction with Recurrent-Transformer

Pathological diagnosis is essential for cancer diagnosis, with whole slide image (WSI) providing histopathological and cellular information. Recent deep learning advancements have improved WSI analysis through a two-stage paradigm: tile-level feature extraction followed by slide-level modeling. In this paradigm, Transformer-based models surpass traditional multiple instance learning approaches in accuracy, yet still face four core limitations: (1) inadequate handling of variable tissue sizes across slides, (2) inability to effectively infer from all tiles for slide-level conclusions, (3) challenges in balancing model complexity with limited training data, and (4) difficulty balancing training efficiency and inference performance. Consequently, these issues limit whole-slide perception for diagnosis with restricted WSI training scales. To address them, we introduce PathRWKV, a novel state space model for slide-level feature modeling. To handle variable tissue sizes, PathRWKV employs two modules: Time Mix and Channel Mix, enabling dynamic perception of tiles for improved slide-level modeling. To draw effective conclusions, we propose an asymmetric design that samples tiles during training and iterates over all tiles at inference, scaling up to cover the entire slide. To balance model complexity and data size, we adopt linear attention and state space architecture with a Recurrent module. To balance training efficiency and inference, we design a tailored multi-task learning module handling versatile tasks simultaneously, enhancing model ability via multiple clinical indicators in slide reading. Experimental results show PathRWKV outperforms nine recent state-of-the-art methods across 10 downstream tasks on 11 datasets with 17,292 WSIs, paving its way for efficient slide-level pathological inference. The project is open-sourced.


[123] 2503.09963

Reference-Free 3D Reconstruction of Brain Dissection Slabs via Learned Atlas Coordinates

Correlation of neuropathology with MRI has the potential to transfer microscopic signatures of pathology to in vivo scans. There is increasing interest in building these correlations from 3D reconstructed stacks of slab photographs, which are routinely taken during dissection at brain banks. These photographs bypass the need for ex vivo MRI, which is not widely accessible. However, existing methods either require a corresponding 3D reference (e.g., an ex vivo MRI scans, or a brain surface acquired with a structured light scanner) or a full stack of brain slabs, which severely limits applicability. Here we propose RefFree, a 3D reconstruction method for dissection photographs that does not require an external reference. RefFree coherently reconstructs a 3D volume for an arbitrary set of slabs (including a single slab) using predicted 3D coordinates in the standard atlas space (MNI) as guidance. To support RefFree's pipeline, we train an atlas coordinate prediction network that estimates the coordinate map from a 2D photograph, using synthetic photographs generated from digitally sliced 3D MRI data with randomized appearance for enhanced generalization. As a by-product, RefFree can propagate information (e.g., anatomical labels) from atlas space to one single photograph even without reconstruction. Experiments on simulated and real data show that, when all slabs are available, RefFree achieves performance comparable to existing classical methods but at substantially higher speed. Moreover, RefFree yields accurate reconstruction and registration for partial stacks or even a single slab. Our code is available at this https URL.


[124] 2503.22140

Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

Message passing algorithms have been tailored for compressive imaging applications by plugging in different types of off-the-shelf image denoisers. These off-the-shelf denoisers mostly rely on some generic or hand-crafted priors for denoising. Due to their insufficient accuracy in capturing the true image prior, these methods often fail to produce satisfactory results, especially in highly underdetermined scenarios. On the other hand, score-based generative modeling offers a promising way to accurately characterize the sophisticated image distribution. In this paper, by exploiting the close relation between score-based modeling and empirical Bayes-optimal denoising, we devise a message passing framework that integrates a score-based minimum mean squared error (MMSE) denoiser for compressive image recovery. Experiments on the FFHQ dataset demonstrate that our method strikes a significantly better performance-complexity tradeoff than conventional message passing, regularized linear regression, and score-based posterior sampling baselines. Remarkably, our method typically converges in fewer than 20 neural function evaluations (NFEs).


[125] 2504.05397

Physics-informed Modularized Neural Network for Advanced Building Control by Deep Reinforcement Learning

Physics-informed machine learning (PIML) provides a promising solution for building energy modeling and can serve as a virtual environment to enable reinforcement learning (RL) agents to interact and learn. However, challenges remain in efficiently integrating physics priors, evaluating the effectiveness of physics constraints, balancing model accuracy and physics consistency, and enabling real-world implementation. To address these gaps, this study introduces a Physics-Informed Modularized Neural Network (PI-ModNN), which incorporates physics priors through a physics-informed model structure, loss functions, and hard constraints. A new evaluation metric called "temperature response violation" is developed to quantify the physical consistency of data-driven building dynamic models under varying control inputs and training data sizes. Additionally, a physics prior evaluation framework based on rule importance is proposed to assess the contribution of each individual physics prior, offering guidance on selecting appropriate PIML techniques. Results indicate that incorporating physical priors does not always improve model performance; inappropriate priors may decrease model accuracy and consistency. However, hard constraints are effective in enforcing model consistency. Furthermore, we present a general workflow for developing control-oriented PIML models and integrating them with deep reinforcement learning (DRL). Following this framework, a case study implementing DRL in an office space over three months demonstrates potential energy savings of 31.4%. Finally, we provide a general guideline for integrating data-driven models with advanced building control through a four-step evaluation framework, paving the way for reliable and scalable deployment of advanced building controls.


[126] 2504.05958

Hybrid Control as a Proxy for Detection and Mitigation of Sensor Attacks in Cooperative Driving

We propose a real-time hybrid controller scheme to detect and mitigate False-Data Injection (FDI) attacks on Cooperative Adaptive Cruise Control (CACC). Our method uses sensor redundancy to create equivalent controller realizations, each driven by distinct sensor subsets but producing identical control inputs when no attack occurs. By comparing control signals and measurements via majority voting, the scheme identifies compromised sensors in real-time and switches to a healthy controller. The hybrid controller uses attack-dependent flow and jump sets, and resets compromised controllers' states. Simulation results demonstrate the effectiveness of this approach.


[127] 2504.07248

Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis

An increasing number of electric loads, such as hydrogen producers or data centers, can be characterized as carbon-sensitive, meaning that they are willing to adapt the timing and/or location of their electricity usage in order to minimize carbon footprints. However, the emission reduction efforts of these carbon-sensitive loads rely on carbon intensity information such as average carbon emissions, and it is unclear whether load shifting based on these signals effectively reduces carbon emissions. To address this open question, we design a carbon-aware equilibrium model, which expands the commonly used equilibrium model for standard (carbon-agnostic) electricity market clearing to include carbon-sensitive consumers that adapt their consumption based on average carbon emission signals and carbon costs. This analysis represents an idealized situation for carbon-sensitive consumers, where their carbon preferences are reflected directly in the market clearing, and contrasts with current practice, where carbon emission signals only become known to consumers a posteriori (i.e., after the market has already been cleared). Furthermore, we extend our model to consider temporal load shifting and time-varying maximum renewable generations. We employ illustrative three-bus examples and numerical simulations on the IEEE RTS-GMLC system to reveal the limitations of the widely adopted average carbon emission signal for guiding carbon emission reduction. Our model offers a novel perspective for evaluating the effectiveness of different carbon signals and contributes to new carbon signal design.


[128] 2504.16222

Nash Equilibrium Learning In Large Populations With First-Order Payoff Modifications

We establish Nash equilibrium learning in large populations of noncooperative, strategic agents. Our analysis considers the broadest class to date of payoff mechanisms with first-order modifications, capable of modeling bounded rationality and anticipatory effects, averaging, or Padé delay approximations. We propose a framework that, for the first time, combines two nonstandard system-theoretic passivity notions. Our results hold for discontinuous best response dynamics alongside continuous learning rules, significantly extending prior work.


[129] 2505.06256

SpectrumFM: A Foundation Model for Intelligent Spectrum Management

Intelligent spectrum management is crucial for improving spectrum efficiency and achieving secure utilization of spectrum resources. However, existing intelligent spectrum management methods, typically based on small-scale models, suffer from notable limitations in recognition accuracy, convergence speed, and generalization, particularly in the complex and dynamic spectrum environments. To address these challenges, this paper proposes a novel spectrum foundation model, termed SpectrumFM, establishing a new paradigm for spectrum management. SpectrumFM features an innovative encoder architecture that synergistically exploits the convolutional neural networks and the multi-head self-attention mechanisms to enhance feature extraction and enable robust representation learning. The model is pre-trained via two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, which leverage large-scale in-phase and quadrature (IQ) data to achieve comprehensive and transferable spectrum representations. Furthermore, a parameter-efficient fine-tuning strategy is proposed to enable SpectrumFM to adapt to various downstream spectrum management tasks, including automatic modulation classification (AMC), wireless technology classification (WTC), spectrum sensing (SS), and anomaly detection (AD). Extensive experiments demonstrate that SpectrumFM achieves superior performance in terms of accuracy, robustness, adaptability, few-shot learning efficiency, and convergence speed, consistently outperforming conventional methods across multiple benchmarks. Specifically, SpectrumFM improves AMC accuracy by up to 12.1% and WTC accuracy by 9.3%, achieves an area under the curve (AUC) of 0.97 in SS at -4 dB signal-to-noise ratio (SNR), and enhances AD performance by over 10%.


[130] 2507.21511

Two-Dimensional Nonseparable Fractional Fourier Transform: Theory and Application

The one-dimensional (1D) fractional Fourier transform (FRFT) generalizes the 1D Fourier transform, offering significant advantages in time-frequency analysis of non-stationary signals. To extend the benefits of the 1D FRFT to higher-dimensional signals, 2D FRFTs have been developed. However, existing 2D FRFTs, including the 2D separable FRFT (SFRFT), gyrator transform (GT), and coupled FRFT (CFRFT), face limitations such as lack of theoretical uniformity, inability to process nonseparable terms in 2D non-stationary signals, and failure to maintain a consistent 4D rotational relationship with the 2D Wigner distribution (WD). These shortcomings limit their performance in practical applications like radar, communications, and optical imaging. To overcome these challenges, we propose a more general 2D nonseparable FRFT (NSFRFT) with four degrees of freedom. The 2D NSFRFT incorporates the 2D SFRFT, GT, and CFRFT as special cases and maintains a generalized 4D rotational relationship with the 2D WD, ensuring geometric consistency in time-frequency analysis. We derive its essential properties and develop three discrete algorithms, including two fast algorithms with computational complexity $O(N^2 \log N)$. Numerical simulations and experiments validate the superior performance of the proposed NSFRFT in applications such as image encryption, decryption, filtering, and denoising.


[131] 2508.03584

Decoding and Engineering the Phytobiome Communication for Smart Agriculture

Smart agriculture applications, integrating technologies like the Internet of Things and machine learning/artificial intelligence (ML/AI) into agriculture, hold promise to address modern challenges of rising food demand, environmental pollution, and water scarcity. Alongside the concept of the phytobiome, which defines the area including the plant, its environment, and associated organisms, and the recent emergence of molecular communication (MC), there exists an important opportunity to advance agricultural science and practice using communication theory. In this article, we motivate to use the communication engineering perspective for developing a holistic understanding of the phytobiome communication and bridge the gap between the phytobiome communication and smart agriculture. Firstly, an overview of phytobiome communication via molecular and electrophysiological signals is presented and a multi-scale framework modeling the phytobiome as a communication network is conceptualized. Then, how this framework is used to model electrophysiological signals is demonstrated with plant experiments. Furthermore, possible smart agriculture applications, such as smart irrigation and targeted delivery of agrochemicals, through engineering the phytobiome communication are proposed. These applications merge ML/AI methods with the Internet of Bio-Nano-Things enabled by MC and pave the way towards more efficient, sustainable, and eco-friendly agricultural production. Finally, the implementation challenges, open research issues, and industrial outlook for these applications are discussed.


[132] 2508.11457

Importance-Aware Robust Semantic Transmission for LEO Satellite-Ground Communication

Satellite-ground semantic communication is anticipated to serve a critical role in the forthcoming 6G era. Nonetheless, task-oriented data transmission in such systems remains a formidable challenge, primarily due to the dynamic nature of signal-to-noise ratio (SNR) fluctuations and the stringent bandwidth limitations inherent to low Earth orbit (LEO) satellite channels. In response to these constraints, we propose an importance-aware robust semantic transmission (IRST) framework, specifically designed for scenarios characterized by bandwidth scarcity and channel variability. The IRST scheme begins by applying a segmentation model enhancement algorithm to improve the granularity and accuracy of semantic segmentation. Subsequently, a task-driven semantic selection method is employed to prioritize the transmission of semantically vital content based on real-time channel state information. Furthermore, the framework incorporates a stack-based, SNR-aware channel codec capable of executing adaptive channel coding in alignment with SNR variations. Comparative evaluations across diverse operating conditions demonstrate the superior performance and resilience of the IRST model relative to existing benchmarks.


[133] 2508.13728

BioGAP-Ultra: A Modular Edge-AI Platform for Wearable Multimodal Biosignal Acquisition and Processing

The growing demand for continuous physiological monitoring and human-machine interaction in real-world settings calls for wearable platforms that are flexible, low-power, and capable of on-device intelligence. This work presents BioGAP-Ultra, an advanced multimodal biosensing platform that supports synchronized acquisition of diverse electrophysiological and hemodynamic signals such as EEG, EMG, ECG, and PPG while enabling embedded AI processing at state-of-the-art energy efficiency. BioGAP-Ultra is a major extension of our previous BioGAP design aimed at meeting the rapidly growing requirements of wearable biosensing applications. It features (i) increased on-device storage (x2 SRAM, x4 FLASH), (ii) improved wireless connectivity (supporting up to 1.4 Mbit/s bandwidth, x4 higher than BioGAP), (iii) enhanced number of signal modalities (from 3 to 5) and analog input channels (x2). Further, it is accompanied by a real-time visualization and analysis software suite that supports the hardware design, providing access to raw data and real-time configurability on a mobile phone. Finally, we demonstrate the system's versatility through integration into various wearable form factors: an EEG-PPG headband consuming 32.8 mW, an EMG sleeve at 26.7 mW, and an ECG-PPG chestband requiring only 9.3 mW for continuous acquisition and streaming, tailored for diverse biosignal applications. To showcase its edge-AI capabilities, we further deploy two representative on-device applications: (1) ECG-PPG-based PAT estimation at 8.6 mW, and (2) EMG-ACC-based classification of reach-and-grasp motion phases, achieving 79.9 % $\pm$ 5.7 % accuracy at 23.6 mW. All hardware and software design files are also released open-source with a permissive license.


[134] 2508.14681

Virtual Multiplex Staining for Histological Images using a Marker-wise Conditioned Diffusion Model

Multiplex imaging is revolutionizing pathology by enabling the simultaneous visualization of multiple biomarkers within tissue samples, providing molecular-level insights that traditional hematoxylin and eosin (H&E) staining cannot provide. However, the complexity and cost of multiplex data acquisition have hindered its widespread adoption. Additionally, most existing large repositories of H&E images lack corresponding multiplex images, limiting opportunities for multimodal analysis. To address these challenges, we leverage recent advances in latent diffusion models (LDMs), which excel at modeling complex data distributions by utilizing their powerful priors for fine-tuning to a target domain. In this paper, we introduce a novel framework for virtual multiplex staining that utilizes pretrained LDM parameters to generate multiplex images from H&E images using a conditional diffusion model. Our approach enables marker-by-marker generation by conditioning the diffusion model on each marker, while sharing the same architecture across all markers. To tackle the challenge of varying pixel value distributions across different marker stains and to improve inference speed, we fine-tune the model for single-step sampling, enhancing both color contrast fidelity and inference efficiency through pixel-level loss functions. We validate our framework on two publicly available datasets, notably demonstrating its effectiveness in generating up to 18 different marker types with improved accuracy, a substantial increase over the 2-3 marker types achieved in previous approaches. This validation highlights the potential of our framework, pioneering virtual multiplex staining. Finally, this paper bridges the gap between H&E and multiplex imaging, potentially enabling retrospective studies and large-scale analyses of existing H&E image repositories.


[135] 2509.11584

Model Predictive Control with High-Probability Safety Guarantee for Nonlinear Stochastic Systems

We present a model predictive control (MPC) framework for nonlinear stochastic systems that ensures safety guarantee with high probability. Unlike most existing stochastic MPC schemes, our method adopts a set-erosion that converts the probabilistic safety constraint into a tractable deterministic safety constraint on a smaller safe set over deterministic dynamics. As a result, our method is compatible with any off-the-shelf deterministic MPC algorithm. The key to the effectiveness of our method is a tight bound on the stochastic fluctuation of a stochastic trajectory around its nominal version. Our method is scalable and can guarantee safety with high probability level (e.g., 99.99%), making it particularly suitable for safety-critical applications involving complex nonlinear dynamics. Rigorous analysis is conducted to establish a theoretical safety guarantee, and numerical experiments are provided to validate the effectiveness of the proposed MPC method.


[136] 2509.14062

Distributed Deep Learning with RIS Grouping for Accurate Cascaded Channel Estimation

Reconfigurable Intelligent Surface (RIS) panels are envisioned as a key technology for sixth-generation (6G) wireless networks, providing a cost-effective means to enhance coverage and spectral efficiency. A critical challenge is the estimation of the cascaded base station (BS)-RIS-user channel, since the passive nature of RIS elements prevents direct channel acquisition, incurring prohibitive pilot overhead, computational complexity, and energy consumption. To address this, we propose a deep learning (DL)-based channel estimation framework that reduces pilot overhead by grouping RIS elements and reconstructing the cascaded channel from partial pilot observations. Furthermore, conventional DL models trained under single-user settings suffer from poor generalization across new user locations and propagation scenarios. We develop a distributed machine learning (DML) strategy in which the BS and users collaboratively train a shared neural network using diverse channel datasets collected across the network, thereby achieving robust generalization. Building on this foundation, we design a hierarchical DML neural architecture that first classifies propagation conditions and then employs scenario-specific feature extraction to further improve estimation accuracy. Simulation results confirm that the proposed framework substantially reduces pilot overhead and complexity while outperforming conventional methods and single-user models in channel estimation accuracy. These results demonstrate the practicality and effectiveness of the proposed approach for 6G RIS-assisted systems.


[137] 2509.21531

Patch-Based Diffusion for Data-Efficient, Radiologist-Preferred MRI Reconstruction

Magnetic resonance imaging (MRI) requires long acquisition times, raising costs, reducing accessibility, and making scans more susceptible to motion artifacts. Diffusion probabilistic models that learn data-driven priors can potentially assist in reducing acquisition time. However, they typically require large training datasets that can be prohibitively expensive to collect. Patch-based diffusion models have shown promise in learning effective data-driven priors over small real-valued datasets, but have not yet demonstrated clinical value in MRI. We extend the Patch-based Diffusion Inverse Solver (PaDIS) to complex-valued, multi-coil MRI reconstruction, and compare it against a state-of-the-art whole-image diffusion baseline (FastMRI-EDM) for 7x undersampled MRI reconstruction on the FastMRI brain dataset. We show that PaDIS-MRI models trained on small datasets of as few as 25 k-space images outperform FastMRI-EDM on image quality metrics (PSNR, SSIM, NRMSE), pixel-level uncertainty, cross-contrast generalization, and robustness to severe k-space undersampling. In a blinded study with three radiologists, PaDIS-MRI reconstructions were chosen as diagnostically superior in 91.7% of cases, compared to baselines (i) FastMRI-EDM and (ii) classical convex reconstruction with wavelet sparsity. These findings highlight the potential of patch-based diffusion priors for high-fidelity MRI reconstruction in data-scarce clinical settings where diagnostic confidence matters.


[138] 2509.23687

Joint Hybrid Beamforming and Artificial Noise Design for Secure Multi-UAV ISAC Networks

Integrated sensing and communication (ISAC) emerges as a key enabler for next-generation applications such as smart cities and autonomous systems. Its integration with unmanned aerial vehicles (UAVs) unlocks new potentials for reliable communication and precise sensing in dynamic aerial environments. However, existing research predominantly treats UAVs as aerial base stations, overlooking their role as ISAC users, and fails to leverage large-scale antenna arrays at terrestrial base stations to enhance security and spectral efficiency. This paper propose a secure and spectral efficient ISAC framework for multi-UAV networks, and a two-stage optimization approach is developed to jointly design hybrid beamforming (HBF), artificial noise (AN) injection, and UAV trajectories. Aiming at maximizing the sum secrecy rate, the first stage employs Proximal Policy Optimization (PPO) to optimize digital beamformers and trajectories, and the second stage decomposes the digital solution into analog and digital components via low-complexity matrix factorization. Simulation results demonstrate the effectiveness of the proposed framework compared to benchmark schemes.


[139] 2510.16841

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Speech codecs that convert continuous speech signals into discrete tokens have become essential for speech language models. However, existing codecs struggle to balance high-quality reconstruction with semantically rich representations, limiting their effectiveness in both generative and understanding tasks. In this work, we propose SAC, a neural speech codec with semantic-acoustic dual-stream quantization. By disentangling semantic and acoustic modeling into two dedicated streams, SAC enables each to be optimized for its respective role. Comprehensive evaluations show that SAC achieves strong reconstruction performance across diverse bitrates under both clean and noisy conditions, with particularly high scores on UTMOS and WER, indicating superior naturalness and intelligibility. Moreover, SAC substantially surpasses prior codecs in semantic representation, approaching the level of continuous self-supervised embeddings. When used as a tokenizer for LLM-based text-to-speech, SAC enables a single-stage autoregressive (AR) TTS model that clearly outperforms state-of-the-art AR systems. Our disentanglement analysis further validates the effectiveness of the dual-stream design, offering new potential for controllable speech generation.


[140] 2510.17775

Sample Complexity Analysis of Multi-Target Detection via Markovian and Hard-Core Multi-Reference Alignment

Motivated by single-particle cryo-electron microscopy, we study the sample complexity of the multi-target detection (MTD) problem, in which an unknown signal appears multiple times at unknown locations within a long, noisy observation. We propose a patching scheme that reduces MTD to a non-i.i.d. multi-reference alignment (MRA) model. In the one-dimensional setting, the latent group elements form a Markov chain, and we show that the convergence rate of any estimator matches that of the corresponding i.i.d. MRA model, up to a logarithmic factor in the number of patches. Moreover, for estimators based on empirical averaging, such as the method of moments, the convergence rates are identical in both settings. We further establish an analogous result in two dimensions, where the latent structure arises from an exponentially mixing random field generated by a hard-core placement model. As a consequence, if the signal in the corresponding i.i.d. MRA model is determined by moments up to order $n_{\min}$, then in the low-SNR regime the number of patches required to estimate the signal in the MTD model scales as $\sigma^{2n_{\min}}$, where $\sigma^2$ denotes the noise variance.


[141] 2511.00623

Adaptive Federated Learning to Optimize Integrated Flows in Cyber-Physical Data Centers

Data centers play an increasingly critical role in societal digitalization, yet their rapidly growing energy demand poses significant challenges for sustainable operation. To enhance the energy efficiency of geographically distributed data centers, this paper formulates a multi-period optimization model that captures the interdependence of electricity, heat, and data flows. The optimization of such integrated multi-domain flows inherently involves mixed-integer formulations and the access to proprietary or sensitive datasets, which correspondingly exacerbate computational complexity and raise data-privacy concerns. To address these challenges, an adaptive federated learning-to-optimization approach is proposed, accounting for the heterogeneity of datasets across distributed data centers. To safeguard privacy, cryptography techniques are leveraged in both the learning and optimization processes. A model acceptance criterion with convergence guarantee is developed to improve learning performance and filter out potentially contaminated data, while a verifiable double aggregation mechanism is further proposed to simultaneously ensure privacy and integrity of shared data during optimization. Theoretical analysis and numerical simulations demonstrate that the proposed approach preserves the privacy and integrity of shared data, achieves near-optimal performance, and exhibits high computational efficiency, making it suitable for large-scale data center optimization under privacy constraints.


[142] 2511.01022

On Structural Properties of Risk-Averse Optimal Stopping Problems

We establish structural properties of optimal stopping problems under time-consistent dynamic (coherent) risk measures, focusing on value function monotonicity and the existence of control limit (threshold) optimal policies. While such results are well developed for risk-neutral (expected-value) models, they remain underexplored in risk-averse settings. Coherent risk measures typically lack the tower property and are subadditive rather than additive, complicating structural analysis. We show that value function monotonicity mirrors the risk-neutral case. Moreover, if the risk envelope associated with each coherent risk measure admits a minimal element, the risk-averse optimal stopping problem reduces to an equivalent risk-neutral formulation. We also develop a general procedure for identifying control limit optimal policies and use it to derive practical, verifiable conditions on the risk measures and MDP structure that guarantee their existence. We illustrate the theory and verify these conditions through optimal stopping problems arising in operations, marketing, and finance.


[143] 2511.08041

Spacecraft Angular Rate Estimation via Event-Based Camera Sensing

This paper presents a method for determining spacecraft angular rates using event-based camera sensing. This is achieved by analyzing the temporal distribution of brightness events triggered by the apparent motion of stars. The location and polarity of the events are used to infer the apparent motion field of the stars, which is, in turn, employed to estimate the observer angular velocity in the camera frame. This can be converted to the spacecraft angular rates provided an attitude reference. The method is validated through numerical simulation for a synthetic dataset of event streams generated on random spacecraft pointing and rates conditions. The accuracy of the method is assessed, demonstrating its potential to complement or replace conventional rate sensors in spacecraft systems using event camera sensing.


[144] 2511.14896

A $μ$-Analysis and Synthesis Framework for Partial Integral Equations using IQCs

We develop a $\mu$-analysis and synthesis framework for infinite-dimensional systems that leverages the Integral Quadratic Constraints (IQCs) to compute the structured singular value's upper bound. The methodology formulates robust stability and performance conditions jointly as Linear Partial Integral Inequalities within the Partial Integral Equation framework, establishing connections between IQC multipliers and $\mu$-theory. Computational implementation via PIETOOLS enables computational tools that practically applicable to spatially distributed infinite dimensional systems. Illustrations with the help of Partial and Delay Differential Equations validate the effectiveness of the framework, showing a significant reduction in conservatism compared to unstructured methods and providing systematic tools for stability-performance trade-off analysis.


[145] 2511.15497

A Review of Machine Learning for Cavitation Intensity Recognition in Complex Industrial Systems

Cavitation intensity recognition (CIR) is a critical technology for detecting and evaluating cavitation phenomena in hydraulic machinery, with significant implications for operational safety, performance optimization, and maintenance cost reduction in complex industrial systems. Despite substantial research progress, a comprehensive review that systematically traces the development trajectory and provides explicit guidance for future research is still lacking. To bridge this gap, this paper presents a thorough review and analysis of hundreds of publications on intelligent CIR across various types of mechanical equipment from 2002 to 2025, summarizing its technological evolution and offering insights for future development. The early stages are dominated by traditional machine learning approaches that relied on manually engineered features under the guidance of domain expert knowledge. The advent of deep learning has driven the development of end-to-end models capable of automatically extracting features from multi-source signals, thereby significantly improving recognition performance and robustness. Recently, physical informed diagnostic models have been proposed to embed domain knowledge into deep learning models, which can enhance interpretability and cross-condition generalization. In the future, transfer learning, multi-modal fusion, lightweight network architectures, and the deployment of industrial agents are expected to propel CIR technology into a new stage, addressing challenges in multi-source data acquisition, standardized evaluation, and industrial implementation. The paper aims to systematically outline the evolution of CIR technology and highlight the emerging trend of integrating deep learning with physical knowledge. This provides a significant reference for researchers and practitioners in the field of intelligent cavitation diagnosis in complex industrial systems.


[146] 2511.15925

Cyber-Resilient Data-Driven Event-Triggered Secure Control for Autonomous Vehicles Under False Data Injection Attacks

This paper proposes a cyber-resilient secure control framework for autonomous vehicles (AVs) subject to false data injection (FDI) threats as actuator attacks. The framework integrates data-driven modeling, event-triggered communication, and fractional-order sliding mode control (FSMC) to enhance the resilience against adversarial interventions. A dynamic model decomposition (DMD)-based methodology is employed to extract the lateral dynamics from real-world data, eliminating the reliance on conventional mechanistic modeling. To optimize communication efficiency, an event-triggered transmission scheme is designed to reduce the redundant transmissions while ensuring system stability. Furthermore, an extended state observer (ESO) is developed for real-time estimation and mitigation of actuator attack effects. Theoretical stability analysis, conducted using Lyapunov methods and linear matrix inequality (LMI) formulations, guarantees exponential error convergence. Extensive simulations validate the proposed event-triggered secure control framework, demonstrating substantial improvements in attack mitigation, communication efficiency, and lateral tracking performance. The results show that the framework effectively counteracts actuator attacks while optimizing communication-resource utilization, making it highly suitable for safety-critical AV applications.


[147] 2511.23251

Deep Learning for Restoring MPI System Matrices Using Simulated Training Data

Magnetic particle imaging reconstructs tracer distributions using a system matrix obtained through time-consuming, noise-prone calibration measurements. Methods for addressing imperfections in measured system matrices increasingly rely on deep neural networks, yet curated training data remain scarce. This study evaluates whether physics-based simulated system matrices can be used to train deep learning models for different system matrix restoration tasks, i.e., denoising, accelerated calibration, upsampling, and inpainting, that generalize to measured data. A large system matrices dataset was generated using an equilibrium magnetization model extended with uniaxial anisotropy. The dataset spans particle, scanner, and calibration parameters for 2D and 3D trajectories, and includes background noise injected from empty-frame measurements. For each restoration task, deep learning models were compared with classical non-learning baseline methods. The models trained solely on simulated system matrices generalized to measured data across all tasks: for denoising, DnCNN/RDN/SwinIR outperformed DCT-F baseline by >10 dB PSNR and up to 0.1 SSIM on simulations and led to perceptually better reconstuctions of real data; for 2D upsampling, SMRnet exceeded bicubic by 20 dB PSNR and 0.08 SSIM at $\times 2$-$\times 4$ which did not transfer qualitatively to real measurements. For 3D accelerated calibration, SMRnet matched tricubic in noiseless cases and was more robust under noise, and for 3D inpainting, biharmonic inpainting was superior when noise-free but degraded with noise, while a PConvUNet maintained quality and yielded less blurry reconstructions. The demonstrated transferability of deep learning models trained on simulations to real measurements mitigates the data-scarcity problem and enables the development of new methods beyond current measurement capabilities.


[148] 2512.02679

Off-grid solar energy storage system with lithium iron phosphate (LFP) batteries in high mountains: a case report of Tianchi Lodge in Taiwan

Mountain huts are buildings located at high altitude, providing shelter and a place for hikers. Energy supply on mountain huts remains an open issue. Using renewable energies could be an appropriate solution. Tianchi Lodge, a famous mountain hut in Taiwan, has operated an off-grid solar energy storage system with lithium iron phosphate (LFP) batteries since 2020. In this case report, the energy architecture, detailed descriptions, and historical status of the system are provided.


[149] 2512.04676

A Unified Low-rank ADI Framework with Shared Linear Solves for Simultaneously Solving Multiple Lyapunov, Sylvester, and Riccati Equations

The alternating direction implicit (ADI) methods are computationally efficient and numerically effective tools for computing low-rank solutions of large-scale linear matrix equations. It is known in the literature that the low-rank ADI method for Lyapunov equations is a Petrov-Galerkin projection algorithm that implicitly performs model order reduction. It recursively enforces interpolation at the mirror images of the ADI shifts and places the poles of the reduced-order models at the ADI shifts. In this paper, we show that the low-rank ADI methods for Sylvester and Riccati equations are also Petrov-Galerkin projection algorithms that implicitly perform model order reduction. These methods likewise enforce interpolation at the mirror images of the ADI shifts; however, they do not place the poles at the mirror images of the interpolation points. Instead, their pole placement ensures that the projected Sylvester and Riccati equations they implicitly solve admit a unique solution. By observing that the ADI methods for Lyapunov, Sylvester, and Riccati equations differ only in pole placement and not in their interpolatory nature, we show that the shifted linear solves-which constitute the bulk of the computational cost-can be shared. The pole-placement step involves only small-scale operations and is therefore inexpensive. We propose a unified ADI framework that requires only two shifted linear solves per iteration to simultaneously solve six Lyapunov equations, one Sylvester equation, and ten Riccati equations, thus substantially increasing the return on investment for the computational cost spent on the linear solves. All operations needed to extract the individual solutions from these shared linear solves are small-scale and inexpensive. Three numerical examples are presented to demonstrate the effectiveness of the proposed ADI framework.


[150] 2512.08469

Aliasing in Near-Field Array Ambiguity Functions: a Spatial Frequency-Domain Framework

Next-generation communication and localization systems increasingly rely on extremely large-scale arrays (XL-arrays), which promise unprecedented spatial resolution and new functionalities. These gains arise from their inherent operation in the near field (NF) regime, where the spherical nature of the wavefront can no longer be ignored; consequently, characterizing the ambiguity function -- which amounts to the matched beam pattern -- is considerably more challenging. Implementing very wide apertures with half-wavelength element spacing is costly and complex. This motivates thinning the array (removing elements), which introduces intricate aliasing structures, i.e., grating lobes. Whereas prior work has addressed this challenge using approximations tailored to specific array geometries, this paper develops a general framework that reveals the fundamental origins and geometric behavior of grating lobes in near-field ambiguity functions. Using a local spatial-frequency analysis of steering signals, we derive a systematic methodology to model NF grating lobes as aliasing artifacts, quantifying their structure on the AF, and providing design guidelines for XL-arrays that operate within aliasing-safe regions. We further connect our framework to established far-field principles. Finally, we demonstrate the practical value of the approach by deriving closed-form expressions for aliasing-free regions in canonical uniform linear arrays and uniform circular arrays.


[151] 2512.10657

Estimating Hormone Concentrations in the Pituitary-Thyroid Feedback Loop from Irregularly Sampled Measurements

Model-based control techniques have recently been investigated for the recommendation of medication dosages to address thyroid diseases. These techniques often rely on knowledge of internal hormone concentrations that cannot be measured from blood samples. Moreover, the measurable concentrations are typically only obtainable at irregular sampling times. In this work, we empirically verify a notion of sample-based detectability that accounts for irregular sampling of the measurable concentrations on two pituitary-thyroid loop models representing patients with hypo- and hyperthyroidism, respectively, and include the internal concentrations as states. We then implement sample-based moving horizon estimation for the models, and test its performance on virtual patients across a range of sampling schemes. Our study shows robust stability of the estimator across all scenarios, and that more frequent sampling leads to less estimation error in the presence of model uncertainty and misreported dosages.


[152] 2512.11468

An Input-Output Data-Driven Dissipativity Approach for Compositional Stability Certification of Interconnected LTI MIMO Systems

We propose an input-output data-driven framework for certifying the stability of interconnected multiple-input-multiple-output linear time-invariant discrete-time systems via QSR-dissipativity. That is, by using measured input-output trajectories of each subsystem, we verify dissipative properties and extract local passivity indices without requiring an explicit model identification. These passivity indices are then used to derive conditions under which the equilibrium of the interconnected system is stable. In particular, the framework identifies how the lack of passivity in some subsystems can be compensated by surpluses in others. The proposed approach enables a compositional stability analysis by combining subsystem-level conditions into a criterion valid for the overall interconnected system. We illustrate via a numerical case study, how to compute channel-wise passivity indices and infer stability guarantees directly from data with the proposed method.


[153] 2401.16515

Neuromorphic Photonic Computing with an Electro-Optic Analog Memory

In neuromorphic photonic systems, device operations are typically governed by analog signals, necessitating digital-to-analog converters (DAC) and analog-to-digital converters (ADC). However, data movement between memory and these converters in conventional von Neumann architectures incur significant energy costs. We propose an analog electronic memory co-located with photonic computing units to eliminate repeated long-distance data movement. Here, we demonstrate a monolithically integrated neuromorphic photonic circuit with on-chip capacitive analog memory and evaluate its performance in machine learning for in situ training and inference using the MNIST dataset. Our analysis shows that integrating analog memory into a neuromorphic photonic architecture can achieve over 26x power savings compared to conventional SRAM-DAC architectures. Furthermore, maintaining a minimum analog memory retention-to-network-latency ratio of 100 maintains >90% inference accuracy, enabling leaky analog memories without substantial performance degradation. This approach reduces reliance on DACs, minimizes data movement, and offers a scalable pathway toward energy-efficient, high-speed neuromorphic photonic computing.


[154] 2405.20336

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rapping vocals, lyrics, and high-quality 3D holistic body meshes. With the RapVerse dataset, we investigate the extent to which scaling autoregressive multimodal transformers across language, audio, and motion can enhance the coherent and realistic generation of vocals and whole-body human motions. For modality unification, a vector-quantized variational autoencoder is employed to encode whole-body motion sequences into discrete motion tokens, while a vocal-to-unit model is leveraged to obtain quantized audio tokens preserving content, prosodic information and singer identity. By jointly performing transformer modeling on these three modalities in a unified way, our framework ensures a seamless and realistic blend of vocals and human motions. Extensive experiments demonstrate that our unified generation framework not only produces coherent and realistic singing vocals alongside human motions directly from textual inputs, but also rivals the performance of specialized single-modality generation systems, establishing new benchmarks for joint vocal-motion generation.


[155] 2406.04163

Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes

We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength, both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O(\tau)$, where $\tau$ is the regularization strength. We provide a lower bound that matches our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this gradient flow, which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropy-regularized natural policy gradient methods, the overall error decays exponentially in the square root of the number of iterations, improving over existing sublinear guarantees. Finally, we extend our analysis to settings beyond the entropy. In particular, we characterize the implicit bias regarding general convex potentials and their resulting generalized natural policy gradients.


[156] 2412.01530

Generative AI-based data augmentation for improved bioacoustic classification in noisy environments

Obtaining data to train robust artificial intelligence (AI)-based models for species classification can be challenging, particularly for rare species. Data augmentation can boost classification accuracy by increasing the diversity of training data and is cheaper to obtain than expert-labelled data. However, many classic image-based augmentation techniques are not suitable for audio spectrograms. We investigate two generative AI models as data augmentation tools to synthesise spectrograms and supplement audio data: Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs). The latter performed particularly well in terms of both realism of generated spectrograms and accuracy in a resulting classification task. Alongside these new approaches, we present a new audio data set of 640 hours of bird calls from wind farm sites in Ireland, approximately 800 samples of which have been labelled by experts. Wind farm data are particularly challenging for classification models given the background wind and turbine noise. Training an ensemble of classification models on real and synthetic data combined compared well with highly confident BirdNET predictions. Each classifier we used was improved by including synthetic data, and classification metrics generally improved in line with the amount of synthetic data added. Our approach can be used to augment acoustic signals for more species and other land-use types, and has the potential to bring about advances in our capacity to develop reliable AI-based detection of rare species. Our code is available at this https URL.


[157] 2412.04130

Deep priors for satellite image restoration with accurate uncertainties

Satellite optical images, upon their on-ground receipt, offer a distorted view of the observed scene. Their restoration, including denoising, deblurring, and sometimes super-resolution, is required before their exploitation. Moreover, quantifying the uncertainties related to this restoration helps to reduce the risks of misinterpreting the image content. Deep learning methods are now state-of-the-art for satellite image restoration. Among them, direct inversion methods train a specific network for each sensor, and generally provide a point estimation of the restored image without the associated uncertainties. Alternatively, deep regularization (DR) methods learn a deep prior on target images before plugging it, as the regularization term, into a model-based optimization scheme. This allows for restoring images from several sensors with a single network and possibly for estimating associated uncertainties. In this paper, we introduce VBLE-xz, a DR method that solves the inverse problem in the latent space of a variational compressive autoencoder (CAE). We adapt the regularization strength by modulating the bitrate of the trained CAE with a training-free approach. Then, VBLE-xz estimates relevant uncertainties jointly in the latent and in the image spaces by sampling an explicit posterior estimated within variational inference. This enables fast posterior sampling, unlike state-of-the-art DR methods that use Markov chains or diffusion-based approaches. We conduct a comprehensive set of experiments on very high-resolution simulated and real Pléiades images, asserting the performance, robustness and scalability of the proposed method. They demonstrate that VBLE-xz represents a compelling alternative to direct inversion methods when uncertainty quantification is required. The code associated to this paper is available in this https URL.


[158] 2412.04783

KNN-MMD: Cross Domain Wireless Sensing via Local Distribution Alignment

Wireless sensing has recently found widespread applications in diverse environments, including homes, offices, and public spaces. By analyzing patterns in channel state information (CSI), it is possible to infer human actions for tasks such as person identification, gesture recognition, and fall detection. However, CSI is highly sensitive to environmental changes, where even minor alterations can significantly distort the CSI patterns. This sensitivity often leads to performance degradation or outright failure when applying wireless sensing models trained in one environment to another. To address this challenge, Domain Alignment (DAL) has been widely adopted for cross-domain classification tasks, as it focuses on aligning the global distributions of the source and target domains in feature space. Despite its popularity, DAL often neglects inter-category relationships, which can lead to misalignment between categories across domains, even when global alignment is achieved. To overcome these limitations, we propose K-Nearest Neighbors Maximum Mean Discrepancy (KNN-MMD), a novel few-shot method for cross-domain wireless sensing. Our approach begins by constructing a help set using KNN from the target domain, enabling local alignment between the source and target domains within each category using MMD. Additionally, we address a key instability issue commonly observed in cross-domain methods, where model performance fluctuates sharply between epochs. Further, most existing methods struggle to determine an optimal stopping point during training due to the absence of labeled data from the target domain. Our method resolves this by excluding the support set from the target domain during training and employing it as a validation set to determine the stopping criterion. The dataset and code are publicly available at this https URL .


[159] 2412.18032

Major Space Weather Risks Identified via Coupled Physics-Engineering-Economic Modeling

Space weather poses an important but under-quantified threat to critical infrastructure, the economy, and society. While extreme geomagnetic storms are recognized as potential global catastrophes, their socio-economic impacts remain poorly quantified. Here we present a novel physics-engineering-economic framework that links geophysical drivers of geomagnetic storms to power grid geoelectric fields, transformer vulnerability, and macroeconomic consequences. Using the United States as an example, we estimate daily economic losses from transformer thermal heating of 1.37 billion USD (95 percent confidence interval: 1.16 to 1.58 billion USD) for a 100-year geomagnetic storm, with power outages affecting 4.1 million people and 101,000 businesses. A 250-year event could raise losses to 2.09 billion USD per day (95 percent confidence interval: 1.84 to 2.34 billion USD), disrupting power for more than 6 million people and 155,000 businesses. Crucially, the framework is scalable and transferable, offering a template for assessing space weather risk to critical infrastructure in other countries. This integrative approach provides the first end-to-end quantification of space weather socio-economic impacts, bridging space physics through to policy-relevant metrics. Our results demonstrate that coupled socio-economic modeling of space weather is both feasible and essential, enabling evidence-based decision making in infrastructure protection and global risk management.


[160] 2505.08752

Configurations, Tessellations and Tone Networks

The Eulerian tonnetz, which associates three minor chords to each major chord and three major chords to each minor chord, can be represented by a bipartite graph with twelve white vertices denoting major chords and twelve black vertices denoting minor chords. This so-called Levi graph determines a configuration of twelve points and twelve lines in $\mathbb R^2$ with the property that three points lie on each line and three lines pass through each point. Interesting features of the tonnetz, such as the existence of the four hexatonic hexacycles and the three octatonic octacycles, crucial for the understanding of nineteenth-century harmony and voice leading, can be read off rather directly as properties of this $\{12_3\}$ and its Levi graph. Analogous tone networks together with their associated Levi graphs and configurations can be constructed for pentatonic music and twelve-tone music, offering the promise of new methods of composition. When the constraints of the Eulerian tonnetz are relaxed so as to allow movements between major and minor triads with variations at exactly two tones, the resulting bipartite graph has two components, each of which generates a tessellation of the plane, of a type known to Kepler, based on hexagons, squares and dodecagons. When the same combinatorial idea is applied to tetrachords of the Tristan genus (dominant sevenths and minor sixths) the cycles of the resulting bipartite graph are sufficiently ample in girth to ensure the existence of a second geometrical configuration of type $\{12_3\}$, distinct from the Eulerian tonnetz as an incidence geometry, which can be used as the basis for a new approach to the analysis of the music of Chopin, Wagner, Tchaikovsky, Brahms and their contemporaries.


[161] 2506.05444

U-NetMN and SegNetMN: Modified U-Net and SegNet models for bimodal SAR image segmentation

Segmenting Synthetic Aperture Radar (SAR) images is crucial for many remote sensing applications, particularly water body detection. However, deep learning-based segmentation models often face challenges related to convergence speed and stability, mainly due to the complex statistical distribution of this type of data. In this study, we evaluate the impact of mode normalization on two widely used semantic segmentation models, U-Net and SegNet. Specifically, we integrate mode normalization, to reduce convergence time while maintaining the performance of the baseline models. Experimental results demonstrate that mode normalization significantly accelerates convergence. Furthermore, cross-validation results indicate that normalized models exhibit increased stability in different zones. These findings highlight the effectiveness of normalization in improving computational efficiency and generalization in SAR image segmentation.


[162] 2508.02657

RC-Gossip: Information Freshness in Clustered Networks with Rate-Changing Gossip

A clustered gossip network is considered in which a source updates its information over time, and end-nodes, organized in clusters through clusterheads, are keeping track of it. The goal for the nodes is to remain as fresh as possible, i.e., have the same information as the source, which we assess by the long-term average binary freshness metric. We introduce a smart mechanism of information dissemination which we coin rate-changing gossip (RC-Gossip). Its main idea is that gossiping is directed towards nodes that need it the most, and hence the rate of gossiping changes based on the number of fresh nodes in the network at a given time. While Stochastic Hybrid System (SHS) analysis has been the norm in studying freshness of gossip networks, we present an equivalent way to analyze freshness using a renewal-reward-based approach. Using that, we show that RC-gossip significantly increases freshness of nodes in different clustered networks, with optimal cluster sizes, compared to traditional gossiping techniques.


[163] 2508.08468

Audio-Visual Speech Enhancement: Architectural Design and Deployment Strategies

This paper introduces a new AI-based Audio-Visual Speech Enhancement (AVSE) system and presents a comparative performance analysis of different deployment architectures. The proposed AVSE system employs convolutional neural networks (CNNs) for spectral feature extraction and long short-term memory (LSTM) networks for temporal modeling, enabling robust speech enhancement through multimodal fusion of audio and visual cues. Multiple deployment scenarios are investigated, including cloud-based, edge-assisted, and standalone device implementations. Their performance is evaluated in terms of speech quality improvement, latency, and computational overhead. Real-world experiments are conducted across various network conditions, including Ethernet, Wi-Fi, 4G, and 5G, to analyze the trade-offs between processing delay, communication latency, and perceptual speech quality. The results show that while cloud deployment achieves the highest enhancement quality, edge-assisted architectures offer the best balance between latency and intelligibility, meeting real-time requirements under 5G and Wi-Fi 6 conditions. These findings provide practical guidelines for selecting and optimizing AVSE deployment architectures in diverse applications, including assistive hearing devices, telepresence, and industrial communications.


[164] 2508.11849

LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget.


[165] 2508.17756

SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling

Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and prohibitively high computational and memory costs. To this end, we introduce SUPERGEN, an efficient tile-based framework for ultra-high-resolution video generation. SUPERGEN features a novel training-free algorithmic innovation with tiling to successfully support a wide range of resolutions without additional training efforts while significantly reducing both memory footprint and computational complexity. Moreover, SUPERGEN incorporates a tile-tailored, adaptive, region-aware caching strategy that accelerates video generation by exploiting redundancy across denoising steps and spatial regions. SUPERGEN also integrates cache-guided, communication-minimized tile parallelism for enhanced throughput and minimized latency. Evaluations show that SUPERGEN maximizes performance gains while achieving high output quality across various benchmarks.


[166] 2509.03913

SwinSRGAN: Swin Transformer-based Generative Adversarial Network for High-Fidelity Speech Super-Resolution

Speech super-resolution (SR) reconstructs high-frequency content from low-resolution speech signals. Existing systems often suffer from representation mismatch in two-stage mel-vocoder pipelines and from over-smoothing of hallucinated high-band content by CNN-only generators. Diffusion and flow models are computationally expensive, and their robustness across domains and sampling rates remains limited. We propose SwinSRGAN, an end-to-end framework operating on Modified Discrete Cosine Transform (MDCT) magnitudes. It is a Swin Transformer-based U-Net that captures long-range spectro-temporal dependencies with a hybrid adversarial scheme combines time-domain MPD/MSD discriminators with a multi-band MDCT discriminator specialized for the high-frequency band. We employs a sparse-aware regularizer on arcsinh-compressed MDCT to better preserve transient components. The system upsamples inputs at various sampling rates to 48 kHz in a single pass and operates in real time. On standard benchmarks, SwinSRGAN reduces objective error and improves ABX preference scores. In zero-shot tests on HiFi-TTS without fine-tuning, it outperforms NVSR and mdctGAN, demonstrating strong generalization across datasets


[167] 2510.21944

Fixed Horizon Linear Quadratic Covariance Steering in Continuous Time with Hilbert-Schmidt Terminal Cost

We formulate and solve the fixed horizon linear quadratic covariance steering problem in continuous time with a terminal cost measured in Hilbert-Schmidt (i.e., Frobenius) norm error between the desired and the controlled terminal covariances. For this problem, the necessary conditions of optimality become a coupled matrix ODE two-point boundary value problem. To solve this system of equations, we design a matricial recursive algorithm and prove its convergence. The proposed algorithm and its analysis make use of the linear fractional transforms parameterized by the state transition matrix of the associated Hamiltonian matrix. To illustrate the results, we provide two numerical examples: one with a two dimensional and another with a six dimensional state space.


[168] 2511.09791

PANDA -- Patch And Distribution-Aware Augmentation for Long-Tailed Exemplar-Free Continual Learning

Exemplar-Free Continual Learning (EFCL) restricts the storage of previous task data and is highly susceptible to catastrophic forgetting. While pre-trained models (PTMs) are increasingly leveraged for EFCL, existing methods often overlook the inherent imbalance of real-world data distributions. We discovered that real-world data streams commonly exhibit dual-level imbalances, dataset-level distributions combined with extreme or reversed skews within individual tasks, creating both intra-task and inter-task disparities that hinder effective learning and generalization. To address these challenges, we propose PANDA, a Patch-and-Distribution-Aware Augmentation framework that integrates seamlessly with existing PTM-based EFCL methods. PANDA amplifies low-frequency classes by using a CLIP encoder to identify representative regions and transplanting those into frequent-class samples within each task. Furthermore, PANDA incorporates an adaptive balancing strategy that leverages prior task distributions to smooth inter-task imbalances, reducing the overall gap between average samples across tasks and enabling fairer learning with frozen PTMs. Extensive experiments and ablation studies demonstrate PANDA's capability to work with existing PTM-based CL methods, improving accuracy and reducing catastrophic forgetting.


[169] 2511.20734

Automated Histopathologic Assessment of Hirschsprung Disease Using a Multi-Stage Vision Transformer Framework

Hirschsprung Disease is characterized by the absence of ganglion cells in the myenteric plexus. Therefore, the correct identification of ganglion cells is crucial for diagnosing Hirschsprung disease. We introduce a three-stage analysis framework that mimics the pathologist's diagnostic approach. The framework, based on a Vision Transformer model (ViT-B/16), sequentially segments the muscularis propria, segments the myenteric plexus, and detects ganglion cells within anatomically valid regions. 30 whole-slide images of colon tissue were used, each containing manual annotations of muscularis, plexus, and ganglion cells. A 5-fold cross-validation scheme was applied to each stage, along with resolution-specific tiling strategies and tailored postprocessing to ensure anatomical consistency. The proposed method achieved a Dice coefficient of 89.9% and a Plexus Inclusion Rate of 100% for muscularis segmentation. Plexus segmentation reached a recall of 94.8%, a precision of 84.2% and a Ganglia Inclusion Rate of 99.7%. For ganglion cells annotated with high certainty, the model achieved 62.1\% precision and 89.1% recall. When considering all annotated ganglion cells, regardless of certainty level, the overall precision was 67.0%. These results indicate that ViT-based models are effective at leveraging global tissue context and capturing cellular morphology at small scales, even within complex histological tissue structures. This multi-stage methodology has great potential to support digital pathology workflows by reducing inter-observer variability and assisting in the evaluation of Hirschsprung disease. The clinical impact will be evaluated in future work with larger multi-center datasets and additional expert annotations.


[170] 2511.20853

MODEST: Multi-Optics Depth-of-Field Stereo Dataset

Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472$\times$3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.


[171] 2512.03434

Quantum Encrypted Control of Networked Systems

Encrypted control has been extensively studied to ensure the confidentiality of system states and control inputs for networked control systems. This paper presents a computationally efficient encrypted control framework for networked systems enabled by quantum communication. A quantum channel between sensors and actuators is used to generate identical secret keys, whose security is further enhanced through quantum key distribution. These keys enable lightweight encryption and decryption while preserving confidentiality and control accuracy. We develop a novel encryption-decryption architecture for state-feedback control of linear systems based on quantum keys, and characterize the impact of quantum state errors on closed-loop stability. In particular, we establish the existence of a critical threshold on intrinsic quantum noise below which stability is guaranteed. In contrast to classical encrypted control schemes, which may collapse under a single key-bit error, the proposed quantum encrypted control exhibits strong robustness to key imperfections. We further adopt quantization techniques to address the scenarios with limited communication bits in practical situations, and implement privacy protection for quantum keys based on a stochastic quantizer. These results demonstrate that integrating quantum technologies into control systems in a nontrivial and principled manner, even at their current level of maturity, can yield substantial performance gains in reducing computational complexity and improving resilience to key errors while ensuring security against multiple eavesdropping sources.