New articles on Electrical Engineering and Systems Science


[1] 2601.03281

$α^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks

Large Language Models (LLMs) are increasingly used as high level controllers for autonomous Unmanned Aerial Vehicle (UAV) missions. However, existing evaluations rarely assess whether such agents remain safe, protocol compliant, and effective under realistic next generation networking constraints. This paper introduces $\alpha^3$-Bench, a benchmark for evaluating LLM driven UAV autonomy as a multi turn conversational reasoning and control problem operating under dynamic 6G conditions. Each mission is formulated as a language mediated control loop between an LLM based UAV agent and a human operator, where decisions must satisfy strict schema validity, mission policies, speaker alternation, and safety constraints while adapting to fluctuating network slices, latency, jitter, packet loss, throughput, and edge load variations. To reflect modern agentic workflows, $\alpha^3$-Bench integrates a dual action layer supporting both tool calls and agent to agent coordination, enabling evaluation of tool use consistency and multi agent interactions. We construct a large scale corpus of 113k conversational UAV episodes grounded in UAVBench scenarios and evaluate 17 state of the art LLMs using a fixed subset of 50 episodes per scenario under deterministic decoding. We propose a composite $\alpha^3$ metric that unifies six pillars: Task Outcome, Safety Policy, Tool Consistency, Interaction Quality, Network Robustness, and Communication Cost, with efficiency normalized scores per second and per thousand tokens. Results show that while several models achieve high mission success and safety compliance, robustness and efficiency vary significantly under degraded 6G conditions, highlighting the need for network aware and resource efficient LLM based UAV agents. The dataset is publicly available on GitHub : this https URL


[2] 2601.03282

Battery-time-space fragment-based formulation for the Electric Autonomous Dial-a-Ride Problem

The Electric Autonomous Dial-A-Ride Problem (E-ADARP) optimizes routing and scheduling for electric autonomous vehicles to transport customers from origins to destinations. It features a combined objective that minimizes travel cost and excess user ride time, and allows partial recharging. Motivated by practical scenarios where time and battery data are available with limited precision, we introduce a discrete variant of the problem, termed D-E-ADARP, in which all time and battery parameters are discretized. This enables the development of our alternative solution approach: the discrete battery-time-space fragment-based formulation (BTSFF). In this framework, a fragment represents a subpath with an associated cost that accounts for both travel cost and excess user ride time. The BTSFF network integrates spatial, temporal, and battery dimensions, with the latter two discretized into finite indices. Computational results show that BTSFF solves D-E-ADARP significantly more efficiently than existing methods applied to the original E-ADARP. In addition, BTSFF efficiently provides high-quality lower bounds for E-ADARP and accelerates solving its battery swap variants. For E-ADARP, a relaxed network is constructed by rounding down travel times and battery consumptions, enabling a valid lower bound. For battery swap variants, BTSFF integrates lazy constraints via callbacks to correct time discretization errors, guaranteeing optimal solutions. Experiments show BTSFF outperforms benchmark methods in efficiency.


[3] 2601.03386

Modeling and Control for UAV with Off-center Slung Load

Unmanned aerial vehicle (UAV) with slung load system is a classic air transportation system. In practical applications, the suspension point of the slung load does not always align with the center of mass (CoM) of the UAV due to mission requirements or mechanical interference. This offset creates coupling in the system's nonlinear dynamics which leads to a complicated motion control problem. In existing research, modeling of the system are performed about the UAV's CoM. In this work we use the point of suspension instead. Based on the new model, a cascade control strategy is developed. In the middle-loop controller, the acceleration of the suspension point is used to regulate the swing angle of the slung load without the need for considering the coupling between the slung load and the UAV. Using the off-center reference frame, an inner-loop controller is designed to track the UAV's attitude without the need of simplification on the coupling effects. We prove local exponential stability of the closed-loop using Lyapunov approach. Finally, simulations and experiments are conducted to validate the proposed control system.


[4] 2601.03387

SEP Analysis of a Low-Resolution SIMO System with M-PSK over Fading Channels

In this paper, the average symbol error probability (SEP) of a phase-quantized single-input multiple-output (SIMO) system with M-ary phase-shift keying (PSK) modulation is analyzed under Rayleigh fading and additive white Gaussian noise. By leveraging a novel method, we derive exact SEP expressions for a quadrature PSK (QPSK)-modulated n-bit phase-quantized SIMO system with maximum ratio combining (SIMO-MRC), along with the corresponding high signal-to-noise ratio (SNR) characterizations in terms of diversity and coding gains. For a QPSK-modulated 2-bit phase-quantized SIMO system with selection combining, the diversity and coding gains are further obtained for an arbitrary number of receive antennas, complementing existing results. Interestingly, the proposed method also reveals a duality between a SIMO-MRC system and a phase-quantized multiple-input single-output (MISO) system with maximum ratio transmission, when the modulation order, phase-quantization resolution, antenna configuration, and the channel state information (CSI) conditions are reciprocal. This duality enables direct inference to obtain the diversity of a general M-PSK-modulated n-bit phase-quantized SIMO-MRC system, and extends the results to its MISO counterpart. All the above results have been obtained assuming perfect CSI at the receiver (CSIR). Finally, the SEP analysis of a QPSK-modulated 2-bit phase-quantized SIMO system is extended to the limited CSIR case, where the CSI at each receive antenna is represented by only 2 bits of channel phase information. In this scenario, the diversity gain is shown to be further halved in general.


[5] 2601.03391

Edit2Restore:Few-Shot Image Restoration via Parameter-Efficient Adaptation of Pre-trained Editing Models

Image restoration has traditionally required training specialized models on thousands of paired examples per degradation type. We challenge this paradigm by demonstrating that powerful pre-trained text-conditioned image editing models can be efficiently adapted for multiple restoration tasks through parameter-efficient fine-tuning with remarkably few examples. Our approach fine-tunes LoRA adapters on FLUX.1 Kontext, a state-of-the-art 12B parameter flow matching model for image-to-image translation, using only 16-128 paired images per task, guided by simple text prompts that specify the restoration operation. Unlike existing methods that train specialized restoration networks from scratch with thousands of samples, we leverage the rich visual priors already encoded in large-scale pre-trained editing models, dramatically reducing data requirements while maintaining high perceptual quality. A single unified LoRA adapter, conditioned on task-specific text prompts, effectively handles multiple degradations including denoising, deraining, and dehazing. Through comprehensive ablation studies, we analyze: (i) the impact of training set size on restoration quality, (ii) trade-offs between task-specific versus unified multi-task adapters, (iii) the role of text encoder fine-tuning, and (iv) zero-shot baseline performance. While our method prioritizes perceptual quality over pixel-perfect reconstruction metrics like PSNR/SSIM, our results demonstrate that pre-trained image editing models, when properly adapted, offer a compelling and data-efficient alternative to traditional image restoration approaches, opening new avenues for few-shot, prompt-guided image enhancement. The code to reproduce our results are available at: this https URL


[6] 2601.03427

Foundation Model-Aided Hierarchical Control for Robust RIS-Assisted Near-Field Communications

The deployment of extremely large aperture arrays (ELAAs) in sixth-generation (6G) networks could shift communication into the near-field communication (NFC) regime. In this regime, signals exhibit spherical wave propagation, unlike the planar waves in conventional far-field systems. Reconfigurable intelligent surfaces (RISs) can dynamically adjust phase shifts to support NFC beamfocusing, concentrating signal energy at specific spatial coordinates. However, effective RIS utilization depends on both rapid channel state information (CSI) estimation and proactive blockage mitigation, which occur on inherently different timescales. CSI varies at millisecond intervals due to small-scale fading, while blockage events evolve over seconds, posing challenges for conventional single-level control algorithms. To address this issue, we propose a dual-transformer (DT) hierarchical framework that integrates two specialized transformer models within a hierarchical deep reinforcement learning (HDRL) architecture, referred to as the DT-HDRL framework. A fast-timescale transformer processes ray-tracing data for rapid CSI estimation, while a vision transformer (ViT) analyzes visual data to predict impending blockages. In HDRL, the high-level controller selects line-of-sight (LoS) or RIS-assisted non-line-of-sight (NLoS) transmission paths and sets goals, while the low-level controller optimizes base station (BS) beamfocusing and RIS phase shifts using instantaneous CSI. This dual-timescale coordination maximizes spectral efficiency (SE) while ensuring robust performance under dynamic conditions. Simulation results demonstrate that our approach improves SE by approximately 18% compared to single-timescale baselines, while the proposed blockage predictor achieves an F1-score of 0.92, providing a 769 ms advance warning window in dynamic scenarios.


[7] 2601.03442

Provable Acceleration of Distributed Optimization with Local Updates

In conventional distributed optimization, each agent performs a single local update between two communication rounds with its neighbors to synchronize solutions. Inspired by the success of using multiple local updates in federated learning, incorporating local updates into distributed optimization has recently attracted increasing attention. However, unlike federated learning, where multiple local updates can accelerate learning by improving gradient estimation under mini-batch settings, it remains unclear whether similar benefits hold in distributed optimization when gradients are exact. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates and obscure their true impact on convergence. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.


[8] 2601.03443

Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers

Generative adversarial networks (GANs) and diffusion models have recently achieved state-of-the-art performance in audio super-resolution (ADSR), producing perceptually convincing wideband audio from narrowband inputs. However, existing evaluations primarily rely on signal-level or perceptual metrics, leaving open the question of how closely the distributions of synthetic super-resolved and real wideband audio match. Here we address this problem by analyzing the separability of real and super-resolved audio in various embedding spaces. We consider both middle-band ($4\to 16$~kHz) and full-band ($16\to 48$~kHz) upsampling tasks for speech and music, training linear classifiers to distinguish real from synthetic samples based on multiple types of audio embeddings. Comparisons with objective metrics and subjective listening tests reveal that embedding-based classifiers achieve near-perfect separation, even when the generated audio attains high perceptual quality and state-of-the-art metric scores. This behavior is consistent across datasets and models, including recent diffusion-based approaches, highlighting a persistent gap between perceptual quality and true distributional fidelity in ADSR models.


[9] 2601.03445

Policy Synthesis for Interval MDPs via Polyhedral Lyapunov Functions

Decision-making under uncertainty is central to many safety-critical applications, where decisions must be guided by probabilistic modeling formalisms. This paper introduces a novel approach to policy synthesis in multi-objective interval Markov decision processes using polyhedral Lyapunov functions. Unlike previous Lyapunov-based methods that mainly rely on quadratic functions, our method utilizes polyhedral functions to enhance accuracy in managing uncertainties within value iteration of dynamic programming. We reformulate the value iteration algorithm as a switched affine system with interval uncertainties and apply control-theoretic stability principles to synthesize policies that guide the system toward a desired target set. By constructing an invariant set of attraction, we ensure that the synthesized policies provide convergence guarantees while minimizing the impact of transition uncertainty in the underlying model. Our methodology removes the need for computationally intensive Pareto curve computations by directly determining a policy that brings objectives within a specified range of their target values. We validate our approach through numerical case studies, including a recycling robot and an electric vehicle battery, demonstrating its effectiveness in achieving policy synthesis under uncertainty.


[10] 2601.03446

Energy Harvesting in High Altitude Platform Station Enabled Sensor Networks

High altitude platform station (HAPS) systems are becoming crucial facilitators for future wireless communication networks, enhancing connectivity across all vertical communication layers, including small Internet of Things (IoT) sensors and devices, terrestrial users, and aerial devices. In the context of the widely recognized vertical heterogeneous network (VHetNet) architecture, HAPS systems can provide service to both aerial and ground users. However, integrating HAPS systems as a core element in the VHetNet architecture presents a considerable energy challenge, marking a prominent constraint for their operation. Driven by this challenge, we introduce an energy harvesting (EH) strategy tailored for HAPS systems, enabling a HAPS system to gather energy from another HAPS system, which is not constrained by energy limitations. To assess the performance capabilities of the proposed model, we derive outage probability (OP), ergodic capacity (EC) and verify them by using Monte Carlo (MC) simulations. Moreover, we explore the system in terms of throughput. The findings reveal that harnessing full potential of EH stands as a viable approach to meet the energy demands of HAPS systems.


[11] 2601.03452

Developing a Quantitative Resiliency Approach

Resiliency has garnered attention in the management of critical infrastructure as a metric of system performance, but there are significant roadblocks to its implementation in a realistic decision-making framework. Contrasted to risk and reliability, which have robust quantification approaches and undergird many regulatory approaches to system safety (e.g., "risk-informed decision-making"), resiliency is a diffuse, qualitatively-understood characteristic, often treated differently or distinctly. However, in the emerging context of highly-complex, highly-interdependent critical systems, the idea of reliability (as the probability of non-failure) may not be an appropriate metric of system health. As a result, focus is shifting towards resiliency-centered approaches that value the response to failure as much as the avoidance of failure. Supporting this approach requires a robustly-defined, quantitative understanding of resiliency. In this paper, we explore the foundations of reliability and resiliency engineering, and propose an approach to resiliency-informed decision-making bolstered by a quantitative understanding of resiliency.


[12] 2601.03476

Online Decision-Making Under Uncertainty for Vehicle-to-Building Systems

Vehicle-to-building (V2B) systems integrate physical infrastructures, such as smart buildings and electric vehicles (EVs) connected to chargers at the building, with digital control mechanisms to manage energy use. By utilizing EVs as flexible energy reservoirs, buildings can dynamically charge and discharge them to optimize energy use and cut costs under time-variable pricing and demand charge policies. This setup leads to the V2B optimization problem, where buildings coordinate EV charging and discharging to minimize total electricity costs while meeting users' charging requirements. However, the V2B optimization problem is challenging because of: (1) fluctuating electricity pricing, which includes both energy charges ($/kWh) and demand charges ($/kW); (2) long planning horizons (typically over 30 days); (3) heterogeneous chargers with varying charging rates, controllability, and directionality (i.e., unidirectional or bidirectional); and (4) user-specific battery levels at departure to ensure user requirements are met. In contrast to existing approaches that often model this setting as a single-shot combinatorial optimization problem, we highlight critical limitations in prior work and instead model the V2B optimization problem as a Markov decision process (MDP), i.e., a stochastic control process. Solving the resulting MDP is challenging due to the large state and action spaces. To address the challenges of the large state space, we leverage online search, and we counter the action space by using domain-specific heuristics to prune unpromising actions. We validate our approach in collaboration with Nissan Advanced Technology Center - Silicon Valley. Using data from their EV testbed, we show that the proposed framework significantly outperforms state-of-the-art methods.


[13] 2601.03486

Adaptive Model-Based Reinforcement Learning for Orbit Feedback Control in NSLS-II Storage Ring

The National Synchrotron Light Source II (NSLS-II) uses highly stable electron beam to produce high-quality X-ray beams with high brightness and low-emittance synchrotron radiation. The traditional algorithm to stabilize the beam applies singular value decomposition (SVD) on the orbit response matrix to remove noise and extract actions. Supervised learning has been studied on NSLS-II storage ring stabilization and other accelerator facilities recently. Several problems, for example, machine status drifting, environment noise, and non-linear accelerator dynamics, remain unresolved in the SVD-based and supervised learning algorithms. To address these problems, we propose an adaptive training framework based on model-based reinforcement learning. This framework consists of two types of optimizations: trajectory optimization attempts to minimize the expected total reward in a differentiable environment, and online model optimization learns non-linear machine dynamics through the agent-environment interaction. Through online training, this framework tracks the internal status drifting in the electron beam ring. Simulation and real in-facility experiments on NSLS-II reveal that our method stabilizes the beam position and minimizes the alignment error, defined as the root mean square (RMS) error between adjusted beam positions and the reference position, down to ~1$\mu$m.


[14] 2601.03495

Cyberattack Detection in Virtualized Microgrids Using LightGBM and Knowledge-Distilled Classifiers

Modern microgrids depend on distributed sensing and communication interfaces, making them increasingly vulnerable to cyber physical disturbances that threaten operational continuity and equipment safety. In this work, a complete virtual microgrid was designed and implemented in MATLAB/Simulink, integrating heterogeneous renewable sources and secondary controller layers. A structured cyberattack framework was developed using MGLib to inject adversarial signals directly into the secondary control pathways. Multiple attack classes were emulated, including ramp, sinusoidal, additive, coordinated stealth, and denial of service behaviors. The virtual environment was used to generate labeled datasets under both normal and attack conditions. The datasets trained Light Gradient Boosting Machine (LightGBM) models to perform two functions: detecting the presence of an intrusion (binary) and distinguishing among attack types (multiclass). The multiclass model attained 99.72% accuracy and a 99.62% F1 score, while the binary model attained 94.8% accuracy and a 94.3% F1 score. A knowledge-distillation step reduced the size of the multiclass model, allowing faster predictions with only a small drop in performance. Real-time tests showed a processing delay of about 54 to 67 ms per 1000 samples, demonstrating suitability for CPU-based edge deployment in microgrid controllers. The results confirm that lightweight machine learning based intrusion detection methods can provide fast, accurate, and efficient cyberattack detection without relying on complex deep learning models. Key contributions include: (1) development of a complete MATLAB-based virtual microgrid, (2) structured attack injection at the control layer, (3) creation of multiclass labeled datasets, and (4) design of low-cost AI models suitable for practical microgrid cybersecurity.


[15] 2601.03499

GeoDiff-SAR: A Geometric Prior Guided Diffusion Model for SAR Image Generation

Synthetic Aperture Radar (SAR) imaging results are highly sensitive to observation geometries and the geometric parameters of targets. However, existing generative methods primarily operate within the image domain, neglecting explicit geometric information. This limitation often leads to unsatisfactory generation quality and the inability to precisely control critical parameters such as azimuth angles. To address these challenges, we propose GeoDiff-SAR, a geometric prior guided diffusion model for high-fidelity SAR image generation. Specifically, GeoDiff-SAR first efficiently simulates the geometric structures and scattering relationships inherent in real SAR imaging by calculating SAR point clouds at specific azimuths, which serves as a robust physical guidance. Secondly, to effectively fuse multi-modal information, we employ a feature fusion gating network based on Feature-wise Linear Modulation (FiLM) to dynamically regulate the weight distribution of 3D physical information, image control parameters, and textual description parameters. Thirdly, we utilize the Low-Rank Adaptation (LoRA) architecture to perform lightweight fine-tuning on the advanced Stable Diffusion 3.5 (SD3.5) model, enabling it to rapidly adapt to the distribution characteristics of the SAR domain. To validate the effectiveness of GeoDiff-SAR, extensive comparative experiments were conducted on real-world SAR datasets. The results demonstrate that data generated by GeoDiff-SAR exhibits high fidelity and effectively enhances the accuracy of downstream classification tasks. In particular, it significantly improves recognition performance across different azimuth angles, thereby underscoring the superiority of physics-guided generation.


[16] 2601.03527

Intensity Fluctuation Dynamics in XPM

Cross-Phase Modulation (XPM) constitutes a critical nonlinear impairment in high-capacity Wavelength Division Multiplexing (WDM) systems, significantly driven by intensity fluctuations (IFs) that evolve due to chromatic dispersion. This paper presents an enhanced XPM model that explicitly incorporates frequency-domain IF growth along the fiber, improving upon prior models that focused primarily on temporal pulse deformation. A direct correlation between this frequency-domain growth and XPM-induced phase distortions is established and analyzed. Results demonstrate that IF evolution, particularly at lower frequencies, profoundly affects XPM phase fluctuation spectra and phase variance. Validated through simulations, the model accurately predicts these spectral characteristics across various system parameters. Furthermore, the derived phase variance enables accurate prediction of system performance in terms of Bit Error Ratio (BER). These findings highlight the necessity of modeling frequency-domain IF evolution to accurately characterize XPM impairments, offering guidance for the design of advanced optical networks.


[17] 2601.03535

OpenISAC: An Open-Source Real-Time Experimentation Platform for OFDM-ISAC with Over-the-Air Synchronization

Integrated sensing and communication (ISAC) is envisioned to be one of the key usage scenarios for the sixth generation (6G) mobile communication networks. While significant progresses have been achieved for the theoretical studies, the further advancement of ISAC is hampered by the lack of accessible, open-source, and real-time experimental platforms. To address this gap, we introduce OpenISAC, a versatile and high-performance open-source platform for real-time ISAC experimentation. OpenISAC utilizes orthogonal frequency division multiplexing (OFDM) waveform and implements crucial sensing functionalities, including both monostatic and bistatic delay-Doppler sensing. A key feature of our platform is a novel over-the-air (OTA) synchronization mechanism that enables robust bistatic operations without requiring a wired connection between nodes. The platform is built entirely on open-source software, leveraging the universal software radio peripheral (USRP) hardware driver (UHD) library, thus eliminating the need for any commercial licenses. It supports a wide range of software-defined radios, from the cost-effective USRP B200 series to the high-performance X400 series. The physical layer modulator and demodulator are implemented with C++ for high-speed processing, while the sensing data is streamed to a Python environment, providing a user-friendly interface for rapid prototyping and validation of sensing signal processing algorithms. With flexible parameter selection and real-time communication and sensing operation, OpenISAC serves as a powerful and accessible tool for the academic and research communities to explore and innovate within the field of OFDM-ISAC.


[18] 2601.03536

Spider web-inspired sensing and computation with fiber network physical reservoirs

Physical reservoir computing leverages the intrinsic dynamics of mechanical systems to perform computation through their natural responses to input signals. Here, we study a compliant fiber network inspired by orb-weaving spider webs and investigate how its mechanical design and operating conditions shape its computational capability. Using Cosserat rod-based simulations, we identify how network topology, geometry, actuation, and axial tension impact the nonlinear computation and memory capacity of the network. We further evaluate several readout reduction strategies to assess how computational performance varies with the number and placement of measured outputs. We then experimentally validate these results using a physical fiber-network prototype. Overall, results provide insights and guidance on design, actuation, and sensing choices to enable fiber networks for mechano-intelligent computation. They demonstrate the ability of structured compliant fibers networks to serve as physical reservoirs capable of nonlinear transformation and input-history retention.


[19] 2601.03601

F$^4$-CKM: Learning Channel Knowledge Map with Radio Frequency Radiance Field Rendering

In 6G mobile communications, acquiring accurate and timely channel state information (CSI) becomes increasingly challenging due to the growing antenna array size and bandwidth. To alleviate the CSI feedback burden, the channel knowledge map (CKM) has emerged as a promising approach by leveraging environment-aware techniques to predict CSI based solely on user locations. However, how to effectively construct a CKM remains an open issue. In this paper, we propose F$^4$-CKM, a novel CKM construction framework characterized by four distinctive features: radiance Field rendering, spatial-Frequency-awareness, location-Free usage, and Fast learning. Central to our design is the adaptation of radiance field rendering techniques from computer vision to the radio frequency (RF) domain, enabled by a novel Wireless Radiator Representation (WiRARE) network that captures the spatial-frequency characteristics of wireless channels. Additionally, a novel shaping filter module and an angular sampling strategy are introduced to facilitate CKM construction. Extensive experiments demonstrate that F$^4$-CKM significantly outperforms existing baselines in terms of wireless channel prediction accuracy and efficiency.


[20] 2601.03626

Learning from Limited Labels: Transductive Graph Label Propagation for Indian Music Analysis

Supervised machine learning frameworks rely on extensive labeled datasets for robust performance on real-world tasks. However, there is a lack of large annotated datasets in audio and music domains, as annotating such recordings is resource-intensive, laborious, and often require expert domain knowledge. In this work, we explore the use of label propagation (LP), a graph-based semi-supervised learning technique, for automatically labeling the unlabeled set in an unsupervised manner. By constructing a similarity graph over audio embeddings, we propagate limited label information from a small annotated subset to a larger unlabeled corpus in a transductive, semi-supervised setting. We apply this method to two tasks in Indian Art Music (IAM): Raga identification and Instrument classification. For both these tasks, we integrate multiple public datasets along with additional recordings we acquire from Prasar Bharati Archives to perform LP. Our experiments demonstrate that LP significantly reduces labeling overhead and produces higher-quality annotations compared to conventional baseline methods, including those based on pretrained inductive models. These results highlight the potential of graph-based semi-supervised learning to democratize data annotation and accelerate progress in music information retrieval.


[21] 2601.03632

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

Zero-shot text-to-speech models can clone a speaker's timbre from a short reference audio, but they also strongly inherit the speaking style present in the reference. As a result, synthesizing speech with a desired style often requires carefully selecting reference audio, which is impractical when only limited or mismatched references are available. While recent controllable TTS methods attempt to address this issue, they typically rely on absolute style targets and discrete textual prompts, and therefore do not support continuous and reference-relative style control. We propose ReStyle-TTS, a framework that enables continuous and reference-relative style control in zero-shot TTS. Our key insight is that effective style control requires first reducing the model's implicit dependence on reference style before introducing explicit control mechanisms. To this end, we introduce Decoupled Classifier-Free Guidance (DCFG), which independently controls text and reference guidance, reducing reliance on reference style while preserving text fidelity. On top of this, we apply style-specific LoRAs together with Orthogonal LoRA Fusion to enable continuous and disentangled multi-attribute control, and introduce a Timbre Consistency Optimization module to mitigate timbre drift caused by weakened reference guidance. Experiments show that ReStyle-TTS enables user-friendly, continuous, and relative control over pitch, energy, and multiple emotions while maintaining intelligibility and speaker timbre, and performs robustly in challenging mismatched reference-target style scenarios.


[22] 2601.03638

DSP-Based Sub-Switching-Period Current-Limiting Control for Grid-Tied Inverter under Grid Faults

This paper presents a sub-switching period current-limiting control for a grid-tied inverter to prevent transient overcurrents during grid faults and enable seamless fault ride-through (FRT). Sudden grid-voltage disturbances, such as voltage sags or phase jumps, can induce large transient currents within a switching period, particularly at low switching frequencies. Upon disturbance detection, the proposed method immediately modifies the pulse-width modulation carrier, enabling continuous regulation of the inverter output current within a time much shorter than a switching period without interrupting current flow. The proposed method can be implemented on commonly used digital signal processors without requiring specialized analog or digital circuits or high-speed computing devices. Experimental results from a 2-level, 3-phase inverter switching at 3.6 kHz validate the effectiveness of the proposed method under symmetric and asymmetric voltage sags and phase jumps.


[23] 2601.03639

Zak-OTFS ISAC with Bistatic Sensing via Semi-Blind Atomic Norm Denoising Scheme

Integrated sensing and communication (ISAC) through Zak-transform-based orthogonal time frequency space (Zak-OTFS) modulation is a promising solution for high-mobility scenarios. Realizing accurate bistatic sensing and robust communication necessitates precise channel estimation; however, this remains a formidable challenge in doubly dispersive environments, where fractional delay-Doppler shifts induce severe channel spreading. This paper proposes a semi-blind atomic norm denoising scheme for Zak-OTFS ISAC with bistatic sensing. We first derive the discrete-time input-output (I/O) relationship of Zak-OTFS under fractional delay-Doppler shifts and rectangular windowing. Based on this I/O relation, we formulate the joint channel parameter estimation and data detection task as an atomic norm denoising problem, utilizing the negative square penalty method to handle the non-convex discrete constellation constraints. To solve this problem efficiently, we develop an accelerated iterative algorithm that integrates majorization-minimization, accelerated projected gradient, and inexact accelerated proximal gradient methods. We provide a rigorous convergence proof for the proposed algorithm. Simulation results demonstrate that the proposed scheme achieves super-resolution sensing accuracy and communication performance approaching the perfect channel state information lower bound.


[24] 2601.03679

Accounting for Optimal Control in the Sizing of Isolated Hybrid Renewable Energy Systems Using Imitation Learning

Decarbonization of isolated or off-grid energy systems through phase-in of large shares of intermittent solar or wind generation requires co-installation of energy storage or continued use of existing fossil dispatchable power sources to balance supply and demand. The effective CO2 emission reduction depends on the relative capacity of the energy storage and renewable sources, the stochasticity of the renewable generation, and the optimal control or dispatch of the isolated energy system. While the operations of the energy storage and dispatchable sources may impact the optimal sizing of the system, it is challenging to account for the effect of finite horizon, optimal control at the stage of system sizing. Here, we present a flexible and computationally efficient sizing framework for energy storage and renewable capacity in isolated energy systems, accounting for uncertainty in the renewable generation and the optimal feedback control. To this end, we implement an imitation learning approach to stochastic neural model predictive control (MPC) which allows us to relate the battery storage and wind peak capacities to the emissions reduction and investment costs while accounting for finite horizon, optimal control. Through this approach, decision makers can evaluate the effective emission reduction and costs of different storage and wind capacities at any price point while accounting for uncertainty in the renewable generation with limited foresight. We evaluate the proposed sizing framework on a case study of an offshore energy system with a gas turbine, a wind farm and a battery energy storage system (BESS). In this case, we find a nonlinear, nontrivial relationship between the investment costs and reduction in gas usage relative to the wind and BESS capacities, emphasizing the complexity and importance of accounting for optimal control in the design of isolated energy systems.


[25] 2601.03712

TellWhisper: Tell Whisper Who Speaks When

Multi-speaker automatic speech recognition (MASR) aims to predict ''who spoke when and what'' from multi-speaker speech, a key technology for multi-party dialogue understanding. However, most existing approaches decouple temporal modeling and speaker modeling when addressing ''when'' and ''who'': some inject speaker cues before encoding (e.g., speaker masking), which can cause irreversible information loss; others fuse identity by mixing speaker posteriors after encoding, which may entangle acoustic content with speaker identity. This separation is brittle under rapid turn-taking and overlapping speech, often leading to degraded performance. To address these limitations, we propose TellWhisper, a unified framework that jointly models speaker identity and temporal within the speech encoder. Specifically, we design TS-RoPE, a time-speaker rotary positional encoding: time coordinates are derived from frame indices, while speaker coordinates are derived from speaker activity and pause cues. By applying region-specific rotation angles, the model explicitly captures per-speaker continuity, speaker-turn transitions, and state dynamics, enabling the attention mechanism to simultaneously attend to ''when'' and ''who''. Moreover, to estimate frame-level speaker activity, we develop Hyper-SD, which casts speaker classification in hyperbolic space to enhance inter-class separation and refine speaker-activity estimates. Extensive experiments demonstrate the effectiveness of the proposed approach.


[26] 2601.03716

Derivation of the Thermal Conductivity in a Latent Thermal Energy Storage Unit for Use in Simplified System Models

Latent Thermal Energy Storages (LTES) can store thermal energy in a narrow temperature range. Therefore, they are favorable for integration into Rankine-based Carnot Batteries. For the design of such systems, simulations based on accurate models are desirable. However, physical phenomena such as natural convection in LTES units cannot be modeled directly in transient system models. Simplified models are required. Therefore, the objective of this work is to derive simplified LTES unit models for use in system models. In transient simulations the state of charge of the LTES influences its temperature profile. The temperature profile depends on the geometry of the LTES unit. Therefore, the geometry must be considered to model the transient behavior of an LTES unit. The LTES unit under investigation has a shell and tube heat exchanger structure. The phase change material (PCM) is located between the hexagonal fins and in the space between the finned tubes. Aluminum fins are used. They have a high thermal conductivity and thus compensate for the low thermal conductivity of the sodium nitrate used as PCM. The interaction between fins and PCM is complex. Therefore, a numerical approach can be used to gain insight into the behavior of the LTES unit. To transfer the results of a complex model to a simplified model where fins and PCM are not considered individually, the effective thermal conductivity of a single finned tube can be used to approximate the performance of the LTES unit. In this study, a model of a section with a single finned tube is developed using the COMSOL software. The effective thermal conductivity of the system is determined by varying the effective thermal conductivity in a simplified model and comparing the results with reference cases based on a complex modeling approach. The results can serve as model input for simplified system models of Carnot Batteries, among others.


[27] 2601.03735

Cramer-Rao Bound for Angle of Arrival Estimates in True-Time-Delay Systems

In the context of joint communication and sensing JC&S, the challenge of obtaining accurate parameter estimates is of interest. Parameter estimates, such as the AoA can be utilized for solving the initial access problem, interference mitigation, localization of users or monitoring of the environment and synchronization of MIMO systems. Recently, TTD systems have gained attention for fast beam training during initial access and mitigation of beam squinting. This work derives the CRB for angle estimates in typical TTD systems. Properties of the CRB and the Fisher information are investigated and numerically evaluated. Finally, methods for angle estimation such as ML and established estimators are utilized to solve the angle estimation problem using a uniform linear array.


[28] 2601.03745

Two-stage Multi-beam Training for Multiuser Millimeter-Wave Communications

In this letter, we study an efficient multi-beam training method for multiuser millimeter-wave communication systems. Unlike the conventional single-beam training method that relies on exhaustive search, multi-beam training design faces a key challenge in balancing the trade-off between beam training overhead and success beam-identification rate, exacerbated by severe inter-beam interference. To tackle this challenge, we propose a new two-stage multi-beam training method with two distinct multi-beam patterns to enable fast and accurate user angle identification. Specifically, in the first stage, the antenna array is divided into sparse subarrays to generate multiple beams (with high array gains), for identifying candidate user angles. In the second stage, the array is redivided into dense subarrays to generate flexibly steered wide beams, for which a cross-validation method is employed to effectively resolve the remaining angular ambiguity in the first stage. Last, numerical results demonstrate that the proposed method significantly improves the success beam-identification rate compared to existing multi-beam training methods, while retaining or even reducing the required beam training overhead.


[29] 2601.03767

Output Consensus on Periodic References for Constrained Multi-agent Systems Under a Switching Network

This work addresses the output consensus problem of constrained heterogeneous multi-agent systems under a switching network with potential communication delay, where outputs are periodic and characterized by a linear exosystem. Since periodic references have more complex dynamics, it is more challenging to track periodic references and achieve consensus on them. In this paper, a model predictive control method incorporating an artificial reference and a modified cost is proposed to track periodic references, which maintains recursive feasibility even when reference switches. Moreover, consensus protocols are proposed to achieve consensus on periodic references in different scenarios, in which global information such as the set of globally admissible references and the global time index are not involved. Theoretical analysis proves that constrained output consensus is asymptotically achieved with the proposed algorithm as the references of each agent converge and agents track their references while maintaining constraint satisfaction. Finally, numerical examples are provided to verify the effectiveness of the proposed algorithm.


[30] 2601.03789

CSI-MAE: A Masked Autoencoder-based Channel Foundation Model

Self-Supervised Learning (SSL) has emerged as a key technique in machine learning, tackling challenges such as limited labeled data, high annotation costs, and variable wireless channel conditions. It is essential for developing Channel Foundation Models (CFMs), which extract latent features from channel state information (CSI) and adapt to different wireless settings. Yet, existing CFMs have notable drawbacks: heavy reliance on scenario-specific data hinders generalization, they focus on single/dual tasks, and lack zero-shot learning ability. In this paper, we propose CSI-MAE, a generalized CFM leveraging masked autoencoder for cross-scenario generalization. Trained on 3GPP channel model datasets, it integrates sensing and communication via CSI perception and generation, proven effective across diverse tasks. A lightweight decoder finetuning strategy cuts training costs while maintaining competitive performance. Under this approach, CSI-MAE matches or surpasses supervised models. With full-parameter finetuning, it achieves the state-of-the-art performance. Its exceptional zero-shot transferability also rivals supervised techniques in cross-scenario applications, driving wireless communication innovation.


[31] 2601.03819

Unified and Efficient Analysis of Machining Chatter and Surface Location Error

Although machining chatter can be suppressed by the choice of stable cutting parameters through means of stability lobe diagram (SLD), surface roughness still remains due to the forced vibration, which limits surface quality, especially in the surface finish. Better cutting parameters can be achieved considering surface location error (SLE) together with SLD. This paper proposes an innovative modeling framework of the machining dynamic system that enables efficient computation of the chatter stability and SLE. The framework mainly embodies two techniques, namely semi-discretization method (SDM) and lifting method. The machining dynamics system is mathematically expressed as an angle-varying delay differential equation (DDE). The SDM approximates the angle-varying and delayed terms to ordinary terms using zero-phase interpolations and governs the discrete angle-varying dynamics system. Then, the system is merged over the tooth passing angle using the lifted approach to establish an explicit dynamic system in the compact state-space form. Based on the compact state-space model, the chatter stability and SLE prediction are easily and efficiently conducted. Simulation results show the improved efficiency of the proposed method over other well-known methods.


[32] 2601.03867

A Systems-Engineered ESP32 DAQ Architecture and FAIR Data Workflow for Small-Scale Wind Turbine Performance Measurement in Tropical Environments

Small-scale wind turbine research in resource-constrained academic settings frequently produces unreliable or unpublishable datasets due to ad-hoc instrumentation, inadequate time synchronization, storage failures, and weak data governance. This paper presents a systematic data acquisition (DAQ) methodology and ESP32-based reference implementation design for field characterization of small wind turbines (100~W--5~kW), emphasizing tropical/coastal deployment constraints typical of Low- and Middle-Income Countries (LMIC). We integrate (i)~a student-adapted V-model with requirements traceability, (ii)~hardware selection strategies for high-humidity and salt-spray environments, (iii)~an embedded firmware architecture featuring interrupt-driven rotor speed measurement, state-machine fault handling, and NTP-based time synchronization, (iv)~a local-first hybrid storage design combining SD-card persistence with optional MQTT cloud telemetry, and (v)~a data-management workflow adapting CRISP-DM and FAIR principles with explicit quality dimensions and publication templates. A detailed helical vertical-axis wind turbine (VAWT) design scenario for coastal Sri Lanka illustrates the complete methodology, targeting $>90\%$ data completeness over six-month campaigns. The methodology is accompanied by open-source firmware, hardware templates, and data-publication workflow artifacts released via GitHub and Zenodo.


[33] 2601.03893

Smooth Sampling-Based Model Predictive Control Using Deterministic Samples

Sampling-based model predictive control (MPC) is effective for nonlinear systems but often produces non-smooth control inputs due to random sampling. To address this issue, we extend the model predictive path integral (MPPI) framework with deterministic sampling and improvements from cross-entropy method (CEM)--MPC, such as iterative optimization, proposing deterministic sampling MPPI (dsMPPI). This combination leverages the exponential weighting of MPPI alongside the efficiency of deterministic samples. Experiments demonstrate that dsMPPI achieves smoother trajectories compared to state-of-the-art methods.


[34] 2601.03899

Ensemble Models for Predicting Treatment Response in Pediatric Low-Grade Glioma Managed with Chemotherapy

In this paper, we introduce a novel pipeline for predicting chemotherapy response in pediatric brain tumors that are not amenable to complete surgical resection, using pre-treatment magnetic resonance imaging combined with clinical information. Our method integrates a state-of-the-art pediatric brain tumor segmentation framework with radiomic feature extraction and clinical data through an ensemble of a Swin UNETR encoder and XGBoost classifier. The segmentation model delineates four tumor subregions enhancing tumor, non-enhancing tumor, cystic component and edema which are used to extract imaging biomarkers and generate predictive features. The Swin UNETR network classifies the response to treatment directly from these segmented MRI scans, while XGBoost predicts response using radiomics and clinical variables including legal sex, ethnicity, race, age at event (in days), molecular subtype, tumor locations, initial surgery status, metastatic status, metastasis location, chemotherapy type, protocol name and chemotherapy agents. The ensemble output provides a non-invasive estimate of chemotherapy response in this historically challenging population characterized by lower progression-free survival. Among compared approaches, our Swin-Ensemble achieved the best performance (precision for non effective cases=0.68, recall for non effective cases=0.85, precision for chemotherapy effective cases=0.64 and overall accuracy=0.69), outperforming Mamba-FeatureFuse, Swin UNETR encoder, and Swin-FeatureFuse models. Our findings suggest that this ensemble framework represents a promising step toward personalized therapy response prediction for pediatric low-grade glioma patients in need of chemotherapy treatment who are not suitable for complete surgical resection, a population with significantly lower progression free survival and for whom chemotherapy remains the primary treatment option.


[35] 2601.03906

Exact Continuous Reformulations of Logic Constraints in Nonlinear Optimization and Optimal Control Problems

Many nonlinear optimal control and optimization problems involve constraints that combine continuous dynamics with discrete logic conditions. Standard approaches typically rely on mixed-integer programming, which introduces scalability challenges and requires specialized solvers. This paper presents an exact reformulation of broad classes of logical constraints as binary-variable-free expressions whose differentiability properties coincide with those of the underlying predicates, enabling their direct integration into nonlinear programming models. Our approach rewrites arbitrary logical propositions into conjunctive normal form, converts them into equivalent max--min constraints, and applies a smoothing procedure that preserves the exact feasible set. The method is evaluated on two benchmark problems, a quadrotor trajectory optimization with obstacle avoidance and a hybrid two-tank system with temporal logic constraints, and is shown to obtain optimal solutions more consistently and efficiently than existing binary variable elimination techniques.


[36] 2601.03924

A low-complexity method for efficient depth-guided image deblurring

Image deblurring is a challenging problem in imaging due to its highly ill-posed nature. Deep learning models have shown great success in tackling this problem but the quest for the best image quality has brought their computational complexity up, making them impractical on anything but powerful servers. Meanwhile, recent works have shown that mobile Lidars can provide complementary information in the form of depth maps that enhance deblurring quality. In this paper, we introduce a novel low-complexity neural network for depth-guided image deblurring. We show that the use of the wavelet transform to separate structural details and reduce spatial redundancy as well as efficient feature conditioning on the depth information are essential ingredients in developing a low-complexity model. Experimental results show competitive image quality against recent state-of-the-art models while reducing complexity by up to two orders of magnitude.


[37] 2601.03944

ASVspoof 5: Evaluation of Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech

ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake detection solutions. A significant change from previous challenge editions is a new crowdsourced database collected from a substantially greater number of speakers under diverse recording conditions, and a mix of cutting-edge and legacy generative speech technology. With the new database described elsewhere, we provide in this paper an overview of the ASVspoof 5 challenge results for the submissions of 53 participating teams. While many solutions perform well, performance degrades under adversarial attacks and the application of neural encoding/compression schemes. Together with a review of post-challenge results, we also report a study of calibration in addition to other principal challenges and outline a road-map for the future of ASVspoof.


[38] 2601.04069

Hybrid Downlink Beamforming with Outage Constraints under Imperfect CSI using Model-Driven Deep Learning

We consider energy-efficient multi-user hybrid downlink beamforming (BF) and power allocation under imperfect channel state information (CSI) and probabilistic outage constraints. In this domain, classical optimization methods resort to computationally costly conic optimization problems. Meanwhile, generic deep network (DN) architectures lack interpretability and require large training data sets to generalize well. In this paper, we therefore propose a lightweight model-aided deep learning architecture based on a greedy selection algorithm for analog beam codewords. The architecture relies on an instance-adaptive augmentation of the signal model to estimate the impact of the CSI error. To learn the DN parameters, we derive a novel and efficient implicit representation of the nested constrained BF problem and prove sufficient conditions for the existence of the corresponding gradient. In the loss function, we utilize an annealing-based approximation of the outage compared to conventional quantile-based loss terms. This approximation adaptively anneals towards the exact probabilistic constraint depending on the current level of quality of service (QoS) violation. Simulations validate that the proposed DN can achieve the nominal outage level under CSI error due to channel estimation and channel compression, while allocating less power than benchmarks. Thereby, a single trained model generalizes to different numbers of users, QoS requirements and levels of CSI quality. We further show that the adaptive annealing-based loss function can accelerate the training and yield a better power-outage trade-off.


[39] 2601.04136

A Load Impedance Emulation Active Interface for Piezoelectric Vibration Energy Harvesters

A single stage active AC/DC interface able to emulate the optimal load impedance of a Resonant Piezoelectric Vibration Energy Harvester (RPVEH) is proposed. As theoretically shown, unlike an electronic interface that emulates an optimal load generator, an interface that emulates an optimal load impedance does not require adaptation to the acceleration of input vibrations. This allows the use of a very simple control, avoiding the implementation of Maximum Power Point Tracking (MPPT) algorithms that require lossy microcontrollers. Thus, the proposed interface is equipped with a simple analog controller allowing the RPVEH to work in its Maximum Power Point (MPP) in both steady-state and variable conditions of vibrations, without recurring to multivariable perturbative approaches, as it happens for the most of single stage AC/DC interfaces proposed in the literature. The absence of perturbative techniques allows a significant improvement of both stationary and dynamic performances. Experimental tests of a prototype of the proposed interface confirm the theoretical findings and the predicted behavior.


[40] 2601.04163

Scanner-Induced Domain Shifts Undermine the Robustness of Pathology Foundation Models

Pathology foundation models (PFMs) have become central to computational pathology, aiming to offer general encoders for feature extraction from whole-slide images (WSIs). Despite strong benchmark performance, PFM robustness to real-world technical domain shifts, such as variability from whole-slide scanner devices, remains poorly understood. We systematically evaluated the robustness of 14 PFMs to scanner-induced variability, including state-of-the-art models, earlier self-supervised models, and a baseline trained on natural images. Using a multiscanner dataset of 384 breast cancer WSIs scanned on five devices, we isolated scanner effects independently from biological and laboratory confounders. Robustness is assessed via complementary unsupervised embedding analyses and a set of clinicopathological supervised prediction tasks. Our results demonstrate that current PFMs are not invariant to scanner-induced domain shifts. Most models encode pronounced scanner-specific variability in their embedding spaces. While AUC often remains stable, this masks a critical failure mode: scanner variability systematically alters the embedding space and impacts calibration of downstream model predictions, resulting in scanner-dependent bias that can impact reliability in clinical use cases. We further show that robustness is not a simple function of training data scale, model size, or model recency. None of the models provided reliable robustness against scanner-induced variability. While the models trained on the most diverse data, here represented by vision-language models, appear to have an advantage with respect to robustness, they underperformed on downstream supervised tasks. We conclude that development and evaluation of PFMs requires moving beyond accuracy-centric benchmarks toward explicit evaluation and optimisation of embedding stability and calibration under realistic acquisition variability.


[41] 2601.04178

Sound Event Detection with Boundary-Aware Optimization and Inference

Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes the need for post-processing hyperparameter tuning, and scales to achieve new state-of-the-art performance across all AudioSet Strong classes.


[42] 2601.04190

Solar Panel-based Visible Light Communication for Batteryless Systems

This paper presents a batteryless wireless communication node for the Internet of Things, powered entirely by ambient light and capable of receiving data through visible light communication. A solar panel serves dual functions as an energy harvester and an optical antenna, capturing modulated signals from LED light sources. A lightweight analog front-end filters and digitizes the signals for an 8-bit low-power processor, which manages the system's operational states based on stored energy levels. The main processor is selectively activated to minimize energy consumption. Data reception is synchronized with the harvester's open-circuit phase, reducing interference and improving signal quality. The prototype reliably decodes 32-bit VLC frames at 800\,Herz, consuming less than 2.8\,mJ, and maintains sleep-mode power below 30\,uW.


[43] 2601.03360

Revisiting Continuous-Time Trajectory Estimation via Gaussian Processes and the Magnus Expansion

Continuous-time state estimation has been shown to be an effective means of (i) handling asynchronous and high-rate measurements, (ii) introducing smoothness to the estimate, (iii) post hoc querying the estimate at times other than those of the measurements, and (iv) addressing certain observability issues related to scanning-while-moving sensors. A popular means of representing the trajectory in continuous time is via a Gaussian process (GP) prior, with the prior's mean and covariance functions generated by a linear time-varying (LTV) stochastic differential equation (SDE) driven by white noise. When the state comprises elements of Lie groups, previous works have resorted to a patchwork of local GPs each with a linear time-invariant SDE kernel, which while effective in practice, lacks theoretical elegance. Here we revisit the full LTV GP approach to continuous-time trajectory estimation, deriving a global GP prior on Lie groups via the Magnus expansion, which offers a more elegant and general solution. We provide a numerical comparison between the two approaches and discuss their relative merits.


[44] 2601.03410

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

Molecular subtyping of PDAC into basal-like and classical has established prognostic and predictive value. However, its use in clinical practice is limited by cost, turnaround time, and tissue requirements, thereby restricting its application in the management of PDAC. We introduce PanSubNet, an interpretable deep learning framework that predicts therapy-relevant molecular subtypes directly from standard H&E-stained WSIs. PanSubNet was developed using data from 1,055 patients across two multi-institutional cohorts (PANCAN, n=846; TCGA, n=209) with paired histology and RNA-seq data. Ground-truth labels were derived using the validated Moffitt 50-gene signature refined by GATA6 expression. The model employs dual-scale architecture that fuses cellular-level morphology with tissue-level architecture, leveraging attention mechanisms for multi-scale representation learning and transparent feature attribution. On internal validation within PANCAN using five-fold cross-validation, PanSubNet achieved mean AUC of 88.5% with balanced sensitivity and specificity. External validation on the independent TCGA cohort without fine-tuning demonstrated robust generalizability (AUC 84.0%). PanSubNet preserved and, in metastatic disease, strengthened prognostic stratification compared to RNA-seq based labels. Prediction uncertainty linked to intermediate transcriptional states, not classification noise. Model predictions are aligned with established transcriptomic programs, differentiation markers, and DNA damage repair signatures. By enabling rapid, cost-effective molecular stratification from routine H&E-stained slides, PanSubNet offers a clinically deployable and interpretable tool for genetic subtyping. We are gathering data from two institutions to validate and assess real-world performance, supporting integration into digital pathology workflows and advancing precision oncology for PDAC.


[45] 2601.03413

Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning

This study highlights the potential of image-based reinforcement learning methods for addressing swarm-related tasks. In multi-agent reinforcement learning, effective policy learning depends on how agents sense, interpret, and process inputs. Traditional approaches often rely on handcrafted feature extraction or raw vector-based representations, which limit the scalability and efficiency of learned policies concerning input order and size. In this work we propose an image-based reinforcement learning method for decentralized control of a multi-agent system, where observations are encoded as structured visual inputs that can be processed by Neural Networks, extracting its spatial features and producing novel decentralized motion control rules. We evaluate our approach on a multi-agent convergence task of agents with limited-range and bearing-only sensing that aim to keep the swarm cohesive during the aggregation. The algorithm's performance is evaluated against two benchmarks: an analytical solution proposed by Bellaiche and Bruckstein, which ensures convergence but progresses slowly, and VariAntNet, a neural network-based framework that converges much faster but shows medium success rates in hard constellations. Our method achieves high convergence, with a pace nearly matching that of VariAntNet. In some scenarios, it serves as the only practical alternative.


[46] 2601.03610

Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures

Respiratory sounds captured via auscultation contain critical clues for diagnosing pulmonary conditions. Automated classification of these sounds faces challenges due to subtle acoustic differences and severe class imbalance in clinical datasets. This study investigates respiratory sound classification with a focus on mitigating pronounced class imbalance. We propose a hybrid deep learning model that combines a Long Short-Term Memory (LSTM) network for sequential feature encoding with a Kolmogorov-Arnold Network (KAN) for classification. The model is integrated with a comprehensive feature extraction pipeline and targeted imbalance mitigation strategies. Experiments were conducted on a public respiratory sound database comprising six classes with a highly skewed distribution. Techniques such as focal loss, class-specific data augmentation, and Synthetic Minority Over-sampling Technique (SMOTE) were employed to enhance minority class recognition. The proposed Hybrid LSTM-KAN model achieves an overall accuracy of 94.6 percent and a macro-averaged F1 score of 0.703, despite the dominant COPD class accounting for over 86 percent of the data. Improved detection performance is observed for minority classes compared to baseline approaches, demonstrating the effectiveness of the proposed architecture for imbalanced respiratory sound classification.


[47] 2601.03612

Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

This monograph introduces a novel approach to polyphonic music generation by addressing the "Missing Middle" problem through structural inductive bias. Focusing on Beethoven's piano sonatas as a case study, we empirically verify the independence of pitch and hand attributes using normalized mutual information (NMI=0.167) and propose the Smart Embedding architecture, achieving a 48.30% reduction in parameters. We provide rigorous mathematical proofs using information theory (negligible loss bounded at 0.153 bits), Rademacher complexity (28.09% tighter generalization bound), and category theory to demonstrate improved stability and generalization. Empirical results show a 9.47% reduction in validation loss, confirmed by SVD analysis and an expert listening study (N=53). This dual theoretical and applied framework bridges gaps in AI music generation, offering verifiable insights for mathematically grounded deep learning.


[48] 2601.03615

Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation

Audio Language Models (ALMs) offer a promising shift towards explainable audio deepfake detections (ADDs), moving beyond \textit{black-box} classifiers by providing some level of transparency into their predictions via reasoning traces. This necessitates a new class of model robustness analysis: robustness of the predictive reasoning under adversarial attacks, which goes beyond existing paradigm that mainly focuses on the shifts of the final predictions (e.g., fake v.s. real). To analyze such reasoning shifts, we introduce a forensic auditing framework to evaluate the robustness of ALMs' reasoning under adversarial attacks in three inter-connected dimensions: acoustic perception, cognitive coherence, and cognitive dissonance. Our systematic analysis reveals that explicit reasoning does not universally enhance robustness. Instead, we observe a bifurcation: for models exhibiting robust acoustic perception, reasoning acts as a defensive \textit{``shield''}, protecting them from adversarial attacks. However, for others, it imposes a performance \textit{``tax''}, particularly under linguistic attacks which reduce cognitive coherence and increase attack success rate. Crucially, even when classification fails, high cognitive dissonance can serve as a \textit{silent alarm}, flagging potential manipulation. Overall, this work provides a critical evaluation of the role of reasoning in forensic audio deepfake analysis and its vulnerabilities.


[49] 2601.03718

Towards Real-world Lens Active Alignment with Unlabeled Data via Domain Adaptation

Active Alignment (AA) is a key technology for the large-scale automated assembly of high-precision optical systems. Compared with labor-intensive per-model on-device calibration, a digital-twin pipeline built on optical simulation offers a substantial advantage in generating large-scale labeled data. However, complex imaging conditions induce a domain gap between simulation and real-world images, limiting the generalization of simulation-trained models. To address this, we propose augmenting a simulation baseline with minimal unlabeled real-world images captured at random misalignment positions, mitigating the gap from a domain adaptation perspective. We introduce Domain Adaptive Active Alignment (DA3), which utilizes an autoregressive domain transformation generator and an adversarial-based feature alignment strategy to distill real-world domain information via self-supervised learning. This enables the extraction of domain-invariant image degradation features to facilitate robust misalignment prediction. Experiments on two lens types reveal that DA3 improves accuracy by 46% over a purely simulation pipeline. Notably, it approaches the performance achieved with precisely labeled real-world data collected on 3 lens samples, while reducing on-device data collection time by 98.7%. The results demonstrate that domain adaptation effectively endows simulation-trained models with robust real-world performance, validating the digital-twin pipeline as a practical solution to significantly enhance the efficiency of large-scale optical assembly.


[50] 2601.03777

Multi-agent Optimization of Non-cooperative Multimodal Mobility Systems

While multimodal mobility systems have the potential to bring many benefits to travelers, drivers, the environment, and traffic congestion, such systems typically involve multiple non-cooperative decision-makers who may selfishly optimize their own objectives without considering the overall system benefits. This paper aims to investigate market-based interactions of travelers and ride-sourcing drivers in the context of multimodal mobility systems. We propose a unified mathematical modeling framework to capture the decentralized travelers and drivers' decision-making process and balance the network's demand and supply by equilibrium pricing. Such a model allows analyses of the impact of decentralized decision-making on multimodal mobility efficiencies. The proposed formulation can be further convexified to efficiently compute the equilibrium ride-sourcing prices. We conduct numerical experiments on different settings of transportation networks to gain policy insights. We find that travelers prefer ride-sourcing and multimodal transportation more than the driving option when they are more sensitive to prices. We also find that travelers may need to be subsidized to use multimodal transportation when there is fewer transit hubs in the network or, ride-sourcing drivers become too sensitive to the prices. However, we find that more transit hubs in the network increases the total empty VMT of ride-sourcing drivers by increasing the total relocation time. The proposed model can be used by policymakers and platform operators to design pricing and subsidy schemes that align individual decision-making with system-level efficiency and evaluate the trade-offs between accessibility and environmental impacts in multimodal transportation networks.


[51] 2601.03827

Objective comparison of auditory profiles using manifold learning and intrinsic measures

Assigning individuals with hearing impairment to auditory profiles can support a better understanding of the causes and consequences of hearing loss and facilitate profile-based hearing-aid fitting. However, the factors influencing auditory profile generation remain insufficiently understood, and existing profiling frameworks have rarely been compared systematically. This study therefore investigated the impact of two key factors - the clustering method and the number of profiles - on auditory profile generation. In addition, eight established auditory profiling frameworks were systematically reviewed and compared using intrinsic statistical measures and manifold learning techniques. Frameworks were evaluated with respect to internal consistency (i.e., grouping similar individuals) and cluster separation (i.e., clear differentiation between groups). To ensure comparability, all analyses were conducted on a common open-access dataset, the extended Oldenburg Hearing Health Record (OHHR), comprising 1,127 participants (mean age = 67.2 years, SD = 12.0). Results showed that both the clustering method and the chosen number of profiles substantially influenced the resulting auditory profiles. Among purely audiogram-based approaches, the Bisgaard auditory profiles demonstrated the strongest clustering performance, whereas audiometric phenotypes performed worst. Among frameworks incorporating supra-threshold information in addition to the audiogram, the Hearing4All auditory profiles were advantageous, combining a near-optimal number of profile classes (N = 13) with high clustering quality, as indicated by a low Davies-Bouldin index. In conclusion, manifold learning and intrinsic measures enable systematic comparison of auditory profiling frameworks and identify the Hearing4All auditory profile as a promising approach for future research.


[52] 2601.03831

Low-Complexity Planar Beyond-Diagonal RIS Architecture Design Using Graph Theory

Reconfigurable intelligent surfaces (RISs) enable programmable control of the wireless propagation environment and are key enablers for future networks. Beyond-diagonal RIS (BD-RIS) architectures enhance conventional RIS by interconnecting elements through tunable impedance components, offering greater flexibility with higher circuit complexity. However, excessive interconnections between BD-RIS elements require multi-layer printed circuit board (PCB) designs, increasing fabrication difficulty. In this letter, we use graph theory to characterize the BD-RIS architectures that can be realized on double-layer PCBs, denoted as planar-connected RISs. Among the possible planar-connected RISs, we identify the ones with the most degrees of freedom, expected to achieve the best performance under practical constraints.


[53] 2601.03971

Posterior error bounds for prior-driven balancing in linear Gaussian inverse problems

In large-scale Bayesian inverse problems, it is often necessary to apply approximate forward models to reduce the cost of forward model evaluations, while controlling approximation quality. In the context of Bayesian inverse problems with linear forward models, Gaussian priors, and Gaussian noise, we use perturbation theory for inverses to bound the error in the approximate posterior mean and posterior covariance resulting from a linear approximate forward model. We then focus on the smoothing problem of inferring the initial condition of linear time-invariant dynamical systems, using finitely many partial state observations. For such problems, and for a specific model order reduction method based on balanced truncation, we show that the impulse response of a certain prior-driven system is closely related to the prior-preconditioned Hessian of the inverse problem. This reveals a novel connection between systems theory and inverse problems. We exploit this connection to prove the first a priori error bounds for system-theoretic model order reduction methods applied to smoothing problems. The bounds control the approximation error of the posterior mean and covariance in terms of the truncated Hankel singular values of the underlying system.


[54] 2601.03976

On-Device Deep Reinforcement Learning for Decentralized Task Offloading Performance trade-offs in the training process

Allowing less capable devices to offload computational tasks to more powerful devices or servers enables the development of new applications that may not run correctly on the device itself. Deciding where and why to run each of those applications is a complex task. Therefore, different approaches have been adopted to make offloading decisions. In this work, we propose a decentralized Deep Reinforcement Learning (DRL) agent to address the selection of computing locations. Unlike most existing work, we analyze it in a real testbed composed of various edge devices running the agent to determine where to execute each task. These devices are connected to a Multi-Access Edge Computing (MEC) server and a Cloud server through 5G communications. We evaluate not only the agent's performance in meeting task requirements but also the implications of running this type of agent locally, assessing the trade-offs of training locally versus remotely in terms of latency and energy consumption.


[55] 2601.04005

Padé Neurons for Efficient Neural Models

Neural networks commonly employ the McCulloch-Pitts neuron model, which is a linear model followed by a point-wise non-linear activation. Various researchers have already advanced inherently non-linear neuron models, such as quadratic neurons, generalized operational neurons, generative neurons, and super neurons, which offer stronger non-linearity compared to point-wise activation functions. In this paper, we introduce a novel and better non-linear neuron model called Padé neurons (Paons), inspired by Padé approximants. Paons offer several advantages, such as diversity of non-linearity, since each Paon learns a different non-linear function of its inputs, and layer efficiency, since Paons provide stronger non-linearity in much fewer layers compared to piecewise linear approximation. Furthermore, Paons include all previously proposed neuron models as special cases, thus any neuron model in any network can be replaced by Paons. We note that there has been a proposal to employ the Padé approximation as a generalized point-wise activation function, which is fundamentally different from our model. To validate the efficacy of Paons, in our experiments, we replace classic neurons in some well-known neural image super-resolution, compression, and classification models based on the ResNet architecture with Paons. Our comprehensive experimental results and analyses demonstrate that neural models built by Paons provide better or equal performance than their classic counterparts with a smaller number of layers. The PyTorch implementation code for Paon is open-sourced at this https URL.


[56] 2601.04011

Flexible-Duplex Cell-Free Architecture for Secure Uplink Communications in Low-Altitude Wireless Networks

Low-altitude wireless networks (LAWNs) are expected to play a central role in future 6G infrastructures, yet uplink transmissions of uncrewed aerial vehicles (UAVs) remain vulnerable to eavesdropping due to their limited transmit power, constrained antenna resources, and highly exposed air-ground propagation conditions. To address this fundamental bottleneck, we propose a flexible-duplex cell-free (CF) architecture in which each distributed access point (AP) can dynamically operate either as a receive AP for UAV uplink collection or as a transmit AP that generates cooperative artificial noise (AN) for secrecy enhancement. Such AP-level duplex flexibility introduces an additional spatial degree of freedom that enables distributed and adaptive protection against wiretapping in LAWNs. Building upon this architecture, we formulate a max-min secrecy-rate problem that jointly optimizes AP mode selection, receive combining, and AN covariance design. This tightly coupled and nonconvex optimization is tackled by first deriving the optimal receive combiners in closed form, followed by developing a penalty dual decomposition (PDD) algorithm with guaranteed convergence to a stationary solution. To further reduce computational burden, we propose a low-complexity sequential scheme that determines AP modes via a heuristic metric and then updates the AN covariance matrices through closed-form iterations embedded in the PDD framework. Simulation results show that the proposed flexible-duplex architecture yields substantial secrecy-rate gains over CF systems with fixed AP roles. The joint optimization method attains the highest secrecy performance, while the low-complexity approach achieves over 90% of the optimal performance with an order-of-magnitude lower computational complexity, offering a practical solution for secure uplink communications in LAWNs.


[57] 2601.04111

Stigmergic optimal transport

Efficient navigation in swarms often relies on the emergence of decentralized approaches that minimize traversal time or energy. Stigmergy, where agents modify a shared environment that then modifies their behavior, is a classic mechanism that can encode this strategy. We develop a theoretical framework for stigmergic transport by casting it as a stochastic optimal control problem: agents (collectively) lay and (individually) follow trails while minimizing expected traversal time. Simulations and analysis reveal two emergent behaviors: path straightening in homogeneous environments and path refraction at material interfaces, both consistent with experimental observations of insect trails. While reminiscent of Fermat's principle, our results show how local, noisy agent+field interactions can give rise to geodesic trajectories in heterogeneous environments, without centralized coordination or global knowledge, relying instead on an embodied slow fast dynamical mechanism.


[58] 2601.04166

Expectation Propagation for Distributed Inference in Grant-Free Cell-Free Massive MIMO

Grant-free cell-free massive multiple-input multiple-output (GF-CF-MaMIMO) systems are anticipated to be a key enabling technology for next-generation Internet-of-Things (IoT) networks, as they support massive connectivity without explicit scheduling. However, the large amount of connected devices prevents the use of orthogonal pilot sequences, resulting in severe pilot contamination (PC) that degrades channel estimation and data detection performance. Furthermore, scalable GF-CF-MaMIMO networks inherently rely on distributed signal processing. In this work, we consider the uplink of a GF-CF-MaMIMO system and propose two novel distributed algorithms for joint activity detection, channel estimation, and data detection (JACD) based on expectation propagation (EP). The first algorithm, denoted as JACD-EP, uses Gaussian approximations for the channel variables, whereas the second, referred to as JACD-EP-BG, models them as Bernoulli-Gaussian (BG) random variables. To integrate the BG distribution into the EP framework, we derive its exponential family representation and develop the two algorithms as efficient message passing over a factor graph constructed from the a posteriori probability (APP) distribution. The proposed framework is inherently scalable with respect to both the number of access points (APs) and user equipments (UEs). Simulation results show the efficient mitigation of PC by the proposed distributed algorithms and their superior detection accuracy compared to (genie-aided) centralized linear detectors.


[59] 2601.04177

Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation

Emergency vehicles require rapid passage through congested traffic, yet existing strategies fail to adapt to dynamic conditions. We propose a novel hierarchical graph neural network (GNN)-based multi-agent reinforcement learning framework to coordinate connected vehicles for emergency corridor formation. Our approach uses a high-level planner for global strategy and low-level controllers for trajectory execution, utilizing graph attention networks to scale with variable agent counts. Trained via Multi-Agent Proximal Policy Optimization (MAPPO), the system reduces emergency vehicle travel time by 28.3% compared to baselines and 44.6% compared to uncoordinated traffic in simulations. The design achieves near-zero collision rates (0.3%) while maintaining 81% of background traffic efficiency. Ablation and generalization studies confirm the framework's robustness across diverse scenarios. These results demonstrate the effectiveness of combining GNNs with hierarchical learning for intelligent transportation systems.


[60] 2408.01193

On Game based Distributed Approach for General Multi-agent Optimal Coverage with Application to UAV Networks

This paper focuses on the optimal coverage problem (OCP) for multi-agent systems with a decentralized optimization mechanism. A game based distributed decision-making method for the multi-agent OCP is proposed to address the high computational costs arising from the large scale of the multi-agent system and to ensure that the game's equilibrium achieves the global performance objective's maximum value. In particular, a distributed algorithm that needs only local information is developed and proved to converge to near-optimal global coverage. Finally, the proposed method is applied to maximize the coverage area of the UAV network for a target region. The simulation results show that our method can require much less computational time than other typical distributed algorithms in related work, while achieving a faster convergence rate. Comparison with centralized optimization also demonstrates that the proposed method has approximate optimization results and high computation efficiency.


[61] 2410.18757

Short-time Fourier Transform-based Signal Recovery for Modulo Analog-to-Digital Converters

This study introduces a short-time Fourier transform-based method for reconstructing signals encoded using modulo analog-to-digital converters with 1-bit folding information. In contrast to existing Fourier-based reconstruction approaches that require complete access to the entire observation, the proposed technique performs reconstruction over short, overlapping segments, enabling significantly lower latency while preserving the recovery accuracy. We also address the spectral leakage introduced by the windowing operation by selecting window parameters that balance the leakage suppression and the computational complexity of the algorithm. In addition, we establish conditions under which the correct unfolding of the modulo samples is guaranteed, leading to a reconstruction error determined solely by the quantization noise at the output. The numerical results demonstrate that the proposed method enables modulo analog-to-digital converters to surpass the mean squared error performance of conventional analog-to-digital converters. Furthermore, the proposed recovery method offers improved reconstruction performance compared with higher-order difference-based recovery, particularly in low-resolution and low-sampling rate regimes.


[62] 2412.05103

Integrating Semantic Communication and Human Decision-Making into an End-to-End Sensing-Decision Framework

As early as 1949, Weaver defined communication in a very broad sense to include all procedures by which one mind or technical system can influence another, thus establishing the idea of semantic communication. With the recent success of machine learning in expert assistance systems where sensed information is wirelessly provided to a human to assist task execution, the need to design effective and efficient communications has become increasingly apparent. In particular, semantic communication aims to convey the meaning behind the sensed information relevant for Human Decision-Making (HDM). Regarding the interplay between semantic communication and HDM, many questions remain, such as how to model the entire end-to-end sensing-decision-making process, how to design semantic communication for the HDM and which information should be provided for HDM. To address these questions, we propose to integrate semantic communication and HDM into one probabilistic end-to-end sensing-decision framework that bridges communications and psychology. In our interdisciplinary framework, we model the human through a HDM process, allowing us to explore how feature extraction from semantic communication can best support HDM both in theory and in simulations. In this sense, our study reveals the fundamental design trade-off between maximizing the relevant semantic information and matching the cognitive capabilities of the HDM model. Our initial analysis shows how semantic communication can balance the level of detail with human cognitive capabilities while demanding less bandwidth, power, and latency.


[63] 2501.13339

Joint Beamforming and Position Optimization for Fluid RIS-aided ISAC Systems

A fluid reconfigurable intelligent surface (fRIS)-aided integrated sensing and communication (ISAC) system is proposed to enhance multi-target sensing and multi-user communication. Unlike the conventional RIS, the fRIS employs movable elements with adjustable positions, offering additional spatial degrees of freedom. In this system, a joint optimization problem is formulated to minimize sensing beampattern mismatch and symbol estimation error. An algorithm based on alternating minimization is devised to handle the resultant non-convex problem, where the subproblems are solved via augmented Lagrangian method, quadratic programming, semidefinite relaxation, and majorization-minimization. A key challenge is that the element positions affect both incident and reflective channels, leading to the high-order composite objective functions. As a remedy, the high-order terms are transformed into linear and linear-difference forms by exploiting the structural characteristics of fRIS and the channels. Numerical results demonstrate the superiority of the proposed scheme over conventional RIS-aided ISAC and other benchmarks.


[64] 2504.09912

Parameter Convergence Radar Detector Based on VAMP Deep Unfolding

Compared with the sparse recovery process in traditional compressed sensing (CS) radar detector CAMP, vector AMP deep unfolding (VAMP-DU) can achieve sparse recovery over a broader range of observation matrices, with faster convergence speed and higher recovery accuracy. However, the distribution of the error term in VAMP-DU remains unknown, which renders the distribution of the test statistic in CS radar detection undetermined and thus hinders threshold setting under a given false alarm rate when VAMP-DU is applied to CS radar detection. In this work, we theoretically prove that the error term in VAMP-DU follows a Gaussian distribution by leveraging a general state evolution (SE). Based on the Gaussianity, we propose a new parameter convergence radar detector (PCRD) as the CS detector to calculate the distribution parameter of the test statistic and realize target detection under a given false alarm rate. Specifically, PCRD exploits the Gaussian property of error term in VAMP-DU to exhibit superior false alarm control capability, while leveraging the improved recovery accuracy of VAMP-DU to further enhance target detection performance. Numerical simulations validate the Gaussianity of the error term in VAMP-DU and show the superiority of the VAMP-DU-based PCRD over existing approaches in both false alarm control accuracy and target detection performance.


[65] 2505.23210

Latent Representations for Control Design with Provable Stability and Safety Guarantees

We initiate a formal study on the use of low-dimensional latent representations of dynamical systems for verifiable control synthesis. Our main goal is to enable the application of verification techniques -- such as Lyapunov or barrier functions -- that might otherwise be computationally prohibitive when applied directly to the full state representation. Towards this goal, we first provide dynamics-aware approximate conjugacy conditions which formalize the notion of reconstruction error necessary for systems analysis. We then utilize our conjugacy conditions to transfer the stability and invariance guarantees of a latent certificate function (e.g., a Lyapunov or barrier function) for a latent space controller back to the original system. Importantly, our analysis contains several important implications for learning latent spaces and dynamics, by highlighting the necessary geometric properties which need to be preserved by the latent space, in addition to providing concrete loss functions for dynamics reconstruction that are directly related to control design. We conclude by demonstrating the applicability of our theory to two case studies: (1) stabilization of a cartpole system, and (2) collision avoidance for a two vehicle system.


[66] 2506.07770

Channel Estimation for RIS-Assisted mmWave Systems via Diffusion Models

Reconfigurable intelligent surface (RIS) has been recognized as a promising technology for next-generation wireless communications. However, the performance of RIS-assisted systems critically depends on accurate channel state information (CSI). To address this challenge, this letter proposes a novel channel estimation method for RIS-aided millimeter-wave (mmWave) systems based on diffusion models (DMs). Specifically, the forward diffusion process of the original signal is formulated to model the received signal as a noisy observation within the framework of DMs. Subsequently, the channel estimation task is formulated as the reverse diffusion process, and a sampling algorithm based on denoising diffusion implicit models (DDIMs) is developed to enable effective inference. Furthermore, a lightweight neural network, termed BRCNet, is introduced to replace the conventional U-Net, significantly reducing the number of parameters and computational complexity. Extensive experiments conducted under various scenarios demonstrate that the proposed method consistently outperforms existing baselines.


[67] 2508.01776

Statistical Multiport-Network Modeling and Efficient Discrete Optimization of RIS

This Letter addresses the physics-consistent optimization of reconfigurable intelligent surfaces (RISs) with mutual coupling (MC) and 1-bit-programmable RIS elements. This combination of constraints is typical of current prototypes but unexplored in theoretical work. First, we present a simple statistical generator for multiport-network-theory (MNT) parameters of rich-scattering, RIS-parametrized channels. We account for reciprocity, passivity, and coherent backscattering; then, we add a simple hyper-parameter to control the MC strength. Second, we benchmark model-agnostic (dictionary search, coordinate descent, genetic algorithm) and model-based (temperature-annealed back-propagation) strategies under varying MC, with and without intelligent initialization. Except when MC is negligible, coordinate descent with random initialization offers the best trade-off in performance, runtime, and memory. Our insights can guide wireless practitioners who optimize RIS prototypes and other reconfigurable wave systems.


[68] 2509.01217

Learn2Reg 2024: New Benchmark Datasets Driving Progress on New Challenges

Medical image registration is critical for clinical applications, and fair benchmarking of different methods is essential for monitoring ongoing progress in the field. To date, the Learn2Reg 2020-2023 challenges have released several complementary datasets and established metrics for evaluations. Building on this foundation, the 2024 edition expands the challenge's scope to cover a wider range of registration scenarios, particularly in terms of modality diversity and task complexity, by introducing three new tasks, including large-scale multi-modal registration and unsupervised inter-subject brain registration, as well as the first microscopy-focused benchmark within Learn2Reg. The new datasets also inspired new method developments, including invertibility constraints, pyramid features, keypoints alignment and instance optimisation. Visit Learn2Reg at this https URL.


[69] 2510.03043

Economic zone data-enabled predictive control for connected open water systems

The real-time operation of open water systems is essential for ensuring operational safety, satisfying operational requirements, and optimizing energy usage. However, existing rule-based control strategies rely heavily on human experience, while model-based approaches depend on accurate hydrodynamic models, which limit their applicability to water systems with complex dynamics and uncertain disturbances. In this work, we develop a fully data-driven, zone-based control framework with adaptive control target zone selection for safe and energy-efficient operation of connected open water systems. Specifically, we propose a mixed-integer economic zone data-enabled predictive control (DeePC) approach that aims to maintain the water levels of the branches within the desired water-level zone while reducing real-time operational energy consumption. The DeePC-based approach enables direct use of input-output data for predictive control, eliminating the need for explicit dynamic modeling. To handle multiple control objectives with different priorities, we employ lexicographic optimization and reformulate the traditional DeePC cost function to incorporate zone tracking and energy consumption minimization objectives. Additionally, Bayesian optimization is utilized to determine the control target zone, which enables an effective trade-off between zone tracking and energy consumption in the presence of external disturbances. Comprehensive simulations and comparative analyses demonstrate the effectiveness of the proposed method. The proposed method maintains water levels within the desired water-level zone for 97.04% of the operating time, with an average energy consumption of 33.5 kWh per 0.5 hour. Compared to rule-based control method, the proposed method lowers zone-violation frequency by 74.96% and the average energy consumption by 22.44%.


[70] 2510.04924

Steady-State Spread Bounds for Graph Diffusion via Laplacian Regularisation in Networked Systems

We study how far a diffusion process on a graph can deviate from a designed starting pattern when the pattern is generated via Laplacian regularisation. Under standard stability conditions for undirected, entrywise nonnegative graphs, we give a closed-form, instance-specific upper bound on the steady-state spread, measured as the relative change between the final and initial profiles. The bound separates two effects: (i) an irreducible term determined by the graph's maximum node degree, and (ii) a design-controlled term that shrinks as the regularisation strength increases (with an inverse square-root law). This leads to a design rule: given any target limit on spread, one can choose a sufficient regularisation strength in closed form. Although one motivating application is array beamforming -- where the initial pattern is the squared magnitude of the beamformer weights -- the result applies to any scenario that first enforces Laplacian smoothness and then evolves by linear diffusion on a graph. Overall, the guarantee is non-asymptotic, easy to compute, and certifies the maximum steady-state deviation.


[71] 2512.21364

Adaptive Real-Time Scheduling Algorithms for Embedded Systems

Embedded systems are becoming more in demand to work in dynamic and uncertain environments, and being confined to the strong requirements of real-time. Conventional static scheduling models usually cannot cope with runtime modification in workload, resource availability, or system updates. This brief survey covers the area of feedback-based control (e.g., Feedback Control Scheduling) and interdependence between tasks (e.g., Symbiotic Scheduling of Periodic Tasks) models. It also borders on predictive methods and power management, combining methods based on Dynamic Voltage and Frequency Scaling (DVFS). In this paper, key mechanisms are briefly summarized, influencing trade-offs relating to adaptivity/predictability, typical metrics of evaluation, and ongoing problems, especially in situations where safety is a critical factor, giving a succinct and easy-to-understand introduction to researchers and practitioners who have to cope with the changing environment of adaptive real-time systems.


[72] 2512.22686

Multistatic Radar Performance in the Presence of Distributed Wireless Synchronization

This paper proposes a multistatic radar (MSR) system utilizing a distributed wireless synchronization protocol. The wireless synchronization protocol uses a two-tone waveform exchange for frequency synchronization and a bi-directional waveform exchange for time synchronization, independent of GPS. A Bayesian Cramer-Rao lower bound (BCRLB) framework is developed to quantify the impact of synchronization offsets on joint delay and Doppler estimation, and consequently, on target localization and velocity estimation accuracy. Simulation results derived from the analytical expressions establish the extent to which the residual synchronization offsets degrade the MSR's performance. The performance of the synchronization links primarily depends on the synchronization-link channel and transmit parameters; optimizing these parameters enables the MSR configuration to surpass the monostatic performance and approach the ideal case. Furthermore, the simulated synchronization-link parameters suggest that practical implementation is feasible.


[73] 2512.24583

Resource Allocation via Backscatter-Aware Transmit Antenna Selection for Low-PAPR and Ultra-Reliable WSNs

This paper addresses a fundamental physical layer conflict in hybrid Wireless Sensor Networks (WSNs) between high-throughput primary communication and the stringent power envelope requirements of passive backscatter sensors. We propose a Backscatter-Constrained Transmit Antenna Selection (BC-TAS) framework, a per-subcarrier selection strategy for multi-antenna illuminators operating within a Multi-Dimensional Orthogonal Frequency Division Multiplexing (MD-OFDM) architecture. Unlike conventional signal-to-noise ratio (SNR) centric selection schemes, BC-TAS employs a multi-objective cost function that jointly maximizes desired link reliability, stabilizes the incident RF energy envelope at passive Surface Acoustic Wave (SAW) sensors, and suppresses interference toward coexisting victim receivers. By exploiting the inherent sparsity of MD-OFDM, the proposed framework enables dual-envelope regulation, simultaneously reducing the transmitter Peak-to-Average Power Ratio (PAPR) and the Backscatter Crest Factor (BCF) observed at the tag. To enhance robustness under imperfect Channel State Information (CSI), a Kalman-based channel smoothing mechanism is incorporated to maintain selection stability in low-SNR regimes. Numerical results using IEEE 802.11be dispersive channel models and a nonlinear Rapp power amplifier demonstrate that BC-TAS achieves orders-of-magnitude improvement in outage probability and significant gains in energy efficiency compared to conventional MU-MIMO baselines, while ensuring spectral mask compliance under reduced power amplifier back-off. These results establish BC-TAS as an effective illuminator-side control mechanism for enabling reliable and energy-stable sensing and communication coexistence in dense, power-constrained wireless environments.


[74] 2601.01410

Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems

Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics mask this operational asymmetry. We introduce a grid-specific evaluation framework (Asymmetric MAPE, Under-Prediction Rate, and Reserve Margin) that directly measures operational risk rather than statistical accuracy alone. Using this framework, we conduct a systematic evaluation of Mamba-based State Space Models for California grid forecasting on a weather-aligned CA ISO-TAC dataset spanning Nov 2023 to Nov 2025 (84,498 hourly records across 5 transmission areas). Our analysis reveals that standard accuracy metrics are poor proxies for operational safety: models with identical MAPE can require vastly different reserve margins. We demonstrate that forecast errors are weakly but statistically significantly associated with temperature (r = 0.16), motivating weather-aware modeling rather than loss function modification alone. The S-Mamba model achieves the lowest 99.5th-percentile reserve margin (14.12 percent) compared to 16.66 percent for iTransformer, demonstrating superior forecast reliability under a 99.5th-percentile tail-risk reserve proxy.


[75] 2601.02827

AI-Native 6G Physical Layer with Cross-Module Optimization and Cooperative Control Agents

In this article, a framework of AI-native cross-module optimized physical layer with cooperative control agents is proposed, which involves optimization across global AI/ML modules of the physical layer with innovative design of multiple enhancement mechanisms and control strategies. Specifically, it achieves simultaneous optimization across global modules of uplink AI/ML-based joint source-channel coding with modulation, and downlink AI/ML-based modulation with precoding and corresponding data detection, reducing traditional inter-module information barriers to facilitate end-to-end optimization toward global objectives. Moreover, multiple enhancement mechanisms are also proposed, including i) an AI/ML-based cross-layer modulation approach with theoretical analysis for downlink transmission that breaks the isolation of inter-layer features to expand the solution space for determining improved constellation, ii) a utility-oriented precoder construction method that shifts the role of the AI/ML-based CSI feedback decoder from recovering the original CSI to directly generating precoding matrices aiming to improve end-to-end performance, and iii) incorporating modulation into AI/ML-based CSI feedback to bypass bit-level bottlenecks that introduce quantization errors, non-differentiable gradients, and limitations in constellation solution spaces. Furthermore, AI/ML based control agents for optimized transmission schemes are proposed that leverage AI/ML to perform model switching according to channel state, thereby enabling integrated control for global throughput optimization. Finally, simulation results demonstrate the superiority of the proposed solutions in terms of BLER and throughput. These extensive simulations employ more practical assumptions that are aligned with the requirements of the 3GPP, which hopefully provides valuable insights for future standardization discussions.


[76] 2601.03007

From inconsistency to decision: explainable operation and maintenance of battery energy storage systems

Battery Energy Storage Systems (BESSs) are increasingly critical to power-system stability, yet their operation and maintenance remain dominated by reactive, expert-dependent diagnostics. While cell-level inconsistencies provide early warning signals of degradation and safety risks, the lack of scalable and interpretable decision-support frameworks prevents these signals from being effectively translated into operational actions. Here we introduce an inconsistency-driven operation and maintenance paradigm for large-scale BESSs that systematically transforms routine monitoring data into explainable, decision-oriented guidance. The proposed framework integrates multi-dimensional inconsistency evaluation with large language model-based semantic reasoning to bridge the gap between quantitative diagnostics and practical maintenance decisions. Using eight months of field data from an in-service battery system comprising 3,564 cells, we demonstrate how electrical, thermal, and aging-related inconsistencies can be distilled into structured operational records and converted into actionable maintenance insights through a multi-agent framework. The proposed approach enables accurate and explainable responses to real-world operation and maintenance queries, reducing response time and operational cost by over 80% compared with conventional expert-driven practices. These results establish a scalable pathway for intelligent operation and maintenance of battery energy storage systems, with direct implications for reliability, safety, and cost-effective integration of energy storage into modern power systems.


[77] 2412.18342

Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization

Open-Set Domain Generalization (OSDG) is a challenging task requiring models to accurately predict familiar categories while minimizing confidence for unknown categories to effectively reject them in unseen domains. While the OSDG field has seen considerable advancements, the impact of label noise--a common issue in real-world datasets--has been largely overlooked. Label noise can mislead model optimization, thereby exacerbating the challenges of open-set recognition in novel domains. In this study, we take the first step towards addressing Open-Set Domain Generalization under Noisy Labels (OSDG-NL) by constructing dedicated benchmarks derived from widely used OSDG datasets, including PACS and DigitsDG. We evaluate baseline approaches by integrating techniques from both label denoising and OSDG methodologies, highlighting the limitations of existing strategies in handling label noise effectively. To address these limitations, we propose HyProMeta, a novel framework that integrates hyperbolic category prototypes for label noise-aware meta-learning alongside a learnable new-category agnostic prompt designed to enhance generalization to unseen classes. Our extensive experiments demonstrate the superior performance of HyProMeta compared to state-of-the-art methods across the newly established benchmarks. The source code of this work is released at this https URL.


[78] 2501.12279

Spatial exponential decay of perturbations in optimal control of general evolution equations

We analyze the robustness of optimally controlled evolution equations with respect to spatially localized perturbations. We prove that if the involved operators are domain-uniformly stabilizable and detectable, then these localized perturbations only have a local effect on the optimal solution. We characterize this domain-uniform stabilizability and detectability for the transport equation with constant transport velocity, showing that even for unitary semigroups, optimality implies exponential damping. We extend this result to the case of a space-dependent transport velocity. Finally we leverage the results for the transport equation to characterize domain-uniform stabilizability of the wave equation. Numerical examples in one space dimension complement the theoretical results.


[79] 2502.20382

Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. Project website: this https URL.


[80] 2503.22928

Optimal Control of an Epidemic with Intervention Design

This paper investigates the optimal control of an epidemic governed by a SEIR model with operational delays in vaccination and non pharmaceutical interventions. We address the mathematical challenge of imposing hard healthcare capacity constraints (e.g., ICU limits) over an infinite time horizon. To rigorously bridge the gap between theoretical constraints and numerical tractability, we employ a variational framework based on Moreau--Yosida regularization and establish the connection between finite- and infinite-horizon solutions via $\Gamma$-convergence. The necessary conditions for optimality are derived using the Pontryagin Maximum Principle, allowing for the characterization of singular regimes where the optimal strategy maintains the infection level precisely at the capacity boundary. Numerical simulations illustrate these theoretical findings, quantifying the shadow prices of infection and costs associated with intervention delays.


[81] 2504.18882

SPD Matrix Learning for Neuroimaging Analysis: Perspectives, Methods, and Challenges

Neuroimaging provides essential tools for characterizing brain activity by quantifying connectivity strength between remote regions, using different modalities that capture different aspects of connectivity. Yet, decoding meaningful neural signatures must contend with modality-specific challenges, including measurement noise, spatial and temporal distortions, heterogeneous acquisition protocols, and limited sample sizes. A unifying perspective emerges when these data are expressed through symmetric positive definite (SPD)-valued representations: across neuroimaging modalities, SPD-valued representations naturally give rise to SPD matrices that capture dependencies between sensors or brain regions. Endowing the SPD space with Riemannian metrics equips it with a non-Euclidean geometric structure, enabling principled statistical modeling and machine learning on the resulting manifold. This review consolidates machine learning methodologies that operate on the SPD manifold under a unified framework termed SPD matrix learning. SPD matrix learning brings conceptual clarity across multiple modalities, establishes continuity with decades of geometric statistics in neuroimaging, and positions SPD modeling as a methodological bridge between classical analysis and emerging AI-driven paradigms. We show that (i) modeling on the SPD manifold is mathematically natural and numerically stable, preserving symmetry and positive definiteness while avoiding degeneracies inherent to Euclidean embeddings; (ii) SPD matrix learning extends a broad family of established geometric statistical tools used across neuroimaging; and (iii) SPD matrix learning integrates new-generation AI technologies, driving a new class of neuroimaging problems that were previously out of reach. Taken together, SPD matrix learning offers a principled and forward-looking framework for next-generation neuroimaging analytics.


[82] 2507.09342

BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus

There is a major shortage of Speech-to-Speech Translation (S2ST) datasets for high resource-to-low resource language pairs such as English-to-Yoruba. Thus, in this study, we curated the Bilingual English-to-Yoruba Speech-to-Speech Translation Corpus Version 1 (BENYO-S2ST-Corpus-1). The corpus is based on a hybrid architecture we developed for large-scale direct S2ST corpus creation at reduced cost. To achieve this, we leveraged non speech-to-speech Standard Yoruba (SY) real-time audios and transcripts in the YORULECT Corpus as well as the corresponding Standard English (SE) transcripts. YORULECT Corpus is small scale(1,504) samples, and it does not have paired English audios. Therefore, we generated the SE audios using pre-trained AI models (i.e. Facebook MMS). We also developed an audio augmentation algorithm named AcoustAug based on three latent acoustic features to generate augmented audios from the raw audios of the two languages. BENYO-S2ST-Corpus-1 has 12,032 audio samples per language, which gives a total of 24,064 sample size. The total audio duration for the two languages is 41.20 hours. This size is quite significant. Beyond building S2ST models, BENYO-S2ST-Corpus-1 can be used to build pretrained models or improve existing ones. The created corpus and Coqui framework were used to build a pretrained Yoruba TTS model (named YoruTTS-1.5) as a proof of concept. The YoruTTS-1.5 gave a F0 RMSE value of 63.54 after 1,000 epochs, which indicates moderate fundamental pitch similarity with the reference real-time audio. Ultimately, the corpus architecture in this study can be leveraged by researchers and developers to curate datasets for multilingual high-resource-to-low-resource African languages. This will bridge the huge digital divides in translations among high and low-resource language pairs. BENYO-S2ST-Corpus-1 and YoruTTS-1.5 are publicly available at (this https URL).


[83] 2508.11100

Full-Wave Modeling of Transcranial Ultrasound using Volume-Surface Integral Equations and CT-Derived Heterogeneous Skull Data

Transcranial ultrasound therapy uses focused acoustic energy to induce therapeutic bioeffects in the brain. Ultrasound must be transmitted through the skull, which is highly attenuating and heterogeneous, causing beam distortion, reducing focal pressure, and shifting the target location. Computational models are frequently used to predict beam aberration, assess cranial heating, and correct the phase of ultrasound transducers. These models often rely on computed tomography (CT) images to build patient-specific geometries and estimate skull acoustic properties. However, the coarse voxel resolution of CT limits accuracy for differential equation solvers at ultrasound frequencies. This paper presents an efficient numerical method based on volume-surface integral equations to model full-wave acoustic propagation through heterogeneous skull bone. We show that our approach effectively simulates transcranial ultrasound, even when using the original CT voxels as the computational mesh, where the 0.5 mm voxel length is relatively coarse compared to the shortest wavelength of 3 mm. The method is validated against a high-resolution boundary element model using an averaged skull representation. Simulations using a CT-based skull model and a bowl transducer reveal significant beam distortion of 7.8 mm attributed to the skull's heterogeneous acoustical properties.


[84] 2509.24613

HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition

Despite advances in multilingual automatic speech recognition (ASR), code-switching (CS), the mixing of languages within an utterance common in daily speech, remains a severely underexplored challenge. In this paper, we introduce HiKE: the Hierarchical Korean-English code-switching benchmark, the first globally accessible non-synthetic evaluation framework for Korean-English CS, aiming to provide a means for the precise evaluation of multilingual ASR models and to foster research in the field. The proposed framework not only consists of high-quality, natural CS data across various topics, but also provides meticulous loanword labels and a hierarchical CS-level labeling scheme (word, phrase, and sentence) that together enable a systematic evaluation of a model's ability to handle each distinct level of code-switching. Through evaluations of diverse multilingual ASR models and fine-tuning experiments, this paper demonstrates that although most multilingual ASR models initially exhibit inadequate CS-ASR performance, this capability can be enabled through fine-tuning with synthetic CS data. HiKE is available at this https URL.


[85] 2509.26428

Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints

The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: this https URL.