New articles on Electrical Engineering and Systems Science


[1] 2603.22371

Multimodal Fusion of Skeleton Dynamics and Clinical Gait Features for Video-Based Cerebral Palsy Severity Assessment

Video-based gait analysis has become a promising approach for assessing motor impairment in children with cerebral palsy (CP). However, existing methods usually rely on either pose sequences or handcrafted gait features alone, making it difficult to simultaneously capture spatiotemporal motion patterns and clinically meaningful biomechanical information. To address this gap, we propose a multimodal fusion framework that integrates skeleton dynamics with contribution-guided clinically meaningful gait features. First, Grad-CAM analysis on a pre-trained ST-GCN backbone identified the most discriminative body keypoints, providing an interpretable basis for subsequent gait feature extraction. We then build a dual-stream architecture, with one stream modeling skeleton dynamics using ST-GCN and the other encoding gait geatures derived from the identified keypoints. By fusing the two streams through feature cross-attention improved four-level CP motor severity classification to 70.86%, outperforming the baseline by 5.6 percentage points. Overall, this work suggests that integrating skeleton dynamics with clinically meaningful gait descriptors can improve both prediction performance and biomechanical interpretability for video-based CP severity assessment.


[2] 2603.22378

Abnormalities and Disease Detection in Gastro-Intestinal Tract Images

Gastrointestinal (GI) tract image analysis plays a crucial role in medical diagnosis. This research addresses the challenge of accurately classifying and segmenting GI images for real-time applications, where traditional methods often struggle due to the diversity and complexity of abnormalities. The high computational demands of this domain require efficient and adaptable solutions. This PhD thesis presents a multifaceted approach to GI image analysis. Initially, texture-based feature extraction and classification methods were explored, achieving high processing speed (over 4000 FPS) and strong performance (F1-score: 0.76, Accuracy: 0.98) on the Kvasir V2 dataset. The study then transitions to deep learning, where an optimized model combined with data bagging techniques improved performance, reaching an accuracy of 0.92 and an F1-score of 0.60 on the HyperKvasir dataset, and an F1-score of 0.88 on Kvasir V2. To support real-time detection, a streamlined neural network integrating texture and local binary patterns was developed. By addressing inter-class similarity and intra-class variation through a learned threshold, the system achieved 41 FPS with high accuracy (0.99) and an F1-score of 0.91 on HyperKvasir. Additionally, two segmentation tools are proposed to enhance usability, leveraging Depth-Wise Separable Convolution and neural network ensembles for improved detection, particularly in low-FPS scenarios. Overall, this research introduces novel and adaptable methodologies, progressing from traditional texture-based techniques to deep learning and ensemble approaches, providing a comprehensive framework for advancing GI image analysis.


[3] 2603.22445

Finite-time Convergent Control Barrier Functions with Feasibility Guarantees

This paper studies the problem of finite-time convergence to a prescribed safe set for nonlinear systems whose initial states violate the safety constraints. Existing Control Lyapunov-Barrier Functions (CLBFs) can enforce recovery to the safe set but may suffer from the issue of chattering and they do not explicitly consider control bounds. To address these limitations, we propose a new Control Barrier Function (CBF) formulation that guarantees finite-time convergence to the safe set while ensuring feasibility under control constraints. Specifically, we strengthen the initially violated safety constraint by introducing a parameter which enables the exploitation of the asymptotic property of a CBF to converge to the safe set in finite time. Furthermore, the conditions for the existence of such a CBF under control bounds to achieve finite-time convergence are derived via reachability analysis and constraint comparison, providing a systematic approach for parameter design. A case study on 2D obstacle avoidance is presented to demonstrate the effectiveness and advantages of the proposed method.


[4] 2603.22460

Data-Driven Synthesis of Robust Positively Invariant Sets from Noisy Data

This paper develops a method to construct robust positively invariant (RPI) tube sets from finite noisy input-state data of an unknown linear time-invariant (LTI) system, yielding tubes that can be directly embedded in tube-based robust data-driven predictive control. Data-consistency uncertainty sets are constructed under process/measurement noise with polytopic/ellipsoidal bounds. In the measurement-noise case, we provide a deterministic and data-consistent procedure to certify the induced residual bound from data. Based on these sets, a robustly stabilizing state-feedback gain is certified via a common quadratic contraction, which in turn enables constructive polyhedral/ellipsoidal RPI tube computation. Numerical examples quantify the conservatism induced by noisy data and the employed certification step.


[5] 2603.22469

Stability-Preserving Online Adaptation of Neural Closed-loop Maps

The growing complexity of modern control tasks calls for controllers that can react online as objectives and disturbances change, while preserving closed-loop stability. Recent approaches for improving the performance of nonlinear systems while preserving closed-loop stability rely on time-invariant recurrent neural-network controllers, but offer no principled way to update the controller during operation. Most importantly, switching from one stabilizing policy to another can itself destabilize the closed-loop. We address this problem by introducing a stability-preserving update mechanism for nonlinear, neural-network-based controllers. Each controller is modeled as a causal operator with bounded $\ell_p$-gain, and we derive gain-based conditions under which the controller may be updated online. These conditions yield two practical update schemes, time-scheduled and state-triggered, that guarantee the closed-loop remains $\ell_p$-stable after any number of updates. Our analysis further shows that stability is decoupled from controller optimality, allowing approximate or early-stopped controller synthesis. We demonstrate the approach on nonlinear systems with time-varying objectives and disturbances, and show consistent performance improvements over static and naive online baselines while guaranteeing stability.


[6] 2603.22496

Far-field compressive ultrasound beamforming

We present a compressive beamforming method for coherent plane-wave compounding (CPWC) ultrasound imaging based on a far-field decomposition of the received radiofrequency (RF) data into virtual plane waves. This decomposition recasts the imaging operation entirely in the spatial frequency domain ($k$-space), allowing direct and flexible control over $k$-space sampling distributions based on the principle of coarrays. We present vernier-type sampling strategies designed to optimize the tradeoff between image contrast and resolution with minimum redundancy, including strategies that favor dense low-frequency sampling for high contrast, shifted schemes that extend the frequency support for improved resolution, and confocal or hybrid compounding schemes that approximate the spatial-frequency transfer function of conventional DAS beamforming. Our method, called KK beamforming, is validated with a calibration phantom and in-vivo human tissue data, demonstrating compression factors of an order of magnitude while maintaining image qualities comparable to conventional DAS. We further demonstrate that KK beamforming yields improvements in computational speed owing to its reduced memory footprint and more efficient cache utilization of the compressed data and associated look-up tables.


[7] 2603.22506

Performance Evaluation of Movable Antenna Arrays in Wideband Multi-User MIMO Systems

Future wireless networks are expected to support increasingly high data rates and user densities, motivating advanced multi-antenna architectures capable of adapting to dynamic propagation environments. Movable antenna (MA) arrays have recently emerged as an extension of massive MIMO, enabling physical repositioning of antenna elements to better exploit spatial diversity and mitigate inter-user interference. While prior studies report promising gains under idealized assumptions, their performance under realistic wideband multi-user operation remains insufficiently understood. This paper presents a comprehensive evaluation of MA-enabled systems in practical uplink and downlink scenarios. A wideband OFDM system model is developed, and novel closed-form sum-rate expressions are derived for both uplink and downlink under linear and nonlinear processing. Hardware impairments are incorporated via an EVM-based model, from which a distortion-aware UL/DL duality is established and the resulting high-SNR sum rate ceiling is analytically characterized. In addition, the interactions between antenna position optimization, receiver processing, and user loading are examined, and performance is evaluated under both time-division duplexing (TDD) and frequency-division duplexing (FDD). The results show that movable antennas can provide noticeable gains in low-impairment regimes with strong multi-user interference, but these benefits are highly scenario-dependent and diminish under hardware-impairment-limited conditions or in rich-scattering environments. These findings highlight the importance of carefully assessing deployment conditions when considering antenna mobility as an alternative to conventional fixed array configurations.


[8] 2603.22536

MSP-Conversation: A Corpus for Naturalistic, Time-Continuous Emotion Recognition

Affective computing aims to understand and model human emotions for computational systems. Within this field, speech emotion recognition (SER) focuses on predicting emotions conveyed through speech. While early SER systems relied on limited datasets and traditional machine learning models, recent deep learning approaches demand largescale, naturalistic emotional corpora. To address this need, we introduce the MSP-Conversation corpus: a dataset of more than 70 hours of conversational audio with time-continuous emotional annotations and detailed speaker diarizations. The time-continuous annotations capture the dynamic and contextdependent nature of emotional expression. The annotations in the corpus include fine-grained temporal traces of valence, arousal, and dominance. The audio data is sourced from publicly available podcasts and overlaps with a subset of the isolated speaking turns in the MSP-Podcast corpus to facilitate direct comparisons between annotation methods (i.e., in-context versus out-of-context annotations). The paper outlines the development of the corpus, annotation methodology, analyses of the annotations, and baseline SER experiments, establishing the MSP-Conversation corpus as a valuable resource for advancing research in dynamic SER in naturalistic settings.


[9] 2603.22548

L2O-CCG: Adversarial Learning with Set Generalization for Adaptive Robust Optimization

The adversarial subproblem in two-stage adaptive robust optimization (ARO), which identifies the worst-case uncertainty realization, is a major computational bottleneck. This difficulty is exacerbated when the recourse value function is non-concave and the uncertainty set shifts across applications. Existing approaches typically exploit specific structural assumptions on the value function or the uncertainty set geometry to reformulate this subproblem, but degrade when these assumptions are violated or the geometry changes at deployment. To address this challenge, we propose L2O-CCG, a bi-level framework that enables the integration of structure-aware adversarial solvers within the constraint-and-column generation (CCG) algorithm. As one instantiation, we develop a generalizable adversarial learning method, which replaces solver-based adversarial search with a learned proximal gradient optimizer that can generalize across uncertainty set geometries without retraining. Here, an inner-level neural network approximates the recourse value function from offline data, while an outer-level pre-trained mapping generates iteration-dependent step sizes for a proximal gradient scheme. We also establish out-of-distribution convergence bounds under uncertainty set parameter shifts, showing how the trajectory deviation of the learned optimizer is bounded by the uncertainty set shift. We illustrate performance of the L2O-CCG method on a building HVAC management task.


[10] 2603.22554

A Model Predictive Control Approach to Dual-Axis Agrivoltaic Panel Tracking

Agrivoltaic systems--photovoltaic (PV) panels installed above agricultural land--have emerged as a promising dual-use solution to address competing land demands for food and energy production. In this paper, we propose a model predictive control (MPC) approach to dual-axis agrivoltaic panel tracking control that dynamically adjusts panel positions in real time to maximize power production and crop yield given solar irradiance and ambient temperature measurements. We apply convex relaxations and shading factor approximations to reformulate the MPC optimization problem as a convex second-order cone program that determines the PV panel position adjustments away from the sun-tracking trajectory. Through case studies, we demonstrate our approach, exploring the Pareto front between i) an approach that maximizes power production without considering crop needs and ii) crop yield with no agrivoltaics. We also conduct a case study exploring the impact of forecast error on MPC performance. We find that dynamically adjusting agrivoltaic panel position helps us actively manage the trade-offs between power production and crop yield, and that active panel control enables the agrivoltaic system to achieve land equivalent ratio values of up to 1.897.


[11] 2603.22627

Single-Subject Multi-View MRI Super-Resolution via Implicit Neural Representations

Clinical MRI frequently acquires anisotropic volumes with high in-plane resolution and low through-plane resolution to reduce acquisition time. Multiple orientations are therefore acquired to provide complementary anatomical information. Conventional integration of these views relies on registration followed by interpolation, which can degrade fine structural details. Recent deep learning-based super-resolution (SR) approaches have demonstrated strong performance in enhancing single-view images. However, their clinical reliability is often limited by the need for large-scale training datasets, resulting in increased dependence on cohort-level priors. Self-supervised strategies offer an alternative by learning directly from the target scans. Prior work either neglects the existence of multi-view information or assumes that in-plane information can supervise through-plane reconstruction under the assumption of pre-alignment between images. However, this assumption is rarely satisfied in clinical settings. In this work, we introduce Single-Subject Implicit Multi-View Super-Resolution for MRI (SIMS-MRI), a framework that operates solely on anisotropic multi-view scans from a single patient without requiring pre- or post-processing. Our method combines a multi-resolution hash-encoded implicit representation with learned inter-view alignment to generate a spatially consistent isotropic reconstruction. We validate the SIMS-MRI pipeline on both simulated brain and clinical prostate MRI datasets. Code will be made publicly available for reproducibility: this https URL


[12] 2603.22653

Explicit Model Predictive Control with Quantum Encryption

This paper studies quantum-encrypted explicit MPC for constrained discrete-time linear systems in a cloud-based architecture. A finite-horizon quadratic MPC problem is solved offline to obtain a piecewise-affine controller. Shared quantum keys generated from Bell pairs and protected by quantum key distribution are used to encrypt the online control evaluation between the sensor and actuator. Based on this architecture, we develop a lightweight encrypted explicit MPC protocol, prove exact recovery of the plaintext control action, and characterize its computational efficiency. Numerical results demonstrate lower online complexity than classical encrypted MPC, while security is discussed in terms of confidentiality of plant data and control inputs.


[13] 2603.22654

Universal Formula Families for Safe Stabilization of Single-Input Nonlinear Systems

We develop an optimization-free framework for safe stabilization of single-input control-affine nonlinear systems with a given control Lyapunov function (CLF) and a given control barrier function (CBF), where the desired equilibrium lies in the interior of the safe set. An explicit compatibility condition is derived that is necessary and sufficient for the pointwise simultaneous satisfaction of the CLF and CBF inequalities. When this condition holds, two closed-form continuous state-feedback laws are constructed from the Lie-derivative data of the CLF and CBF via standard universal stabilizer formulas, yielding asymptotic stabilization of the origin and forward invariance of the interior of the safe set, without online quadratic programming. The two laws belong to broader families parametrized by a free nondecreasing function, providing additional design flexibility. When the compatibility condition fails, a safety-prioritizing modification preserves forward invariance and drives the state toward the safe-set boundary until a compatible region is reached, whereupon continuity at the origin and asymptotic stabilization are recovered. The framework produces families of explicit constructive alternatives to CLF-CBF quadratic programming for scalar-input nonlinear systems.


[14] 2603.22773

Distributed Hybrid Feedback for Global Pose Synchronization of Multiple Rigid Body Systems on $SE(3)$

This paper investigates the problem of pose synchronization for multiple rigid body systems evolving on the matrix Lie group $\SE(3)$. We propose a distributed hybrid feedback control scheme with global asymptotic stability guarantees using relative pose and group velocity measurements. The key idea consists of constructing a new potential function on $\SE(3) \times \mathbb{R}$ with a generalized non-diagonal weighting matrix, and a set of auxiliary scalar variables with continuous-discrete hybrid dynamics. Based on the new potential function and the auxiliary scalar variables, a geometric distributed hybrid feedback designed directly on $\SE(3)$ is proposed to achieve global pose synchronization. Numerical simulation results are presented to illustrate the performance of the proposed distributed hybrid control scheme.


[15] 2603.22776

Viewport-based Neural 360° Image Compression

Given the popularity of 360° images on social media platforms, 360° image compression becomes a critical technology for media storage and transmission. Conventional 360° image compression pipeline projects the spherical image into a single 2D plane, leading to issues of oversampling and distortion. In this paper, we propose a novel viewport-based neural compression pipeline for 360° images. By replacing the image projection in conventional 360° image compression pipelines with viewport extraction and efficiently compressing multiple viewports, the proposed pipeline minimizes the inherent oversampling and distortion issues. However, viewport extraction impedes information sharing between multiple viewports during compression, causing the loss of global information about the spherical image. To tackle this global information loss, we design a neural viewport codec to capture global prior information across multiple viewports and maximally compress the viewport data. The viewport codec is empowered by a transformer-based ViewPort ConText (VPCT) module that can be integrated with canonical learning-based 2D image compression structures. We compare the proposed pipeline with existing 360° image compression models and conventional 360° image compression pipelines building on learning-based 2D image codecs and standard hand-crafted codecs. Results show that our pipeline saves an average of $14.01\%$ bit consumption compared to the best-performing 360° image compression methods without compromising quality. The proposed VPCT-based codec also outperforms existing 2D image codecs in the viewport-based neural compression pipeline. Our code can be found at: this https URL.


[16] 2603.22842

L-UNet: An LSTM Network for Remote Sensing Image Change Detection

Change detection of high-resolution remote sensing images is an important task in earth observation and was extensively investigated. Recently, deep learning has shown to be very successful in plenty of remote sensing tasks. The current deep learning-based change detection method is mainly based on conventional long short-term memory (Conv-LSTM), which does not have spatial characteristics. Since change detection is a process with both spatiality and temporality, it is necessary to propose an end-to-end spatiotemporal network. To achieve this, Conv-LSTM, an extension of the Conv-LSTM structure, is introduced. Since it shares similar spatial characteristics with the convolutional layer, L-UNet, which substitutes partial convolution layers of UNet-to-Conv-LSTM and Atrous L-UNet (AL-UNet), which further using Atrous structure to multiscale spatial information is proposed. Experiments on two data sets are conducted and the proposed methods show the advantages both in quantity and quality when compared with some other methods.


[17] 2603.22856

Retrieval-Guided Photovoltaic Inventory Estimation from Satellite Imagery for Distribution Grid Planning

The rapid expansion of distributed rooftop photovoltaic (PV) systems introduces increasing uncertainty in distribution grid planning, hosting capacity assessment, and voltage regulation. Reliable estimation of rooftop PV deployment from satellite imagery is therefore essential for accurate modeling of distributed generation at feeder and service-territory scales. However, conventional computer vision approaches rely on fixed learned representations and globally averaged visual correlations. This makes them sensitive to geographic distribution shifts caused by differences in roof materials, urban morphology, and imaging conditions across regions. To address these challenges, this paper proposes Solar Retrieval-Augmented Generation (Solar-RAG), a context-grounded framework for photovoltaic assessment that integrates similarity-based image retrieval with multimodal vision-language reasoning. Instead of producing predictions solely from internal model parameters, the proposed approach retrieves visually similar rooftop scenes with verified annotations and performs comparative reasoning against these examples during inference. This retrieval-guided mechanism provides geographically contextualized references that improve robustness under heterogeneous urban environments without requiring model retraining. The method outperform both conventional deep vision models and standalone vision-language models. Furthermore, feeder-level case studies show that improved PV inventory estimation reduces errors in voltage deviation analysis and hosting capacity assessment. The results demonstrate that the proposed method provides a scalable and geographically robust approach for monitoring distributed PV deployment. This enables more reliable integration of remote sensing data into distribution grid planning and distributed energy resource management.


[18] 2603.22857

Secure Two-Party Matrix Multiplication from Lattices and Its Application to Encrypted Control

In this study, we propose a two-party computation protocol for approximate matrix multiplication of fixed-point numbers. The proposed protocol is provably secure under standard lattice-based cryptographic assumptions and enables matrix multiplication at a desired approximation level within a single round of communication. We demonstrate the feasibility of the protocol by applying it to the secure implementation of a linear control law. Our evaluation reveals that the client achieves lower online computational complexity compared to the original controller computation, while ensuring the privacy of controller inputs, outputs, and parameters. Furthermore, a numerical example confirms that the proposed method maintains sufficient precision of control inputs even in the presence of approximation and quantization errors.


[19] 2603.22881

Cooperative Bandit Learning in Directed Networks with Arm-Access Constraints

Sequential decision-making under uncertainty often involves multiple agents learning which actions (arms) yield the highest rewards through repeated interaction with a stochastic environment. This setting is commonly modeled by cooperative multi-agent multi-armed bandit problems, where agents explore and share information without centralized coordination. In many realistic systems, agents have heterogeneous capabilities that limit their access to subsets of arms and communicate over asymmetric networks represented by directed graphs. In this work, we study multi-agent multi-armed bandit problems with partial arm access, where agents explore and exploit only the arms available to them while exchanging information with neighbors. We propose a distributed consensus-based upper confidence bound (UCB) algorithm that accounts for both the arm accessibility structure and network asymmetry. Our approach employs a mass-preserving information mixing mechanism, ensuring that reward estimates remain unbiased across the network despite accessibility constraints and asymmetric information flow. Under standard stochastic assumptions, we establish logarithmic regret for every agent, with explicit dependence on network mixing properties and arm accessibility constraints. These results quantify how heterogeneous arm access and directed communication shape cooperative learning performance.


[20] 2603.22929

Experimental Characterisation of Distributed Reactive Power Sharing under Communication-Induced Stress in Parallel Grid-Forming Inverters

Synchronisation of parallel grid-forming inverters is crucial for stable operation of future power systems. This includes accurate and robust reactive power sharing under realistic operating conditions such as impedance mismatch and communication constraints. In this work, reactive power sharing by virtue of a distributed control law is investigated under line impedance mismatch. Furthermore, robustness and transient behaviour of the proposed approach are experimentally evaluated under communication-induced stressors including a fixed 3% packet loss and communication delays ranging from 50 ms to 100 ms, artificially introduced through a software-defined overlay. The study is conducted in a low-voltage laboratory-scale microgrid comprising two parallel grid-forming inverters, an AC load, and a grid-following battery system acting as a reactive power injector. The results show reactive power sharing convergence up to 90 ms communication delay, with a stability boundary between 90 ms and 100 ms, which decreases with increasing integral gain.


[21] 2603.22958

Toward Integrated Sensing, Communications, and Edge Intelligence Networks

Wireless systems are expanding their purposes, from merely connecting humans and things to connecting intelligence and opportunistically sensing of the environment through radio-frequency signals. In this paper, we introduce the concept of triple-functional networks in which the same infrastructure and resources are shared for integrated sensing, communications, and (edge) Artificial Intelligence (AI) inference. This concept opens up several opportunities, such as devising non-orthogonal resource deployment and power consumption to concurrently update multiple services, but also challenges related to resource management and signaling cross-talk, among others. The core idea of this work is that computation-related aspects, including computing resources and AI models availability, should be explicitly considered when taking resource allocation decisions, to address the conflicting goals of the services coexistence. After showing the natural coupling between theoretical performance bounds of the three services, we formulate a service coexistence optimization problem that is solved optimally, and showcase the advantages against a disjoint allocation strategy.


[22] 2603.22983

Markov-Enforced Discrete Diffusion Model for Digital Semantic Symbol Error Correction

Diffusion models (DMs) have achieved remarkable success across various domains owing to their strong generative and denoising capabilities. Meanwhile, semantic communication based on neural joint source-channel coding (JSCC) has emerged as a promising paradigm for robust and efficient image transmission. However, severe channel noise can still distort the transmitted semantic symbols, resulting in significant performance degradation. Applying DMs to digital semantic symbols, particularly in vector quantization (VQ)-based systems, is fundamentally challenging because the Markov assumption does not hold for the symbol transition dynamics. To address this issue, we introduce SSCDM, a semantic symbol correcting diffusion model whose discrete-time transition dynamics are constructed using solutions from continuous-time Markov chain theory. Furthermore, to promote synergy between DMs and JSCC, our DM structure embeds discrete symbols into a latent feature space using a learned VQ codebook, and a self-organizing map-based loss is incorporated during codebook learning to enhance the geometric vicinity between neighboring digital symbols, thereby promoting topology-preserving semantic representations. Experimental results show that the proposed method significantly improves image reconstruction quality and outperforms previous symbol-level denoising techniques under low signal-to-noise ratio scenarios and different datasets.


[23] 2603.22992

Design Guidelines for Nonlinear Kalman Filters via Covariance Compensation

Nonlinear extensions of the Kalman filter (KF), such as the extended Kalman filter (EKF) and the unscented Kalman filter (UKF), are indispensable for state estimation in complex dynamical systems, yet the conditions for a nonlinear KF to provide robust and accurate estimations remain poorly understood. This work proposes a theoretical framework that identifies the causes of failure and success in certain nonlinear KFs and establishes guidelines for their improvement. Central to our framework is the concept of covariance compensation: the deviation between the covariance predicted by a nonlinear KF and that of the EKF. With this definition and detailed theoretical analysis, we derive three design guidelines for nonlinear KFs: (i) invariance under orthogonal transformations, (ii) sufficient covariance compensation beyond the EKF baseline, and (iii) selection of compensation magnitude that favors underconfidence. Both theoretical analysis and empirical validation confirm that adherence to these principles significantly improves estimation accuracy, whereas fixed parameter choices commonly adopted in the literature are often suboptimal. The codes and the proofs for all the theorems in this paper are available at this https URL.


[24] 2603.23015

JanusBM: A Dual-Fidelity Multi-Zone White-Box Building Modeling Framework

Accurate building energy models are crucial for analyzing sector-coupled energy systems, where buildings interact with electrified heating, energy storage, and advanced control across various scenarios. High-fidelity (HiFi) white-box models that resolve hydronic distribution and emitter dynamics can capture short-term transients, yet their numerical stiffness and computational burden limit long-term simulations and large-scale scenario exploration. Conversely, reduced-order low-fidelity (LoFi) representations enable rapid annual assessments but may fail to capture the hydronic- and control-induced dynamics that govern transient and peak behavior. This paper proposes a dual-fidelity, multi-zone white-box building modeling framework, which is called JanusBM, built on a novel topology-driven modeling tool RoomFlex6D, coupling a HiFi hydronic model and a LoFi ideal-load surrogate that removes explicit hydronic states in Modelica. To ensure applicability and physical consistency across time scales, we introduce a two-stage hybrid validation and calibration pipeline that uses complementary data: the IEA EBC Annex 60 benchmark for energy-scale validation and time-series measurements from real-world experimental buildings for hydronic dynamics-scale calibration. Results show that the generated LoFi models achieve a high degree of consistency with Annex 60 benchmark on the energy scale, and the proposed calibration workflow robustly improves loop-level return water temperature transients and zone-level temperature dynamics. Moreover, the LoFi model achieves orders-of-magnitude faster simulations suited to annual energy analyses, whereas the HiFi model becomes necessary when the required heat differs from the actual delivered heat due to distribution and control limitations, especially in transient and peak-oriented assessments.


[25] 2603.23017

Modelling Emotions is an Elusive Pursuit in Affective Computing

Affective computing - combining sensor technology, machine learning, and psychology - have been studied for over three decades and is employed in AI-powered technologies to enhance emotional awareness in AI systems, and detect symptoms of mental health disorders such as anxiety and depression. However, the uncertainty in such systems remains high, and the application areas are limited by categorical definitions of emotions and emotional concepts. This paper argues that categorical emotion labels obscure emotional nuance in affective computing, and therefore continuous dimensional definitions are needed to advance the field, increase application usefulness, and lower uncertainties.


[26] 2603.23039

Rao-Blackwellized Stein Gradient Descent for Joint State-Parameter Estimation

We present a filtering framework for online joint state estimation and parameter identification in nonlinear, time-varying systems. The algorithm uses Rao-Blackwellization technique to infer joint state-parameter posteriors efficiently. In particular, conditional state distributions are computed analytically via Kalman filtering, while model parameters including process and measurement noise covariances are approximated using particle-based Stein Variational Gradient Descent (SVGD), enabling stable real-time inference. We prove a theoretical consistency result by bounding the impact of the SVGD approximated parameter posterior on state estimates, relating the divergence between the true and approximate parameter posteriors to the total variation distance between the resulting state marginals. Performance of the proposed filter is validated on two case studies: a bioreactor with Haldane kinetics and a neural-network-augmented dynamic system. The latter demonstrates the filter's capacity for online neural network training within a dynamical model, showcasing its potential for fully adaptive, data-driven system identification.


[27] 2603.23057

Prompt Amplification and Zero-Shot Late Fusion in Audio-Language Models for Speech Emotion Recognition

Audio-Language Models (ALMs) are making strides in understanding speech and non-speech audio. However, domain-specialist Foundation Models (FMs) remain the best for closed-ended speech processing tasks such as Speech Emotion Recognition (SER). Using ALMs for Zero-shot SER is a popular choice, but their potential to work with specialists to achieve state-of-the-art (SOTA) performance remains unexplored. We propose ZS-Fuse, a late-fusion method that combines zero-shot emotion estimates from a dual-encoder ALM with specialist FMs. To handle ambiguity in emotions and sensitivity to prompt choice, 1) we use a simple prompt ensemble and 2) suggest a novel technique called prompt amplification, which repeats audio and text queries to discover stronger zero-shot capabilities. We demonstrate the efficacy of our technique by evaluating ZS-Fuse with three dual-encoder ALMs and two FMs, and report improvements over SOTA baselines, such as WavLM-Large, on three speech emotion recognition datasets.


[28] 2603.23093

Extended-Target Classification and Localization for Near-Field ISAC

Near-field integrated sensing and communication (ISAC) enables object-level sensing from distance-dependent array responses, yet most existing near-field methods still rely on point-target models and realistic extended targets remain largely unexplored. In this paper, joint target classification and range-azimuth localization are studied from channel responses of realistic extended targets. A dual-branch inference framework is proposed. Semantic and geometric branches are used for classification and localization, respectively. Cross-task attention is introduced after task-specific encoding so that complementary cues can be exchanged without forcing full feature sharing from the input stage. To improve localization on the same backbone, uncertainty-aware regression and a physics-guided structured objective are adopted, including planar consistency, peak-response regularization, and geometry-coupling constraints. Training and evaluation data are generated from full-wave electromagnetic scattering simulations of voxelized vehicle targets with randomized heading angles, material contrasts, and placements. The compared variants show that cross-task attention mainly benefits classification, while uncertainty-aware and structured supervision are needed to recover strong localization performance on the same backbone. Under the adopted shared-OFDM benchmark, the proposed framework reaches the best joint operating point with fewer sensing tones for the same target performance region.


[29] 2603.23096

Rigid Motion Estimation using Accelerated Iterative Coordinate Descent (REACT) for MR Imaging

Purpose: To develop a computationally viable autofocus method for estimating 3D rigid motion in MR imaging. Theory and Methods: The proposed method, REACT, assumes a piecewise-constant motion trajectory and estimates the rigid motion parameters of individual temporal segments by optimizing an image-quality metric. Coordinate descent is adopted to decompose the high-dimensional optimization problem into a series of subproblems, each updating the motion parameters of a single temporal segment. The cost function of each subproblem is assumed to be approximately locally convex under suitable acquisition conditions. Each subproblem is then solved using a derivative-free solver, thereby avoiding an exhaustive grid search. Numerical simulations were conducted to investigate the local convexity assumption. REACT was evaluated for respiratory motion correction on in vivo free-breathing coronary MR angiography datasets acquired using a 3D cones trajectory with image-based navigators (iNAVs). An autofocus nonrigid motion correction method was also evaluated for comparison. Coronary artery sharpness was quantified using unbounded image edge profile acutance (u-IEPA). Results: In numerical simulations, the objective surfaces of the subproblems were approximately locally convex when the current motion estimate was close to the desired solution. In the in vivo study, REACT yielded higher u-IEPA than the conventional iNAV-based translational motion-estimation method for both the left anterior descending artery (LAD) and right coronary artery. REACT also yielded higher u-IEPA for the LAD than the autofocus nonrigid motion correction method. Conclusion: This study demonstrates the feasibility of coordinate descent for autofocus motion correction in MR imaging.


[30] 2603.23103

Power System Studies Using Open-Access Software

The use of open-access software is an option that can be considered by those interested in power system studies. In addition, the combination of two or more of these tools can expand the capabilities and the fields of application of each tool. This paper proposes the implementation of a flexible and powerful simulation environment based on R/Rstudio for carrying out power system studies. Several simple case studies are presented aimed at showing how the combination of either EMTP/ATP or OpenDSS with R/RStudio can expand the capabilities of each of these tools for performing either steady-state or transient power system studies. Basically, the proposed environment uses RStudio as control center from which each simulation tool (e.g., R, ATP, OpenDSS) can be run. Some procedures for generating information that must be exchanged between RStudio and ATP or RStudio and OpenDSS have been implemented. Such exchanges are bidirectional: ATP and OpenDSS produce simulation results that can be read by RStudio (text files in the case of ATP, comma separated value (CSV) and text files in the case of OpenDSS), while RStudio capabilities are used to generate files that are embedded into the input file to be read by either ATP or OpenDSS. This late option can be used to change either the configuration or some parameters of the test system under study. Finally, one very interesting option illustrated in this paper is the possibility of using machine learning algorithms to predict the performance of the test system.


[31] 2603.23131

Optimal Control of Switched Systems Governed by Logical Switching Dynamics

This paper investigates the optimal co-design of logical and continuous controls for switched linear systems governed by controlled logical switching dynamics. Unlike traditional switched systems with arbitrary or state-dependent switching, the switching signals here are generated by an internal logical dynamical system and explicitly integrated into the control synthesis. By leveraging the semi-tensor product (STP) of matrices, we embed the coupled logical and continuous dynamics into a unified algebraic state-space representation, transforming the co-design problem into a tractable linear-quadratic framework. We derive Riccati-type backward recursions for both deterministic and stochastic logical dynamics, which yield optimal state-feedback laws for continuous control alongside value-function-based, state-dependent decision rules for logical switching. To mitigate the combinatorial explosion inherent in logical decision-making, a hierarchical algorithm is developed to decouple offline precomputation from efficient online execution. Numerical simulations demonstrate the efficacy of the proposed framework.


[32] 2603.23147

Stable Inversion of Discrete-Time Linear Periodically Time-Varying Systems via Cyclic Reformulation

Stable inverse systems for periodically time-varying plants are essential for feedforward control and iterative learning control of multirate and periodic systems, yet existing approaches either require complex-valued Floquet factors and noncausal processing or operate on a block time scale via lifting. This paper proposes a systematic method for constructing stable inverse systems for discrete-time linear periodically time-varying (LPTV) systems that avoids these limitations. The proposed approach proceeds in three steps: (i) cyclic reformulation transforms the LPTV system into an equivalent LTI representation; (ii) the inverse of the resulting LTI system is constructed using standard LTI inversion theory; and (iii) the periodically time-varying inverse matrices are recovered from the block structure of the cycled inverse through parameter extraction. For the fundamental case of relative degree zero, where the output depends directly on the current input, the inverse system is obtained as an explicit closed-form time-varying matrix expression. For systems with periodic relative degree r >= 1, the r-step-delayed inverse is similarly obtained in explicit closed form via the periodic Markov parameters. The stability of the resulting inverse system is characterized by the transmission zeros of the cycled plant, generalizing the minimum phase condition from the LTI case. Numerical examples for both relative degree zero and higher relative degree systems confirm the validity of the stability conditions and demonstrate the effectiveness of the proposed framework, including exact input reconstruction via causal real-valued inverse systems.


[33] 2603.23150

Feedback Control of a Recirculating Bioreactor with Electrophoretic Removal of Inhibitory Extracellular DNA

Extracellular DNA accumulation in recirculating bioprocesses inhibits microbial growth and reduces productivity. We consider a continuous bioreactor with a recirculating loop and an electrophoretic filtration unit for selective DNA removal, and develop a feedback control framework combining online state and parameter estimation via an Unscented Kalman Filter with two control strategies: an adaptive Model Predictive Controller that jointly optimizes dilution rate and filtration activation, and a simpler bang--bang filtration policy with lookup-table dilution rate selection. Closed-loop simulations under nominal and perturbed conditions show that the MPC strategy achieves significantly higher cumulative profit while keeping DNA concentration below the inhibition threshold.


[34] 2603.23203

Scalable Impedance Identification of Diverse IBRs via Cluster-Specialized Neural Networks

Modern machine learning approaches typically identify the impedance of a single inverter-based resource (IBR) and assume similar impedance characteristics across devices. In modern power systems, however, IBRs will employ diverse control topologies and algorithms, leading to highly heterogeneous impedance behaviors. Training one model per IBR is inefficient and does not scale. This paper proposes a scalable impedance identification framework for diverse IBRs via cluster-specialized neural networks. First, the dataset is partitioned into multiple clusters with similar feature profiles using the K-means clustering method. Then, each cluster is assigned a specialized feed-forward neural network (FNN) tailored to its characteristics, improving both accuracy and computational efficiency. In deployment, only a small number of measurements are required to predict impedance over a wide range of operating points. The framework is validated on six IBRs with varying control bandwidths, control structures, and operating conditions, and further tested on a previously unseen IBR using only ten measurement points. The results demonstrate high accuracy in both the clustering and prediction stages, confirming the effectiveness and scalability of the proposed method.


[35] 2603.23222

Underdetermined Library-aided Impedance Estimation with Terminal Smart Meter Data

Smart meters provide relevant information for impedance identification, but they lack global phase alignment and internal network nodes are often unobserved. A few methods for this setting were developed, but they have requirements on data correlation and/or network topology. In this paper, we offer a unifying view of data- and structure-driven identifiability issues, and use this groundwork to propose a method for underdetermined impedance identification. The method can handle intrinsically ambiguous topologies and data; its output is not forcedly a single estimate, but instead a collection of data-compatible impedance assignments. It uses a library of plausible commercial cable types as a prior to refine the solutions, and we show how it can support topology identification workflows built around known georeferenced joints without degree guarantees. The method depends on a small number of non-sensitive parameters and achieves high identification performance on a sizeable benchmark case even with low-size injection/voltage datasets. We identify key steps that can be accelerated via GPU-based parallelization. Finally, we assess the tolerance of the identification to noisy input.


[36] 2603.23267

Geometric Direction Finding on Dynamic Manifolds: Unambiguous DOA Estimation for Spatially Undersampled UWB Arrays

Traditional Direction of Arrival (DOA) estimation methods struggle to simultaneously address three physical constraints in Ultra-Wideband (UWB) electromagnetic sensing: spatial undersampling, asynchronous array phase, and beam squint. Existing solutions treat these issues in isolation, leading to limited performance in complex scenarios. This paper proposes a novel dynamic manifold perspective, which models UWB signal observations as a continuous manifold curve in a high-dimensional space driven by temporal evolution and array topology. We theoretically demonstrate that the DOA can be uniquely determined solely by the geometric shape of the manifold, rather than the absolute arrival phase. Based on this perspective, we construct a geometric parameter system comprising extrinsic and intrinsic parameters, along with a corresponding DOA estimation framework. Extrinsic vector parameters serve as a dynamic extension of traditional array processing, effectively expanding the degrees of freedom to suppress grating lobes. Intrinsic scalar invariants provide a new geometric perspective independent of traditional phase models, offering intrinsic robustness against array channel phase errors. Simulation results show that the derived analytical expressions for geometric parameters are highly consistent with numerical truths. The proposed framework not only completely eliminates spatial ambiguity in sparse arrays but also achieves high-precision direction finding under conditions with calibration-free phase errors.


[37] 2603.23312

Time-Delay Systems with Discrete and Distributed delays: Discontinuous Initial Conditions and Reachability Sets

Time-invariant finite-dimensional systems, under reasonable continuity assumptions, exhibit the property that if solutions exist for all future times, the set of vectors reachable from a bounded set of initial conditions over bounded time intervals is also bounded. This property can be summarized as follows: forward completeness implies bounded reachability sets. By contrast, this property does not necessarily hold for infinite-dimensional systems in general, and time-delay systems in particular. Sufficient conditions for this property to hold that can be directly tested on the function defining the system dynamics are only known in the case of systems with pointwise (or discrete) delays. This paper develops novel sufficient conditions for the boundedness of the reachability sets of time-delay systems involving mixed pointwise and distributed delays. Broad classes of systems satisfying these conditions are identified.


[38] 2603.23357

Robust and Interpretable Graph Neural Networks for Power Systems State Estimation

This study analyzes Graph Neural Networks (GNNs) for distribution system state estimation (DSSE) by employing an interpretable Graph Neural Additive Network (GNAN) and by utilizing an edge-conditioned message-passing mechanism. The architectures are benchmarked against the standard Graph Attention Network (GAT) architecture. Multiple SimBench grids with topology changes and various measurement penetration rates were used to evaluate performance. Empirically, GNAN trails GAT in accuracy but serves as a useful probe for graph learning when accompanied with the proposed edge attention mechanism. Together, they demonstrate that incorporating information from distant nodes could improve learning depending on the grid topology and available data. This study advances the state-of-the-art understanding of learning on graphs for the state estimation task and contributes toward reliable GNN-based DSSE prediction technologies.


[39] 2603.23372

WAKE-NET: 3D-Wake-Aware Turbine Layout and Cabling Optimization Framework of Multi-Hub-Height Wind Farms for Grid-Scale and Industrial Power Systems

The global transition towards renewable energy has accelerated the deployment of utility-scale wind farms, increasing the need for accurate performance and economic assessments. Although wind energy offers substantial potential for carbon emission reduction, investment decisions are highly sensitive to predicted annual energy production and economic profitability. Conventionally wind farm analyses often estimate turbine power output based solely on incoming wind conditions, neglecting wake interactions between turbines. These wake effects can significantly reduce downstream turbine performance, leading to overestimation of energy yield and financial returns. This study proposes WAKE-NET a wake-aware optimization framework that incorporates both turbine layout optimization and hub height diversification across turbines of varying capacities. Unlike traditional approaches that assume a uniform hub height or ignore wake dynamics, the proposed methodology accounts for wake-induced power losses in its framework. Results indicate that the benchmark model that neglects wake effects can overestimate annual profits, while the use of multiple hub heights reduces wake overlap and associated power losses. Overall, the findings demonstrate that wake-aware design and hub height diversity improve energy yield accuracy and economic viability, offering a valuable guidance for wind farm developers and investors seeking to invest in renewable energy systems.


[40] 2603.23394

Markov State--Space Modeling and Channel Characterization for DNA-Based Molecular Communication

In this paper, we study DNA-based molecular communication with microarray-style reception under reversible hybridization, where the bound-state observation exhibits both inter-symbol interference and colored counting noise. To capture these effects in a communication-oriented form, we develop a Markov state-space framework based on a voxelized reaction--diffusion model, in which a block-structured transition matrix describes molecular transport and binding/unbinding dynamics. For the microarray specialization, this representation yields the channel impulse response, the equilibrium gain, and a settling-time-based characterization of the effective channel memory. Building on the resulting symbol-rate observation model for on--off keying, we derive a grouped-binomial counting model and obtain a closed-form expression for the covariance of the counting noise. Based on these statistics, we further develop a differential-threshold detector and a finite-memory decision-feedback equalizer. Numerical results validate the theoretical correlation behavior and show that the relative performance of the proposed receivers depends strongly on the channel-memory regime.


[41] 2603.23401

Self-Supervised Graph Neural Networks for Optimal Substation Reconfiguration

Changing the transmission system topology is an efficient and costless lever to reduce congestion or increase exchange capacities. The problem of finding the optimal switch states within substations is called Optimal Substation Reconfiguration (OSR), and may be framed as a Mixed Integer Linear Program (MILP). Current state-of-the-art optimization techniques come with prohibitive computing times, making them impractical for real-time decision-making. Meanwhile, deep learning offers a promising perspective with drastically smaller computing times, at the price of an expensive training phase and the absence of optimality guarantees. In this work, we frame OSR as an Amortized Optimization problem, where a Graph Neural Network (GNN) model -- our data being graphs -- is trained in a self-supervised way to improve the objective function. We apply our approach to the maximization of the exchange capacity between two areas of a small-scale 12-substations system. Once trained, our GNN model improves the exchange capacity by 10.2% on average compared to the all connected configuration, while a classical MILP solver reaches an average improvement of 15.2% with orders-of-magnitude larger computing times.


[42] 2603.23450

Information-Driven Active Perception for k-step Predictive Safety Monitoring

This work studies the synthesis of active perception policies for predictive safety monitoring in partially observable stochastic systems. Operating under strict sensing and communication budgets, the proposed monitor dynamically schedules sensor queries to maximize information gain about the safety of future states. The underlying stochastic dynamics are captured by a labeled hidden Markov model (HMM), with safety requirements defined by a deterministic finite automaton (DFA). To enable active information acquisition, we introduce minimizing k-step Shannon conditional entropy of the safety of future states as a planning objective, under the constraint of a limited sensor query budget. Using observable operators, we derive an efficient algorithm to compute the k-step conditional entropy and analyze key properties of the conditional entropy gradient with respect to policy parameters. We validate the effectiveness of the method for predictive safety monitoring through a dynamic congestion game example.


[43] 2603.23465

Statistical Efficiency of Single- and Multi-step Models for Forecasting and Control

Compounding error, where small prediction mistakes accumulate over time, presents a major challenge in learning-based control. A common remedy is to train multi-step predictors directly instead of rolling out single-step models. However, it is unclear when the benefits of multi-step predictors outweigh the difficulty of learning a more complex model. We provide the first quantitative analysis of this trade-off for linear dynamical systems. We study three predictor classes: (i) single step models, (ii) multi-step models, and (iii) single step models trained with multi-step losses. We show that when the model class is well-specified and accurately captures the system dynamics, single-step models achieve the lowest asymptotic prediction error. On the other hand, when the model class is misspecified due to partial observability, direct multi-step predictors can significantly reduce bias and improve accuracy. We provide theoretical and empirical evidence that these trade-offs persist when predictors are used in closed-loop control.


[44] 2603.23475

Bridging the numerical-physical gap in acoustic holography via end-to-end differentiable structural optimization

Acoustic holography provides a practical means of flexibly controlling acoustic wavefronts. However, high-fidelity shaping of acoustic fields remains constrained by the numerical-physical gap inherent in conventional phase-only designs. These approaches realize a two-dimensional phase-delay profile as a three-dimensional thickness-varying lens, while neglecting wave-matter interactions arising from the lens structure. Here, we introduce an end-to-end, physics-aware differentiable structural optimization framework that directly incorporates three-dimensional lens geometries into the acoustic simulation and optimization loop. Using a novel differentiable relaxation, termed Differentiable Hologram Lens Approximation (DHLA), the lens geometry is treated as a differentiable design variable, ensuring intrinsic consistency between numerical design and physical realization. The resulting Thickness-Only Acoustic Holograms (TOAHs) significantly outperform state-of-the-art phase-only acoustic holograms (POAHs) in field reconstruction fidelity and precision under complex conditions. We further demonstrate the application of the framework to spatially selective neuromodulation in a neuropathic pain mouse model, highlighting its potential for non-invasive transcranial neuromodulation. In summary, by reconciling numerical design with physical realization, this work establishes a robust strategy for high-fidelity acoustic wavefront shaping in complex environments.


[45] 2603.22437

mmFHE: mmWave Sensing with End-to-End Fully Homomorphic Encryption

We present mmFHE, the first system that enables fully homomorphic encryption (FHE) for end-to-end mmWave radar sensing. mmFHE encrypts raw range profiles on a lightweight edge device and executes the entire mmWave signal-processing and ML inference pipeline homomorphically on an untrusted cloud that operates exclusively on ciphertexts. At the core of mmFHE is a library of seven composable, data-oblivious FHE kernels that replace standard DSP routines with fixed arithmetic circuits. These kernels can be flexibly composed into different application-specific pipelines. We demonstrate this approach on two representative tasks: vital-sign monitoring and gesture recognition. We formally prove two cryptographic guarantees for any pipeline assembled from this library: input privacy, the cloud learns nothing about the sensor data; and data obliviousness, the execution trace is identical on the cloud regardless of the data being processed. These guarantees effectively neutralize various supervised and unsupervised privacy attacks on raw data, including re-identification and data-dependent privacy leakage. Evaluation on three public radar datasets (270 vital-sign recordings, 600 gesture trials) shows that encryption introduces negligible error: HR/RR MAE <10^-3 bpm versus plaintext, and 84.5% gesture accuracy (vs. 84.7% plaintext) with end-to-end cloud GPU latency of 103s for a 10s vital-sign window and 37s for a 3s gesture window. These results show that privacy-preserving end-to-end mmWave sensing is feasible on commodity hardware today.


[46] 2603.22508

Parallel OctoMapping: A Scalable Framework for Enhanced Path Planning in Autonomous Navigation

Mapping is essential in robotics and autonomous systems because it provides the spatial foundation for path planning. Efficient mapping enables planning algorithms to generate reliable paths while ensuring safety and adapting in real time to complex environments. Fixed-resolution mapping methods often produce overly conservative obstacle representations that lead to suboptimal paths or planning failures in cluttered scenes. To address this issue, we introduce Parallel OctoMapping (POMP), an efficient OctoMap-based mapping technique that maximizes available free space and supports multi-threaded computation. To the best of our knowledge, POMP is the first method that, at a fixed occupancy-grid resolution, refines the representation of free space while preserving map fidelity and compatibility with existing search-based planners. It can therefore be integrated into existing planning pipelines, yielding higher pathfinding success rates and shorter path lengths, especially in cluttered environments, while substantially improving computational efficiency.


[47] 2603.22589

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling

First-order Ambisonics (FOA) is a standard spatial audio format based on spherical harmonic decomposition. Its zeroth- and first-order components capture the sound pressure and particle velocity, respectively. Recently, physics-informed neural networks have been applied to the spatial interpolation of FOA signals, regularizing the network outputs based on soft penalty terms derived from physical principles, e.g., the linearized momentum equation. In this paper, we reformulate the task so that the predicted FOA signal automatically satisfies the linearized momentum equation. Our network approximates a scalar function called velocity potential, rather than the FOA signal itself. Then, the FOA signal can be readily recovered through the partial derivatives of the velocity potential with respect to the network inputs (i.e., time and microphone position) according to physics of sound propagation. By deriving the four channels of FOA from the single-channel velocity potential, the reconstructed signal follows the physical principle at any time and position by construction. Experimental results on room impulse response reconstruction confirm the effectiveness of the proposed framework.


[48] 2603.22590

Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks

With the increasing deployment of automated and agentic systems, ensuring the adversarial robustness of automatic speech recognition (ASR) models has become critical. We observe that changing the precision of an ASR model during inference reduces the likelihood of adversarial attacks succeeding. We take advantage of this fact to make the models more robust by simple random sampling of the precision during prediction. Moreover, the insight can be turned into an adversarial example detection strategy by comparing outputs resulting from different precisions and leveraging a simple Gaussian classifier. An experimental analysis demonstrates a significant increase in robustness and competitive detection performance for various ASR models and attack types.


[49] 2603.22709

Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics

Conversational automatic speech recognition remains challenging due to overlapping speech, far-field noise, and varying speaker counts. While recent LLM-based systems perform well on single-speaker benchmarks, their robustness in multi-speaker settings is unclear. We systematically compare LLM-based and modular pipeline approaches along four axes: overlap robustness, semantic fidelity, speaker count, and single- versus multi-channel input. To capture meaning-altering errors that conventional metrics miss, we introduce tcpSemER, which extends tcpWER by replacing Levenshtein distance with embedding-based semantic similarity. We further decompose tcpWER into overlapping and non-overlapping components for finer-grained analysis. Experiments across three datasets show that LLM-based systems are competitive in two-speaker settings but degrade as speaker count and overlap increase, whereas modular pipelines remain more robust.


[50] 2603.22710

Optimal filtering for a giant cavity in waveguide QED systems

In waveguide quantum electrodynamics (QED) systems, a giant cavity can be engineered to interact with quantum fields by multiple distant coupling points so that its non-Markovian dynamics are quite different from traditional quantum optical cavity systems. Towards feedback control this system, this paper designs an optimal filter for the giant cavity systems to estimate its state evolution under continuous quantum measurements. Firstly, the Langevin equation in the Heisenberg picture are derived, which is a linear continuous-time system with both states and inputs delays resulting from the unconventional distant couplings. Compared to existing modeling approaches, this formulation effectively preserves the nonlocal coupling and multiple delay dynamic characteristics inherent in the original system. In particular, the presence of coupling and propagation delays leads to noncommutativity among the system operators at different times, which prevents the direct application of existing quantum filtering methods. To address this issue, an optimal filter is designed, in which the delayed-state covariance matrices are computed. By iteratively evaluating the delayed-state covariance over successive time intervals, the resulting optimal filter can be implemented in an interval-wise backward recursion algorithm. Finally, numerical simulations are conducted to evaluate the tracking performance of the proposed optimal filter for the giant cavity. By comparing between the evolutions of Wigner functions of coherent and cat states and the filter, the effectiveness of the optimal filter is validated.


[51] 2603.22727

Spiking Personalized Federated Learning for Brain-Computer Interface-Enabled Immersive Communication

This work proposes a novel immersive communication framework that leverages brain-computer interface (BCI) to acquire brain signals for inferring user-centric states (e.g., intention and perception-related discomfort), thereby enabling more personalized and robust immersive adaptation under strong individual variability. Specifically, we develop a personalized federated learning (PFL) model to analyze and process the collected brain signals, which not only accommodates neurodiverse brain-signal data but also prevents the leakage of sensitive brain-signal information. To address the energy bottleneck of continual on-device learning and inference on energy-limited immersive terminals (e.g., head-mounted display), we further embed spiking neural networks (SNNs) into the PFL. By exploiting sparse, event-driven spike computation, the SNN-enabled PFL reduces the computation and energy cost of training and inference while maintaining competitive personalization performance. Experiments on real brain-signal dataset demonstrate that our method achieves the best overall identification accuracy while reducing inference energy by 6.46$\times$ compared with conventional artificial neural network-based personalized baselines.


[52] 2603.22728

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

This paper presents the Interspeech 2026 Audio Encoder Capability Challenge, a benchmark specifically designed to evaluate and advance the performance of pre-trained audio encoders as front-end modules for Large Audio Language Models (LALMs). While LALMs have shown remarkable understanding of complex acoustic scenes, their performance depends on the semantic richness of the underlying audio encoder representations. This challenge addresses the integration gap by providing a unified generative evaluation framework, XARES-LLM, which assesses submitted encoders across a diverse suite of downstream classification and generation tasks. By decoupling encoder development from LLM fine-tuning, the challenge establishes a standardized protocol for general-purpose audio representations that can effectively be used for the next generation of multimodal language models.


[53] 2603.22731

Fleet-Level Battery-Health-Aware Scheduling for Autonomous Mobile Robots

Autonomous mobile robot fleets must coordinate task allocation and charging under limited shared resources, yet most battery aware planning methods address only a single robot. This paper extends degradation cost aware task planning to a multi robot setting by jointly optimizing task assignment, service sequencing, optional charging decisions, charging mode selection, and charger access while balancing degradation across the fleet. The formulation relies on reduced form degradation proxies grounded in the empirical battery aging literature, capturing both charging mode dependent wear and idle state of charge dependent aging; the bilinear idle aging term is linearized through a disaggregated piecewise McCormick formulation. Tight big M values derived from instance data strengthen the LP relaxation. To manage scalability, we propose a hierarchical matheuristic in which a fleet level master problem coordinates assignments, routes, and charger usage, while robot level subproblems whose integer part decomposes into trivially small independent partition selection problems compute route conditioned degradation schedules. Systematic experiments compare the proposed method against three baselines: a rule based nearest available dispatcher, an energy aware formulation that enforces battery feasibility without modeling degradation, and a charger unaware formulation that accounts for degradation but ignores shared charger capacity limits.


[54] 2603.22802

Equivalence of Finite- and Fixed-time Stability to Asymptotic Stability

In this paper, we present new results on finite- and fixed-time convergence for dynamical systems using LaSalle-like invariance principles. In particular, we provide first and second-order non-smooth Lyapunov-like results for finite- and fixed-time convergence, thereby relaxing the requirement of existence a differentiable, positive definite Lyapunov function. Based on these findings, we show that a dynamical system whose equilibrium point is globally asymptotically stable can be modified through scaling so that the resulting dynamical system has a fixed-time stable equilibrium point. The results in this paper expand our understanding of various convergence rates and strengthen the hypothesis that all the convergence rates are interconnected through a suitable transformation.


[55] 2603.22924

Positive Observers Revisited

The paper shows that positive linear systems can be stabilized using positive Luenberger-type observers, contradicting previous conclusions. This is achieved by structuring the observer as monotonically converging upper and lower bounds on the state. Analysis of the closed-loop properties under linear observer feedback gives conditions that cover a larger class than previous observer designs. The results are applied to nonpositive systems by enforcing positivity of the dynamics using feedback from the upper bound observer. The setting is expanded to include stochastic noise, giving conditions for convergence in expectation using feedback from positive observers.


[56] 2603.23006

On the Suboptimality of Rate--Distortion-Optimal Compression: Fundamental Accuracy Limits for Distributed Localization

We derive fundamental accuracy limits for distributed localization when a fusion center has access only to independently rate-distortion (RD)-optimally compressed versions of multi-sensor observations, under a line-of-sight propagation model with a Gaussian wideband waveform. Using the Gaussian RD test-channel model together with a Whittle spectral Fisher-information characterization, we obtain an explicit frequency-domain Cramér-Rao lower bound. A two-band, two-level specialization yields closed-form expressions and reveals a rate-induced regime change: RD-optimal compression under a squared-error distortion measure can eliminate localization-informative spectral content. A simple band-selective scheme can outperform RD compression by orders of magnitude at the same rate, motivating localization-aware compression for networked sensing and integrated sensing and communication systems.


[57] 2603.23123

Towards a Unified Coding Scheme for 6G

The growing demand for higher data rates necessitates continuous innovations in wireless communication systems, particularly with the emergence of 6G. Channel coding plays a crucial role in this evolution. In 5G systems, rate-adaptive raptor-like quasi-cyclic irregular low-density parity-check codes are used for the data link, while polar codes with successive cancellation list decoding handle short messages on the synchronization channel. However, to meet the stringent requirements of future 6G systems, a versatile and unified coding scheme should be developed - one that offers competitive error-correcting performance alongside low complexity encoding and decoding schemes that enable energy-efficient hardware implementations. This white paper outlines the vision for such a unified coding scheme. We explore various 6G communication scenarios that pose new challenges to channel coding and provide a first analysis of potential solutions.


[58] 2603.23182

Path Planning and Reinforcement Learning-Driven Control of On-Orbit Free-Flying Multi-Arm Robots

This paper presents a hybrid approach that integrates trajectory optimization (TO) and reinforcement learning (RL) for motion planning and control of free-flying multi-arm robots in on-orbit servicing scenarios. The proposed system integrates TO for generating feasible, efficient paths while accounting for dynamic and kinematic constraints, and RL for adaptive trajectory tracking under uncertainties. The multi-arm robot design, equipped with thrusters for precise body control, enables redundancy and stability in complex space operations. TO optimizes arm motions and thruster forces, reducing reliance on the arms for stabilization and enhancing maneuverability. RL further refines this by leveraging model-free control to adapt to dynamic interactions and disturbances. The experimental results validated through comprehensive simulations demonstrate the effectiveness and robustness of the proposed hybrid approach. Two case studies are explored: surface motion with initial contact and a free-floating scenario requiring surface approximation. In both cases, the hybrid method outperforms traditional strategies. In particular, the thrusters notably enhance motion smoothness, safety, and operational efficiency. The RL policy effectively tracks TO-generated trajectories, handling high-dimensional action spaces and dynamic mismatches. This integration of TO and RL combines the strengths of precise, task-specific planning with robust adaptability, ensuring high performance in the uncertain and dynamic conditions characteristic of space environments. By addressing challenges such as motion coupling, environmental disturbances, and dynamic control requirements, this framework establishes a strong foundation for advancing the autonomy and effectiveness of space robotic systems.


[59] 2603.23197

Privacy-Aware Smart Cameras: View Coverage via Socially Responsible Coordination

Coordination of view coverage via privacy-aware smart cameras is key to a more socially responsible urban intelligence. Rather than maximizing view coverage at any cost or over relying on expensive cryptographic techniques, we address how cameras can coordinate to legitimately monitor public spaces while excluding privacy-sensitive regions by design. This article proposes a decentralized framework in which interactive smart cameras coordinate to autonomously select their orientation via collective learning, while eliminating privacy violations via soft and hard constraint satisfaction. The approach scales to hundreds up to thousands of cameras without any centralized control. Experimental evidence shows 18.42% higher coverage efficiency and 85.53% lower privacy violation than baselines and other state-of-the-art approaches. This significant advance further unravels practical guidelines for operators and policymakers: how the field of view, spatial placement, and budget of cameras operating by ethically-aligned artificial intelligence jointly influence coverage efficiency and privacy protection in large-scale and sensitive urban environments.


[60] 2603.23262

Autoencoder-based Optimization of Multi-user Molecule Mixture Communication Systems

In this paper, we introduce an autoencoder (AE)-based scheme for end-to-end optimization of a multi-user molecule mixture communication system. In the proposed scheme, each transmitter leverages an encoder network that maps the user symbol to a molecule mixture. The mixtures then propagate through the channel to the receiver, which samples the channel using a non-linear, cross-reactive sensor array. A decoder network then estimates the symbol transmitted by each user based on the sensor observations. The proposed scheme achieves, for a given signal-to-noise ratio, lower symbol error rates than a baseline scheme from the literature in a single-user setting with full channel state information. We additionally demonstrate that the proposed AE-based scheme allows reliable communication when the channel is unknown or changing. Finally, we show that for multiple access the system can account for different user priorities. In summary, the proposed AE-based scheme enables end-to-end system optimization in complex scenarios unsuitable for analytical treatment and thereby brings molecular communication systems closer to real-world deployment.


[61] 2603.23297

Drop-In Perceptual Optimization for 3D Gaussian Splatting

Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at recovering fine textures without incurring a higher splat count. WD-R is preferred by raters more than $2.3\times$ over the original 3DGS loss, and $1.5\times$ over current best method Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across various datasets, and generalizes across recent frameworks, such as Mip-Splatting and Scaffold-GS, where replacing the original loss with WD-R consistently enhances perceptual quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), and leads to reconstructions being preferred by human raters $1.8\times$ and $3.6\times$, respectively. We also find that this carries over to the task of 3DGS scene compression, with $\approx 50\%$ bitrate savings for comparable perceptual metric performance.


[62] 2603.23390

Harnessing Lightweight Transformer with Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation

Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the extrinsic contextual information to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes intrinsic contextual information to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%. Code is released at this https URL.


[63] 2603.23476

Index-Based Scheduling for a Resource-Constrained Quantum Switch

We consider a quantum switch with a finite number of quantum memory registers that aims to serve multipartite entanglement requests among $N$ users. We propose scheduling policies that aim to optimize the average number of requests served per unit time by efficiently utilizing the switch's available memory. To measure the performance of the scheduling policies, we employ the newly introduced metric of age of entanglement establishment (AoEE). We formulate the scheduling problem in a restless multi-armed bandit (RMAB) framework. We show that the scheduling of entanglement requests is indexable. Subsequently, we find a closed-form expression of the Whittle index for all possible request-age pairs. By modeling the Whittle index of each request as its reward and its cardinality as its cost, we formulate the memory-constrained scheduling problem as a $0$-$1$ knapsack problem and solve it via dynamic programming. Furthermore, we consider two low-complexity sequential greedy policies that leverage two different modified Whittle indices.


[64] 2305.07144

Survey on Integrated Sensing and Communication Performance Modeling and Use Cases Feasibility

As the research community starts to address the* key features of 6G cellular standards, one of the agreed bridge topics to be studied already in 5G advanced releases is Integrated Sensing and Communication (ISAC). The first efforts of the research community are focusing on ISAC enablers, fundamental limits, and first demonstrators, that show that the time has come for the deployment of sensing functionalities in cellular standards. This survey paper takes a needed step towards ISAC deployment, providing an analytical toolkit to model cellular systems' sensing performance, accounting for both their fundamental and practical constraints. We then elaborate on the likely features of 6G systems to provide the feasible sensing key performance indicators (KPIs) in the frequency ranges spanned by cellular networks, including the potential new bands available in 6G, the Frequency Range 3 (FR3). We further validate our framework by visually investigating ISAC constraints with simulation examples. Finally, we assess the feasibility of few selected scenarios that can be enabled by ISAC, highlighting in each of them the limiting factor and, thus, which gaps should be filled by the research and standardization communities in the next years.


[65] 2405.00141

RIS-aided Wireless Communication with Movable Elements Geometry Impact on Performance

Reconfigurable Intelligent Surfaces (RIS) are known as a promising technology to improve the performance of wireless communication networks, and have been extensively studied. Movable Antennas (MA) are a novel technology that fully exploits the antenna placement for enhancing the system performance. This article aims at evaluating the impact of transmit power and number of antenna elements on the outage probability performance of an MA-enabled RIS structure (MA-RIS), compared to existing Fixed-Position Antenna RIS (FPA-RIS). The change in geometry caused by the movement of antennas and its implications for the effective number of illuminated elements, are studied for 1D and 2D array structures. Our numerical results confirm the performance advantage provided by MA-RIS, achieving 24\% improvement in outage probability, and 2 dB gain in Signal-to-Noise Ratio (SNR), as compared to FPA-RIS.


[66] 2406.19342

Unconditional Stability Analysis of N-Port Networks Based on Structured Singular Value Computation

In this paper, a novel approach based on robust stability concepts and tools is introduced to evaluate the unconditional stability of microwave active $\textit{n}$-port devices. An efficient calculation of the Structured Singular Value of the $\textit{n}$x$\textit{n}$ scattering matrix is proposed to obtain the stability characteristics of the device. The presented method is validated in two ways. First, it is applied to a referential 4x4 scattering parameter set for independent verification. Second, the method is applied to a 4-port GaAs FET amplifier fabricated in hybrid technology. The results confirm the validity and computational efficiency of the proposed approach.


[67] 2410.05907

Privacy-Enhanced Over-the-Air Federated Learning via Client-Driven Power Balancing

This paper introduces a novel privacy-enhanced over-the-air Federated Learning (OTA-FL) framework using client-driven power balancing (CDPB) to address privacy concerns in OTA-FL systems. In recent studies, a server determines the power balancing based on the continuous transmission of channel state information (CSI) from each client. Furthermore, they concentrate on fulfilling privacy requirements in every global iteration, which can heighten the risk of privacy exposure as the learning process extends. To mitigate these risks, we propose two CDPB strategies -- CDPB-n (noisy) and CDPB-i (idle) -- allowing clients to adjust transmission power independently, without sharing CSI. CDPB-n transmits noise during poor conditions, while CDPB-i pauses transmission until conditions improve. To further enhance privacy and learning efficiency, we show a mixed strategy, CDPB-mixed, which combines CDPB-n and CDPB-i. Our experimental results show that CDPB outperforms traditional approaches in terms of model accuracy and privacy guarantees, providing a practical solution for enhancing OTA-FL in resource-constrained environments.


[68] 2410.19843

Artificial intelligence for partial differential equations in computational mechanics: A review

In recent years, Artificial intelligence (AI) has become ubiquitous, empowering various fields, especially integrating artificial intelligence and traditional science (AI for Science: Artificial intelligence for science), which has attracted widespread attention. In AI for Science, using artificial intelligence algorithms to solve partial differential equations (AI for PDEs: Artificial intelligence for partial differential equations) has become a focal point in computational mechanics. The core of AI for PDEs is the fusion of data and partial differential equations (PDEs), which can solve almost any PDEs. Therefore, this article provides a comprehensive review of the research on AI for PDEs, summarizing the existing algorithms and theories. The article discusses the applications of AI for PDEs in computational mechanics, including solid mechanics, fluid mechanics, and biomechanics. The existing AI for PDEs algorithms include those based on Physics-Informed Neural Networks (PINNs), Deep Energy Methods (DEM), Operator Learning, and Physics-Informed Neural Operator (PINO). AI for PDEs represents a new method of scientific simulation that provides approximate solutions to specific problems using large amounts of data, then fine-tuning according to specific physical laws, avoiding the need to compute from scratch like traditional algorithms. Thus, AI for PDEs is the prototype for future foundation models in computational mechanics, capable of significantly accelerating traditional numerical algorithms.


[69] 2412.04802

Unsupervised Hyperspectral Image Super-Resolution via Self-Supervised Modality Decoupling

Fusion-based hyperspectral image super-resolution aims to fuse low-resolution hyperspectral images (LR-HSIs) and high-resolution multispectral images (HR-MSIs) to reconstruct high spatial and high spectral resolution images. Current methods typically apply direct fusion from the two modalities without effective supervision, leading to an incomplete perception of deep modality-complementary information and a limited understanding of inter-modality correlations. To address these issues, we propose a simple yet effective solution for unsupervised HMIF, revealing that modality decoupling is key to improving fusion performance. Specifically, we propose an end-to-end self-supervised Modality-Decoupled Spatial-Spectral Fusion (MossFuse) framework that decouples shared and complementary information across modalities and aggregates a concise representation of both LR-HSIs and HR-MSIs to reduce modality redundancy. Also, we introduce the subspace clustering loss as a clear guide to decouple modality-shared features from modality-complementary ones. Systematic experiments over multiple datasets demonstrate that our simple and effective approach consistently outperforms the existing HMIF methods while requiring considerably fewer parameters with reduced inference time. The source source code is in \href{this https URL}{MossFuse}.


[70] 2506.11293

Influence Functions for Data Attribution in Linear System Identification and LQR Control

When a controller is designed from an identified model, its performance ultimately depends on the trajectories used for identification, but pinpointing which ones help or hurt remains an open problem. We bring influence functions, a data attribution tool from machine learning, into this setting by chaining two closed form sensitivity analyses across a regularized least squares identification and an infinite horizon LQR pipeline. On the identification side, the quadratic loss admits an exact leave one trajectory out parameter shift and a reusable first order approximation with a Neumann series error bound. On the control side, we implicitly differentiate through the DARE via its discrete Lyapunov structure and compress the cost gradient to a single adjoint Lyapunov solve. The resulting scores track true LOTO retraining with Pearson correlations above 0.99 and speedups of 7 to 60 times on linear systems of dimension 2 to 10.


[71] 2506.16210

From Coarse to Continuous: Progressive Refinement Implicit Neural Representation for Motion-Robust Anisotropic MRI Reconstruction

In motion-robust magnetic resonance imaging (MRI), slice-to-volume reconstruction is critical for recovering anatomically consistent 3D brain volumes from 2D slices, especially under accelerated acquisitions or patient motion. However, this task remains challenging due to hierarchical structural disruptions. It includes local detail loss from k-space undersampling, global structural aliasing caused by motion, and volumetric anisotropy. Therefore, we propose a progressive refinement implicit neural representation (PR-INR) framework. Our PR-INR unifies motion correction, structural refinement, and volumetric synthesis within a geometry-aware coordinate space. Specifically, a motion-aware diffusion module is first employed to generate coarse volumetric reconstructions that suppress motion artifacts and preserve global anatomical structures. Then, we introduce an implicit detail restoration module that performs residual refinement by aligning spatial coordinates with visual features. It corrects local structures and enhances boundary precision. Further, a voxel continuous-aware representation module represents the image as a continuous function over 3D coordinates. It enables accurate inter-slice completion and high-frequency detail recovery. We evaluate PR-INR on five public MRI datasets under various motion conditions (3% and 5% displacement), undersampling rates (4x and 8x) and slice resolutions (scale = 5). Experimental results demonstrate that PR-INR outperforms state-of-the-art methods in both quantitative reconstruction metrics and visual quality. It further shows generalization and robustness across diverse unseen domains.


[72] 2507.00571

Delay Bound Relaxation with Deep Learning-based Haptic Estimation for Tactile Internet

Haptic teleoperation typically demands sub-millisecond latency and ultra-high reliability (99.999%) in Tactile Internet. At a 1 kHz haptic signal sampling rate, this translates into an extremely high packet transmission rate, posing significant challenges for timely delivery and introducing substantial complexity and overhead in radio resource allocation. To address this critical challenge, we introduce a novel DL modelthat estimates force feedback using multi-modal input, i.e. both force measurements from the remote side and local operator motion signals. The DL model can capture complex temporal features of haptic time-series with the use of CNN and LSTM layers, followed by a transformer encoder, and autoregressively produce a highly accurate estimation of the next force values for different teleoperation activities. By ensuring that the estimation error is within a predefined threshold, the teleoperation system can safely relax its strict delay requirements. This enables the batching and transmission of multiple haptic packets within a single resource block, improving resource efficiency and facilitating scheduling in resource allocation. Through extensive simulations, we evaluated network performance in terms of reliability and capacity. Results show that, for both dynamic and rigid object interactions, the proposed method increases the number of reliably served users by up to 66%.


[73] 2507.12703

Joint Price and Power MPC for Peak Power Reduction at Workplace EV Charging Stations

Demand charge, a utility fee based on an electricity customer's peak power consumption, often constitutes a significant portion of costs for commercial electric vehicle (EV) charging station operators. This paper explores control methods to reduce peak power consumption at workplace EV charging stations in a joint price and power optimization framework. We optimize a menu of price options to incentivize users to select controllable charging service. Using this framework, we propose a model predictive control approach to reduce both demand charge and overall operator costs. Through a Monte Carlo simulation, we find that our algorithm outperforms a state-of-the-art benchmark optimization strategy and can significantly reduce station operator costs.


[74] 2509.11467

A Goal-Oriented Approach for Active Object Detection with Exploration-Exploitation Balance

Active object detection, which aims to identify objects of interest through controlled camera movements, plays a pivotal role in real-world visual perception for autonomous robotic applications, such as manufacturing tasks (e.g., assembly operations) performed in unknown environments. A dual control for exploration and exploitation (DCEE) algorithm is presented within goal-oriented control systems to achieve efficient active object detection, leveraging active learning by incorporating variance-based uncertainty estimation in the cost function. This novel method employs an exploration-exploitation balanced cost function to actively guide the selection of the next viewpoint. Specifically, active object detection is achieved through the development of a reward function that encodes knowledge about the confidence variation of objects as a function of viewpoint position within a given domain. By identifying the unknown parameters of this function, the system generates an optimal viewpoint planning strategy. DCEE integrates parameter estimation of the reward function and view planning, ensuring a balanced trade-off between the exploitation of learned knowledge and active exploration during the planning process. Moreover, it demonstrates remarkable adaptability across diverse scenarios, effectively handling LEGO brick detection at varying locations. Importantly, the algorithm maintains consistent configuration settings and a fixed number of parameters across various scenarios, underscoring its efficiency and robustness. To validate the proposed approach, extensive numerical studies, high-fidelity virtual simulations, and real-world experiments under various scenarios were conducted. The results confirm the effectiveness of DCEE in active object detection, showcasing superior performance compared to existing methods, including model predictive control (MPC) and entropy approaches.


[75] 2509.19668

Selective Classifier-free Guidance for Zero-shot Text-to-speech

In zero-shot text-to-speech, achieving a balance between fidelity to the target speaker and adherence to text content remains a challenge. While classifier-free guidance (CFG) strategies have shown promising results in image generation, their application to speech synthesis are underexplored. Separating the conditions used for CFG enables trade-offs between different desired characteristics in speech synthesis. In this paper, we evaluate the adaptability of CFG strategies originally developed for image generation to speech synthesis and extend separated-condition CFG approaches for this domain. Our results show that CFG strategies effective in image generation generally fail to improve speech synthesis. We also find that we can improve speaker similarity while limiting degradation of text adherence by applying standard CFG during early timesteps and switching to selective CFG only in later timesteps. Surprisingly, we observe that the effectiveness of a selective CFG strategy is highly text-representation dependent, as differences between the two languages of English and Mandarin can lead to different results even with the same model.


[76] 2510.00298

Observer-Usable Information as a Task-specific Image Quality Metric

Objective, task-based measures of image quality (IQ) have been widely advocated for assessing and optimizing medical imaging technologies. Besides signal detection theory-based measures, information-theoretic quantities have been proposed to quantify task-based IQ. For example, task-specific information (TSI), defined as the mutual information between an image and a task variable, represents an optimal measure of how informative an image is for performing a specified task. However, like the ideal observer from signal detection theory, TSI does not quantify the amount of task-relevant information in an image that can be exploited by a sub-ideal observer. A recently proposed relaxation of TSI, termed predictive V-information (V-info), removes this limitation and can quantify the utility of an image with consideration of a specified family of sub-ideal observers. In this study, for the first time, we introduce and investigate V-info as an objective, task-specific IQ metric. To corroborate its usefulness, a stylized magnetic resonance image restoration problem is considered in which V-info is employed to quantify signal detection or discrimination performance. The presented results show that V-info correlates with area under the receiver operating characteristic (ROC) curve for binary tasks, while being readily applicable to multi-class (>2) tasks where ROC analysis is challenging. Notably, V-info exhibits greater sensitivity in scenarios where conventional metrics saturate. These findings demonstrate that V-info represents a new objective IQ measure that can complement conventional signal detection theory-based ones.


[77] 2511.08416

Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

Semantic communications mark a paradigm shift from bit-accurate transmission toward meaning-centric communication, essential as wireless systems approach theoretical capacity limits. The emergence of generative AI has catalyzed generative semantic communications, where receivers reconstruct content from minimal semantic cues by leveraging learned priors. Among generative approaches, diffusion models stand out for their superior generation quality, stable training dynamics, and rigorous theoretical foundations. However, the field currently lacks systematic guidance connecting diffusion techniques to communication system design, forcing researchers to navigate disparate literatures. This article provides the first comprehensive tutorial on diffusion models for generative semantic communications. We present score-based diffusion foundations and systematically review three technical pillars: conditional diffusion for controllable generation, efficient diffusion for accelerated inference, and generalized diffusion for cross-domain adaptation. In addition, we introduce an inverse problem perspective that reformulates semantic decoding as posterior inference, bridging semantic communications with computational imaging. Through analysis of human-centric, machine-centric, and agent-centric scenarios, we illustrate how diffusion models enable extreme compression while maintaining semantic fidelity and robustness. By bridging generative AI innovations with communication system design, this article aims to establish diffusion models as foundational components of next-generation wireless networks and beyond.


[78] 2511.13971

On the Impact of Voltage Unbalance on Distribution Locational Marginal Prices

Finding clear economic signals for distribution-network operation and expansion is increasingly important as single-phase loads and distributed energy resources escalate. These devices create phase-to-phase imbalances that manifest as voltage unbalance, a power quality issue that accelerates insulation aging in machines and increases network losses, thereby raising costs for operators and consumers. Traditional grid codes address unbalance via disparate hard limits on various indices thresholds that differ across standards, offer no dynamic economic incentive and undermine optimality. This paper proposes instead to treat voltage unbalance as a `soft limit' by adding penalty terms to grid operation costs within a three-phase optimal power flow to reflect the cost of the decrease in lifetime of assets due to being subject to voltage unbalance. This unified approach yields dynamic economic signals unbalance-aware Distribution Locational Marginal Prices (DLMP) that reflect the cost of power quality deviations. A novel mathematical decomposition of DLMP is developed, isolating the energy, loss, congestion, and unbalance components. Case studies conducted on two benchmark networks demonstrate the effectiveness and practical value of the proposed method. The results indicate that unbalance penalties reshape nodal prices, produce unexpected phase-level effects, and even allow scenarios where added load reduces unbalance and lowers costs, while providing planners and market designers with actionable insights to balance investment, operation, and power quality in modern distribution systems.


[79] 2511.22952

RDS-DeePC: Robust Data Selection for Data-Enabled Predictive Control via Sensitivity Score

Data Enabled Predictive Control (DeePC) is an established model free approach to predictive control, but it faces two open challenges: computational complexity that scales cubically with dataset size and performance degradation when data are corrupted. This paper introduces Robust Data Selection DeePC (RDS DeePC), a framework that addresses both obstacles through influence function analysis. We derive a sensitivity score quantifying the leverage each trajectory segment exerts on the optimization solution and prove that high sensitivity segments correspond to outliers while low sensitivity segments represent consistent data. Selecting low sensitivity segments thus yields both computational efficiency and automatic outlier filtering without requiring data quality labels. For nonlinear systems, we extend the framework via a two stage online selection approach accelerated by the LiSSA algorithm. Experiments on four systems of increasing complexity including a DC motor, an inverted pendulum, a planar quadrotor UAV tracking a figure 8 trajectory, and a kinematic bicycle vehicle following a figure 8 path demonstrate that RDS DeePC achieves 94 to 97 percent clean data selection and comparable or better tracking performance under 20 percent data corruption.


[80] 2512.19703

ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval

The dominant paradigm for Audio-Text Retrieval (ATR) relies on dual-encoder architectures optimized via mini-batch contrastive learning. However, restricting optimization to local in-batch samples creates a fundamental limitation we term the Gradient Locality Bottleneck (GLB), which prevents the resolution of acoustic ambiguities and hinders the learning of rare long-tail concepts. While external knowledge injection can break this bottleneck, it often triggers a problem called Representation-Drift Mismatch (RDM), where a static knowledge base becomes misaligned with evolving encoders, degrading guidance into noise. To address these intertwined challenges, we propose the Adaptive Self-improving Knowledge (ASK) framework. ASK breaks the GLB via multi-grained knowledge injection and mitigates RDM through a dynamic refinement strategy that synchronizes the knowledge base with the model. Additionally, an adaptive reliability weighting scheme is employed to filter retrieval noise based on cross-modal consistency. Extensive experiments across multiple benchmarks demonstrate that ASK consistently achieves new state-of-the-art performance across various backbones.


[81] 2601.20561

Tilt-based Aberration Estimation in Transmission Electron Microscopy

Transmission electron microscopes (TEMs) enable atomic-scale imaging but suffer from aberrations caused by lens imperfections and environmental conditions, reducing image quality. These aberrations can be compensated by adjusting electromagnetic lenses, but this requires accurate estimates of the aberration coefficients, which can drift over time. This paper introduces a method for the estimation of aberrations in TEM by leveraging the relationship between an induced tilt of the electron beam and the resulting image shift. The method uses a Kalman filter (KF) to estimate the aberration coefficients from a sequence of image shifts, while accounting for the drift of the aberrations over time. The applied tilt sequence is optimized by minimizing the trace of the predicted error covariance in the KF, which corresponds to the A-optimality criterion in experimental design. We show that this optimization can be performed offline, as the cost criterion is independent of the actual measurements. The resulting non-convex optimization problem is solved using a gradient-based, receding-horizon approach with multi-starts. Additionally, we develop an approach to estimate specimen-dependent noise properties using expectation maximization (EM), which are then used to tailor the tilt pattern optimization to the specific specimen being imaged. The proposed method is validated on a real TEM set-up with several optimized tilt patterns. The results show that optimized patterns significantly outperform naive approaches and that the aberration and drift model accurately captures the underlying physical phenomena. A direct comparison with the widely used Zemlin tableau shows that the proposed method achieves comparable or higher image quality on amorphous specimens, while additionally extending to non-amorphous specimens where the Zemlin tableau cannot operate.


[82] 2602.12288

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

With the growth of intelligent civil infrastructure and smart cities, operation and maintenance (O&M) increasingly requires safe, efficient, and energy-conscious robotic manipulation of articulated components, including access doors, service drawers, and pipeline valves. However, existing robotic approaches either focus primarily on grasping or target object-specific articulated manipulation, and they rarely incorporate explicit actuation energy into multi-objective optimisation, which limits their scalability and suitability for long-term deployment in real O&M settings. Therefore, this paper proposes an articulation-agnostic and energy-aware reinforcement learning framework for robotic manipulation in intelligent infrastructure O&M. The method combines part-guided 3D perception, weighted point sampling, and PointNet-based encoding to obtain a compact geometric representation that generalises across heterogeneous articulated objects. Manipulation is formulated as a Constrained Markov Decision Process (CMDP), in which actuation energy is explicitly modelled and regulated via a Lagrangian-based constrained Soft Actor-Critic scheme. The policy is trained end-to-end under this CMDP formulation, enabling effective articulated-object operation while satisfying a long-horizon energy budget. Experiments on representative O&M tasks demonstrate 16%-30% reductions in energy consumption, 16%-32% fewer steps to success, and consistently high success rates, indicating a scalable and sustainable solution for infrastructure O&M manipulation.


[83] 2603.05169

Uncertainty and Autarky: Cooperative Game Theory for Stable Local Energy Market Partitioning

Local energy markets empower prosumers to form coalitions for energy trading. However, the optimal partitioning of the distribution grid into such coalitions remains unclear, especially in constrained grids with stochastic production and consumption. This analysis must take into account the interests of both the grid operator and the constituent prosumers. In this work, we present a cooperative game theoretic framework to study distribution grid partitioning into local energy market coalitions under uncertain prosumption and grid constraints. We formulate the optimal stable partitioning problem to balance the interests of the grid operator with that of prosumers. Under deterministic load and generation, we show that the largest market coalition is the optimal stable partition. For the case of stochastic loads and generation, we provide an algorithm to evaluate the optimal stable partition. Numerical experiments are performed on benchmark and real world distribution grids. Our results help in understanding how uncertainty affects local energy market partitioning decisions in constrained distribution grids.


[84] 2603.10138

Data-Driven Successive Linearization for Optimal Voltage Control

Power distribution systems are increasingly exposed to large voltage fluctuations driven by intermittent renewable generation and time varying loads (e.g., electric vehicles and storage). To address this challenge, a number of advanced controllers have been proposed for voltage regulation. However, these controllers typically rely on fixed linear approximations of voltage dynamics. As a result, the solutions may become infeasible when applied to the actual voltage behavior governed by nonlinear power flow equations, particularly under heavy power injection from distributed energy resources. This paper proposes a data-driven successive linearization approach for voltage control under nonlinear power flow constraints. By leveraging the fact that the deviation between the nonlinear power flow solution and its linearization is bounded by the distance from the operating point, we perform data-driven linearization around the most recent operating point. Convergence of the proposed method to a neighborhood of KKT points is established by exploiting the convexity of the objective function and structural properties of the nonlinear constraints. Case studies show that the proposed approach achieves fast convergence and adapts quickly to changes in net load.


[85] 2603.10845

Human Presence Detection via Wi-Fi Range-Filtered Doppler Spectrum on Commodity Laptops

Human Presence Detection (HPD) is key to enable intelligent power management and security features in everyday devices. In this paper we propose the first HPD solution that leverages monostatic Wi-Fi sensing and detects user position using only the built-in Wi-Fi hardware of a device, with no need for external devices, access points, or additional sensors. In contrast, existing HPD solutions for laptops require external dedicated sensors which add cost and complexity, or rely on camera-based approaches that introduce significant privacy concerns. We herewith introduce the Range-Filtered Doppler Spectrum (RF-DS), a novel Wi-Fi sensing technique for presence estimation that enables both range-selective and temporally windowed detection of user presence. By applying targeted range-area filtering in the Channel Impulse Response (CIR) domain before Doppler analysis, our method focuses processing on task-relevant spatial zones, significantly reducing computational complexity. In addition, the use of temporal windows in the spectrum domain provides greater estimator stability compared to conventional 2D Range-Doppler detectors. Furthermore, we propose an adaptive multi-rate processing framework that dynamically adjusts Channel State Information (CSI) sampling rates-operating at low frame rates (10Hz) during idle periods and high rates (100Hz) only when motion is detected. To our knowledge, this is the first low-complexity solution for occupancy detection using monostatic Wi-Fi sensing on a built-in Wi-Fi network interface controller (NIC) of a commercial off-the-shelf laptop that requires no external network infrastructure or specialized sensors. Our solution can scale across different environments and devices without calibration or retraining.


[86] 2603.17499

A Tutorial on Learning-Based Radio Map Construction: Data, Paradigms, and Physics-Awarenes

The integration of artificial intelligence into next-generation wireless networks necessitates the accurate construction of radio maps (RMs) as a foundational prerequisite for electromagnetic digital twins. A RM provides the digital representation of the wireless propagation environment, mapping complex geographical and topological boundary conditions to critical spatial-spectral metrics that range from received signal strength to full channel state information matrices. This tutorial presents a comprehensive survey of learning-based RM construction, systematically addressing three intertwined dimensions: data, paradigms, and physics-awareness. From the data perspective, we review physical measurement campaigns, ray tracing simulation engines, and publicly available benchmark datasets, identifying their respective strengths and fundamental limitations. From the paradigm perspective, we establish a core taxonomy that categorizes RM construction into source-aware forward prediction and source-agnostic inverse reconstruction, and examine five principal neural architecture families spanning convolutional neural networks, vision transformers, graph neural networks, generative adversarial networks, and diffusion models. We further survey optics-inspired methods adapted from neural radiance fields and 3D Gaussian splatting for continuous wireless radiation field modeling. From the physics-awareness perspective, we introduce a three-level integration framework encompassing data-level feature engineering, loss-level partial differential equation regularization, and architecture-level structural isomorphism. Open challenges including foundation model development, physical hallucination detection, and amortized inference for real-time deployment are discussed to outline future research directions.


[87] 2603.20784

Enhanced Direction-Sensing Methods and Performance Analysis in Low-Altitude Wireless Network via a Rotation Antenna Array

Due to the directive property of each antenna element, the received signal power can be severely attenuated when the emitter deviates from the array boresight, which will lead to a severe degradation in sensing performance along the corresponding direction. Although existing rotatable array sensing methods such as recursive rotation (RR-Root-MUSIC) can mitigate this issue by iteratively rotating and sensing, several mechanical rotations and repeated eigendecomposition operations are required to yield a high computational complexity and low time-efficiency. To address this problem, a pre-rotation initialization with recieve power as a rule is proposed to signifcantly reduce the computational complexity and improve the time-efficiency. Using this idea, a low-complexity enhanced direction-sensing framework with pre-rotation initialization and iterative greedy spatial-spectrum search (PRI-IGSS) is develped with three stages: (1) the normal vector of array is rotated to a set of candidates to find the opimal direction with the maximum sensing energy with the corresponding DOA value computed by the Root-MUSIC algorithm; (2) the array is mechanically rotated to the initial estimated direction and kept fixed; (3) an iterative greedy spatial-spectrum search or recieving beamforming method, moviated by reinforcement learning, is designed with a reduced search range and making a summation of all previous sampling variance matrices and the current one is adopted to provide an increasiong performance gain as the iteration process continues. To assess the performance of the proposed method, the corresponding CRLB is derived with a simplified rotation model. Simulation results demonstrate that the proposed PRI-IGSS method performs much better than RR-Root-MUSIC and achieves the CRLB in term of mean squared error due to the fact there is no sample accumulation for the latter.


[88] 2603.21561

Digital Self-Interference Cancellation in Full-Duplex Radios: A Fundamental Limit Perspective

Digital self-interference cancellation (D-SIC) plays a crucial role in in-band full-duplex radios. Unfortunately, its fundamental limit remains unclear. In this paper, we aim to address this problem by exploring the performance limit of the parallel Hammerstein (PH) canceller for D-SIC, which is most commonly used in practice. First, a comprehensive analysis of the power of the residual self-interference (RSI) after the PH canceller with the least squares (LS) estimator is provided, which takes into account the truncation error, reconstruction error and transmitter noise. Specifically, the analysis is greatly simplified by equivalently expanding the PH canceller via generalized Laguerre polynomials (GLP), which enjoys the desirable property of mutual orthogonality among the basis functions. As a by-product of this orthogonal expansion, we establish that the LS estimator for the weights of the GLP canceller is asymptotically \textit{unbiased}, if the pilot sequence is Gaussian distributed. Second, in order to minimize the reconstruction error of the PH canceller, we propose a succinct criterion for optimizing the pilot sequence, which essentially seeks for small eigenvalue spread and large minimum eigenvalue of the Gram matrix corresponding to the pilot sequence. Specifically, the criterion is to minimize the product of the Shannon rank, an effective rank of a positive semidefinite matrix and the minimum eigenvalue of the Gram matrix. Simulation results demonstrate that with the optimized pilot sequence of a single OFDM symbol, over 10 dB gain can be achieved compared to the conventional pilot sequence (HE-LTF) for the PH canceller, and the corresponding RSI can be as low as -87.6 dBm.


[89] 2603.22131

WiRD-Gest: Gesture Recognition In The Real World Using Range-Doppler Wi-Fi Sensing on COTS Hardware

Wi-Fi sensing has emerged as a promising technique for gesture recognition, yet its practical deployment is hindered by environmental sensitivity and device placement challenges. To overcome these limitations we propose Wi-Fi Range and Doppler (WiRD)-Gest, a novel system that performs gesture recognition using a single, unmodified Wi-Fi transceiver on a commercial off-the-shelf (COTS) laptop. The system leverages an monostatic full duplex sensing pipeline capable of extracting Range-Doppler (RD) information. Utilizing this, we present the first benchmark of deep learning models for gesture recognition based on monostatic sensing. The key innovation lies in how monostatic sensing and spatial (range) information fundamentally transforms accuracy, robustness and generalization compared to prior approaches. We demonstrate excellent performance in crowded, unseen public spaces with dynamic interference and additional moving targets even when trained on data from controlled environments only. These are scenarios where prior Wi-Fi sensing approaches often fail, however, our system suffers minor degradation. The WiRD-Gest benchmark and dataset will also be released as open source.


[90] 2302.10426

An Accurate and Interpretable Framework for Trustworthy Process Monitoring

Trustworthy process monitoring seeks to build an accurate and interpretable monitoring framework, which is critical for ensuring the safety of energy conversion plant (ECP) that operates under extreme working conditions such as high pressure and temperature. Contemporary self-attentive models, however, fall short in this domain for two main reasons. First, they rely on step-wise correlations that fail to involve physically meaningful semantics in ECP logs, resulting in suboptimal accuracy and interpretability. Second, attention matrices are frequently cluttered with spurious correlations that obscure physically meaningful ones, further impeding effective interpretation. To overcome these issues, we propose AttentionMixer, a framework aimed at improving both accuracy and interpretability of existing methods and establish a trustworthy ECP monitoring framework. Specifically, to tackle the first issue, we employ a spatial adaptive message passing block to capture variate-wise correlations. This block is coupled with a temporal adaptive message passing block through an \textit{mixing} operator, yielding a multi-faceted representation of ECP logs accounting for both step-wise and variate-wise correlations. Concurrently, to tackle the second issue, we employ a sparse message passing regularizer to filter out spurious correlations. We validate the efficacy of AttentionMixer using two real-world datasets from the radiation monitoring network for Chinese nuclear power plants.


[91] 2501.01921

Structural and Statistical Audio Texture Knowledge Distillation for Acoustic Classification

While knowledge distillation has shown success in various audio tasks, its application to environmental sound classification often overlooks essential low-level audio texture features needed to capture local patterns in complex acoustic environments. To address this gap, the Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) framework is proposed, which combines high-level contextual information with low-level structural and statistical audio textures extracted from intermediate layers. To evaluate its generalizability across diverse acoustic domains, SSATKD is tested on four datasets within the environmental sound classification domain, including two passive sonar datasets (DeepShip and Vessel Type Underwater Acoustic Data (VTUAD)) and two general environmental sound datasets (Environmental Sound Classification 50 (ESC-50) and Tampere University of Technology (TUT) Acoustic Scenes). Two teacher adaptation strategies are explored: classifier-head-only adaptation and full fine-tuning. The framework is further evaluated using various convolutional and transformer-based teacher models. Experimental results demonstrate consistent accuracy improvements across all datasets and settings, confirming the effectiveness and robustness of SSATKD in real-world sound classification tasks.


[92] 2501.02949

MSA-CNN: A Lightweight Multi-Scale CNN with Attention for Sleep Stage Classification

Recent advancements in machine learning-based signal analysis, coupled with open data initiatives, have fuelled efforts in automatic sleep stage classification. Despite the proliferation of classification models, few have prioritised reducing model complexity, which is a crucial factor for practical applications. In this work, we introduce Multi-Scale and Attention Convolutional Neural Network (MSA-CNN), a lightweight architecture featuring as few as ~10,000 parameters. MSA-CNN leverages a novel multi-scale module employing complementary pooling to eliminate redundant filter parameters and dense convolutions. Model complexity is further reduced by separating temporal and spatial feature extraction and using cost-effective global spatial convolutions. This separation of tasks not only reduces model complexity but also mirrors the approach used by human experts in sleep stage scoring. We evaluated both small and large configurations of MSA-CNN against nine state-of-the-art baseline models across three public datasets, treating univariate and multivariate models separately. Our evaluation, based on repeated cross-validation and re-evaluation of all baseline models, demonstrated that the large MSA-CNN outperformed all baseline models on all three datasets in terms of accuracy and Cohen's kappa, despite its significantly reduced parameter count. Lastly, we explored various model variants and conducted an in-depth analysis of the key modules and techniques, providing deeper insights into the underlying mechanisms. The code for our models, baselines, and evaluation procedures is available at this https URL.


[93] 2505.00333

Two Stage Wireless Federated LoRA Fine-Tuning with Sparsified Orthogonal Updates

Transformer-based large language models (LLMs) have achieved remarkable success across various tasks. Yet, fine-tuning such massive models in federated learning (FL) settings poses significant challenges due to resource constraints and communication overhead. Low-Rank Adaptation (LoRA) addresses these issues by training compact, low-rank matrices instead of fully fine-tuning large models. This paper introduces a wireless federated LoRA fine-tuning framework that optimizes both learning performance and communication efficiency. We provide a novel convergence analysis, revealing how LoRA rank and covariance effects influence FL training dynamics. Leveraging these insights, we propose Sparsified Orthogonal Fine-Tuning (\textbf{SOFT}), an adaptive sparsification method that streamlines parameter updates without expensive matrix multiplications and singular value decomposition (SVD) operations. Additionally, we present a Two Stage Federated Algorithm (\textbf{TSFA}) algorithm that pre-determines key parameters offline and dynamically adjusts bandwidth and sparsification online, ensuring efficient training under latency constraints. Experiments on benchmark datasets show that our approach achieves accuracy comparable to ideal scenario models while significantly reducing communication overhead. Our framework thus enables scalable, resource-efficient deployment of large models in real-world wireless FL scenarios.


[94] 2505.02395

A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints

We present a real-time safety filter for motion planning, including those that are learning-based, using Control Barrier Functions (CBFs) to provide formal guarantees for collision avoidance with road boundaries. A key feature of our approach is its ability to directly incorporate road geometries of arbitrary shape that are represented as polylines without resorting to conservative overapproximations. We formulate the safety filter as a constrained optimization problem as a Quadratic Program (QP), which achieves safety by making minimal, necessary adjustments to the control actions issued by the nominal motion planner. We validate our safety filter through extensive numerical experiments across a variety of traffic scenarios featuring complex road boundaries. The results confirm its reliable safety and high computational efficiency (execution frequency up to 40 Hz). Code reproducing our experimental results and a video demonstration are available at this http URL.


[95] 2505.08432

Low-complexity Detection for Noncoherent Massive MIMO Communications

This work studies a point-to-point MIMO uplink in which user equipment transmits data to a base station employing a massive array. Signal detection is noncoherent and fading is assumed to follow the Weichselberger model. By exploiting the spatial stationarity of fading at the base station, a cyclostationary structure emerges naturally in the space-time representation, which suggests formulating the statistical properties of the received signal in the Karhunen-Loève domain. This allows the derivation of a low-complexity receiver that approximates maximum likelihood detection even for a moderate array size. The spectral analysis of the problem provides valuable insights on the design of space-time codewords.


[96] 2507.06788

Dynamic Output-Feedback Controller Synthesis for Dissipativity and $H_2$ Performance from Noisy Input-State Data

In this paper we propose dynamic output-feedback controller synthesis methods for discrete-time linear time-invariant systems. The synthesis goal is to achieve dissipativity with respect to a given quadratic supply rate or a given $H_2$ performance level. It is assumed that the model of system dynamics is unknown, expect for the disturbance term. Instead, we have a recorded trajectory of the control input and the state, which can be corrupted by an unknown but bounded disturbance. The state data is used only for the purpose of controller synthesis, while the designed controller is output feedback controller, i.e., the full state is not used for control in real time. The presented synthesis method is formulated in terms of linear matrix inequalities parametrized by a scalar variable. Within the considered setting, the synthesis procedure is non-conservative.


[97] 2508.20476

Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio

Audio is the primary modality for human communication and has driven the success of Automatic Speech Recognition (ASR) technologies. However, such audio-centric systems inherently exclude individuals who are deaf or hard of hearing. Visual alternatives such as sign language and lip reading offer effective substitutes, and recent advances in Sign Language Translation (SLT) and Visual Speech Recognition (VSR) have improved audio-less communication. Yet, these modalities have largely been studied in isolation, and their integration within a unified framework remains underexplored. In this paper, we propose the first unified framework capable of handling diverse combinations of sign language, lip movements, and audio for spoken-language text generation. We focus on three main objectives: (i) designing a unified, modality-agnostic architecture capable of effectively processing heterogeneous inputs; (ii) exploring the underexamined synergy among modalities, particularly the role of lip movements as non-manual cues in sign language comprehension; and (iii) achieving performance on par with or superior to state-of-the-art models specialized for individual tasks. Building on this framework, we achieve performance on par with or better than task-specific state-of-the-art models across SLT, VSR, ASR, and Audio-Visual Speech Recognition. Furthermore, our analysis reveals a key linguistic insight: explicitly modeling lip movements as a distinct modality significantly improves SLT performance by capturing critical non-manual cues.


[98] 2509.06027

DreamAudio: Customized Text-to-Audio Generation with Diffusion Models

With the development of large-scale diffusion-based and language-modeling-based generative models, impressive progress has been achieved in text-to-audio generation. Despite producing high-quality outputs, existing text-to-audio models mainly aim to generate semantically aligned sound and fall short of controlling fine-grained acoustic characteristics of specific sounds. As a result, users who need specific sound content may find it difficult to generate the desired audio clips. In this paper, we present DreamAudio for customized text-to-audio generation (CTTA). Specifically, we introduce a new framework that is designed to enable the model to identify auditory information from user-provided reference concepts for audio generation. Given a few reference audio samples containing personalized audio events, our system can generate new audio samples that include these specific events. In addition, two types of datasets are developed for training and testing the proposed systems. The experiments show that DreamAudio generates audio samples that are highly consistent with the customized audio features and aligned well with the input text prompts. Furthermore, DreamAudio offers comparable performance in general text-to-audio tasks. We also provide a human-involved dataset containing audio events from real-world CTTA cases as the benchmark for customized generation tasks.


[99] 2509.25802

Graph Distribution-valued Signals: A Wasserstein Space Perspective

We introduce a novel framework for graph signal processing (GSP) that models signals as graph distribution-valued signals (GDSs), which are probability distributions in the Wasserstein space. This approach overcomes key limitations of classical vector-based GSP, including the assumption of synchronous observations over vertices, the inability to capture uncertainty, and the requirement for strict correspondence in graph filtering. By representing signals as distributions, GDSs naturally encode uncertainty and stochasticity, while strictly generalizing traditional graph signals. We establish a systematic dictionary mapping core GSP concepts to their GDS counterparts, demonstrating that classical definitions are recovered as special cases. The effectiveness of the framework is validated through graph filter learning for prediction tasks, supported by experimental results.


[100] 2510.27211

Nonasymptotic Convergence Rates for Plug-and-Play Methods With MMSE Denoisers

It is known that the minimum-mean-squared-error (MMSE) denoiser under Gaussian noise can be written as a proximal operator, which suffices for asymptotic convergence of plug-and-play (PnP) methods but does not reveal the structure of the induced regularizer or give convergence rates. We show that the MMSE denoiser corresponds to a regularizer that can be written explicitly as an upper Moreau envelope of the negative log-marginal density, which in turn implies that the regularizer is 1-weakly convex. Using this property, we derive (to the best of our knowledge) the first sublinear convergence guarantee for PnP proximal gradient descent with an MMSE denoiser. We validate the theory with a one-dimensional synthetic study that recovers the implicit regularizer. We also validate the theory with imaging experiments (deblurring and computed tomography), which exhibit the predicted sublinear behavior.


[101] 2512.22501

NOWA: Null-space Optical Watermark for Invisible Capture Fingerprinting and Tamper Localization

Ensuring the authenticity and ownership of digital images is increasingly challenging as modern editing tools enable highly realistic forgeries. Existing image protection systems mainly rely on digital watermarking, which is susceptible to sophisticated digital attacks. To address this limitation, we propose a hybrid optical-digital framework that incorporates physical authentication cues during image formation and preserves them through a learned reconstruction process. At the optical level, a phase mask in the camera aperture produces a Null-space Optical Watermark (NOWA) that lies in the Null Space of the imaging operator and therefore remains invisible in the captured image. Then, a Null-Space Network (NSN) performs measurement-consistent reconstruction that delivers high-quality protected images while preserving the NOWA signature. The proposed design enables tamper localization by projecting the image onto the camera's null space and detecting pixel-level inconsistencies. Our design preserves perceptual quality, resists common degradations such as compression, and establishes a structural security asymmetry: without access to the optical or NSN parameters, adversaries cannot forge the NOWA signature. Experiments with simulations and a prototype camera demonstrate competitive performance in terms of image quality preservation, and tamper localization accuracy compared to state-of-the-art digital watermarking and learning-based authentication methods.


[102] 2602.11478

Defining causal mechanism in dual process theory and two types of feedback control

Mental events are considered to supervene on physical events. A supervenient event does not change without a corresponding change in the underlying subvenient physical events. Since wholes and their parts exhibit the same supervenience-subvenience relations, inter-level causation has been expected to serve as a model for mental causation. We proposed an inter-level causation mechanism to construct a model of consciousness and an agent's self-determination. However, a significant gap exists between this mechanism and cognitive functions. Here, we demonstrate how to integrate the inter-level causation mechanism with the widely known dual-process theories. We assume that the supervenience level is composed of multiple supervenient functions (i.e., neural networks), and we argue that inter-level causation can be achieved by controlling the feedback error defined through changing algebraic expressions combining these functions. Using inter-level causation allows for a dual laws model in which each level possesses its own distinct dynamics. In this framework, the feedback error is determined independently by two processes: (1) the selection of equations combining supervenient functions, and (2) the negative feedback error reduction to satisfy the equations through adjustments of neurons and synapses. We interpret these two independent feedback controls as Type 1 and Type 2 processes in the dual process theories. As a result, theories of consciousness, agency, and dual process theory are unified into a single framework, and the characteristic features of Type 1 and Type 2 processes are naturally derived.


[103] 2602.11488

When Audio-LLMs Don't Listen: A Cross-Linguistic Study of Modality Arbitration

When audio and text conflict, speech-enabled language models follow text far more often than they do when arbitrating between two conflicting text sources, even under explicit instructions to trust the audio. We introduce ALME (Audio-LLM Modality Evaluation), a dataset of 57,602 controlled audio-text conflict stimuli across eight languages, together with Text Dominance Ratio (TDR), which measures how often a model follows conflicting text when instructed to follow audio. Gemini 2.0 Flash and GPT-4o show TDR 10--26$\times$ higher than a baseline that replaces audio with its transcript under otherwise identical conditions (Gemini 2.0 Flash: 16.6% vs. 1.6%; GPT-4o: 23.2% vs. 0.9%). These results suggest that text dominance reflects not only information content, but also an asymmetry in arbitration accessibility, i.e., how easily the model can use competing representations at decision time. Framing the transcript as deliberately corrupted reduces TDR by 80%, whereas forcing explicit transcription increases it by 14%. A fine-tuning ablation further suggests that arbitration behavior depends more on LLM reasoning than on the audio input path alone. Across four audio-LLMs, we observe the same qualitative pattern with substantial cross-model and cross-linguistic variation.