New articles on Electrical Engineering and Systems Science


[1] 2509.06966

Cross-device Zero-shot Label Transfer via Alignment of Time Series Foundation Model Embeddings

High-quality, medically validated labels exist for clinical actigraphy data but not for ubiquitous consumer wearables like the Apple Watch. Manually labeling wearables data is expensive and doesn't scale. This paper offers a novel framework that transfers valuable labels from a source domain (e.g., actigraphy) to a target domain (e.g., Apple Watch) without requiring paired data. Instead of working with raw time-series signals, we project both domains into a shared latent embedding space using time-series foundation models (TSFMs) and develop a new framework to align the cross-device representations. Our method, Adversarial Alignment of TSFM Embeddings forces the distributions of source and target embeddings to align within this space, facilitating label transfer across device type.


[2] 2509.06967

Cross-field SNR Analysis and Tensor Channel Estimation for Multi-UAV Near-field Communications

Extremely large antenna array (ELAA) is key to enhancing spectral efficiency in 6G networks. Leveraging the distributed nature of multi-unmanned aerial vehicle (UAV) systems enables the formation of distributed ELAA, which often operate in the near-field region with spatial sparsity, rendering the conventional far-field plane wave assumption invalid. This paper investigates channel estimation for distributed near-field multi-UAV communication systems. We first derive closed-form signal-to-noise ratio (SNR) expressions under the plane wave model (PWM), spherical wave model (SWM), and a hybrid spherical-plane wave model (HSPWM), also referred to as the cross-field model, within a distributed uniform planar array (UPA) scenario. The analysis shows that HSPWM achieves a good balance between modeling accuracy and analytical tractability. Based on this, we propose two channel estimation algorithms: the spherical-domain orthogonal matching pursuit (SD-OMP) and the tensor-OMP. The SD-OMP generalizes the polar domain to jointly consider elevation, azimuth, and range. Under the HSPWM, the channel is naturally formulated as a tensor, enabling the use of tensor-OMP. Simulation results demonstrate that tensor-OMP achieves normalized mean square error (NMSE) performance comparable to SD-OMP, while offering reduced computational complexity and improved scalability.


[3] 2509.06968

Deep Learning-based Techniques for Integrated Sensing and Communication Systems: State-of-the-Art, Challenges, and Opportunities

This article comprehensively reviews recent developments and research on deep learning-based (DL-based) techniques for integrated sensing and communication (ISAC) systems. ISAC, which combines sensing and communication functionalities, is regarded as a key enabler for 6G and beyond networks, as many emerging applications, such as vehicular networks and industrial robotics, necessitate both sensing and communication capabilities for effective operation. A unified platform that provides both functions can reduce hardware complexity, alleviate frequency spectrum congestion, and improve energy efficiency. However, integrating these functionalities on the same hardware requires highly optimized signal processing and system design, introducing significant computational complexity when relying on conventional iterative or optimization-based techniques. As an alternative to conventional techniques, DL-based techniques offer efficient and near-optimal solutions with reduced computational complexity. Hence, such techniques are well-suited for operating under limited computational resources and low latency requirements in real-time systems. DL-based techniques can swiftly and effectively yield near-optimal solutions for a wide range of sophisticated ISAC-related tasks, including waveform design, channel estimation, sensing signal processing, data demodulation, and interference mitigation. Therefore, motivated by these advantages, recent studies have proposed various DL-based approaches for ISAC system design. After briefly introducing DL architectures and ISAC fundamentals, this survey presents a comprehensive and categorized review of state-of-the-art DL-based techniques for ISAC, highlights their key advantages and major challenges, and outlines potential directions for future research.


[4] 2509.07020

Physics-Guided Diffusion Transformer with Spherical Harmonic Posterior Sampling for High-Fidelity Angular Super-Resolution in Diffusion MRI

Diffusion MRI (dMRI) angular super-resolution (ASR) aims to reconstruct high-angular-resolution (HAR) signals from limited low-angular-resolution (LAR) data without prolonging scan time. However, existing methods are limited in recovering fine-grained angular details or preserving high fidelity due to inadequate modeling of q-space geometry and insufficient incorporation of physical constraints. In this paper, we introduce a Physics-Guided Diffusion Transformer (PGDiT) designed to explore physical priors throughout both training and inference stages. During training, a Q-space Geometry-Aware Module (QGAM) with b-vector modulation and random angular masking facilitates direction-aware representation learning, enabling the network to generate directionally consistent reconstructions with fine angular details from sparse and noisy data. In inference, a two-stage Spherical Harmonics-Guided Posterior Sampling (SHPS) enforces alignment with the acquired data, followed by heat-diffusion-based SH regularization to ensure physically plausible reconstructions. This coarse-to-fine refinement strategy mitigates oversmoothing and artifacts commonly observed in purely data-driven or generative models. Extensive experiments on general ASR tasks and two downstream applications, Diffusion Tensor Imaging (DTI) and Neurite Orientation Dispersion and Density Imaging (NODDI), demonstrate that PGDiT outperforms existing deep learning models in detail recovery and data fidelity. Our approach presents a novel generative ASR framework that offers high-fidelity HAR dMRI reconstructions, with potential applications in neuroscience and clinical research.


[5] 2509.07042

PUUMA (Placental patch and whole-Uterus dual-branch U-Mamba-based Architecture): Functional MRI Prediction of Gestational Age at Birth and Preterm Risk

Preterm birth is a major cause of mortality and lifelong morbidity in childhood. Its complex and multifactorial origins limit the effectiveness of current clinical predictors and impede optimal care. In this study, a dual-branch deep learning architecture (PUUMA) was developed to predict gestational age (GA) at birth using T2* fetal MRI data from 295 pregnancies, encompassing a heterogeneous and imbalanced population. The model integrates both global whole-uterus and local placental features. Its performance was benchmarked against linear regression using cervical length measurements obtained by experienced clinicians from anatomical MRI and other Deep Learning architectures. The GA at birth predictions were assessed using mean absolute error. Accuracy, sensitivity, and specificity were used to assess preterm classification. Both the fully automated MRI-based pipeline and the cervical length regression achieved comparable mean absolute errors (3 weeks) and good sensitivity (0.67) for detecting preterm birth, despite pronounced class imbalance in the dataset. These results provide a proof of concept for automated prediction of GA at birth from functional MRI, and underscore the value of whole-uterus functional imaging in identifying at-risk pregnancies. Additionally, we demonstrate that manual, high-definition cervical length measurements derived from MRI, not currently routine in clinical practice, offer valuable predictive information. Future work will focus on expanding the cohort size and incorporating additional organ-specific imaging to improve generalisability and predictive performance.


[6] 2509.07134

Modeling the Doppler Shift in Cislunar Environment with Gaussian Mixture Models

This study investigates the RF-based Doppler shift distribution characterization of the Lunar South Pole (LSP) based inter-satellite link (ISL) in varying inclination. Doppler shift in parts per million (ppm) is determined and analyzed, as it provides an independence from the carrier frequency. Due to unknown relative velocity states duration, the Gaussian Mixture Model (GMM) is found to be the best fitting distribution for ISLs with $1^\circ$ inclination interval Doppler shift with respect to a predetermined satellite. Goodness-of-fit is investigated and quantified with Kullback-Leibler (KL) divergence and weighted mean relative difference (WMRD) error metrics. Simulation results show that ISL Doppler shifts reach up to $\pm1.89$ ppm as the inclination of the other orbit deviates higher from the reference orbit, inclining $80^\circ$. Regarding the error measurements of GMM fitting, the WMRD and KL divergence metrics for ISL take values up to 0.6575 and 2.2963, respectively.


[7] 2509.07172

Impact of Fading Correlation on the High-SNR Regime of Reconfigurable Intelligent Surfaces

This paper addresses three critical limitations in previous analyses of RIS-aided wireless systems: propagation environments with fixed diversity gain, restricted spatial correlation profiles, and approximation methods that fail to capture the system behavior in the high signal-to-noise ratio (SNR) regime. To overcome these challenges, we conduct an exact asymptotic analysis focused on the left tail of the SNR distribution, which plays a critical role in high-SNR system performance. Additionally, to account for general correlation profiles and fading environments with variable diversity and coding gains, we consider arbitrarily correlated Nakagami-m fading channels. The analytical results show that fading correlation induces a horizontal shift in the asymptotic behavior -- represented as a straight line in the log-dB scale -- of the PDF and CDF, displacing these curves to the left. The asymptotic linear coefficient quantifies this shift, while the angular coefficient remains unaffected. Moreover, the results reveal that the high sensitivity of the linear coefficient to correlation arises from the aggregated contribution of all marginal asymptotic terms, effectively capturing each channel's correlation characteristics.


[8] 2509.07193

Evaluation of Machine Learning Reconstruction Techniques for Accelerated Brain MRI Scans

This retrospective-prospective study evaluated whether a deep learning-based MRI reconstruction algorithm can preserve diagnostic quality in brain MRI scans accelerated up to fourfold, using both public and prospective clinical data. The study included 18 healthy volunteers (scans acquired at 3T, January 2024-March 2025), as well as selected fastMRI public datasets with diverse pathologies. Phase-encoding-undersampled 2D/3D T1, T2, and FLAIR sequences were reconstructed with DeepFoqus-Accelerate and compared with standard-of-care (SOC). Three board-certified neuroradiologists and two MRI technologists independently reviewed 36 paired SOC/AI reconstructions from both datasets using a 5-point Likert scale, while quantitative similarity was assessed for 408 scans and 1224 datasets using Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Haar wavelet-based Perceptual Similarity Index (HaarPSI). No AI-reconstructed scan scored below 3 (minimally acceptable), and 95% scored $\geq 4$. Mean SSIM was 0.95 $\pm$ 0.03 (90% cases >0.90), PSNR >41.0 dB, and HaarPSI >0.94. Inter-rater agreement was slight to moderate. Rare artifacts did not affect diagnostic interpretation. These findings demonstrate that DeepFoqus-Accelerate enables robust fourfold brain MRI acceleration with 75% reduced scan time, while preserving diagnostic image quality and supporting improved workflow efficiency.


[9] 2509.07195

Identifying and Calibrating Overconfidence in Noisy Speech Recognition

Modern end-to-end automatic speech recognition (ASR) models like Whisper not only suffer from reduced recognition accuracy in noise, but also exhibit overconfidence - assigning high confidence to wrong predictions. We conduct a systematic analysis of Whisper's behavior in additive noise conditions and find that overconfident errors increase dramatically at low signal-to-noise ratios, with 10-20% of tokens incorrectly predicted with confidence above 0.7. To mitigate this, we propose a lightweight, post-hoc calibration framework that detects potential overconfidence and applies temperature scaling selectively to those tokens, without altering the underlying ASR model. Evaluations on the R-SPIN dataset demonstrate that, in the low signal-to-noise ratio range (-18 to -5 dB), our method reduces the expected calibration error (ECE) by 58% and triples the normalized cross entropy (NCE), yielding more reliable confidence estimates under severe noise conditions.


[10] 2509.07201

Design of Input-Output Observers for a Population of Systems with Bounded Frequency-Domain Variation using $DK$-iteration

This paper proposes a linear input-output observer design methodology for a population of systems in which each observer uses knowledge of the linear time-invariant dynamics of the particular device. Observers are typically composed of a known model of the system and a correction mechanism to produce an estimate of the state. The proposed design procedure characterizes the variation within the population in the frequency domain and synthesizes a single robust correction filter. The correction filter is compatible with all system models that satisfy the variation characterization such that a given level of estimation performance is guaranteed. This is accomplished by posing a robust performance problem using the observer error dynamics and solving it using $DK$-iteration. The design procedure is experimentally demonstrated on a flexible joint robotic manipulator with varied joint stiffnesses. It is shown that the proposed method that uses a single correction filter achieves comparable estimation performance to a method that uses a correction gain tailored toward each joint stiffness configuration.


[11] 2509.07203

Extended Version: Market-Driven Equilibria for Distributed Solar Panel Investment

This study investigates market-driven long-term investment decisions in distributed solar panels by individual investors. We consider a setting where investment decisions are driven by expected revenue from participating in short-term electricity markets over the panel's lifespan. These revenues depend on short-term markets equilibria, i.e., prices and allocations, which are influenced by aggregate invested panel capacity participating in the markets. We model the interactions among investors by a non-atomic game and develop a framework that links short-term markets equilibria to the resulting long-term investment equilibrium. Then, within this framework, we analyze three market mechanisms: (a) a single-product real-time energy market, (b) a product-differentiated real-time energy market that treats solar energy and grid energy as different products, and (c) a contract-based panel market that trades claims or rights to the production of certain panel capacity ex-ante, rather than the realized solar production ex-post. For each, we derive expressions for short-term equilibria and the associated expected revenues, and analytically characterize the corresponding long-term Nash equilibrium aggregate capacity. We compare the solutions of these characterizing equations under different conditions and theoretically establish that the product-differentiated market always supports socially optimal investment, while the single-product market consistently results in under-investment. We also establish that the contract-based market leads to over-investment when the extra valuations of users for solar energy are small. Finally, we validate our theoretical findings through numerical experiments.


[12] 2509.07218

Electricity Demand and Grid Impacts of AI Data Centers: Challenges and Prospects

The rapid growth of artificial intelligence (AI) is driving an unprecedented increase in the electricity demand of AI data centers, raising emerging challenges for electric power grids. Understanding the characteristics of AI data center loads and their interactions with the grid is therefore critical for ensuring both reliable power system operation and sustainable AI development. This paper provides a comprehensive review and vision of this evolving landscape. Specifically, this paper (i) presents an overview of AI data center infrastructure and its key components, (ii) examines the key characteristics and patterns of electricity demand across the stages of model preparation, training, fine-tuning, and inference, (iii) analyzes the critical challenges that AI data center loads pose to power systems across three interrelated timescales, including long-term planning and interconnection, short-term operation and electricity markets, and real-time dynamics and stability, and (iv) discusses potential solutions from the perspectives of the grid, AI data centers, and AI end-users to address these challenges. By synthesizing current knowledge and outlining future directions, this review aims to guide research and development in support of the joint advancement of AI data centers and power systems toward reliable, efficient, and sustainable operation.


[13] 2509.07229

Joint Spatial and Spectral Hybrid Precoding for Multi-User MIMO-OFDM Systems

The deployment of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems cannot rely solely on digital precoding due to hardware constraints. Instead, hybrid precoding, which combines digital and radio frequency (RF) techniques, has emerged as a potential alternative. This approach strikes a balance between performance and cost, addressing the limitations of signal mixers and analog-to-digital converters in mmWave systems. mmWave systems are designed to function in wideband channels with frequency selectivity, necessitating the use of orthogonal frequency-division multiplexing (OFDM) to mitigate dispersive channels. However, OFDM faces several challenges. First, it suffers from a high peak-to-average power ratio (PAPR) due to the linear combination of subcarriers. Second, it suffers from out-of-band (OOB) emissions due to the sharp spectral transitions of OFDM subcarriers and windowing-induced spectral leakage. Furthermore, phase shifter (PS) impairments at the RF transmitter precoder and the user combiner represent a limitation in practical mmWave systems, leading to phase errors. This work addresses these challenges. We study the problem of robust digital-RF precoding optimization for the downlink sum-rate maximization in hybrid multi-user (MU) MIMO-OFDM systems under maximum transmit power, PAPR, and OOB emission constraints. The formulated maximization problem is non-convex and difficult to solve. We propose a weighted minimum mean squared error (WMMSE) based block coordinate descent (BCD) method to iteratively optimize digital-RF precoders at the transmitter and digital-RF combiners at the users. Low-cost and scalable optimization approaches are proposed to efficiently solve the BCD subproblems. Extensive simulation results are conducted to demonstrate the efficiency of the proposed approaches and exhibit their superiority relative to well-known benchmarks.


[14] 2509.07293

Experimental Analysis of Biasing Voltage Generation in Wave-Controlled RIS

Reconfigurable intelligent surfaces (RISs), an emerging technology proposed for inclusion in next generation wireless communication systems, are programmable surfaces that can adaptively reflect incident electromagnetic radiation in different desired directions. To reduce the complexity and physical profile of conventional RIS designs, a novel concept known as Wave-Controlled RIS has been proposed, in which standing waves along a transmission line are used to generate the required dc bias for reflective control. This paper shows the design of such a Wave-Controlled RIS and its biasing transmission line. The effectiveness of this approach in generating the correct dc bias from a single standing wave frequency is analyzed through both theoretical modeling and experimental validation, which uncovered a dependence on impedance matching not accounted for by the theory. Additionally, the potential for reflective control using only a single standing wave frequency on the biasing transmission line is explored, demonstrating the ability of single-beam steering toward angles near broadside.


[15] 2509.07294

Learning Neural Koopman Operators with Dissipativity Guarantees

We address the problem of learning a neural Koopman operator model that provides dissipativity guarantees for an unknown nonlinear dynamical system that is known to be dissipative. We propose a two-stage approach. First, we learn an unconstrained neural Koopman model that closely approximates the system dynamics. Then, we minimally perturb the parameters to enforce strict dissipativity. Crucially, we establish theoretical guarantees that extend the dissipativity properties of the learned model back to the original nonlinear system. We realize this by deriving an exact relationship between the dissipativity of the learned model and the true system through careful characterization of the identification errors from the noisy data, Koopman operator truncation, and generalization to unseen data. We demonstrate our approach through simulation on a Duffing oscillator model.


[16] 2509.07304

Distributed Leader-Follower Consensus for Uncertain Multiagent Systems with Time-Triggered Switching of the Communication Network

A distributed adaptive control strategy is developed for heterogeneous multiagent systems in nonlinear Brunovsky form with \({\pd}\)-dimensional $n^{\text{th}}$-order dynamics, operating under time-triggered switching communication topologies. The approach uses repulsive potential functions to ensure agent-agent and obstacle safety, while neural network estimators compensate for system uncertainties and disturbances. A high-order control barrier function framework is then employed to certify the positive invariance of the safe sets and the boundedness of the proposed control inputs. The resulting distributed control and adaptive laws, together with dwell-time requirements for topology transitions, achieve leader-following consensus. This integrated design provides synchronized formation and robust disturbance rejection in evolving network configurations, and its effectiveness is demonstrated through numerical simulations.


[17] 2509.07320

Data-knowledge fusion driven frequency security assessment: A robust framework for renewable-dominated power grids

Frequency security is critical for power grids, as deviations can trigger widespread outages and result in substantial economic losses. However, modern renewable-dominated power grids face an increased risk of insecurity due to low inertia and nonlinear frequency responses. To mitigate these risks, robust pre-fault frequency security assessment (FSA) is critical, which enables grid operators to implement preventive control strategies. We propose a data-knowledge fusion framework to achieve intelligent FSA in actual power grids. First, we classify FSA domain knowledge into two distinct categories: (1) physics-guided knowledge directs the neural network pre-training process, ensuring that the fusion model's predictions consistent with frequency response mechanisms, and (2) physics-constrained knowledge establishes quantitative relationship on predictions, which forces them within theoretical ranges defined by domain knowledge. Furthermore, we develop a dual-channel neural network architecture to simultaneously capture both local and global characteristics of the power system. Finally, we introduce a data-knowledge fusion training algorithm that integrates guided learning with constrained network architecture to enhance model reliability and generalization. Case studies on China's Yunnan Provincial Power Grid validate the superior performance of our framework: it reduces average prediction error to 1.26% (a 49.2% reduction over data-driven methods), and maintains 97.60% accuracy in untrained scenarios (3.85% higher than data-driven methods), therefore satisfies the accuracy, reliability, and generalization requirements for actual power grids. The proposed methodology establishes a new paradigm for enhancing robustness of FSA in power grids, with potential application to cross-domain security assessment.


[18] 2509.07341

Affine Modulation-based Audiogram Fusion Network for Joint Noise Reduction and Hearing Loss Compensation

Hearing aids (HAs) are widely used to provide personalized speech enhancement (PSE) services, improving the quality of life for individuals with hearing loss. However, HA performance significantly declines in noisy environments as it treats noise reduction (NR) and hearing loss compensation (HLC) as separate tasks. This separation leads to a lack of systematic optimization, overlooking the interactions between these two critical tasks, and increases the system complexity. To address these challenges, we propose a novel audiogram fusion network, named AFN-HearNet, which simultaneously tackles the NR and HLC tasks by fusing cross-domain audiogram and spectrum features. We propose an audiogram-specific encoder that transforms the sparse audiogram profile into a deep representation, addressing the alignment problem of cross-domain features prior to fusion. To incorporate the interactions between NR and HLC tasks, we propose the affine modulation-based audiogram fusion frequency-temporal Conformer that adaptively fuses these two features into a unified deep representation for speech reconstruction. Furthermore, we introduce a voice activity detection auxiliary training task to embed speech and non-speech patterns into the unified deep representation implicitly. We conduct comprehensive experiments across multiple datasets to validate the effectiveness of each proposed module. The results indicate that the AFN-HearNet significantly outperforms state-of-the-art in-context fusion joint models regarding key metrics such as HASQI and PESQ, achieving a considerable trade-off between performance and efficiency. The source code and data will be released at this https URL.


[19] 2509.07345

Distributed Frequency Control for Multi-Area Power Systems Considering Transient Frequency Safety

High penetration of renewable energy sources intensifies frequency fluctuations in multi-area power systems, challenging both stability and operational safety. This paper proposes a novel distributed frequency control method that ensures transient frequency safety and enforces generation capacity constraints, while achieving steady-state frequency restoration and optimal economic operation. The method integrates a feedback optimization (FO)-based controller and a safety corrector. The FO-based controller generates reference setpoints by solving an optimization problem, driving the system to the steady state corresponding to the optimal solution of this problem. The safety corrector then modifies these references using control barrier functions to maintain frequencies within prescribed safe bounds during transients while respecting capacity constraints. The proposed method combines low computational burden with improved regulation performance and enhanced practical applicability. Theoretical analysis establishes optimality, asymptotic stability, and transient frequency safety for the closed-loop system. Simulation studies show that, compared with conventional FO-based schemes, the method consistently enforces frequency safety and capacity limits, achieves smaller frequency deviations and faster recovery, thereby demonstrating its practical effectiveness and advantages.


[20] 2509.07356

Anti-Disturbance Hierarchical Sliding Mode Controller for Deep-Sea Cranes with Adaptive Control and Neural Network Compensation

To address non-linear disturbances and uncertainties in complex marine environments, this paper proposes a disturbance-resistant controller for deep-sea cranes. The controller integrates hierarchical sliding mode control, adaptive control, and neural network compensation techniques. By designing a global sliding mode surface, the dynamic coordination between the driving and non-driving subsystems is achieved, ensuring overall system stability. The subsystem surfaces reduce oscillations and enhance tracking accuracy. Adaptive control dynamically adjusts system parameters, enhancing robustness against external uncertainties, while the neural network compensates for time-varying disturbances through real-time learning. The stability of the control scheme is verified on the basis of Lyapunov theory. The simulation results demonstrate that, compared to traditional PID control, the proposed controller exhibits significant advantages in trajectory tracking accuracy, response speed, and disturbance rejection.


[21] 2509.07384

Adaptive Event-Triggered MPC for Linear Parameter-Varying Systems with State Delays, Actuator Saturation and Disturbances

This paper proposes a unified adaptive event-triggered model predictive control (ETMPC) scheme for linear parameter-varying (LPV) systems subject to state delays, actuator saturation, and external disturbances. In existing studies, only a limited number of ETMPC methods have attempted to address either state delays or actuator saturation, and even these few methods typically lack co-design optimization between adaptive event-triggering mechanisms and the control law. To overcome these limitations, this paper presents a Lyapunov-Krasovskii-based adaptive ETMPC strategy that enables the co-design optimization of both the triggering mechanism and the controller. Specifically, the event-triggering parameter matrix is adaptively optimized by embedding an internal adaptive variable within the Lyapunov-Krasovskii-like function. Furthermore, the actuator saturation nonlinearity is transformed into a convex hull representation. The infinite-horizon robust optimization problem is reformulated as a convex optimization problem with linear matrix inequality (LMI) constraints. Invariant set constraints are introduced to ensure recursive feasibility, and mean-square input-to-state stability (ISS) under multiple uncertainties is rigorously established. Simulations on an industrial electric heating system validate the proposed method's effectiveness in reducing communication load.


[22] 2509.07400

A smart fridge with AI-enabled food computing

The Internet of Things (IoT) plays a crucial role in enabling seamless connectivity and intelligent home automation, particularly in food management. By integrating IoT with computer vision, the smart fridge employs an ESP32-CAM to establish a monitoring subsystem that enhances food management efficiency through real-time food detection, inventory tracking, and temperature monitoring. This benefits waste reduction, grocery planning improvement, and household consumption optimization. In high-density inventory conditions, capturing partial or layered images complicates object detection, as overlapping items and occluded views hinder accurate identification and counting. Besides, varied angles and obscured details in multi-layered setups reduce algorithm reliability, often resulting in miscounts or misclassifications. Our proposed system is structured into three core modules: data pre-processing, object detection and management, and a web-based visualization. To address the challenge of poor model calibration caused by overconfident predictions, we implement a variant of focal loss that mitigates over-confidence and under-confidence in multi-category classification. This approach incorporates adaptive, class-wise error calibration via temperature scaling and evaluates the distribution of predicted probabilities across methods. Our results demonstrate that robust functional calibration significantly improves detection reliability under varying lighting conditions and scalability challenges. Further analysis demonstrates a practical, user-focused approach to modern food management, advancing sustainable living goals through reduced waste and more informed consumption.


[23] 2509.07402

Electric Vehicle Routing Problem with Time Windows and Station-based or Route-based Charging Options

The Electric Vehicle Routing Problem with Time Windows and Station-based or Route-based Charging Options addresses fleet optimization incorporating both conventional charging stations and continuous wireless charging infrastructure. This paper extends Schneider et al.'s foundational EVRP-TW model with arc-based dynamic wireless charging representation, partial coverage modeling, and hierarchical multi-objective optimization prioritizing fleet minimization. Computational experiments on Schneider benchmark instances demonstrate substantial operational benefits, with distance and time improvements ranging from 0.7% to 35.9% in secondary objective components. Analysis reveals that 20% wireless coverage achieves immediate benefits, while 60% coverage delivers optimal performance across all test instances for infrastructure investment decisions.


[24] 2509.07416

Eye Movement Feature-Guided Signal De-Drifting in Electrooculography Systems

Electrooculography (EOG) is widely used for gaze tracking in Human-Robot Collaboration (HRC). However, baseline drift caused by low-frequency noise significantly impacts the accuracy of EOG signals, creating challenges for further sensor fusion. This paper presents an Eye Movement Feature-Guided De-drift (FGD) method for mitigating drift artifacts in EOG signals. The proposed approach leverages active eye-movement feature recognition to reconstruct the feature-extracted EOG baseline and adaptively correct signal drift while preserving the morphological integrity of the EOG waveform. The FGD is evaluated using both simulation data and real-world data, achieving a significant reduction in mean error. The average error is reduced to 0.896° in simulation, representing a 36.29% decrease, and to 1.033° in real-world data, corresponding to a 26.53% reduction. Despite additional and unpredictable noise in real-world data, the proposed method consistently outperforms conventional de-drifting techniques, demonstrating its effectiveness in practical applications such as enhancing human performance augmentation.


[25] 2509.07422

Multi-Modal Intelligent Channel Modeling Framework for 6G-Enabled Networked Intelligent Systems

The design and technology development of 6G-enabled networked intelligent systems needs an accurate real-time channel model as the cornerstone. However, with the new requirements of 6G-enabled networked intelligent systems, the conventional channel modeling methods face many limitations. Fortunately, the multi-modal sensors equipped on the intelligent agents bring timely opportunities, i.e., the intelligent integration and mutually beneficial mechanism between communications and multi-modal sensing could be investigated based on the artificial intelligence (AI) technologies. In this case, the mapping relationship between physical environment and electromagnetic channel could be explored via Synesthesia of Machines (SoM). This article presents a novel multi-modal intelligent channel modeling (MMICM) framework for 6G-enabled networked intelligent systems, which establishes a nonlinear model between multi-modal sensing and channel characteristics, including large-scale and small-scale channel characteristics. The architecture and features of proposed intelligent modeling framework are expounded and the key technologies involved are also analyzed. Finally, the system-engaged applications and potential research directions of MMICM framework are outlined.


[26] 2509.07432

Spectrotemporal Feature Extraction in EHG Signals and Tocograms for Enhanced Preterm Birth Prediction

Preterm birth (PTB), defined as delivery before 37 weeks of gestation, is a leading cause of neonatal mortality and long term health complications. Early detection is essential for enabling timely medical interventions. Electrohysterography (EHG) and tocography (TOCO) are promising non invasive tools for PTB prediction, but prior studies often suffer from class imbalance, improper oversampling, and reliance on features with limited physiological relevance. This work presents a machine learning pipeline incorporating robust preprocessing, physiologically grounded feature extraction, and rigorous evaluation. Features were extracted from EHG (and TOCO) signals using Mel frequency cepstral coefficients, statistical descriptors of wavelet coefficients, and peaks of the normalized power spectrum. Signal quality was enhanced via Karhunen Loève Transform (KLT) denoising through eigenvalue based subspace decomposition. Multiple classifiers, including Logistic Regression, Support Vector Machines, Random Forest, Gradient Boosting, Multilayer Perceptron, and CatBoost, were evaluated on the TPEHGT dataset. The CatBoost classifier with KLT denoising achieved the highest performance on fixed interval segments of the TPEHGT dataset, reaching 97.28% accuracy and an AUC of 0.9988. Ablation studies confirmed the critical role of both KLT denoising and physiologically informed features. Comparative analysis showed that including TOCO signals did not substantially improve prediction over EHG alone, highlighting the sufficiency of EHG for PTB detection. These results demonstrate that combining denoising with domain relevant features can yield highly accurate, robust, and clinically interpretable models, supporting the development of cost effective and accessible PTB prediction tools, particularly in low resource healthcare settings.


[27] 2509.07436

SA-OOSC: A Multimodal LLM-Distilled Semantic Communication Framework for Enhanced Coding Efficiency with Scenario Understanding

This paper introduces SA-OOSC, a multimodal large language models (MLLM)-distilled semantic communication framework that achieves efficient semantic coding with scenario-aware importance allocations. This approach addresses a critical limitation of existing object-oriented semantic communication (OOSC) systems - assigning static importance values to specific classes of objects regardless of their contextual relevance. Our framework utilizes MLLMs to identify the scenario-augmented (SA) semantic importance for objects within the image. Through knowledge distillation with the MLLM-annotated data, our vectorization/de-vectorization networks and JSCC encoder/decoder learn to dynamically allocate coding resources based on contextual significance, i.e., distinguishing between high-importance objects and low-importance according to the SA scenario information of the task. The framework features three core innovations: a MLLM-guided knowledge distillation pipeline, an importance-weighted variable-length JSCC framework, and novel loss function designs that facilitate the knowledge distillation within the JSCC framework. Experimental validation demonstrates our framework's superior coding efficiency over conventional semantic communication systems, with open-sourced MLLM-annotated and human-verified datasets established as new benchmarks for future research in semantic communications.


[28] 2509.07441

Node Position Estimation in Diffusion-Based Molecular Communications Using Multi-Layer Perceptron

This paper proposes a method for accurately estimating the relative position between two nodes with unknown locations in a diffusion-based molecular communication environment. A specialized node structure is designed, combining a central absorbing receiver with multiple transmitters placed at predefined spherical coordinates. Pilot molecules are released, and their absorption time and concentration are measured. By partitioning the spherical coordinate space, these spatially distinct measurements serve as input to a multilayer perceptron (MLP)-based model. The proposed method significantly improves the precision of distance and direction estimation. Simulation results demonstrate localization accuracy, confirming the effectiveness of the neural network model in capturing the underlying physical characteristics.


[29] 2509.07442

A Systematic Framework to Test the Resilience of Three-Fold Redundant Sparse Arrays Against Two Sensor Failures and Some Never-Before Findings

As the field of sparse arrays progressed, numerous array designs have been introduced with a focus on larger apertures and higher degrees of freedom (DOFs), resulting in maximally economic sparse arrays (MESAs) that operate with the least number of sensors required to provide a given aperture while ensuring a hole-free difference coarray (DCA). Consequently, MESAs are least robust to sensor failures and cannot afford the failure of even a single sensor. Multifold redundant sparse arrays (MFRSAs) provide a practical solution to the problem of sensor failures in sparse arrays by making sure that the array contains enough sensor pairs necessary to produce each spatial lag multiple times. Owing to this property, a \b{eta}-fold redundant array can withstand simultaneous failure of at least \b{eta}-1 sensors without losing the hole-free DCA property. Nevertheless, MFRSAs are also prone to hidden dependencies that prevent them from being fully robust. In this work, we present a systematic framework to evaluate the robustness of triple redundant sparse linear arrays (TRSLAs) against all possible two-sensor failures. After detailing the proposed approach, we present the failure analysis of representative TRSLAs available in existing literature. It is found that existing TRSLAs have some hidden vulnerabilities against the failure of some peculiar sensor pairs. Corresponding MATLAB programs and numerical simulations are provided for evaluation and use by the array processing community. The proposed approach has a great archival value as it can evaluate the robustness of any present or future TRSLAs through objective means.


[30] 2509.07482

Integrated Communication and Computing in Time-Varying mmWave Channels

We propose a novel framework for integrated communication and computing (ICC) transceiver design in time-varying millimeter-wave (mmWave) channels. In particular, in order to cope with the dynamics of time-varying mmWave channels, the detection of communication symbols and the execution of an over-the-air computing (AirComp) operation are performed in parallel with channel tracking, as opposed to existing state-of-the-art (SotA) on ICC where perfect knowledge of the channel at all time instances is typically assumed. For clarity of exposition, we consider a single-input multiple-output (SIMO) uplink scenario where multiple single-antenna user equipment (UE) transmit to a base station (BS) equipped with multiple antennas, such that each UE, or edge device (ED), precodes its own transmit signal, while the BS, or access points (APs), also performs receive beamforming. The proposed transceiver framework then estimates channel state information (CSI) and data symbols in parallel, using a bilinear Gaussian belief propagation (BiGaBP) algorithm for joint channel and data detection (JCDE), aided by a channel prediction (CP) algorithm executed before each estimation window at the BS. The AirComp operation is then executed by means of an optimal combination of the residual signal. Simulation results demonstrate the effectiveness of the proposed scheme in performing ICC in challenging time-varying mmWave channels, with minimal degradation to both communication and computing performance.


[31] 2509.07483

A Methodological Framework for Positioning of Wireless Sensors in New Generation Launchers

In wireless sensor networks for reusable launchers, the electromagnetic characterization and electromagnetic compatibility analyses are relevant due to the reference operational scenario, which implies a complex, and sometimes dynamic, electromagnetic environment. This work proposes a methodological framework for the design of the network and for the analysis of the related electromagnetic environment within the stages of a given launcher. Based on the preliminary positioning of the network nodes, the framework prescribes a workflow and the related toolset for determining the optimal network topology focusing on the weights, the operation of the transceivers, and the overall radiated power. The optimal network configuration is simulated by using computational electromagnetics strategies in order to assess the electromagnetic environment induced by the sensor network itself. The paper provides some results concerning a case study for a specific launcher.


[32] 2509.07511

Joint Antenna Positioning and Beamforming for Movable Antenna Array Aided Ground Station in Low-Earth Orbit Satellite Communication

This paper proposes a new architecture for the low-earth orbit (LEO) satellite ground station aided by movable antenna (MA) array. Unlike conventional fixed-position antenna (FPA), the MA array can flexibly adjust antenna positions to reconfigure array geometry, for more effectively mitigating interference and improving communication performance in ultra-dense LEO satellite networks. To reduce movement overhead, we configure antenna positions at the antenna initialization stage, which remain unchanged during the whole communication period of the ground station. To this end, an optimization problem is formulated to maximize the average achievable rate of the ground station by jointly optimizing its antenna position vector (APV) and time-varying beamforming weights, i.e., antenna weight vectors (AWVs). To solve the resulting non-convex optimization problem, we adopt the Lagrangian dual transformation and quadratic transformation to reformulate the objective function into a more tractable form. Then, we develop an efficient block coordinate descent-based iterative algorithm that alternately optimizes the APV and AWVs until convergence is reached. Simulation results demonstrate that our proposed MA scheme significantly outperforms traditional FPA by increasing the achievable rate at ground stations under various system setups, thus providing an efficient solution for interference mitigation in future ultra-dense LEO satellite communication networks.


[33] 2509.07586

Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription

Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 128-320 ms. However, most real-time musical applications require latencies below 30 ms. In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription. Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size. Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription. Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy. We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.


[34] 2509.07610

Asymmetric Modulation Design for Fluid-Antenna SWIPT Systems

In this work, we propose the design of modulation schemes that improve the rate-energy region of fluid antenna-assisted simultaneous wireless information and power transfer (SWIPT) systems. By considering the nonlinear characteristics of practical energy harvesting circuits, we formulate a dual-objective rate-energy (RE) region optimization problem to jointly maximize the discrete-input mutual information (DIMI) and harvested current. The problem is solved using the epsilon-constraint method and optimized constellations are designed for various energy harvesting thresholds. We then evaluate the performance of the optimized constellations under three different fluid antenna (FA) port selection strategies: (i) Best Port, (ii) Fixed Port, and (iii) Random Port. Our simulation results demonstrate significant performance gains of optimized constellations over conventional constellations in both information rate and energy harvesting.


[35] 2509.07634

A kernel-based approach to physics-informed nonlinear system identification

This paper presents a kernel-based framework for physics-informed nonlinear system identification. The key contribution is a structured methodology that extends kernel-based techniques to seamlessly integrate partially known physics-based models, improving parameter estimation and overall model accuracy. The proposed method enhances traditional modeling approaches by integrating a parametric model, which provides physical interpretability, with a kernel-based function, which accounts for unmodelled dynamics. The two model's components are identified from data simultaneously, minimizing a suitable cost that balances the relative importance of the physical and the black-box parts of the model. Additionally, nonlinear state smoothing is employed to address scenarios involving state-space models with not fully measurable states. Numerical simulations on an experimental benchmark system demonstrate the effectiveness of the proposed approach, with performance comparisons against state-of-the-art identification techniques.


[36] 2509.07703

Prescribed-Time Event-Triggered Control for Matrix-Scaled Networks

This article proposes a distributed control method for matrix-scaled multi-agent networks aimed at achieving convergence within a user-defined time frame. The control law of each individual agent relies only on information from neighboring agents and is updated at discrete intervals determined by state-dependent triggering functions, reducing the frequency of agent interactions. To this end, first, the controller is augmented with a time-varying gain. Then, the dynamics of the closed-loop system over the finite-time interval is transformed into an infinite-time frame using time scaling. Lyapunov-based analysis is employed to derive suitable triggering conditions that guarantee the asymptotic convergence of the time-transformed system, thereby ensuring the prescribed-time convergence of the original system.


[37] 2509.07748

Swarm-optimized Adaptive Augmentation of Missile Autopilot

This paper considers the problem of optimizing a missile autopilot. In particular, the paper investigates the application of an online learning technique to learn and optimize the gains of a three-loop topology autopilot for a planar missile modeled with nonlinear dynamics and nonlinear aerodynamics forces and moments. The classical autopilot for a missile is based on a three-loop topology, where each loop consists of tunable proportional gains. An adaptive three-loop autopilot is constructed by augmenting the classical autopilot's fixed-gain controllers with a learning-based controller, which is recursively optimized using retrospective cost optimization. Numerical simulations show that online learning improves the tracking performance of the classical autopilot in both nominal and off-nominal interception scenarios.


[38] 2509.07754

Interference Mitigation for OFDM-based Integrated Sensing and Communications with Arbitrary Modulation Formats

Integrated sensing and communication will be a key feature of future mobile networks, enabling highly efficient systems and numerous new applications by leveraging communication signals for sensing. In this paper, we analyze the impact of arbitrary modulation alphabets on the sensing performance of communication-centric OFDM systems as expected in the next-generation 6G networks. We evaluate existing interference mitigation techniques, such as coherent successive target cancellation, and propose an enhanced version of this algorithm. A systematic performance evaluation in multi-target scenarios, including the effects of scattering, demonstrates that our proposed interference mitigation methods achieve performance comparable to sensing-optimal constant modulus signals while utilizing higher order constellations for more efficient communications.


[39] 2509.07758

Experimental Evaluation of Joint Clock Recovery and Equalization for Sub-Terahertz Links

This paper proposes and experimentally evaluates a joint clock recovery (CR) and equalization architecture tailored for high-speed sub-terahertz (sub-THz) wireless communication links. Specifically, a Baud-spaced digital receiver architecture is investigated that combines a constant modulus algorithm (CMA) equalizer with a blind timing error detector (TED), enabling robust symbol timing synchronization without decision-directed (DD) feedback or pilot symbols. The proposed TED leverages the CMA filter coefficients to estimate timing errors, which are then used to drive a Farrow interpolator operating at twice the symbol rate. The system is validated experimentally using a 140~GHz wireless testbed with 16-QAM modulation over a 10~GHz bandwidth. Results show that the proposed TED schemes outperform conventional blind TEDs, such as Gardner and blind implementations of Mueller \& Müller, in terms of bit error rate (BER), error vector magnitude (EVM), and intersymbol interference (ISI) suppression. These capabilities are especially relevant to next-generation spaceborne communication systems, where wideband sub-THz links are expected to play a key role in enabling ultra-high-data-rate inter-satellite and deep-space communications under challenging synchronization constraints.


[40] 2509.07775

Sensing with Mobile Devices through Radio SLAM: Models, Methods, Opportunities, and Challenges

The integration of sensing and communication (ISAC) is a cornerstone of 6G, enabling simultaneous environmental awareness and communication. This paper explores radio SLAM (simultaneous localization and mapping) as a key ISAC approach, using radio signals for mapping and localization. We analyze radio SLAM across different frequency bands, discussing trade-offs in coverage, resolution, and hardware requirements. We also highlight opportunities for integration with sensing, positioning, and cooperative networks. The findings pave the way for standardized solutions in 6G applications such as autonomous systems and industrial robotics.


[41] 2509.07795

Enhanced SegNet with Integrated Grad-CAM for Interpretable Retinal Layer Segmentation in OCT Images

Optical Coherence Tomography (OCT) is essential for diagnosing conditions such as glaucoma, diabetic retinopathy, and age-related macular degeneration. Accurate retinal layer segmentation enables quantitative biomarkers critical for clinical decision-making, but manual segmentation is time-consuming and variable, while conventional deep learning models often lack interpretability. This work proposes an improved SegNet-based deep learning framework for automated and interpretable retinal layer segmentation. Architectural innovations, including modified pooling strategies, enhance feature extraction from noisy OCT images, while a hybrid loss function combining categorical cross-entropy and Dice loss improves performance for thin and imbalanced retinal layers. Gradient-weighted Class Activation Mapping (Grad-CAM) is integrated to provide visual explanations, allowing clinical validation of model decisions. Trained and validated on the Duke OCT dataset, the framework achieved 95.77% validation accuracy, a Dice coefficient of 0.9446, and a Jaccard Index (IoU) of 0.8951. Class-wise results confirmed robust performance across most layers, with challenges remaining for thinner boundaries. Grad-CAM visualizations highlighted anatomically relevant regions, aligning segmentation with clinical biomarkers and improving transparency. By combining architectural improvements, a customized hybrid loss, and explainable AI, this study delivers a high-performing SegNet-based framework that bridges the gap between accuracy and interpretability. The approach offers strong potential for standardizing OCT analysis, enhancing diagnostic efficiency, and fostering clinical trust in AI-driven ophthalmic tools.


[42] 2509.07837

Filtering in Multivariate Systems with Quantized Measurements using a Gaussian Mixture-Based Indicator Approximation

This work addresses the problem of state estimation in multivariable dynamic systems with quantized outputs, a common scenario in applications involving low-resolution sensors or communication constraints. A novel method is proposed to explicitly construct the probability mass function associated with the quantized measurements by approximating the indicator function of each region defined by the quantizer using Gaussian mixture models. Unlike previous approaches, this technique generalizes to any number of quantized outputs without requiring case-specific numerical solutions, making it a scalable and efficient solution. Simulation results demonstrate that the proposed filter achieves high accuracy in state estimation, both in terms of fidelity of the filtering distributions and mean squared error, while maintaining significantly reduced computational cost.


[43] 2509.07839

Enhancements in Score-based Channel Estimation for Real-Time Wireless Systems

We propose enhancements to score-based generative modeling techniques for low-latency pilot-based channel estimation in a point-to-point single-carrier multiple-input multiple-output (MIMO) wireless system. Building on recent advances in score-based models, we investigate a specific noise schedule design and sampling acceleration by step-skipping to reduce the number of denoising steps during inference. We additionally propose a single-step signal-to-noise ratio informed denoiser as an extreme case of the step-skipping approach. Our methods achieve significant latency reductions without performance degradation, as demonstrated on a synthetic channel dataset representing an urban macrocell MIMO communications scenario.


[44] 2509.07840

Sensor Management in Multi-Stage Stochastic Control Problems with Imperfect State Information

Technological advancements in miniaturization and wireless communications are yielding more affordable and versatile sensors and, in turn, more applications in which a network of sensors can be actively managed to best support overall decision-making objectives. We propose modeling the opportunity for sensor management within multi-stage stochastic control problems with imperfect state information. Such formulations inherently assume the state of the modeled environment cannot be accessed directly but instead the controller can observe only noisy measurements of the state and, therefore, at each decision stage some form of state estimation is required before a control is actuated. The notion of sensor management arises when the modeled controls not only affect the subsequent evolution of the state but can also affect the nature of future measurements and, hence, the quality of state estimates that drive future control decisions. In principle, the optimal strategy for any appropriately modeled multi-stage stochastic control problem with imperfect state information (with or without opportunity for sensor management) is the solution to a dynamic program; in practice, the computational requirements are typically prohibitive yet dynamic programming methods are still useful to guide the development of effective suboptimal strategies. In this spirit, we model the opportunity for sensor management within small-scale examples of two well-studied dynamic programming formulations, namely (1) the finite-state/finite-action Partially-Observable Markov Decision Process (PO-MDP) and (2) the Linear-Quadratic-Gaussian Regulator (LQGR). These examples admit solvable dynamic programs and confirm how the interplay between sensing and acting is a natural by-product of a dynamic programming solution.


[45] 2509.07843

Feedback Linearization-based Guidance Law for Guaranteed Interception

This paper presents an input-output feedback linearization (IOL)-based guidance law to ensure interception in a pursuer-evader engagement scenario. A point-mass dynamic model for both the pursuer and the evader is considered. An IOL guidance law is derived using range and line-of-sight (LOS) rate measurements. It is found that the range-based IOL guidance law exhibits a singularity under certain conditions. To address this issue, a fuzzy logic system is employed to smoothly blend the IOL guidance with the classical proportional guidance law, thereby avoiding the singularity. In contrast, the LOS-based IOL guidance law is free of singularities but suffers from divergence issues due to angle-related complications. To resolve this, a simple correction function is introduced to ensure consistent interception behavior. Results from Monte Carlo simulations indicate that both modifications of the IOL guidance laws cause interception with control limits applied.


[46] 2509.07847

Multi-Topic Projected Opinion Dynamics for Resource Allocation

We propose a model of opinion formation on resource allocation among multiple topics by multiple agents, who are subject to hard budget constraints. We define a utility function for each agent and then derive a projected dynamical system model of opinion evolution assuming that each agent myopically seeks to maximize its utility subject to its constraints. Inter-agent coupling arises from an undirected social network, while inter-topic coupling arises from resource constraints. We show that opinions always converge to the equilibrium set. For special networks with very weak antagonistic relations, the opinions converge to a unique equilibrium point. We further show that the underlying opinion formation game is a potential game. We relate the equilibria of the dynamics and the Nash equilibria of the game and characterize the unique Nash equilibrium for networks with no antagonistic relations. Finally, simulations illustrate our findings.


[47] 2509.07918

Partitioning and Self-organization of Distributed Generation in Large Distribution Networks

Distribution networks will experience more installations of distributed generation (DG) that is unpredictable and stochastic in nature. Greater distributed control and intelligence will allow challenges such as voltage control to be handled effectively. The partitioning of power networks into smaller clusters provides a method to split the control problem into manageable sub-problems. This paper presents a community detection-based partitioning technique for distribution networks considering local DGs, allowing them to be grouped and controlled in a distributed manner by using local signals and measurements. This method also allows each community to control the voltage using only neighboring DGs, and for each community to self-organize to reflect varying DG conditions and to maintain stable control. Simulations demonstrate that the partitioning of the large distribution network is effective, and each community is able to self-organize and to regulate the voltage independently using only its local DGs.


[48] 2509.07919

A Markov Decision Process Model for Intrusion Tolerance Problems

We formulate and analyze a simplest Markov decision process model for intrusion tolerance problems, assuming that (i) each attack proceeds through one or more steps before the system's security fails, (ii) defensive responses that target these intermediate steps may only sometimes thwart the attack and (iii) reset responses that are sensible upon discovering an attack's completion may not always recover from the security failure. The analysis shows that, even in the ideal case of perfect detectors, it can be sub-optimal in the long run to employ defensive responses while under attack; that is, depending on attack dynamics and response effectiveness, the total overhead of ongoing defensive countermeasures can exceed the total risk of intermittent security failures. The analysis similarly examines the availability loss versus the risk reduction of employing preemptive resets, isolating key factors that determine whether system recovery is best initiated reactively or proactively. We also discuss model extensions and related work looking towards intrusion tolerance applications with (i) imperfect or controllable detectors, (ii) multiple types of attacks, (iii) continuous-time dynamics or (iv) strategic attackers.


[49] 2509.06964

Prototype: A Keyword Spotting-Based Intelligent Audio SoC for IoT

In this demo, we present a compact intelligent audio system-on-chip (SoC) integrated with a keyword spotting accelerator, enabling ultra-low latency, low-power, and low-cost voice interaction in Internet of Things (IoT) devices. Through algorithm-hardware co-design, the system's energy efficiency is maximized. We demonstrate the system's capabilities through a live FPGA-based prototype, showcasing stable performance and real-time voice interaction for edge intelligence applications.


[50] 2509.06995

The Protocol Genome A Self Supervised Learning Framework from DICOM Headers

In this paper, we introduce the Protocol Genome, a self-supervised learning system that learns correlations from DICOM headers and achieves AUROC 0.901 (vs 0.847 baseline) and ECE 0.036 (vs 0.058) on fully held-out external validation. Our method also improves calibration and robustness across modalities (CT, MRI, CXR) and vendors. Clinical imaging is funneled through PACS/DICOM, where procedure choices (scanner make/model, sequence, kernel, kVp, TR/TE, and slice thickness) have consequences for contrast, noise, and artifact. These latent confounders impede the generalization of image-only networks across sites. We consider structured DICOM headers as a label and learn protocol-aware but clinically robust image representations. Protocol Genome obtains tokenized embeddings of de-identified header fields and models them along with image features using: (1) protocol-image contrastive learning, (2) masked protocol prediction, and (3) protocol-protocol translation. With 1.26M studies (7 health systems, 31 scanners, 3 vendors; CT, MR, CR/DR), we experiment on: (A) chest CT triage for PE, (B) brain MRI glioma grading, and (C) chest radiograph cardiomegaly detection. Relative to strong SSL baselines (SimCLR, MAE) as well as ImageNet transfer, Protocol Genome (+0.046: PE, +0.058: glioma, +0.041: cardiomegaly) is associated with higher external AUROC; 25-37% calibration improvements are obtained (p < 0.01, DeLong tests). While the gains may be task-dependent, they are preserved with 10-20% of labeled data. From a clinical point of view, the technique reduces false positives at protocol borders and is applicable in a PACS (DICOM C-FIND/C-MOVE, DICOMweb QIDO/WADO). We publish a model card and deployment guide, complete with both de-identification and bias audits.


[51] 2509.07009

Computational Concept of the Psyche

The article provides an overview of approaches to modeling the human psyche in the perspective of building an artificial one. Based on the review, a concept of cognitive architecture is proposed, where the psyche is considered as an operating system of a living or artificial subject, including a space of needs that determines its life meanings in connection with stimuli from the external world, and intelligence as a decision-making system for actions in relation to this world in order to satisfy these needs. Based on the concept, a computational formalization is proposed for creating artificial intelligence systems through learning from experience in the space of a space of needs, taking into account their biological or existential significance for an intelligent agent. Thus, the problem of building general artificial intelligence as a system for making optimal decisions in the space of agent-specific needs under conditions of uncertainty is formalized, with maximization of success in achieving goals, minimization of existential risks and maximization of energy efficiency. A minimal experimental implementation of the model is also provided.


[52] 2509.07038

Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence

Controllable Singing Voice Synthesis (SVS) aims to generate expressive singing voices reflecting user intent. While recent SVS systems achieve high audio quality, most rely on probabilistic modeling, limiting precise control over attributes such as dynamics. We address this by focusing on dynamic control--temporal loudness variation essential for musical expressiveness--and explicitly condition the SVS model on energy sequences extracted from ground-truth spectrograms, reducing annotation costs and improving controllability. We also propose a phoneme-level energy sequence for user-friendly control. To the best of our knowledge, this is the first attempt enabling user-driven dynamics control in SVS. Experiments show our method achieves over 50% reduction in mean absolute error of energy sequences for phoneme-level inputs compared to baseline and energy-predictor models, without compromising synthesis quality.


[53] 2509.07128

Contrast-Free Ultrasound Microvascular Imaging via Radiality and Similarity Weighting

Microvascular imaging has advanced significantly with ultrafast data acquisition and improved clutter filtering, enhancing the sensitivity of power Doppler imaging to small vessels. However, the image quality remains limited by spatial resolution and elevated background noise, both of which impede visualization and accurate quantification. To address these limitations, this study proposes a high-resolution cross-correlation Power Doppler (HR-XPD) method that integrates spatial radiality weighting with Doppler signal coherence analysis, thereby enhancing spatial resolution while suppressing artifacts and background noise. Quantitative evaluations in simulation and in vivo experiments on healthy human liver, transplanted human kidney, and pig kidney demonstrated that HR-XPD significantly improves microvascular resolvability and contrast compared to conventional PD. In vivo results showed up to a 2 to 3-fold enhancement in spatial resolution and an increase in contrast by up to 20 dB. High-resolution vascular details were clearly depicted within a short acquisition time of only 0.3 s-1.2 s without the use of contrast agents. These findings indicate that HR-XPD provides an effective, contrast-free, and high-resolution microvascular imaging approach with broad applicability in both preclinical and clinical research.


[54] 2509.07139

The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties

Recent improvements in multilingual ASR have not been equally distributed across languages and language varieties. To advance state-of-the-art (SOTA) ASR models, we present the Interspeech 2025 ML-SUPERB 2.0 Challenge. We construct a new test suite that consists of data from 200+ languages, accents, and dialects to evaluate SOTA multilingual speech models. The challenge also introduces an online evaluation server based on DynaBench, allowing for flexibility in model design and architecture for participants. The challenge received 5 submissions from 3 teams, all of which outperformed our baselines. The best-performing submission achieved an absolute improvement in LID accuracy of 23% and a reduction in CER of 18% when compared to the best baseline on a general multilingual test set. On accented and dialectal data, the best submission obtained 30.2% lower CER and 15.7% higher LID accuracy, showing the importance of community challenges in making speech technologies more inclusive.


[55] 2509.07237

Normative Modelling in Neuroimaging: A Practical Guide for Researchers

Normative modelling is an increasingly common statistical technique in neuroimaging that estimates population-level benchmarks in brain structure. It enables the quantification of individual deviations from expected distributions whilst accounting for biological and technical covariates without requiring large, matched control groups. This makes it a powerful alternative to traditional case-control studies for identifying brain structural alterations associated with pathology. Despite the availability of numerous modelling approaches and several toolboxes with pretrained models, the distinct strengths and limitations of normative modelling make it difficult to determine how and when to implement them appropriately. This review offers practical guidance and outlines statistical considerations for clinical researchers using normative modelling in neuroimaging. We compare several open-source normative modelling tools through a worked example using clinical epilepsy data; outlining decision points, common pitfalls, and considerations for responsible implementation, to support broader and more rigorous adoption of normative modelling in neuroimaging research.


[56] 2509.07313

From Diagnosis to Therapy: Progress in SPECT and PET Reconstruction for Theranostics

The theranostic paradigm enables personalization of treatment by selecting patients with a diagnostic radiopharmaceutical and monitoring therapy using a matched therapeutic isotope. This strategy relies on accurate image reconstruction of both pre-therapy and post-therapy images for patient selection and monitoring treatment. However, traditional reconstruction methods are hindered by challenges such as crosstalk in multi-isotope imaging and extremely low-count measurements when imaging of alpha- ({\alpha}-) emitting therapies. Additionally, to fully realize the benefits of new imaging systems being developed for theranostic applications, advanced reconstruction techniques are needed. These needs, alongside the growing clinical adoption of theranostics, have spurred the development of novel PET and SPECT reconstruction algorithms. This review highlights recent progress and addresses critical challenges and unmet needs in theranostic image reconstruction.


[57] 2509.07415

EMORF-II: Adaptive EM-based Outlier-Robust Filtering with Correlated Measurement Noise

We present a learning-based outlier-robust filter for a general setup where the measurement noise can be correlated. Since it is an enhanced version of EM-based outlier robust filter (EMORF), we call it as EMORF-II. As it is equipped with an additional powerful feature to learn the outlier characteristics during inference along with outlier-detection, EMORF-II has improved outlier-mitigation capability. Numerical experiments confirm performance gains as compared to the state-of-the-art methods in terms of accuracy with an increased computational overhead. However, thankfully the computational complexity order remains at par with other practical methods making it a useful choice for diverse applications.


[58] 2509.07464

Safe and Non-Conservative Contingency Planning for Autonomous Vehicles via Online Learning-Based Reachable Set Barriers

Autonomous vehicles must navigate dynamically uncertain environments while balancing the safety and driving efficiency. This challenge is exacerbated by the unpredictable nature of surrounding human-driven vehicles (HVs) and perception inaccuracies, which require planners to adapt to evolving uncertainties while maintaining safe trajectories. Overly conservative planners degrade driving efficiency, while deterministic approaches may encounter serious issues and risks of failure when faced with sudden and unexpected maneuvers. To address these issues, we propose a real-time contingency trajectory optimization framework in this paper. By employing event-triggered online learning of HV control-intent sets, our method dynamically quantifies multi-modal HV uncertainties and refines the forward reachable set (FRS) incrementally. Crucially, we enforce invariant safety through FRS-based barrier constraints that ensure safety without reliance on accurate trajectory prediction of HVs. These constraints are embedded in contingency trajectory optimization and solved efficiently through consensus alternative direction method of multipliers (ADMM). The system continuously adapts to the uncertainties in HV behaviors, preserving feasibility and safety without resorting to excessive conservatism. High-fidelity simulations on highway and urban scenarios, as well as a series of real-world experiments demonstrate significant improvements in driving efficiency and passenger comfort while maintaining safety under uncertainty. The project page is available at this https URL.


[59] 2509.07546

Differential Dynamic Programming for the Optimal Control Problem with an Ellipsoidal Target Set and Its Statistical Inference

This work addresses an extended class of optimal control problems where a target for a system state has the form of an ellipsoid rather than a fixed, single point. As a computationally affordable method for resolving the extended problem, we present a revised version of the differential dynamic programming (DDP), termed the differential dynamic programming with ellipsoidal target set (ETS-DDP). To this end, the problem with an ellipsoidal target set is reformulated into an equivalent form with the orthogonal projection operator, yielding that the resulting cost functions turn out to be discontinuous at some points. As the DDP usually requires the differentiability of cost functions, in the ETS-DDP formulation we locally approximate the (nonsmooth) cost functions to smoothed ones near the path generated at the previous iteration, by utilizing the explicit form of the orthogonal projection operator. Moreover, a statistical inference method is also presented for designing the ellipsoidal target set, based on data on admissible target points collected by expert demonstrations. Via a simulation on autonomous parking of a vehicle, it is seen that the proposed ETS-DDP efficiently derives an admissible state trajectory while running much faster than the point-targeted DDP, at the expense of optimality.


[60] 2509.07593

Can SSD-Mamba2 Unlock Reinforcement Learning for End-to-End Motion Control?

End-to-end reinforcement learning for motion control promises unified perception-action policies that scale across embodiments and tasks, yet most deployed controllers are either blind (proprioception-only) or rely on fusion backbones with unfavorable compute-memory trade-offs. Recurrent controllers struggle with long-horizon credit assignment, and Transformer-based fusion incurs quadratic cost in token length, limiting temporal and spatial context. We present a vision-driven cross-modal RL framework built on SSD-Mamba2, a selective state-space backbone that applies state-space duality (SSD) to enable both recurrent and convolutional scanning with hardware-aware streaming and near-linear scaling. Proprioceptive states and exteroceptive observations (e.g., depth tokens) are encoded into compact tokens and fused by stacked SSD-Mamba2 layers. The selective state-space updates retain long-range dependencies with markedly lower latency and memory use than quadratic self-attention, enabling longer look-ahead, higher token resolution, and stable training under limited compute. Policies are trained end-to-end under curricula that randomize terrain and appearance and progressively increase scene complexity. A compact, state-centric reward balances task progress, energy efficiency, and safety. Across diverse motion-control scenarios, our approach consistently surpasses strong state-of-the-art baselines in return, safety (collisions and falls), and sample efficiency, while converging faster at the same compute budget. These results suggest that SSD-Mamba2 provides a practical fusion backbone for scalable, foresightful, and efficient end-to-end motion control.


[61] 2509.07635

Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations

Deep learning appears as an appealing solution for Automatic Synthesizer Programming (ASP), which aims to assist musicians and sound designers in programming sound synthesizers. However, integrating software synthesizers into training pipelines is challenging due to their potential non-differentiability. This work tackles this challenge by introducing a method to approximate arbitrary synthesizers. Specifically, we train a neural network to map synthesizer presets onto an audio embedding space derived from a pretrained model. This facilitates the definition of a neural proxy that produces compact yet effective representations, thereby enabling the integration of audio embedding loss into neural-based ASP systems for black-box synthesizers. We evaluate the representations derived by various pretrained audio models in the context of neural-based nASP and assess the effectiveness of several neural network architectures, including feedforward, recurrent, and transformer-based models, in defining neural proxies. We evaluate the proposed method using both synthetic and hand-crafted presets from three popular software synthesizers and assess its performance in a synthesizer sound matching downstream task. While the benefits of the learned representation are nuanced by resource requirements, encouraging results were obtained for all synthesizers, paving the way for future research into the application of synthesizer proxies for neural-based ASP systems.


[62] 2509.07669

On-chip microwave sensing of quasiparticles in tantalum superconducting circuits on silicon for scalable quantum technologies

The performance and scalability of superconducting quantum circuits are fundamentally constrained by non-equilibrium quasiparticles, which induce microwave losses that limit resonator quality factors and qubit coherence times. Understanding and mitigating these excitations is therefore central to advancing scalable quantum technologies. Here, we demonstrate on-chip microwave sensing of quasiparticles in high-Q {\alpha}-tantalum coplanar waveguide resonators on silicon, operated in the single-photon regime. Temperature-dependent measurements reveal persistent non-equilibrium quasiparticles at millikelvin temperatures, producing a measurable suppression of the internal quality factor (Qi) relative to theoretical expectations. By benchmarking across materials, we find that the quasiparticle density in {\alpha}-Ta is approximately one-third that of NbN at equivalent normalised temperatures (T/Tc), directly correlating with reduced microwave loss. Our methodology establishes a scalable platform for probing quasiparticle dynamics and points towards new routes for engineering superconducting circuits with improved coherence, with impact on qubit readout resonators, kinetic-inductance detectors, and emerging quantum processors and sensors.


[63] 2509.07707

Fault Tolerant Control of a Quadcopter using Reinforcement Learning

This study presents a novel reinforcement learning (RL)-based control framework aimed at enhancing the safety and robustness of the quadcopter, with a specific focus on resilience to in-flight one propeller failure. Addressing the critical need of a robust control strategy for maintaining a desired altitude for the quadcopter to safe the hardware and the payload in physical applications. The proposed framework investigates two RL methodologies Dynamic Programming (DP) and Deep Deterministic Policy Gradient (DDPG), to overcome the challenges posed by the rotor failure mechanism of the quadcopter. DP, a model-based approach, is leveraged for its convergence guarantees, despite high computational demands, whereas DDPG, a model-free technique, facilitates rapid computation but with constraints on solution duration. The research challenge arises from training RL algorithms on large dimensions and action domains. With modifications to the existing DP and DDPG algorithms, the controllers were trained not only to cater for large continuous state and action domain and also achieve a desired state after an inflight propeller failure. To verify the robustness of the proposed control framework, extensive simulations were conducted in a MATLAB environment across various initial conditions and underscoring its viability for mission-critical quadcopter applications. A comparative analysis was performed between both RL algorithms and their potential for applications in faulty aerial systems.


[64] 2509.07756

Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks

Next to decision tree and k-nearest neighbours algorithms deep convolutional neural networks (CNNs) are widely used to classify audio data in many domains like music, speech or environmental sounds. To train a specific CNN various spectral and rhythm features like mel-scaled spectrograms, mel-frequency cepstral coefficients (MFCC), cyclic tempograms, short-time Fourier transform (STFT) chromagrams, constant-Q transform (CQT) chromagrams and chroma energy normalized statistics (CENS) chromagrams can be used as digital image input data for the neural network. The performance of these spectral and rhythm features for audio category level as well as audio class level classification is investigated in detail with a deep CNN and the ESC-50 dataset with 2,000 labeled environmental audio recordings using an end-to-end deep learning pipeline. The evaluated metrics accuracy, precision, recall and F1 score for multiclass classification clearly show that the mel-scaled spectrograms and the mel-frequency cepstral coefficients (MFCC) perform significantly better then the other spectral and rhythm features investigated in this research for audio classification tasks using deep CNNs.


[65] 2509.07773

Quantum Computing for Large-scale Network Optimization: Opportunities and Challenges

The complexity of large-scale 6G-and-beyond networks demands innovative approaches for multi-objective optimization over vast search spaces, a task often intractable. Quantum computing (QC) emerges as a promising technology for efficient large-scale optimization. We present our vision of leveraging QC to tackle key classes of problems in future mobile networks. By analyzing and identifying common features, particularly their graph-centric representation, we propose a unified strategy involving QC algorithms. Specifically, we outline a methodology for optimization using quantum annealing as well as quantum reinforcement learning. Additionally, we discuss the main challenges that QC algorithms and hardware must overcome to effectively optimize future networks.


[66] 2509.07832

RAQ-MIMO: MIMO for Multi-Band Rydberg Atomic Quantum Receiver

Rydberg atomic quantum receivers (RAQRs) are capable of receiving multi-band radio-frequency (RF) signals simultaneously, which are expected to break Chu's limit for classical electronic antennas. However, signals from different users will interfere with each other in the optical intermediate frequency (IF) domain of the multi-band quantum receiver, which is termed the IF interference (IFI) problem. To address this problem, in this paper, we propose a multi-input multi-output (MIMO) architecture for Rydberg atomic quantum receiver (RAQ-MIMO) by exploiting the additional spatial diversity of MIMO receivers. Specifically, by applying the dynamic signal model of RAQRs, we clarify the physical relationship between the quantum local oscillator (LO) configurations and the multi-band gains with the concept of quantum transconductance. Then, with the quantum transconductance-based signal model, we formulate the spectral efficiency (SE) maximization problem and further propose the quantum weighted minimum mean square error (qWMMSE) algorithm, which jointly optimizes the quantum LO configurations and the classical precoder/combiner matrices. Furthermore, we test the qWMMSE algorithm within the standard space division multiple access (SDMA) scheme and the frequency division multiple access (FDMA) scheme. Simulation results demonstrate that the qWMMSE optimization framework can significantly improve the SE of RAQ-MIMO systems for both multiple access schemes, and that RAQ-MIMO systems can outperform classical electronic receiver-based multi-user MIMO systems by eliminating the mutual coupling effect between classical antennas.


[67] 2509.07936

Feature Space Analysis by Guided Diffusion Model

One of the key issues in Deep Neural Networks (DNNs) is the black-box nature of their internal feature extraction process. Targeting vision-related domains, this paper focuses on analysing the feature space of a DNN by proposing a decoder that can generate images whose features are guaranteed to closely match a user-specified feature. Owing to this guarantee that is missed in past studies, our decoder allows us to evidence which of various attributes in an image are encoded into a feature by the DNN, by generating images whose features are in proximity to that feature. Our decoder is implemented as a guided diffusion model that guides the reverse image generation of a pre-trained diffusion model to minimise the Euclidean distance between the feature of a clean image estimated at each step and the user-specified feature. One practical advantage of our decoder is that it can analyse feature spaces of different DNNs with no additional training and run on a single COTS GPU. The experimental results targeting CLIP's image encoder, ResNet-50 and vision transformer demonstrate that images generated by our decoder have features remarkably similar to the user-specified ones and reveal valuable insights into these DNNs' feature spaces.


[68] 2001.03346

Time-Varying Graph Learning with Constraints on Graph Temporal Variation

We propose a novel framework for learning time-varying graphs from spatiotemporal measurements. Given an appropriate prior on the temporal behavior of signals, our proposed method can estimate time-varying graphs from a small number of available measurements. To achieve this, we introduce two regularization terms in convex optimization problems that constrain sparseness of temporal variations of the time-varying networks. Moreover, a computationally-scalable algorithm is introduced to efficiently solve the optimization problem. The experimental results with synthetic and real datasets (point cloud and temperature data) demonstrate our proposed method outperforms the existing state-of-the-art methods.


[69] 2410.08756

State Estimation with Protecting Exogenous Inputs via Cramér-Rao Lower Bound Approach

This paper addresses the real-time state estimation problem for dynamic systems while protecting exogenous inputs against adversaries, who may be honest-but-curious third parties or external eavesdroppers. The Cramér-Rao lower bound (CRLB) is employed to constrain the mean square error (MSE) of the adversary's estimate for the exogenous inputs above a specified threshold. By minimizing the MSE of the state estimate while ensuring a certain privacy level measured by CRLB, the problem is formulated as a constrained optimization. To solve the optimization problem, an explicit expression for CRLB is first provided. As the computational complexity of the CRLB increases with the time step, a low-complexity approach is proposed to make the complexity independent of time. Then, a relaxation approach is proposed to efficiently solve the optimization problem. Finally, a privacy-preserving state estimation algorithm with low complexity is developed, which also ensures $(\epsilon, \delta)$-differential privacy. Two illustrative examples, including a practical scenario for protecting building occupancy, demonstrate the effectiveness of the proposed algorithm.


[70] 2411.07052

Ultra-Wideband Communications: Interference Challenges and Solutions

The idea of ultra-wideband (UWB) communications for short ranges (up to a few tens of meters) has been around for nearly three decades. However, despite significant efforts by the industry, UWB deployment has not yet reached its predicted potential. This article, thus, seeks to rectify this situation by providing a practical examination of UWB interference conditions. Through a spectrum survey of today's wireless environments, we explore the interference that UWB devices may face from a perspective of outage probability in both high- and low-rate configurations. We find that by suppressing interference, the outage probability can be reduced by one or more orders of magnitude. In the non-line-of sight channels, in particular, we find that both interference suppression and bandwidth expansion are required to support the minimum data rates suggested in the IEEE802.15.4 series of standards. We connect these findings to a recently proposed UWB signaling method based on filter banks and show this method fulfills the above requirements for implementing effective UWB systems.


[71] 2412.14968

An Overview on Over-the-air Electromagnetic Signal Processing

This article provides a tutorial on over-the-air electromagnetic signal processing (ESP) for next-generation wireless networks, addressing the limitations of digital processing to enhance the efficiency and sustainability of future 6th Generation (6G) systems. It explores the integration of electromagnetism and signal processing (SP) under a unified framework by highlighting how their convergence can drive innovations for 6G technologies. Key topics include electromagnetic (EM) wave-based processing, the application of metamaterials and advanced antennas to optimize EM field manipulation with a reduced number of radiofrequency chains, and their applications in holographic multiple-input multiple-output systems. By showcasing enabling technologies and use cases, the article illustrates how wave-based processing can minimize energy consumption, complexity, and latency, offering an effective framework for more sustainable and efficient wireless systems. This article aims to assist researchers and professionals in integrating advanced EM technologies with conventional SP methods.


[72] 2502.07205

VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification

Reverberant speech, denoting the speech signal degraded by reverberation, contains crucial knowledge of both anechoic source speech and room impulse response (RIR). This work proposes a variational Bayesian inference (VBI) framework with neural speech prior (VINP) for joint speech dereverberation and blind RIR identification. In VINP, a probabilistic signal model is constructed in the time-frequency (T-F) domain based on convolution transfer function (CTF) approximation. For the first time, we propose using an arbitrary discriminative dereverberation deep neural network (DNN) to estimate the prior distribution of anechoic speech within a probabilistic model. By integrating both reverberant speech and the anechoic speech prior, VINP yields the maximum a posteriori (MAP) and maximum likelihood (ML) estimations of the anechoic speech spectrum and CTF filter, respectively. After simple transformations, the waveforms of anechoic speech and RIR are estimated. VINP is effective for automatic speech recognition (ASR) systems, which sets it apart from most deep learning (DL)-based single-channel dereverberation approaches. Experiments on single-channel speech dereverberation demonstrate that VINP attains state-of-the-art (SOTA) performance in mean opinion score (MOS) and word error rate (WER). For blind RIR identification, experiments demonstrate that VINP achieves SOTA performance in estimating reverberation time at 60 dB (RT60) and advanced performance in direct-to-reverberation ratio (DRR) estimation. Codes and audio samples are available online.


[73] 2502.17914

Upper Mid-Band Spectrum for 6G: Vision, Opportunity and Challenges

Driven by the pursuit of gigabit-per-second data speeds for future 6G mobile networks, in addition to the support of sensing and artificial intelligence applications, the industry is expanding beyond crowded sub-6 GHz bands with innovative new spectrum allocations. In this paper, we chart a compelling vision for 6G within the frequency range 3 (FR3) spectrum, i.e. $7.125$-$24.25$ $\GHz$, by delving into its key enablers and addressing the multifaceted challenges that lie ahead for these new frequency bands. Here we highlight the physical properties of this never-before used spectrum for cellular by reviewing recent channel measurements for outdoor and indoor environments, including path loss, delay and angular spreads, and material penetration loss, all which offer insights that underpin future 5G/6G wireless communication designs. Building on the fundamental knowledge of the channel properties, we explore FR3 spectrum agility strategies that balance coverage and capacity tradeoffs, while examining coexistence with incumbent systems, such as satellites, radio astronomy, and earth exploration. Moreover, we discuss the potential of massive multiple-input multiple-output technologies, challenges for commercial deployment, and potential solutions for FR3, including multiband sensing for FR3 integrated sensing and communications. Finally, we outline 6G standardization features that are likely to emerge from 3GPP radio frame innovations and open radio access network developments.


[74] 2503.14779

Involution and BSConv Multi-Depth Distillation Network for Lightweight Image Super-Resolution

Single-image super-resolution (SISR) is a fundamental problem in computer vision that aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs. Although convolutional neural networks (CNNs) have achieved substantial advancements, deeper architectures often introduce excessive parameters, higher memory usage, and computational cost, limiting their applicability on resource-constrained devices. Recent research has thus focused on lightweight architectures that preserve accuracy while reducing complexity. This paper presents the Involution and BSConv Multi-Depth Distillation Network (IBMDN), a lightweight and effective architecture for SISR. The proposed IBMDN comprises Involution and BSConv Multi-Depth Distillation Blocks (IBMDB) and a Contrast and High-Frequency Attention Block (CHFAB). IBMDB employs varying combinations of Involution and BSConv at multiple depths to perform efficient feature extraction while minimizing computational complexity. CHFAB, a lightweight self-attention mechanism, focuses on extracting high-frequency and contrast information to enhance perceptual quality in the reconstructed images. The flexible design of IBMDB enables it to be seamlessly integrated into diverse SISR frameworks, including information distillation, transformer-based, and GAN-based models. Extensive experiments demonstrate that incorporating IBMDB significantly reduces memory usage, parameters, and floating-point operations (FLOPs), while achieving improvements in both pixel-wise accuracy and visual quality. The source code is available at: this https URL.


[75] 2503.15054

Joint Design of Radar Receive Filter and Unimodular ISAC Waveform with Sidelobe Level Control

Integrated sensing and communication (ISAC) has been considered a key feature of next-generation wireless networks. This paper investigates the joint design of the radar receive filter and dual-functional transmit waveform for the multiple-input multiple-output (MIMO) ISAC system. While optimizing the mean square error (MSE) of the radar receive spatial response and maximizing the achievable rate at the communication receiver, besides the constraints of full-power radar receiving filter and unimodular transmit sequence, we control the maximum range sidelobe level, which is often overlooked in existing ISAC waveform design literature, for better radar imaging performance. To solve the formulated optimization problem with convex and nonconvex constraints, we propose an inexact augmented Lagrangian method (ALM) algorithm. For each subproblem in the proposed inexact ALM algorithm, we custom-design a block successive upper-bound minimization (BSUM) scheme with closed-form solutions for all blocks of variable to enhance the computational efficiency. Convergence analysis shows that the proposed algorithm is guaranteed to provide a stationary and feasible solution. Extensive simulations are performed to investigate the impact of different system parameters on communication and radar imaging performance. Comparison with the existing works shows the superiority of the proposed algorithm.


[76] 2504.01585

Nonlinear Bandwidth and Bode Diagrams based on Scaled Relative Graphs

Scaled Relative Graphs (SRGs) provide a novel graphical frequency-domain method for the analysis of Nonlinear (NL) systems. In this paper, we restrict the SRG to particular input spaces to compute frequency-dependent incremental gain bounds for nonlinear systems. This leads to a NL generalization of the Bode diagram, where the sinusoidal, harmonic, and subharmonic inputs are considered separately. When applied to the analysis of the NL loop transfer and sensitivity, we define a notion of bandwidth for both the open-loop and closed-loop, compatible with the Linear Time-Invariant (LTI) definitions. We illustrate the power of our method on the analysis of a DC motor with a parasitic nonlinearity and verify our results in simulations.


[77] 2504.16048

PRIME: Fast Primal-Dual Feedback Optimization for Markets with Application to Optimal Power Flow

Online Feedback Optimization (OFO) controllers iteratively drive a plant to an optimal operating point that satisfies input and output constraints, relying solely on the input-output sensitivity as model information. This paper introduces PRIME (PRoximal Iterative MarkEts), a novel OFO approach based on proximal-point iterations. Unlike existing OFO solutions, PRIME admits a market-based implementation, where self-interested actors are incentivized to make choices that result in safe and efficient operation, without communicating private costs or constraints. Furthermore, PRIME can handle non-smooth objective functions, achieve fast convergence rates and rapid constraint satisfaction, and effectively reject measurement noise. We demonstrate PRIME on an AC optimal power flow problem, obtaining an efficient real-time nonlinear local marginal pricing scheme.


[78] 2505.01625

Embracing Diffraction: A Paradigm Shift in Wireless Sensing and Communication

Wireless signals are integral to modern society, enabling both communication and increasingly, environmental sensing. While various propagation models exist, ranging from empirical methods to full-wave simulations, the phenomenon of electromagnetic diffraction is often treated as a secondary effect or a correction factor. This paper positions diffraction as a fundamentally important and underutilized mechanism that is rich with information about the physical environment. Specifically, diffraction-inducing elements generate distinct signatures that are rich with information about their underlying properties such as their geometries. We then argue that by understanding and exploiting these relationships, diffraction can be harnessed strategically. We introduce a general optimization framework to formalize this concept, illustrating how diffraction can be leveraged for both inverse problems (sensing scene details such as object geometries from measured fields) and design problems (shaping radio frequency (RF) fields for communication objectives by configuring diffracting elements). Focusing primarily on edge diffraction and Keller's Geometrical Theory of Diffraction (GTD), we discuss specific applications in RF sensing for scene understanding and in communications for RF field programming, drawing upon recent work. Overall, this paper lays out a vision for systematically incorporating diffraction into the design and operation of future wireless systems, paving the way for enhanced sensing capabilities and more robust communication strategies.


[79] 2505.24001

Multi-output Classification using a Cross-talk Architecture for Compound Fault Diagnosis of Motors in Partially Labeled Condition

The increasing complexity of rotating machinery and the diversity of operating conditions, such as rotating speed and varying torques, have amplified the challenges in fault diagnosis in scenarios requiring domain adaptation, particularly involving compound faults. This study addresses these challenges by introducing a novel multi-output classification (MOC) framework tailored for domain adaptation in partially labeled target datasets. Unlike conventional multi-class classification (MCC) approaches, the MOC framework classifies the severity levels of compound faults simultaneously. Furthermore, we explore various single-task and multi-task architectures applicable to the MOC formulation-including shared trunk and cross-talk-based designs-for compound fault diagnosis under partially labeled conditions. Based on this investigation, we propose a novel cross-talk architecture, residual neural dimension reductor (RNDR), that enables selective information sharing across diagnostic tasks, effectively enhancing classification performance in compound fault scenarios. In addition, frequency-layer normalization was incorporated to improve domain adaptation performance on motor vibration data. Compound fault conditions were implemented using a motor-based test setup and evaluated across six domain adaptation scenarios. The experimental results demonstrate its superior macro F1 performance compared to baseline models. We further showed that the structural advantage of RNDR is more pronounced in compound fault settings through a single-fault comparison. We also found that frequency-layer normalization fits the fault diagnosis task better than conventional methods. Lastly, we analyzed the RNDR with various conditions, other models with increased number of parameters, and compared with the ablated RNDR structure.


[80] 2506.21375

Integrating Movable Antennas and Intelligent Reflecting Surfaces for Coverage Enhancement

This paper investigates an intelligent reflecting surface (IRS)-aided movable antenna (MA) system, where multiple IRSs cooperate with a multi-MA base station to extend wireless coverage to multiple target areas. The objective is to maximize the worst-case signal-to-noise ratio (SNR) across all locations within these areas through joint optimization of MA positions, IRS phase shifts, and transmit beamforming. To achieve this while balancing the performance-cost trade-off, we propose three coverage-enhancement schemes: the area-adaptive MA-IRS scheme, where both MA positions and IRS phase shifts are adaptively adjusted for each target area; the area-adaptive MA-staIRS scheme, where only MA positions are adjusted, while IRS phase shifts remain unchanged after initial configuration (with staIRS denoting static IRSs); and the shared MA-staIRS scheme, where a common MA placement and static IRS configuration are applied across all areas. These schemes lead to challenging non-convex optimization problems with implicit objectives, which are difficult to solve optimally. To address these problems, we propose a general algorithmic framework that can solve each problem efficiently albeit suboptimally. Simulation results demonstrate that: 1) the proposed MA-based schemes consistently outperform their fixed-position antenna (FPA)-based counterparts under both area-adaptive and static IRS configurations, with the area-adaptive MA-IRS scheme achieving the best worst-case SNR; 2) as transmit antennas are typically far fewer than IRS elements, the area-adaptive MA-staIRS scheme may underperform the baseline FPA scheme with area-adaptive IRSs in worst-case SNR, but a modest increase in antenna number can reverse this; 3) under a fixed total cost, the optimal MA-to-IRS-element ratio for worst-case SNR maximization is empirically found to be proportional to the reciprocal of their unit cost ratio.


[81] 2506.22411

19.3 GHz Acoustic Filter with High Close-in Rejection in Tri-layer Thin-Film Lithium Niobate

Acoustic filters are preferred front-end solutions at sub-6 GHz due to their superior frequency selectivity compared to electromagnetic (EM) counterparts. With the ongoing development of 5G and the evolution toward 6G, there is a growing need to extend acoustic filter technologies into frequency range 3 (FR3), which spans 7 to 24 GHz to accommodate emerging high-frequency bands. However, scaling acoustic filters beyond 10 GHz presents significant challenges, as conventional platforms suffer from increased insertion loss (IL) and degraded out-of-band (OoB) rejection at higher frequencies. Recent innovations have led to the emergence of periodically poled piezoelectric lithium niobate (P3F LN) laterally excited bulk acoustic resonators (XBARs), offering low-loss and high electromechanical coupling performance above 10 GHz. This work presents the first tri-layer P3F LN filter operating at 19.3 GHz, achieving a low IL of 2.2 dB, a 3-dB fractional bandwidth (FBW) of 8.5%, and an impressive 49 dB close in rejection. These results demonstrate strong potential for integration into FR3 diplexers.


[82] 2507.15031

Safety Controller Synthesis for Stochastic Networked Systems under Communication Constraints

This paper develops a framework for synthesizing safety controllers for discrete-time stochastic linear control systems (dt-SLS) operating under communication imperfections. The control unit is remote and communicates with the sensor and actuator through an imperfect wireless network. We consider a constant delay in the sensor-to-controller channel (uplink), and data loss in both sensor-to-controller and controller-to-actuator (downlink) channels. In our proposed scheme, data loss in each channel is modeled as an independent Bernoulli-distributed random process. To systematically handle the uplink delay, we first introduce an augmented discrete-time stochastic linear system (dt-ASLS) by concatenating all states and control inputs that sufficiently represent the state-input evolution of the original dt-SLS under the delay and packet loss constraints. We then leverage control barrier certificates for dt-ASLS to synthesize a controller that ensures the stochastic safety of dt-SLS, guaranteeing that all trajectories remain outside unsafe regions with a quantified probabilistic bound. Our approach translates safety constraints into matrix inequalities, leading to an optimization problem that eventually quantifies the probability of satisfying the safety specification in the presence of communication imperfections. We validate our results on an RLC circuit subject to both constant delay and probabilistic data loss.


[83] 2507.17325

Grid impedance estimation based Kalman Filter

Modern power systems face new operational hurdles due to the increasing adoption of inverter-coupled distributed energy resources, which impact system stability and control. Central to these challenges is the dynamic nature of grid impedance. To address this, a novel real-time estimation algorithm based on the Discrete Fourier Transform is proposed. This algorithm is embedded within an Advanced Angle Estimation Kalman Filter framework that employs a Linear Quadratic Regulator for current control (AAEKF-LQR). The impedance data directly informs and refines the controller's phase angle estimation. Simulation analyses demonstrate robust collaboration between the estimator and controller, sustaining system stability under weak grid conditions. The technique proves capable of delivering swift and accurate impedance updates during grid variations, which is crucial for maintaining stable inverter operation


[84] 2508.02437

On the Equivalence of Koopman Eigenfunctions and Commuting Symmetries

The Koopman operator framework offers a way to represent a nonlinear system as a linear one. The key to this simplification lies in the identification of eigenfunctions. While various data-driven algorithms have been developed for this problem, a theoretical characterization of Koopman eigenfunctions from geometric properties of the flow is still missing. This paper provides such a characterization by establishing an equivalence between a set of Koopman eigenfunctions and a set of commuting symmetries -- both assumed to span the tangent spaces at every point on a simply connected open set. Based on this equivalence, we build an explicit and convergent formula for the principal Koopman eigenfunctions defined on the region of attraction of a locally asymptotically stable equilibrium point, thereby offering a constructive formula to compute Koopman eigenfunctions.


[85] 2508.07002

Joint Transmit and Pinching Beamforming Design for Pinching Antenna-assisted Symbiotic Radio

This paper investigates a novel downlink symbiotic radio framework enabled by the pinching antenna system (PASS), designed to enhance both primary and secondary transmissions through reconfigurable antenna positioning. This reconfigurability introduces additional degrees of freedom for adaptive pinching beamforming, thereby enabling constructive signal enhancement and interference suppression tailored to the locations of the backscatter device, the Internet of Things (IoT) receiver, and the primary receivers. To fully exploit these benefits, we formulate a joint transmit and pinching beamforming optimization problem that maximizes the achievable sum rate while satisfying the IoT receiver's detection error probability constraint and feasible deployment constraints for the pinching antennas. The resulting problem is inherently nonconvex and highly coupled. To address this challenge, we develop two complementary solution approaches. The first is a learning-aided gradient descent method, where the constrained optimization is reformulated into a differentiable form and solved through end-to-end learning. In this approach, the pinching antenna position matrix is reparameterized to automatically satisfy minimum spacing constraints, while transmit power and waveguide length limits are enforced via projection and normalization. The second approach is an optimization-based successive convex approximation-particle swarm optimization method, which first determines the transmit beamforming solution using successive convex approximation and subsequently optimizes pinching beamforming via a particle swarm optimization search over candidate pinching antenna placements.


[86] 2508.19408

1-Bit Unlimited Sampling Beyond Fourier Domain: Low-Resolution Sampling of Quantization Noise

Analog-to-digital converters (ADCs) play a critical role in digital signal acquisition across various applications, but their performance is inherently constrained by sampling rates and bit budgets. This bit budget imposes a trade-off between dynamic range (DR) and digital resolution, with ADC energy consumption scaling linearly with sampling rate and exponentially with bit depth. To bypass this, numerous approaches, including oversampling with low-resolution ADCs, have been explored. A prominent example is 1-Bit ADCs with Sigma-Delta Quantization (SDQ), a widely used consumer-grade solution. However, SDQs suffer from overloading or saturation issues, limiting their ability to handle inputs with arbitrary DR. The Unlimited Sensing Framework (USF) addresses this challenge by injecting modulo non-linearity in hardware, resulting in a new digital sensing technology. In this paper, we introduce a novel 1-Bit sampling architecture that extends both conventional 1-Bit SDQ and USF. Our contributions are twofold: (1) We generalize the concept of noise shaping beyond the Fourier domain, allowing the inclusion of non-bandlimited signals in the Fourier domain but bandlimited in alternative transform domains. (2) Building on this generalization, we develop a new transform-domain recovery method for 1-Bit USF. When applied to the Fourier domain, our method demonstrates superior performance compared to existing time-domain techniques, offering reduced oversampling requirements and improved robustness. Extensive numerical experiments validate our findings, laying the groundwork for a broader generalization of 1-Bit sampling systems.


[87] 2508.21035

A multi-task neural network for atypical mitosis recognition under domain shift

Recognizing atypical mitotic figures in histopathology images allows physicians to correctly assess tumor aggressiveness. Although machine learning models could be exploited for automatically performing such a task, under domain shift these models suffer from significative performance drops. In this work, an approach based on multi-task learning is proposed for addressing this problem. By exploiting auxiliary tasks, correlated to the main classification task, the proposed approach, submitted to the track 2 of the MItosis DOmain Generalization (MIDOG) challenge, aims to aid the model to focus only on the object to classify, ignoring the domain varying background of the image. The proposed approach shows promising performance in a preliminary evaluation conducted on three distinct datasets, i.e., the MIDOG 2025 Atypical Training Set, the Ami-Br dataset, as well as the preliminary test set of the MIDOG25 challenge.


[88] 2509.01777

Maximally Resilient Controllers under Temporal Logic Specifications

In this paper, we consider the notion of resilience of a dynamical system, defined by the maximum disturbance a controlled dynamical system can withstand while satisfying given temporal logic specifications. Given a dynamical system and a specification, the objective is to synthesize the controller such that the closed-loop system satisfies this specification while maximizing its resilience. The problem is formulated as a robust optimization program where the objective is to compute the maximum resilience while simultaneously synthesizing the corresponding controller parameters. For linear systems and linear controllers, exact solutions are provided for the class of time-varying polytopic specifications. For the case of nonlinear systems, nonlinear controllers and more general specifications, we leverage tools from the scenario optimization approach, offering a probabilistic guarantee of the solution as well as computational feasibility. Different case studies are presented to illustrate the theoretical results.


[89] 2509.06361

Speaker Privacy and Security in the Big Data Era: Protection and Defense against Deepfake

In the era of big data, remarkable advancements have been achieved in personalized speech generation techniques that utilize speaker attributes, including voice and speaking style, to generate deepfake speech. This has also amplified global security risks from deepfake speech misuse, resulting in considerable societal costs worldwide. To address the security threats posed by deepfake speech, techniques have been developed focusing on both the protection of voice attributes and the defense against deepfake speech. Among them, the voice anonymization technique has been developed to protect voice attributes from extraction for deepfake generation, while deepfake detection and watermarking have been utilized to defend against the misuse of deepfake speech. This paper provides a short and concise overview of the three techniques, describing the methodologies, advancements, and challenges. A comprehensive version, offering additional discussions, will be published in the near future.


[90] 2401.02081

Joint Waveform Design for MIMO-OFDM DFRC Systems

Dual-functional radar-communication (DFRC) has attracted considerable attention. This paper considers the frequency-selective multipath fading environment and proposes DFRC waveform design schemes based on multiple-input and multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) techniques. In the proposed waveform design schemes, the Cramer-Rao bound (CRB) of the radar system, the inter-stream interference (ISI) and the achievable rate of the communication system, are respectively considered as the performance metrics. In this paper, we focus on the performance trade-off between the radar system and the communication system, and the optimization problems are formulated. In the ISI minimization based waveform design scheme, the optimization problem is convex and can be easily solved. In the achievable rate maximization based waveform design scheme, we propose a water-filling (WF) and sequential quadratic programming (SQP) based algorithm to derive the covariance matrix and the precoding matrix. Simulation results validate the proposed DFRC waveform designs and show that the achievable rate maximization based scheme has a better performance than the ISI minimization based scheme.


[91] 2405.15454

Linearly Controlled Language Generation with Performative Guarantees

The increasing prevalence of Large Language Models (LMs) in critical applications highlights the need for controlled language generation strategies that are not only computationally efficient but that also enjoy performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. In particular, we take the view that natural language generation traces a trajectory in this continuous semantic space, realized by the language model's hidden activations. This view permits a control-theoretic treatment of text generation in latent space, in which we propose a lightweight, gradient-free intervention that dynamically steers trajectories away from regions corresponding to undesired meanings. In particular, we propose to directly intervene the activations of the token that is being generated in embedding space in an online fashion. Crucially, we do not simply steer activations towards a desirable region. Instead, our method relies on classical techniques from control theory to precisely control activations in a context-dependent way, and guarantees that they are brought into a specific pre-defined region of embedding space that corresponds to allowed semantics. Our intervention is computed in closed-form according to an optimal controller formulation, minimally impacting generation time. This control of the activations in embedding space allows for fine-grained steering of attributes of the generated sequence. We demonstrate the effectiveness of our approach on different objectives -- toxicity avoidance and sentiment control -- while maintaining text quality.


[92] 2407.05643

Revisiting XL-MIMO Channel Estimation: When Dual-Wideband Effects Meet Near Field

The deployment of extremely large antenna arrays (ELAAs) in extremely large-scale multiple-input multiple-output (XL-MIMO) systems introduces significant near-field effects, such as spherical wavefront propagation and spatially non-stationary (SnS) properties. When combined with the dual-wideband effects inherent to wideband systems, these phenomena fundamentally alter the channel's sparsity patterns in the angular-delay domain, rendering existing estimation methods insufficient. To address these challenges, this paper reconsiders the channel estimation problem for wideband XL-MIMO systems. Leveraging the spatial-chirp property of array responses, we first quantitatively characterize the angular-delay domain sparsity of wideband XL-MIMO channels, revealing both global block sparsity and local common-delay sparsity. To effectively capture this structured sparsity, we then propose a novel column-wise hierarchical prior model that integrates a precision sharing mechanism and a Markov random field (MRF) structure. Building on this prior model, the channel estimation task is formulated as a multiple measurement vector (MMV)-based Bayesian inference problem. Tailored to the complex factor graph induced by this hierarchical prior, we develop a MMV-based hybrid message passing (MMV-HMP) algorithm. This algorithm performs message updates along the edges of the factor graph, and selectively applies either the variational message passing (VMP) or sum-product (SP) rules, depending on the factor-node structure and message tractability. Simulation results validate the effectiveness of the proposed column-wise hierarchical prior model through ablation studies and demonstrate that the MMV-HMP algorithm, while maintaining moderate computational complexity, consistently outperforms existing baselines which fail to capture the structured sparsity of wideband XL-MIMO channels.


[93] 2410.11633

Grover Adaptive Search with Spin Variables

This paper presents a novel approach to Grover adaptive search (GAS) for a combinatorial optimization problem whose objective function involves spin variables. While the GAS algorithm with a conventional design of a quantum dictionary subroutine handles a problem associated with an objective function with binary variables $\{0,1\}$, we reformulate the problem using spin variables $\{+1,-1\}$ to simplify the algorithm. Specifically, we introduce a novel quantum dictionary subroutine that is designed for this spin-based formulation. A key benefit of this approach is the substantial reduction in the number of CNOT gates required to construct the quantum circuit. We theoretically demonstrate that, for certain problems, our proposed approach can reduce the gate complexity from an exponential order to a polynomial order, compared to the conventional binary-based approach. This improvement has the potential to enhance the scalability and efficiency of GAS, particularly in larger quantum computations.


[94] 2411.13860

Decoupled Sparse Priors Guided Diffusion Compression Model for Point Clouds

Lossy compression methods rely on an autoencoder to transform a point cloud into latent points for storage, leaving the inherent redundancy of latent representations unexplored. To reduce redundancy in latent points, we propose a sparse priors guided method that achieves high reconstruction quality, especially at high compression ratios. This is accomplished by a dual-density scheme separately processing the latent points (intended for reconstruction) and the decoupled sparse priors (intended for storage). Our approach features an efficient dual-density data flow that relaxes size constraints on latent points, and hybridizes a progressive conditional diffusion model to encapsulate essential details for reconstruction within the conditions, which are decoupled hierarchically to intra-point and inter-point priors. Specifically, our method encodes the original point cloud into latent points and decoupled sparse priors through separate encoders. Latent points serve as intermediates, while sparse priors act as adaptive conditions. We then employ a progressive attention-based conditional denoiser to generate latent points conditioned on the decoupled priors, allowing the denoiser to dynamically attend to geometric and semantic cues from the priors at each encoding and decoding layer. Additionally, we integrate the local distribution into the arithmetic encoder and decoder to enhance local context modeling of the sparse points. The original point cloud is reconstructed through a point decoder. Compared to state-of-the-art, our method obtains superior rate-distortion trade-off, evidenced by extensive evaluations on the ShapeNet dataset and standard test datasets from MPEG group including 8iVFB, and Owlii.


[95] 2502.19548

When Large Language Models Meet Speech: A Survey on Integration Approaches

Recent advancements in large language models (LLMs) have spurred interest in expanding their application beyond text-based tasks. A large number of studies have explored integrating other modalities with LLMs, notably speech modality, which is naturally related to text. This paper surveys the integration of speech with LLMs, categorizing the methodologies into three primary approaches: text-based, latent-representation-based, and audio-token-based integration. We also demonstrate how these methods are applied across various speech-related applications and highlight the challenges in this field to offer inspiration for


[96] 2503.13246

Highly Efficient Direct Analytics on Semantic-aware Time Series Data Compression

Semantic communication has emerged as a promising paradigm to tackle the challenges of massive growing data traffic and sustainable data communication. It shifts the focus from data fidelity to goal-oriented or task-oriented semantic transmission. While deep learning-based methods are commonly used for semantic encoding and decoding, they struggle with the sequential nature of time series data and high computation cost, particularly in resource-constrained IoT environments. Data compression plays a crucial role in reducing transmission and storage costs, yet traditional data compression methods fall short of the demands of goal-oriented communication systems. In this paper, we propose a novel method for direct analytics on time series data compressed by the SHRINK compression algorithm. Through experimentation using outlier detection as a case study, we show that our method outperforms baselines running on uncompressed data in multiple cases, with merely 1% difference in the worst case. Additionally, it achieves four times lower runtime on average and accesses approximately 10% of the data volume, which enables edge analytics with limited storage and computation power. These results demonstrate that our approach offers reliable, high-speed outlier detection analytics for diverse IoT applications while extracting semantics from time-series data, achieving high compression, and reducing data transmission.


[97] 2503.16833

The Model Hears You: Audio Language Model Deployments Should Consider the Principle of Least Privilege

The latest Audio Language Models (Audio LMs) process speech directly instead of relying on a separate transcription step. This shift preserves detailed information, such as intonation or the presence of multiple speakers, that would otherwise be lost in transcription. However, it also introduces new safety risks, including the potential misuse of speaker identity cues and other sensitive vocal attributes, which could have legal implications. In this paper, we urge a closer examination of how these models are built and deployed. Our experiments show that end-to-end modeling, compared with cascaded pipelines, creates socio-technical safety risks such as identity inference, biased decision-making, and emotion detection. This raises concerns about whether Audio LMs store voiceprints and function in ways that create uncertainty under existing legal regimes. We then argue that the Principle of Least Privilege should be considered to guide the development and deployment of these models. Specifically, evaluations should assess (1) the privacy and safety risks associated with end-to-end modeling; and (2) the appropriate scope of information access. Finally, we highlight related gaps in current audio LM benchmarks and identify key open research questions, both technical and policy-related, that must be addressed to enable the responsible deployment of end-to-end Audio LMs.


[98] 2504.18031

Joint Resource Estimation and Trajectory Optimization for eVTOL-involved CR network: A Monte Carlo Tree Search-based Approach

Electric Vertical Take-Off and Landing (eVTOL) aircraft, pivotal to Advanced Air Mobility (AAM), are emerging as a transformative transportation paradigm with the potential to redefine urban and regional mobility. While these systems offer unprecedented efficiency in transporting people and goods, they rely heavily on computation capability, safety-critical operations such as real-time navigation, environmental sensing, and trajectory tracking--necessitating robust offboard computational support. A widely adopted solution involves offloading these tasks to terrestrial base stations (BSs) along the flight path. However, air-to-ground connectivity is often constrained by spectrum conflicts with terrestrial users, which poses a significant challenge to maintaining reliable task execution. Cognitive radio (CR) techniques offer promising capabilities for dynamic spectrum access, making them a natural fit for addressing this issue. Existing studies often overlook the time-varying nature of BS resources, such as spectrum availability and CPU cycles, which leads to inaccurate trajectory planning, suboptimal offloading success rates, excessive energy consumption, and operational delays. To address these challenges, we propose a trajectory optimization framework for eVTOL swarms that maximizes task offloading success probability while minimizing both energy consumption and resource competition (e.g., spectrum and CPU cycles) with primary terrestrial users. The proposed algorithm integrates a Multi-Armed Bandit (MAB) model to dynamically estimate BS resource availability and a Monte Carlo Tree Search (MCTS) algorithm to determine optimal offloading decisions, selecting both the BSs and access time windows that align with energy and temporal constraints.


[99] 2506.00681

Learning to Upsample and Upmix Audio in the Latent Domain

Neural audio autoencoders create compact latent representations that preserve perceptually important information, serving as the foundation for both modern audio compression systems and generation approaches like next-token prediction and latent diffusion. Despite their prevalence, most audio processing operations, such as spatial and spectral up-sampling, still inefficiently operate on raw waveforms or spectral representations rather than directly on these compressed representations. We propose a framework that performs audio processing operations entirely within an autoencoder's latent space, eliminating the need to decode to raw audio formats. Our approach dramatically simplifies training by operating solely in the latent domain, with a latent L1 reconstruction term, augmented by a single latent adversarial discriminator. This contrasts sharply with raw-audio methods that typically require complex combinations of multi-scale losses and discriminators. Through experiments in bandwidth extension and mono-to-stereo up-mixing, we demonstrate computational efficiency gains of up to 100x while maintaining quality comparable to post-processing on raw audio. This work establishes a more efficient paradigm for audio processing pipelines that already incorporate autoencoders, enabling significantly faster and more resource-efficient workflows across various audio tasks.


[100] 2507.07867

Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders

Neural audio codecs and autoencoders have emerged as versatile models for audio compression, transmission, feature-extraction, and latent-space generation. However, a key limitation is that most are trained to maximize reconstruction fidelity, often neglecting the specific latent structure necessary for optimal performance in diverse downstream applications. We propose a simple, post-hoc framework to address this by modifying the bottleneck of a pre-trained autoencoder. Our method introduces a "Re-Bottleneck", an inner bottleneck trained exclusively through latent space losses to instill user-defined structure. We demonstrate the framework's effectiveness in three experiments. First, we enforce an ordering on latent channels without sacrificing reconstruction quality. Second, we align latents with semantic embeddings, analyzing the impact on downstream diffusion modeling. Third, we introduce equivariance, ensuring that a filtering operation on the input waveform directly corresponds to a specific transformation in the latent space. Ultimately, our Re-Bottleneck framework offers a flexible and efficient way to tailor representations of neural audio models, enabling them to seamlessly meet the varied demands of different applications with minimal additional training.


[101] 2507.21625

Real-Time Gradient Waveform Design for Arbitrary $k$-Space Trajectories

\textbf{Objective: }To develop a real-time method for designing gradient waveforms for arbitrary $k$-space trajectories that are time-optimal and hardware-compliant. \textbf{Methods: }The gradient waveform is solved recursively under both the slew-rate and the trajectory constraints. The gradient constraint is enforced by thresholding the $\ell_2$-norm of the next gradient vector. The constraints form a quadratic equation. To ensure the existence of the solution, a novel Discrete-Time Forward and Backward Sweep (DTFBS) strategy is proposed. To ensure the existence of the trajectory derivatives, the trajectory function is reparameterized as a piecewise cubic polynomial function with $C^2$ continuity. To ensure trajectory fidelity, the output gradient waveform is reparameterized by the finite difference of the trajectory samples. Simulation experiments across seven commonly adopted non-Cartesian trajectories were conducted to validate generality, time-optimality, real-time capability, slew-rate accuracy, and improvements over prior work. Imaging feasibility of the designed time-optimal gradient waveform was validated in phantom and in vivo experiments. \textbf{Results: }The proposed method achieves a $>89\%$ reduction in computation time and simultaneously reduces slew-rate overshoot by $>98\%$ compared to the prior method across all involved trajectories. The computation time of the proposed method is shorter than the gradient duration for all tested cases, validating the real-time capability of the proposed method. \textbf{Conclusions: }The proposed method enables real-time and hardware-compliant gradient waveform design, achieving significant reductions in computation time and slew-rate overshoot compared to the previous method. \textbf{Significance: }This is the first method achieving real-time gradient waveform design for arbitrary $k$-space trajectories.


[102] 2508.04795

Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM

In dialogue transcription pipelines, Large Language Models (LLMs) are frequently employed in post-processing to improve grammar, punctuation, and readability. We explore a complementary post-processing step: enriching transcribed dialogues by adding metadata tags for speaker characteristics such as age, gender, and emotion. Some of the tags are global to the entire dialogue, while some are time-variant. Our approach couples frozen audio foundation models, such as Whisper or WavLM, with a frozen LLAMA language model to infer these speaker attributes, without requiring task-specific fine-tuning of either model. Using lightweight, efficient connectors to bridge audio and language representations, we achieve competitive performance on speaker profiling tasks while preserving modularity and speed. Additionally, we demonstrate that a frozen LLAMA model can compare x-vectors directly, achieving an Equal Error Rate of 8.8% in some scenarios.


[103] 2508.05702

Grid-Agent: An LLM-Powered Multi-Agent System for Power Grid Control

Modern power grids face unprecedented complexity from Distributed Energy Resources (DERs), Electric Vehicles (EVs), and extreme weather, while also being increasingly exposed to cyberattacks that can trigger grid violations. This paper introduces Grid-Agent, an autonomous AI-driven framework that leverages Large Language Models (LLMs) within a multi-agent system to detect and remediate violations. Grid-Agent integrates semantic reasoning with numerical precision through modular agents: a planning agent generates coordinated action sequences using power flow solvers, while a validation agent ensures stability and safety through sandboxed execution with rollback mechanisms. To enhance scalability, the framework employs an adaptive multi-scale network representation that dynamically adjusts encoding schemes based on system size and complexity. Violation resolution is achieved through optimizing switch configurations, battery deployment, and load curtailment. Our experiments on IEEE and CIGRE benchmark networks, including the IEEE 69-bus, CIGRE MV, IEEE 30-bus test systems, demonstrate superior mitigation performance, highlighting Grid-Agent's suitability for modern smart grids requiring rapid, adaptive response.


[104] 2508.09788

HingeNet: A Harmonic-Aware Fine-Tuning Approach for Beat Tracking

Fine-tuning pre-trained foundation models has made significant progress in music information retrieval. However, applying these models to beat tracking tasks remains unexplored as the limited annotated data renders conventional fine-tuning methods ineffective. To address this challenge, we propose HingeNet, a novel and general parameter-efficient fine-tuning method specifically designed for beat tracking tasks. HingeNet is a lightweight and separable network, visually resembling a hinge, designed to tightly interface with pre-trained foundation models by using their intermediate feature representations as input. This unique architecture grants HingeNet broad generalizability, enabling effective integration with various pre-trained foundation models. Furthermore, considering the significance of harmonics in beat tracking, we introduce harmonic-aware mechanism during the fine-tuning process to better capture and emphasize the harmonic structures in musical signals. Experiments on benchmark datasets demonstrate that HingeNet achieves state-of-the-art performance in beat and downbeat tracking


[105] 2508.10587

Self-Supervised Temporal Super-Resolution of Energy Data using Generative Adversarial Transformer

To bridge the temporal granularity gap in energy network design and operation based on Energy System Models, resampling of time series is required. While conventional upsampling methods are computationally efficient, they often result in significant information loss or increased noise. Advanced models such as time series generation models, Super-Resolution models and imputation models show potential, but also face fundamental challenges. The goal of time series generative models is to learn the distribution of the original data to generate high-resolution series with similar statistical characteristics. This is not entirely consistent with the definition of upsampling. Time series Super-Resolution models or imputation models can degrade the accuracy of upsampling because the input low-resolution time series are sparse and may have insufficient context. Moreover, such models usually rely on supervised learning paradigms. This presents a fundamental application paradox: their training requires the high-resolution time series that is intrinsically absent in upsampling application scenarios. To address the mentioned upsampling issue, this paper introduces a new method utilizing Generative Adversarial Transformers (GATs), which can be trained without access to any ground-truth high-resolution data. Compared with conventional interpolation methods, the introduced method can reduce the root mean square error (RMSE) of upsampling tasks by 9%, and the accuracy of a model predictive control (MPC) application scenario is improved by 13%.


[106] 2509.00405

SaD: A Scenario-Aware Discriminator for Speech Enhancement

Generative adversarial network-based models have shown remarkable performance in the field of speech enhancement. However, the current optimization strategies for these models predominantly focus on refining the architecture of the generator or enhancing the quality evaluation metrics of the discriminator. This approach often overlooks the rich contextual information inherent in diverse scenarios. In this paper, we propose a scenario-aware discriminator that captures scene-specific features and performs frequency-domain division, thereby enabling a more accurate quality assessment of the enhanced speech generated by the generator. We conducted comprehensive experiments on three representative models using two publicly available datasets. The results demonstrate that our method can effectively adapt to various generator architectures without altering their structure, thereby unlocking further performance gains in speech enhancement across different scenarios.


[107] 2509.00896

AI-Enhanced Intelligent NIDS Framework: Leveraging Metaheuristic Optimization for Robust Attack Detection and Prevention

In todays rapidly evolving digital landscape, safeguarding network infrastructures against cyberattacks has become a critical priority. This research presents an innovative AI-driven real-time intrusion detection framework designed to enhance network security, particularly in Wireless Sensor Networks (WSNs), Cloud Computing (CC), and Internet of Things (IoT) environments. The system employs classical machine learning models, Logistic Regression, decision trees, and K-Nearest Neighbors, optimized through the novel Energy Valley Optimization (EVO) method using the NSL-KDD dataset. Feature selection significantly reduced the number of input features from 42 to 18, while maintaining strong detection capabilities. The proposed system achieved 98.95 percent. accuracy with Decision Tree, 98.47 percent with K-Nearest Neighbors, and 88.84 percent with Logistic Regression. Moreover, high precision, recall, and F1-scores were attained across all classifiers while substantially reducing training and testing times, making the framework highly suitable for real-time applications. To ensure fair detection across diverse attack types, dataset balancing via Downsampling was applied to address class imbalance challenges. This investigation focuses on the significance of advancing IDSs. in cloud computing and WSNs. Overall, this work advances secure communications by delivering a scalable, low-latency, and high-accuracy intrusion detection solution aligned with the latest trends in artificial intelligence, cybersecurity, and real-time digital networks.