New articles on Electrical Engineering and Systems Science


[1] 2501.10396

AI-Powered Urban Transportation Digital Twin: Methods and Applications

We present a survey paper on methods and applications of digital twins (DT) for urban traffic management. While the majority of studies on the DT focus on its "eyes," which is the emerging sensing and perception like object detection and tracking, what really distinguishes the DT from a traditional simulator lies in its ``brain," the prediction and decision making capabilities of extracting patterns and making informed decisions from what has been seen and perceived. In order to add values to urban transportation management, DTs need to be powered by artificial intelligence and complement with low-latency high-bandwidth sensing and networking technologies. We will first review the DT pipeline leveraging cyberphysical systems and propose our DT architecture deployed on a real-world testbed in New York City. This survey paper can be a pointer to help researchers and practitioners identify challenges and opportunities for the development of DTs; a bridge to initiate conversations across disciplines; and a road map to exploiting potentials of DTs for diverse urban transportation applications.


[2] 2501.10402

SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG

Decoding speech from brain signals is a challenging research problem that holds significant importance for studying speech processing in the brain. Although breakthroughs have been made in reconstructing the mel spectrograms of audio stimuli perceived by subjects at the word or letter level using noninvasive electroencephalography (EEG), there is still a critical gap in precisely reconstructing continuous speech features, especially at the minute level. To address this issue, this paper proposes a State Space Model (SSM) to reconstruct the mel spectrogram of continuous speech from EEG, named SSM2Mel. This model introduces a novel Mamba module to effectively model the long sequence of EEG signals for imagined speech. In the SSM2Mel model, the S4-UNet structure is used to enhance the extraction of local features of EEG signals, and the Embedding Strength Modulator (ESM) module is used to incorporate subject-specific information. Experimental results show that our model achieves a Pearson correlation of 0.069 on the SparrKULee dataset, which is a 38% improvement over the previous baseline.


[3] 2501.10404

Automated Detection of Epileptic Spikes and Seizures Incorporating a Novel Spatial Clustering Prior

A Magnetoencephalography (MEG) time-series recording consists of multi-channel signals collected by superconducting sensors, with each signal's intensity reflecting magnetic field changes over time at the sensor location. Automating epileptic MEG spike detection significantly reduces manual assessment time and effort, yielding substantial clinical benefits. Existing research addresses MEG spike detection by encoding neural network inputs with signals from all channel within a time segment, followed by classification. However, these methods overlook simultaneous spiking occurred from nearby sensors. We introduce a simple yet effective paradigm that first clusters MEG channels based on their sensor's spatial position. Next, a novel convolutional input module is designed to integrate the spatial clustering and temporal changes of the signals. This module is fed into a custom MEEG-ResNet3D developed by the authors, which learns to extract relevant features and classify the input as a spike clip or not. Our method achieves an F1 score of 94.73% on a large real-world MEG dataset Sanbo-CMR collected from two centers, outperforming state-of-the-art approaches by 1.85%. Moreover, it demonstrates efficacy and stability in the Electroencephalographic (EEG) seizure detection task, yielding an improved weighted F1 score of 1.4% compared to current state-of-the-art techniques evaluated on TUSZ, whch is the largest EEG seizure dataset.


[4] 2501.10405

Stochastic resonance in Schmitt trigger and its application towards weak signal detection

This study explores stochastic resonance (SR) in a Schmitt trigger circuit and its application to weak signal detection. SR, a phenomenon where noise synchronizes with weak signals to enhance detectability, was demonstrated using a custom-designed bi-stable Schmitt trigger system. The circuit's bi-stability was validated through hysteresis curve analysis, confirming its suitability for SR studies. Experimental results revealed SR behavior by analyzing signal-to-noise ratio (SNR) responses to noise amplitude variations. Detection experiments were conducted to determine frequency and amplitude of damping sinusoidal pulses. Frequency detection proved effective, albeit with limitations at low frequencies, while amplitude detection faced challenges due to mathematical complexities. Nonetheless, the study highlights SR's potential for weak signal detection, with proposed enhancements to improve detection accuracy. This work underscores the adaptability of classical SR principles to practical detection systems and suggests future applications in advanced detection technologies, including quantum systems.


[5] 2501.10407

RadDet: A Wideband Dataset for Real-Time Radar Spectrum Detection

Real-time detection of radar signals in a wideband radio frequency spectrum is a critical situational assessment function in electronic warfare. Compute-efficient detection models have shown great promise in recent years, providing an opportunity to tackle the spectrum detection problem. However, progress in radar spectrum detection is limited by the scarcity of publicly available wideband radar signal datasets accompanied by corresponding annotations. To address this challenge, we introduce a novel and challenging dataset for radar detection (RadDet), comprising a large corpus of radar signals occupying a wideband spectrum across diverse radar density environments and signal-to-noise ratios (SNR). RadDet contains 40,000 frames, each generated from 1 million in-phase and quadrature (I/Q) samples across a 500 MHz frequency band. RadDet includes 11 classes of radar samples across 6 different SNR settings, 2 radar density environments, and 3 different time-frequency resolutions, with corresponding time-frequency and class annotations. We evaluate the performance of various state-of-the-art real-time detection models on RadDet and a modified radar classification dataset from NIST (NIST-CBRS) to establish a novel benchmark for wideband radar spectrum detection.


[6] 2501.10408

Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition

Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction. Cross-Linguistic SER (CLSER) has been a challenging research problem due to significant variability in linguistic and acoustic features of different languages. In this study, we propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics. These features are fused using a cross-attention transformer (CAT) mechanism during feature extraction. Transfer learning is applied to gain from a source emotional speech dataset to the target corpus for emotion recognition. We use IEMOCAP as the source dataset to train the source model and evaluate the proposed method on seven datasets in five languages (e.g., English, German, Spanish, Italian, and Chinese). We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75% across the seven datasets, with notable performance of 88.69% on EMODB (German language) and 79.48% on EMOVO (Italian language). Our extensive evaluation demonstrates that HuMP-CAT outperforms existing methods across multiple target languages.


[7] 2501.10428

Perception-Guided EEG Analysis: A Deep Learning Approach Inspired by Level of Detail (LOD) Theory

Objective: This study explores a novel deep learning approach for EEG analysis and perceptual state guidance, inspired by Level of Detail (LOD) theory. The goal is to improve perceptual state identification accuracy and advance personalized psychological therapy. Methods: Portable EEG devices and music rhythm signals were used for data collection. LOD theory was applied to dynamically adjust EEG signal processing, extracting core perceptual features. A Unity-based software system integrated EEG data with audio materials. The deep learning model combined a CNN for feature extraction and classification, and a DQN for reinforcement learning to optimize rhythm adjustments. Results: The CNN achieved 94.05% accuracy in perceptual state classification. The DQN guided subjects to target states with a 92.45% success rate, averaging 13.2 rhythm cycles. However, only 50% of users reported psychological alignment with the target state, indicating room for improvement. Discussion: The results validate the potential of LOD-based EEG biofeedback. Limitations include dataset source, label subjectivity, and reward function optimization. Future work will expand to diverse subjects, incorporate varied musical elements, and refine reward functions for better generalization and personalization.


[8] 2501.10436

A flatness-based predictive controller for six-degrees of freedom spacecraft rendezvous

This work presents a closed-loop guidance algorithm for six-degrees of freedom spacecraft rendezvous with a passive target flying in an eccentric orbit. The main assumption is that the chaser vehicle has an attitude control system, based on reaction wheels, providing the necessary torque to change its orientation whereas the number of thrusters is arbitrary. The goal is to design fuel optimal maneuvers while satisfying operational constraints and rejecting disturbances. The proposed method is as follows; first, the coupled translational and angular dynamics are transformed to equivalent algebraic relations using the relative translational states transition matrix and the attitude flatness property. Then, a direct transcription method, based on B-splines parameterization and discretization of time continuous constraints, is developed to obtain a tractable static program. Finally, a Model Predictive Controller, based on linearization around the previously computed solution, is considered to handle disturbances. Numerical results are shown and discussed.


[9] 2501.10437

Chance-constrained Model Predictive Control for Near Rectilinear Halo Orbit spacecraft rendezvous

This work presents a robust Model Predictive Controller (MPC) to solve the problem of spacecraft rendezvous in the context of the restricted three-body problem (R3BP) as will be required to dock with space stations in cislunar space. The employed methodology is both valid for chemical and electric thrusters. By exploiting the state transition matrix and using a chance-constrained approach, the robust MPC assures constraints satisfaction under the presence of disturbances in a probabilistic sense. The perturbations parameters are computed on-line using a disturbance estimator. The robust controller is tested for a rendezvous scenario with a target placed in an Earth-Moon Near-Rectilinear Halo Orbit. Numerical results are shown and discussed.


[10] 2501.10438

Event-Based Impulsive Control for Spacecraft Rendezvous Hovering Phases

This work presents an event-triggered controller for spacecraft rendezvous hovering phases. The goal is to maintain the chaser within a bounded region with respect to the target. The main assumption is that the chaser vehicle has impulsive thrusters. These are assumed to be orientable at any direction and are constrained by dead-zone and saturation bounds. The event-based controller relies on trigger rules deciding when a suitable control law is applied. The local control law consists on a single impulse; therefore the trigger rules design is based on the instantaneous reachability to the admissible set. The final outcome is a very efficient algorithm from both computational burden and footprint perspectives. Because the proposed methodology is based on a single impulse control, the controller invariance is local and assessed through impulsive systems theory. Finally, numerical results are shown and discussed.


[11] 2501.10446

Optimizing a multi-state cold-standby system with multiple vacations in the repair and loss of units

A complex multi-state redundant system with preventive maintenance subject to multiple events is considered. The online unit can undergo several types of failures: internal and those provoked by external shocks. Multiple degradation levels are assumed so as internal and external. Degradation levels are observed by random inspections and if they are major, the unit goes to repair facility where preventive maintenance is carried out. This repair facility is composed of a single repairperson governed by a multiple vacation policy. This policy is set up according to the operational number of units. Two types of task can be performed by the repairperson, corrective repair and preventive maintenance. The times embedded in the system are phase type distributed and the model is built by using Markovian Arrival Processes with marked arrivals. Multiple performance measures besides of the transient and stationary distribution are worked out through matrix-analytic methods. This methodology enables us to express the main results and the global development in a matrix-algorithmic form. To optimize the model costs and rewards are included. A numerical example shows the versatility of the model.


[12] 2501.10447

A Predictive Cooperative Collision Avoidance for Multi-Robot Systems Using Control Barrier Function

Control barrier function (CBF)-based methods provide the minimum modification necessary to formally guarantee safety in the context of quadratic programming, and strict safety guarantee for safety critical systems. However, most CBF-related derivatives myopically focus on present safety at each time step, a reasoning over a look-ahead horizon is exactly missing. In this paper, a predictive safety matrix is constructed. We then consolidate the safety condition based on the smallest eigenvalue of the proposed safety matrix. A predefined deconfliction strategy of motion paths is embedded into the trajectory tracking module to manage deadlock conflicts, which computes the deadlock escape velocity with the minimum attitude angle. Comparison results show that the introduction of the predictive term is robust for measurement uncertainty and is immune to oscillations. The proposed deadlock avoidance method avoids a large detour, without obvious stagnation.


[13] 2501.10592

Analytical Models of Frequency and Voltage in Large-Scale All-Inverter Power Systems

Low-order frequency response models for power systems have a decades-long history in optimization and control problems such as unit commitment, economic dispatch, and wide-area control. With a few exceptions, these models are built upon the Newtonian mechanics of synchronous generators, assuming that the frequency dynamics across a system are approximately homogeneous, and assume the dynamics of nodal voltages for most operating conditions are negligible, and thus are not directly computed at all buses. As a result, the use of system frequency models results in the systematic underestimation of frequency minimum nadir and maximum RoCoF, and provides no insight into the reactive power-voltage dynamics. This paper proposes a low-order model of both frequency and voltage response in grid-forming inverter-dominated power systems. The proposed model accounts for spatial-temporal variations in frequency and voltage behavior across a system and as a result, demonstrates the heterogeneity of frequency response in future renewable power systems. Electromagnetic transient (EMT) simulations are used to validate the utility, accuracy, and computational efficiency of these models, setting the basis for them to serve as fast, scalable alternatives to EMT simulation, especially when dealing with very large-scale systems, for both planning and operational studies.


[14] 2501.10609

Universal Discrete Filtering with Lookahead or Delay

We consider the universal discrete filtering problem, where an input sequence generated by an unknown source passes through a discrete memoryless channel, and the goal is to estimate its components based on the output sequence with limited lookahead or delay. We propose and establish the universality of a family of schemes for this setting. These schemes are induced by universal Sequential Probability Assignments (SPAs), and inherit their computational properties. We show that the schemes induced by LZ78 are practically implementable and well-suited for scenarios with limited computational resources and latency constraints. In passing, we use some of the intermediate results to obtain upper and lower bounds that appear to be new, in the purely Bayesian setting, on the optimal filtering performance in terms, respectively, of the mutual information between the noise-free and noisy sequence, and the entropy of the noise-free sequence causally conditioned on the noisy one.


[15] 2501.10610

Automated Water Irrigation System

This paper presents the design and implementation of an automated water irrigation system aimed at optimizing plant care through precision moisture monitoring and controlled water delivery. The system uses a capacitive soil moisture sensor, an ADC (analog-to-digital converter), and a relay-driven water pump to ensure plants receive adequate hydration based on real-time data. In addition, this work aims to build on existing applications for Raspberry Pi (4B) and Arduino-based automatic irrigation systems by integrating advanced calibration methods, employing optimized algorithms, and introducing new technologies to further enhance overall system efficiency and reliability.


[16] 2501.10654

Efficient Transmission of Radiomaps via Physics-Enhanced Semantic Communications

Enriching information of spectrum coverage, radiomap plays an important role in many wireless communication applications, such as resource allocation and network optimization. To enable real-time, distributed spectrum management, particularly in the scenarios with unstable and dynamic environments, the efficient transmission of spectrum coverage information for radiomaps from edge devices to the central server emerges as a critical problem. In this work, we propose an innovative physics-enhanced semantic communication framework tailored for efficient radiomap transmission based on generative learning models. Specifically, instead of bit-wise message passing, we only transmit the key "semantics" in radiomaps characterized by the radio propagation behavior and surrounding environments, where semantic compression schemes are utilized to reduce the communication overhead. Incorporating the novel concepts of Radio Depth Maps, the radiomaps are reconstructed from the delivered semantic information backboned on the conditional generative adversarial networks. Our framework is further extended to facilitate its implementation in the scenarios of multi-user edge computing, by integrating with federated learning for collaborative model training while preserving the data privacy. Experimental results show that our approach achieves high accuracy in radio coverage information recovery at ultra-high bandwidth efficiency, which has great potentials in many wireless-generated data transmission applications.


[17] 2501.10657

Channel Estimation and Beamforming Design for MF-RIS-Aided Communication Systems

In this letter, we study the beamforming design for channel estimation of multi-functional reconfigurable intelligent surface (MF-RIS)-aided multi-user communications that supports simultaneous signal reflection, refraction, and amplification. A least square (LS) based channel estimator is proposed for MF-RIS by considering both the coupled MF-RIS beams and the introduced thermal noise. With the discrete fourier transform (DFT)-matrix, the MF-RIS beamforming design problem is simplified under the proposed LS channel estimator. The optimal MF-RIS beamforming design that achieves the Cram\'er-Rao lower bound (CRLB) of channel estimator is obtained with the proposed alternating optimization algorithm. Simulation results demonstrate the effectiveness of the proposed beamforming design in reducing the impact of thermal noise.


[18] 2501.10676

Predictive Target-to-User Association in Complex Scenarios via Hybrid-Field ISAC Signaling

This paper presents a novel and robust target-to-user (T2U) association framework to support reliable vehicle-to-infrastructure (V2I) networks that potentially operate within the hybrid field (near-field and far-field). To address the challenges posed by complex vehicle maneuvers and user association ambiguity, an interacting multiple-model filtering scheme is developed, which combines coordinated turn and constant velocity models for predictive beamforming. Building upon this foundation, a lightweight association scheme leverages user-specific integrated sensing and communication (ISAC) signaling while employing probabilistic data association to manage clutter measurements in dense traffic. Numerical results validate that the proposed framework significantly outperforms conventional methods in terms of both tracking accuracy and association reliability.


[19] 2501.10689

Low-Complexity Iterative Precoding Design for Near-field Multiuser Systems With Spatial Non-Stationarity

Extremely large antenna arrays (ELAA) are regarded as a promising technology for supporting sixth-generation (6G) networks. However, the large number of antennas significantly increases the computational complexity in precoding design, even for linearly regularized zero-forcing (RZF) precoding. To address this issue, a series of low-complexity iterative precoding are investigated. The main idea of these methods is to avoid matrix inversion of RZF precoding. Specifically, RZF precoding is equivalent to a system of linear equations that can be solved by fast iterative algorithms, such as random Kaczmarz (RK) algorithm. Yet, the performance of RK-based precoding algorithm is limited by the energy distributions of multiple users, which restricts its application in ELAA-assisted systems. To accelerate the RK-based precoding, we introduce the greedy random Kaczmarz (GRK)-based precoding by using the greedy criterion-based selection strategy. To further reduce the complexity of the GRK-based precoding, we propose a visibility region (VR)-based orthogonal GRK (VR-OGRK) precoding that leverages near-field spatial non-stationarity, which is characterized by the concept of VR. Next, by utilizing the information from multiple hyperplanes in each iteration, we extend the GRK-based precoding to the aggregation hyperplane Kaczmarz (AHK)-based pecoding algorithm, which further enhances the convergence rate. Building upon the AHK algorithm, we propose a VR-based orthogonal AHK (VR-OAHK) precoding to further reduce the computational complexity. Furthermore, the proposed iterative precoding algorithms are proven to converge to RZF globally at an exponential rate. Simulation results show that the proposed algorithms achieve faster convergence and lower computational complexity than benchmark algorithms, and yield very similar performance to the RZF precoding.


[20] 2501.10690

Insights from the application of nonlinear model predictive control to a cart-pendulum

Inspired greatly by Mills et al. (2009) and the solution within, this paper aims to more clearly explain the mathematics and implementation details of such a powerful control algorithm. While the aforementioned paper is well written and of sound mathematics, it is extreamly dense and requires some time and patience to decipher, especially as it draws on many other sources to complete the algorithm. This dense property is a clear result of the paper being restricted to the brief form and important details being ommited as a result. We provide the much needed elaboration here for the benifit of the reader.


[21] 2501.10734

GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems

Automatic Speech Recognition (ASR) systems have demonstrated remarkable performance across various applications. However, limited data and the unique language features of specific domains, such as low-resource languages, significantly degrade their performance and lead to higher Word Error Rates (WER). In this study, we propose Generative Error Correction via Retrieval-Augmented Generation (GEC-RAG), a novel approach designed to improve ASR accuracy for low-resource domains, like Persian. Our approach treats the ASR system as a black-box, a common practice in cloud-based services, and proposes a Retrieval-Augmented Generation (RAG) approach within the In-Context Learning (ICL) scheme to enhance the quality of ASR predictions. By constructing a knowledge base that pairs ASR predictions (1-best and 5-best hypotheses) with their corresponding ground truths, GEC-RAG retrieves lexically similar examples to the ASR transcription using the Term Frequency-Inverse Document Frequency (TF-IDF) measure. This process provides relevant error patterns of the system alongside the ASR transcription to the Generative Large Language Model (LLM), enabling targeted corrections. Our results demonstrate that this strategy significantly reduces WER in Persian and highlights a potential for domain adaptation and low-resource scenarios. This research underscores the effectiveness of using RAG in enhancing ASR systems without requiring direct model modification or fine-tuning, making it adaptable to any domain by simply updating the transcription knowledge base with domain-specific data.


[22] 2501.10757

Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment

Dark-field radiography of the human chest has been demonstrated to have promising potential for the analysis of the lung microstructure and the diagnosis of respiratory diseases. However, previous studies of dark-field chest radiographs evaluated the lung signal only in the inspiratory breathing state. Our work aims to add a new perspective to these previous assessments by locally comparing dark-field lung information between different respiratory states. To this end, we discuss suitable image registration methods for dark-field chest radiographs to enable consistent spatial alignment of the lung in distinct breathing states. Utilizing full inspiration and expiration scans from a clinical chronic obstructive pulmonary disease study, we assess the performance of the proposed registration framework and outline applicable evaluation approaches. Our regional characterization of lung dark-field signal changes between the breathing states provides a proof-of-principle that dynamic radiography-based lung function assessment approaches may benefit from considering registered dark-field images in addition to standard plain chest radiographs.


[23] 2501.10770

Enhancing Diagnostic in 3D COVID-19 Pneumonia CT-scans through Explainable Uncertainty Bayesian Quantification

Accurately classifying COVID-19 pneumonia in 3D CT scans remains a significant challenge in the field of medical image analysis. Although deterministic neural networks have shown promising results in this area, they provide only point estimates outputs yielding poor diagnostic in clinical decision-making. In this paper, we explore the use of Bayesian neural networks for classifying COVID-19 pneumonia in 3D CT scans providing uncertainties in their predictions. We compare deterministic networks and their Bayesian counterpart, enhancing the decision-making accuracy under uncertainty information. Remarkably, our findings reveal that lightweight architectures achieve the highest accuracy of 96\% after developing extensive hyperparameter tuning. Furthermore, the Bayesian counterpart of these architectures via Multiplied Normalizing Flow technique kept a similar performance along with calibrated uncertainty estimates. Finally, we have developed a 3D-visualization approach to explain the neural network outcomes based on SHAP values. We conclude that explainability along with uncertainty quantification will offer better clinical decisions in medical image analysis, contributing to ongoing efforts for improving the diagnosis and treatment of COVID-19 pneumonia.


[24] 2501.10794

Learning to reconstruct signals with inexact sensing operator via knowledge distillation

In computational optical imaging and wireless communications, signals are acquired through linear coded and noisy projections, which are recovered through computational algorithms. Deep model-based approaches, i.e., neural networks incorporating the sensing operators, are the state-of-the-art for signal recovery. However, these methods require exact knowledge of the sensing operator, which is often unavailable in practice, leading to performance degradation. Consequently, we propose a new recovery paradigm based on knowledge distillation. A teacher model, trained with full or almost exact knowledge of a synthetic sensing operator, guides a student model with an inexact real sensing operator. The teacher is interpreted as a relaxation of the student since it solves a problem with fewer constraints, which can guide the student to achieve higher performance. We demonstrate the improvement of signal reconstruction in computational optical imaging for single-pixel imaging with miscalibrated coded apertures systems and multiple-input multiple-output symbols detection with inexact channel matrix.


[25] 2501.10807

FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation

Versatile audio super-resolution (SR) is the challenging task of restoring high-frequency components from low-resolution audio with sampling rates between 4kHz and 32kHz in various domains such as music, speech, and sound effects. Previous diffusion-based SR methods suffer from slow inference due to the need for a large number of sampling steps. In this paper, we introduce FlashSR, a single-step diffusion model for versatile audio super-resolution aimed at producing 48kHz audio. FlashSR achieves fast inference by utilizing diffusion distillation with three objectives: distillation loss, adversarial loss, and distribution-matching distillation loss. We further enhance performance by proposing the SR Vocoder, which is specifically designed for SR models operating on mel-spectrograms. FlashSR demonstrates competitive performance with the current state-of-the-art model in both objective and subjective evaluations while being approximately 22 times faster.


[26] 2501.10814

No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling

3D models are favored over 2D for 3D medical image segmentation tasks due to their ability to leverage inter-slice relationship, yielding higher segmentation accuracy. However, 3D models demand significantly more GPU memory with increased model size and intermediate tensors. A common solution is to use patch-based training and make whole-volume predictions with sliding window (SW) inference. SW inference reduces memory usage but is slower due to equal resource allocation across patches and less accurate as it overlooks global features beyond patches. We propose NMSW-Net (No-More-Sliding-Window-Net), a novel framework that enhances efficiency and accuracy of any given 3D segmentation model by eliminating SW inference and incorporating global predictions when necessary. NMSW-Net incorporates a differentiable Top-k module to sample only the relevant patches that enhance segmentation accuracy, thereby minimizing redundant computations. Additionally, it learns to leverage coarse global predictions when patch prediction alone is insufficient. NMSW-Net is model-agnostic, making it compatible with any 3D segmentation model that previously relied on SW inference. Evaluated across 3 tasks with 3 segmentation backbones, NMSW-Net achieves competitive or sometimes superior accuracy compared to SW, while reducing computational complexity by 90% (87.5 to 7.95 TFLOPS), delivering 4x faster inference on the H100 GPU (19.0 to 4.3 sec), and 7x faster inference on the Intel Xeon Gold CPU (1710 to 230 seconds).


[27] 2501.10827

Integrating Expert and Physics Knowledge for Modeling Heat Load in District Heating Systems

New residential neighborhoods are often supplied with heat via district heating systems (DHS). Improving the energy efficiency of a DHS is critical for increasing sustainability and satisfying user requirements. In this paper, we present HELIOS, a dedicated artificial intelligence (AI) model designed specifically for modeling the heat load in DHS. HELIOS leverages a combination of established physical principles and expert knowledge, resulting in superior performance compared to existing state-of-the-art models. HELIOS is explainable, enabling enhanced accountability and traceability in its predictions. We evaluate HELIOS against ten state-of-the-art data-driven models in modeling the heat load in a DHS case study in the Netherlands. HELIOS emerges as the top-performing model while maintaining complete accountability. The applications of HELIOS extend beyond the present case study, potentially supporting the adoption of AI by DHS and contributing to sustainable energy management on a larger scale.


[28] 2501.10829

Mathematical model of parameters relevance in adaptive level-crossing sampling for electrocardiogram signals

Digital acquisition of bio-signals has been mostly dominated by uniform time sampling following the Nyquist theorem. However, in recent years, new approaches have emerged, focused on sampling a signal only when certain events happen. Currently, the most prominent of these approaches is Level Crossing (LC) sampling. Conventional level crossing analog-to-digital converters (LC-ADC) are often designed to make use of statically defined and uniformly spaced levels. However, a different positioning of the levels, optimized for bio-signals monitoring, can potentially lead to better performing solutions. In this work, we compare multiple LC-level definitions, including statically defined (uniform and logarithmic) configurations and optimization-driven designs (randomized and Bayesian optimization), assessing their ability to maintain signal fidelity while minimizing the sampling rate. In this paper, we analyze the performance of these different methodologies, which is evaluated using the root mean square error (RMSE), the sampling reduction factor (SRF) -- a metric evaluating the sampling compression ratio -- , and error per event metrics to gauge the trade-offs between signal fidelity and data compression. Our findings reveal that optimization-driven LC-sampling, particularly those using Bayesian methods, achieve a lower RMSE without substantially impacting the error per event compared to static configurations, but at the cost of an increase in the sampling rate.


[29] 2501.10839

Systems Engineering for Autonomous Vehicles; Supervising AI using Large Language Models (SSuperLLM)

Generative Artificial Intelligence (GAI) and the idea to use hierarchical models has been around for some years now. GAI has proved to be an extremely useful tool for Autonomous Vehicles (AVs). AVs need to perform robustly in their environment. Thus the AV behavior and short-term trajectory planning needs to be: a) designed and architected using safeguarding and supervisory systems and b) verified using proper Systems Engineering (SysEng) Principles. Can AV Systems Engineering also use Large Language Models (LLM) to help Autonomous vehicles (AV) development? This reader-friendly paper advocates the use of LLMs in 1) requirements (Reqs) development and 2) Reqs verification and 3) provides a proof-of-concept of AV supervisory control. The latter uses a simulation environment of a simple planar (bicycle) vehicle dynamics model and a Linear Quadratic Regulator (LQR) control with an LLM Application Interface (API). The Open-Source simulation SW is available from the author accessible to the readers so that they can engage into the AV stack, LLM API and rules, SysEng and Reqs and fundamental vehicle dynamics and control.


[30] 2501.10842

BOOST: Microgrid Sizing using Ordinal Optimization

The transition to sustainable energy systems has highlighted the critical need for efficient sizing of renewable energy resources in microgrids. In particular, designing photovoltaic (PV) and battery systems to meet residential loads is challenging due to trade-offs between cost, reliability, and environmental impact. While previous studies have employed dynamic programming and heuristic techniques for microgrid sizing, these approaches often fail to balance computational efficiency and accuracy. In this work, we propose BOOST, or Battery-solar Ordinal Optimization Sizing Technique, a novel framework for optimizing the sizing of PV and battery components in microgrids. Ordinal optimization enables computationally efficient evaluations of potential designs while preserving accuracy through robust ranking of solutions. To determine the optimal operation of the system at any given time, we introduce a mixed-integer linear programming (MILP) approach, which achieves lower costs than the commonly used dynamic programming methods. Our numerical experiments demonstrate that the proposed framework identifies optimal designs that achieve a levelized cost of energy (LCOE) as low as 8.84 cents/kWh, underscoring its potential for cost-effective microgrid design. The implications of our work are significant: BOOST provides a scalable and accurate methodology for integrating renewable energy into residential microgrids, addressing economic and environmental goals simultaneously.


[31] 2501.10851

Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction

Reconstructing MR images using deep neural networks from undersampled k-space data without using fully sampled training references offers significant value in practice, which is a self-supervised regression problem calling for effective prior knowledge and supervision. The Siamese architectures are motivated by the definition "invariance" and shows promising results in unsupervised visual representative learning. Building homologous transformed images and avoiding trivial solutions are two major challenges in Siamese-based self-supervised model. In this work, we explore Siamese architecture for MRI reconstruction in a self-supervised training fashion called SiamRecon. We show the proposed approach mimics an expectation maximization algorithm. The alternative optimization provide effective supervision signal and avoid collapse. The proposed SiamRecon achieves the state-of-the-art reconstruction accuracy in the field of self-supervised learning on both single-coil brain MRI and multi-coil knee MRI.


[32] 2501.10859

Which price to pay? Auto-tuning building MPC controller for optimal economic cost

Model predictive control (MPC) controller is considered for temperature management in buildings but its performance heavily depends on hyperparameters. Consequently, MPC necessitates meticulous hyperparameter tuning to attain optimal performance under diverse contracts. However, conventional building controller design is an open-loop process without critical hyperparameter optimization, often leading to suboptimal performance due to unexpected environmental disturbances and modeling errors. Furthermore, these hyperparameters are not adapted to different pricing schemes and may lead to non-economic operations. To address these issues, we propose an efficient performance-oriented building MPC controller tuning method based on a cutting-edge efficient constrained Bayesian optimization algorithm, CONFIG, with global optimality guarantees. We demonstrate that this technique can be applied to efficiently deal with real-world DSM program selection problems under customized black-box constraints and objectives. In this study, a simple MPC controller, which offers the advantages of reduced commissioning costs, enhanced computational efficiency, was optimized to perform on a comparable level to a delicately designed and computationally expensive MPC controller. The results also indicate that with an optimized simple MPC, the monthly electricity cost of a household can be reduced by up to 26.90% compared with the cost when controlled by a basic rule-based controller under the same constraints. Then we compared 12 real electricity contracts in Belgium for a household family with customized black-box occupant comfort constraints. The results indicate a monthly electricity bill saving up to 20.18% when the most economic contract is compared with the worst one, which again illustrates the significance of choosing a proper electricity contract.


[33] 2501.10865

Generalized Spatial Modulation Aided Affine Frequency Division Multiplexing

Generalized spatial modulation-aided affine frequency division multiplexing (GSM-AFDM) is conceived for reliable multiple-input multiple-output (MIMO) communications over doubly selective channels. We commence by proposing several low-complexity detectors for large-scale GSM-AFDM systems. Specifically, we introduce the linear minimum mean square error (LMMSE) equalizer-based maximum likelihood detector (LMMSE-MLD). By exploiting the GSM properties, we then derive the LMMSE-based transmit-antenna activation pattern (TAP) check-based log-likelihood ratio detector (LMMSE-TC-LLRD). In addition, we propose a pair of new detectors, namely the greedy residual check detector (GRCD) and the reduced space check detector (RSCD). We also derive a bit error rate (BER) upper-bound by considering the MLD. Our simulation results demonstrate that 1) the BER upper bound derived is tight for moderate to high signal-to-noise ratios (SNRs), 2) the proposed GSM-AFDM achieves lower BER than its conventional counterparts, and 3) the conceived detectors strike a compelling trade-off between the BER and complexity.


[34] 2501.10878

A Novel Hybrid Precoder With Low-Resolution Phase Shifters and Fronthaul Capacity Limitation

In massive MIMO systems, fully digital precoding offers high performance but has significant implementation complexity and energy consumption, particularly at millimeter frequencies and beyond. Hybrid analog-digital architectures provide a practical alternative by reducing the number of radio frequency (RF) chains while retaining performance in spatially sparse multipath scenarios. However, most hybrid precoder designs assume ideal, infinite-resolution analog phase shifters, which are impractical in real-world scenarios. Another practical constraint is the limited fronthaul capacity between the baseband processor and array, implying that each entry of the digital precoder must be picked from a finite set of quantization labels. To minimize the sum rate degradation caused by quantized analog and digital precoders, we propose novel designs inspired by the sphere decoding (SD) algorithm. We demonstrate numerically that our proposed designs outperform traditional methods, ensuring minimal sum rate loss in hybrid precoding systems with low-resolution phase shifters and limited fronthaul capacity.


[35] 2501.10891

OpenEarthMap-SAR: A Benchmark Synthetic Aperture Radar Dataset for Global High-Resolution Land Cover Mapping

High-resolution land cover mapping plays a crucial role in addressing a wide range of global challenges, including urban planning, environmental monitoring, disaster response, and sustainable development. However, creating accurate, large-scale land cover datasets remains a significant challenge due to the inherent complexities of geospatial data, such as diverse terrain, varying sensor modalities, and atmospheric conditions. Synthetic Aperture Radar (SAR) imagery, with its ability to penetrate clouds and capture data in all-weather, day-and-night conditions, offers unique advantages for land cover mapping. Despite these strengths, the lack of benchmark datasets tailored for SAR imagery has limited the development of robust models specifically designed for this data modality. To bridge this gap and facilitate advancements in SAR-based geospatial analysis, we introduce OpenEarthMap-SAR, a benchmark SAR dataset, for global high-resolution land cover mapping. OpenEarthMap-SAR consists of 1.5 million segments of 5033 aerial and satellite images with the size of 1024$\times$1024 pixels, covering 35 regions from Japan, France, and the USA, with partially manually annotated and fully pseudo 8-class land cover labels at a ground sampling distance of 0.15--0.5 m. We evaluated the performance of state-of-the-art methods for semantic segmentation and present challenging problem settings suitable for further technical development. The dataset also serves the official dataset for IEEE GRSS Data Fusion Contest Track I. The dataset has been made publicly available at https://zenodo.org/records/14622048.


[36] 2501.10908

A Note on the Conversion of Nonnegative Integers to the Canonical Signed-digit Representation

This note addresses the signed-digit representation of non-negative integer binary numbers. We review and revisit popular literature methods for canonical signed-digit representation. A method based on string substitution is discussed.


[37] 2501.10941

Synesthesia of Machines (SoM)-Aided FDD Precoding with Sensing Heterogeneity: A Vertical Federated Learning Approach

High complexity in precoding design for frequency division duplex systems necessitates streamlined solutions. Guided by Synesthesia of Machines (SoM), this paper introduces a heterogeneous multi-vehicle, multi-modal sensing aided precoding scheme within a vertical federated learning (VFL) framework, which significantly minimizes pilot sequence length while optimizing the system's sum rate. We address the challenges posed by local data heterogeneity due to varying on-board sensor configurations through a meticulously designed VFL training procedure. To extract valuable channel features from multi-modal sensing, we employ three distinct data preprocessing methods that convert raw data into informative representations relevant for precoding. Additionally, we propose an online training strategy based on VFL framework, enabling the scheme to adapt dynamically to fluctuations in user numbers. Numerical results indicate that our approach, utilizing short pilot sequences, closely approximates the performance of traditional optimization methods with perfect channel state information.


[38] 2501.10952

Ambient Backscatter Communication in LTE Uplink Sounding Reference Signal

Ambient Internet of Things (AIoT), recently standardized by the 3rd Generation Partnership Project (3GPP), demands a low-power wide-area communication solution that operates several orders of magnitude below the power requirements of existing 3GPP specifications. Ambient backscatter communication (AmBC) is considered as a competitive potential technique by harvesting energy from the ambient RF signal. This paper considers a symbiotic AmBC into Long Term Evolution (LTE) cellular system uplink. Leveraging by LTE uplink channel estimation ability, AIoT conveys its own message to Base Station (BS) by modulating backscatter path. We explore the detector design, analyze the error performance of the proposed scheme, provide exact expression and its Guassian approximation for the error probability. We corroborate the receiver error performance by Monte Carlo simulation. Analysis of communication range reveals AmBC achieves a reasonable BER of order of magnitude $10^{-2}$ within four times wavelength reading distance. In addition, a AmBC prototype in LTE uplink confirms the its feasibility. The over-the-air experiment results validate theoretical analysis. Hence, the proposed AmBC approach enables AIoT deployment with minimal changes to the LTE system.


[39] 2501.11014

Transfer Learning Strategies for Pathological Foundation Models: A Systematic Evaluation in Brain Tumor Classification

Foundation models pretrained on large-scale pathology datasets have shown promising results across various diagnostic tasks. Here, we present a systematic evaluation of transfer learning strategies for brain tumor classification using these models. We analyzed 252 cases comprising five major tumor types: glioblastoma, astrocytoma, oligodendroglioma, primary central nervous system lymphoma, and metastatic tumors. Comparing state-of-the-art foundation models with conventional approaches, we found that foundation models demonstrated robust classification performance with as few as 10 patches per case, challenging the traditional assumption that extensive per-case image sampling is necessary. Furthermore, our evaluation revealed that simple transfer learning strategies like linear probing were sufficient, while fine-tuning often degraded model performance. These findings suggest a paradigm shift from extensive data collection to efficient utilization of pretrained features, providing practical implications for implementing AI-assisted diagnosis in clinical pathology.


[40] 2501.11028

Few-shot Human Motion Recognition through Multi-Aspect mmWave FMCW Radar Data

Radar human motion recognition methods based on deep learning models has been a heated spot of remote sensing in recent years, yet the existing methods are mostly radial-oriented. In practical application, the test data could be multi-aspect and the sample number of each motion could be very limited, causing model overfitting and reduced recognition accuracy. This paper proposed channel-DN4, a multi-aspect few-shot human motion recognition method. First, local descriptors are introduced for a precise classification metric. Moreover, episodic training strategy was adopted to reduce model overfitting. To utilize the invariant sematic information in multi-aspect conditions, we considered channel attention after the embedding network to obtain precise implicit high-dimensional representation of sematic information. We tested the performance of channel-DN4 and methods for comparison on measured mmWave FMCW radar data. The proposed channel-DN4 produced competitive and convincing results, reaching the highest 87.533% recognition accuracy in 3-way 10-shot condition while other methods suffer from overfitting. Codes are available at: https://github.com/MountainChenCad/channel-DN4


[41] 2501.11062

Design and Prototyping of Filtering Active STAR-RIS with Adjustable Power Splitting

Reconfigurable Intelligent Surfaces (RISs) have emerged as a transformative technology for next-generation wireless communication systems, offering unprecedented control over electromagnetic wave propagation. In particular, Simultaneously Transmitting and Reflecting RISs (STAR-RISs) have garnered significant attention due to their full-space coverage. This paper presents an active STAR-RIS, which enables independent control of both transmission and reflection phases and features out-of-band harmonic suppression. Unlike the traditional passive RIS, the proposed design integrates active amplification to overcome the inherent passive losses, significantly enhancing signal strength and system performance. Additionally, the system supports dynamic power allocation between transmission and reflection modes, providing greater flexibility to meet diverse communication demands in complex propagation environments. The versatility of the design is further validated by extending the Radar Cross Section (RCS)-based path loss model to the STAR-RIS. This design improves efficiency, flexibility, and adaptability, offering a promising solution for future wireless communication systems, particularly in scenarios requiring simultaneous control of transmission and reflection signals.


[42] 2501.11093

Channel Sounding Using Multiplicative Arrays Based on Successive Interference Cancellation Principle

Ultra-massive multiple-input and multiple-output (MIMO) systems have been seen as the key radio technology for the advancement of wireless communication systems, due to its capability to better utilize the spatial dimension of the propagation channels. Channel sounding is essential for developing accurate and realistic channel models for the massive MIMO systems. However, channel sounding with large-scale antenna systems has faced significant challenges in practice. The real antenna array based (RAA) sounder suffers from high complexity and cost, while virtual antenna array (VAA) solutions are known for its long measurement time. Notably, these issues will become more pronounced as the antenna array configuration gets larger for future radio systems. In this paper, we propose the concept of multiplicative array (MA) for channel sounding applications to achieve large antenna aperture size with reduced number of required antenna elements. The unique characteristics of the MA are exploited for wideband spatial channel sounding purposes, supported by both one-path and multi-path numerical simulations. To address the fake paths and distortion in the angle delay profile issues inherent for MA in multipath channel sounding, a novel channel parameter estimation algorithm for MA based on successive interference cancellation (SIC) principle is proposed. Both numerical simulations and experimental validation results are provided to demonstrate the effectiveness and robustness of the proposed SIC algorithm for the MA. This research contributes significantly to the channel sounding and characterization of massive MIMO systems for future applications.


[43] 2501.11147

Ultrasonic monitoring of carbonation in Portland cements

Chemical reactions resulting from the ingress of carbonates into the cement matrix modify the properties of its pore solution, as well as its pore distribution and size. These changes lead to corrosion of the steel in reinforced concrete. The nature of conventional testing for the estimation of carbonation in cement-based materials is time-consuming and destructive. This paper presents a set of non-destructive ultrasound-based indexes, obtained solely from non-linear and linear analyses of ultrasonic signals, for measuring the carbonation of Portland cement pastes. Class 30RS cement pastes with three water/cement ratios by weight (0.4, 0.5, and 0.6) were considered. Carbonation was carried out for 120 days with a constant CO2 level of 4% by volume under controlled temperature and humidity, considering a unidirectional carbonation, parallel to the longitudinal axis of the samples. The level of carbonation was validated by FTIR measurements. From these analyses, different indexes with high correlation were obtained, estimated only from the ultrasonic signals and as a function of the days of exposure to carbonation, as well as of the percentage of carbonation. Further study is required for the evaluation of the reliability of these promising indexes for the determination of carbonation in cement-based materials.


[44] 2501.11188

Global Attitude Synchronization for Multi-agent Systems on SO(3)

In this paper, we address the problem of attitude synchronization for a group of rigid body systems evolving on SO(3). The interaction among these systems is modeled through an undirected, connected, and acyclic graph topology. First, we present an almost global continuous distributed attitude synchronization scheme with rigorously proven stability guarantees. Thereafter, we propose two global distributed hybrid attitude synchronization schemes on SO(3). The first scheme is a hybrid control law that leverages angular velocities and relative orientations to achieve global alignment to a common orientation. The second scheme eliminates the dependence on angular velocities by introducing dynamic auxiliary variables, while ensuring global asymptotic attitude synchronization. This velocity-free control scheme relies exclusively on attitude information. Simulation results are provided to illustrate the effectiveness of the proposed distributed attitude synchronization schemes.


[45] 2501.11196

Enhancing Brain Tumor Segmentation Using Channel Attention and Transfer learning

Accurate and efficient segmentation of brain tumors is critical for diagnosis, treatment planning, and monitoring in clinical practice. In this study, we present an enhanced ResUNet architecture for automatic brain tumor segmentation, integrating an EfficientNetB0 encoder, a channel attention mechanism, and an Atrous Spatial Pyramid Pooling (ASPP) module. The EfficientNetB0 encoder leverages pre-trained features to improve feature extraction efficiency, while the channel attention mechanism enhances the model's focus on tumor-relevant features. ASPP enables multiscale contextual learning, crucial for handling tumors of varying sizes and shapes. The proposed model was evaluated on two benchmark datasets: TCGA LGG and BraTS 2020. Experimental results demonstrate that our method consistently outperforms the baseline ResUNet and its EfficientNet variant, achieving Dice coefficients of 0.903 and 0.851 and HD95 scores of 9.43 and 3.54 for whole tumor and tumor core regions on the BraTS 2020 dataset, respectively. compared with state-of-the-art methods, our approach shows competitive performance, particularly in whole tumor and tumor core segmentation. These results indicate that combining a powerful encoder with attention mechanisms and ASPP can significantly enhance brain tumor segmentation performance. The proposed approach holds promise for further optimization and application in other medical image segmentation tasks.


[46] 2501.11221

Finding Reproducible and Prognostic Radiomic Features in Variable Slice Thickness Contrast Enhanced CT of Colorectal Liver Metastases

Establishing the reproducibility of radiomic signatures is a critical step in the path to clinical adoption of quantitative imaging biomarkers; however, radiomic signatures must also be meaningfully related to an outcome of clinical importance to be of value for personalized medicine. In this study, we analyze both the reproducibility and prognostic value of radiomic features extracted from the liver parenchyma and largest liver metastases in contrast enhanced CT scans of patients with colorectal liver metastases (CRLM). A prospective cohort of 81 patients from two major US cancer centers was used to establish the reproducibility of radiomic features extracted from images reconstructed with different slice thicknesses. A publicly available, single-center cohort of 197 preoperative scans from patients who underwent hepatic resection for treatment of CRLM was used to evaluate the prognostic value of features and models to predict overall survival. A standard set of 93 features was extracted from all images, with a set of eight different extractor settings. The feature extraction settings producing the most reproducible, as well as the most prognostically discriminative feature values were highly dependent on both the region of interest and the specific feature in question. While the best overall predictive model was produced using features extracted with a particular setting, without accounting for reproducibility, (C-index = 0.630 (0.603--0.649)) an equivalent-performing model (C-index = 0.629 (0.605--0.645)) was produced by pooling features from all extraction settings, and thresholding features with low reproducibility ($\mathrm{CCC} \geq 0.85$), prior to feature selection. Our findings support a data-driven approach to feature extraction and selection, preferring the inclusion of many features, and narrowing feature selection based on reproducibility when relevant data is available.


[47] 2501.11230

Optimum Power-Subcarrier Allocation and Time-Sharing in Multicarrier NOMA Uplink

Currently used resource allocation methods for uplink multicarrier non-orthogonal multiple access (MC-NOMA) systems have multiple shortcomings. Current approaches either allocate the same power across all subcarriers to a user, or use heuristic-based near-far, strong channel-weak channel user grouping to assign the decoding order for successive interference cancellation (SIC). This paper proposes a novel optimal power-subcarrier allocation for uplink MC-NOMA. This new allocation achieves the optimal power-subcarrier allocation as well as the optimal SIC decoding order. Furthermore, the proposed method includes a time-sharing algorithm that dynamically alters the decoding orders of the participating users to achieve the required data rates, even in cases where any single decoding order fails to do so. Extensive experimental evaluations show that the new method achieves higher sum data rates and lower power consumption compared to current NOMA methods.


[48] 2501.11243

Energy Consumption Reduction for UAV Trajectory Training : A Transfer Learning Approach

The advent of 6G technology demands flexible, scalable wireless architectures to support ultra-low latency, high connectivity, and high device density. The Open Radio Access Network (O-RAN) framework, with its open interfaces and virtualized functions, provides a promising foundation for such architectures. However, traditional fixed base stations alone are not sufficient to fully capitalize on the benefits of O-RAN due to their limited flexibility in responding to dynamic network demands. The integration of Unmanned Aerial Vehicles (UAVs) as mobile RUs within the O-RAN architecture offers a solution by leveraging the flexibility of drones to dynamically extend coverage. However, UAV operating in diverse environments requires frequent retraining, leading to significant energy waste. We proposed transfer learning based on Dueling Double Deep Q network (DDQN) with multi-step learning, which significantly reduces the training time and energy consumption required for UAVs to adapt to new environments. We designed simulation environments and conducted ray tracing experiments using Wireless InSite with real-world map data. In the two simulated environments, training energy consumption was reduced by 30.52% and 58.51%, respectively. Furthermore, tests on real-world maps of Ottawa and Rosslyn showed energy reductions of 44.85% and 36.97%, respectively.


[49] 2501.11246

Unlocking the Potential: A Novel Tool for Assessing Untapped Micro-Pumped Hydro Energy Storage Systems in Michigan

This study presents an innovative tool designed to unlock the potential of Michigan's lakes and dams for applications such as water resource management and renewable energy generation. Given Michigan's relatively flat landscape, the focus is on systems that could serve as micro-hydro energy storage solutions. To ensure accuracy and reliability, the tool incorporates extensive data gathered from authorized sources, covering more than 420 water facilities and potential reservoirs in the state. These data are used as part of a case study to evaluate the tool's capabilities. Key parameters assessed include horizontal and vertical distances (head), volume, and the total storage capacity of each reservoir, measured in GWh. By analyzing these factors, the tool determines the suitability of various lakes and dams for hydroelectric power generation, and other uses based on the horizontal and vertical threshold distances. Its robust assessment framework integrates these metrics to comprehensively evaluate each site's potential. The tool's friendly interface and advanced data visualization features make the findings easy to interpret, facilitating optimal resource utilization and informed decision-making for state authorities. Hence, this tool represents a meaningful advancement in managing Michigan's water resources sustainably, promoting environmentally friendly practices, and supporting economic development.


[50] 2501.11253

How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?

The pre-training and fine-tuning paradigm has become prominent in transfer learning. For example, if the model is pre-trained on ImageNet and then fine-tuned to PASCAL, it can significantly outperform that trained on PASCAL from scratch. While ImageNet pre-training has shown enormous success, it is formed in 2D, and the learned features are for classification tasks; when transferring to more diverse tasks, like 3D image segmentation, its performance is inevitably compromised due to the deviation from the original ImageNet context. A significant challenge lies in the lack of large, annotated 3D datasets rivaling the scale of ImageNet for model pre-training. To overcome this challenge, we make two contributions. Firstly, we construct AbdomenAtlas 1.1 that comprises 9,262 three-dimensional computed tomography (CT) volumes with high-quality, per-voxel annotations of 25 anatomical structures and pseudo annotations of seven tumor types. Secondly, we develop a suite of models that are pre-trained on our AbdomenAtlas 1.1 for transfer learning. Our preliminary analyses indicate that the model trained only with 21 CT volumes, 672 masks, and 40 GPU hours has a transfer learning ability similar to the model trained with 5,050 (unlabeled) CT volumes and 1,152 GPU hours. More importantly, the transfer learning ability of supervised models can further scale up with larger annotated datasets, achieving significantly better performance than preexisting pre-trained models, irrespective of their pre-training methodologies or data sources. We hope this study can facilitate collective efforts in constructing larger 3D medical datasets and more releases of supervised pre-trained models.


[51] 2501.11261

The Expected Peak-to-Average Power Ratio of White Gaussian Noise in Sampled I/Q Data

One of the fundamental endeavors in radio frequency (RF) metrology is to measure the power of signals, where a common aim is to estimate the peak-to-average power ratio (PAPR), which quantifies the ratio of the maximum (peak) to the mean value. For a finite number of discrete-time samples of baseband in-phase and quadrature (I/Q) white Gaussian noise (WGN) that are independent and identically distributed with zero mean, we derive a closed-form, exact formula for mean PAPR that is well-approximated by the natural logarithm of the number of samples plus Euler's constant. Additionally, we give related theoretical results for the mean crest factor. After comparing our main result to previously published approximate formulas, we examine how violations of the WGN assumptions in sampled I/Q data result in deviations from the expected value of PAPR. Finally, utilizing a measured RF I/Q acquisition, we illustrate how our formula for mean PAPR can be applied to spectral analysis with spectrograms to verify when measured RF emissions are WGN in a given frequency band.


[52] 2501.11266

Optimum Power Allocation for Low Rank Wi-Fi Channels: A Comparison with Deep RL Framework

Upcoming Augmented Reality (AR) and Virtual Reality (VR) systems require high data rates ($\geq$ 500 Mbps) and low power consumption for seamless experience. With an increasing number of subscribing users, the total number of antennas across all transmitting users far exceeds the number of antennas at the access point (AP). This results in a low rank wireless channel, presenting a bottleneck for uplink communication systems. The current uplink systems that use orthogonal multiple access (OMA) and the proposed non-orthogonal multiple access (NOMA), fail to achieve the required data rates / power consumption under predominantly low rank channel scenarios. This paper introduces an optimal power sub carrier allocation algorithm for multi-carrier NOMA, named minPMAC, and an associated time-sharing algorithm that adaptively changes successive interference cancellation decoding orders to maximize sum data rates in these low rank channels. This Lagrangian based optimization technique, although globally optimum, is prohibitive in terms of runtime, proving inefficient for real-time scenarios. Hence, we propose a novel near-optimal deep reinforcement learning-based energy sum optimization (DRL-minPMAC) which achieves real-time efficiency. Extensive experimental evaluations show that minPMAC achieves 28\% and 39\% higher data rates than NOMA and OMA baselines. Furthermore, the proposed DRL-minPMAC runs ~5 times faster than minPMAC and achieves 83\% of the global optimum data rates in real time


[53] 2501.11274

SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation

Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed speaker encoders to extract target speaker clues, guiding the PSE model in isolating the desired speech. However, these approaches suffer from significant model complexity and often underutilize enrollment speaker information, limiting the potential performance of the PSE model. To address these limitations, we propose a novel Speaker Encoder-Free PSE network, termed SEF-PNet, which fully exploits the information present in both the enrollment speech and noisy mixtures. SEF-PNet incorporates two key innovations: Interactive Speaker Adaptation (ISA) and Local-Global Context Aggregation (LCA). ISA dynamically modulates the interactions between enrollment and noisy signals to enhance the speaker adaptation, while LCA employs advanced channel attention within the PSE encoder to effectively integrate local and global contextual information, thus improving feature learning. Experiments on the Libri2Mix dataset demonstrate that SEF-PNet significantly outperforms baseline models, achieving state-of-the-art PSE performance.


[54] 2501.11276

ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction

Alzheimer's disease (AD) is a common neurodegenerative disease among the elderly. Early prediction and timely intervention of its prodromal stage, mild cognitive impairment (MCI), can decrease the risk of advancing to AD. Combining information from various modalities can significantly improve predictive accuracy. However, challenges such as missing data and heterogeneity across modalities complicate multimodal learning methods as adding more modalities can worsen these issues. Current multimodal fusion techniques often fail to adapt to the complexity of medical data, hindering the ability to identify relationships between modalities. To address these challenges, we propose an innovative multimodal approach for predicting MCI conversion, focusing specifically on the issues of missing positron emission tomography (PET) data and integrating diverse medical information. The proposed incomplete triple-modal MCI conversion prediction network is tailored for this purpose. Through the missing modal generation module, we synthesize the missing PET data from the magnetic resonance imaging and extract features using specifically designed encoders. We also develop a channel aggregation module and a triple-modal co-attention fusion module to reduce feature redundancy and achieve effective multimodal data fusion. Furthermore, we design a loss function to handle missing modality issues and align cross-modal features. These components collectively harness multimodal data to boost network performance. Experimental results on the ADNI1 and ADNI2 datasets show that our method significantly surpasses existing unimodal and other multimodal models. Our code is available at https://github.com/justinhxy/ITFC.


[55] 2501.11307

SIG-SDP: Sparse Interference Graph-Aided Semidefinite Programming for Large-Scale Wireless Time-Sensitive Networking

Wireless time-sensitive networking (WTSN) is essential for Industrial Internet of Things. We address the problem of minimizing time slots needed for WTSN transmissions while ensuring reliability subject to interference constraints -- an NP-hard task. Existing semidefinite programming (SDP) methods can relax and solve the problem but suffer from high polynomial complexity. We propose a sparse interference graph-aided SDP (SIG-SDP) framework that exploits the interference's sparsity arising from attenuated signals between distant user pairs. First, the framework utilizes the sparsity to establish the upper and lower bounds of the minimum number of slots and uses binary search to locate the minimum within the bounds. Here, for each searched slot number, the framework optimizes a positive semidefinite (PSD) matrix indicating how likely user pairs share the same slot, and the constraint feasibility with the optimized PSD matrix further refines the slot search range. Second, the framework designs a matrix multiplicative weights (MMW) algorithm that accelerates the optimization, achieved by only sparsely adjusting interfering user pairs' elements in the PSD matrix while skipping the non-interfering pairs. We also design an online architecture to deploy the framework to adjust slot assignments based on real-time interference measurements. Simulations show that the SIG-SDP framework converges in near-linear complexity and is highly scalable to large networks. The framework minimizes the number of slots with up to 10 times faster computation and up to 100 times lower packet loss rates than compared methods. The online architecture demonstrates how the algorithm complexity impacts dynamic networks' performance.


[56] 2501.11330

Practical Modulo Sampling: Mitigating High-Frequency Components

Recovering signals within limited dynamic range (DR) constraints remains a central challenge for analog-to-digital converters (ADCs). To prevent data loss, an ADCs DR typically must exceed that of the input signal. Modulo sampling has recently gained attention as a promising approach for addressing DR limitations across various signal classes. However, existing methods often rely on ideal ADCs capable of capturing the high frequencies introduced by the modulo operator, which is impractical in real-world hardware applications. This paper introduces an innovative hardware-based sampling approach that addresses these high-frequency components using an analog mixer followed by a Low-Pass Filter (LPF). This allows the use of realistic ADCs, which do not need to handle frequencies beyond the intended sampling rate. Our method eliminates the requirement for high-specification ADCs and demonstrates that the resulting samples are equivalent to those from an ideal high-spec ADC. Consequently, any existing modulo recovery algorithm can be applied effectively. We present a practical hardware prototype of this approach, validated through both simulations and hardware recovery experiments. Using a recovery method designed to handle quantization noise, we show that our approach effectively manages high-frequency artifacts, enabling reliable modulo recovery with realistic ADCs. These findings confirm that our hardware solution not only outperforms conventional methods in high-precision settings but also demonstrates significant real-world applicability.


[57] 2501.11333

A Dynamic Improvement Framework for Vehicular Task Offloading

In this paper, the task offloading from vehicles with random velocities is optimized via a novel dynamic improvement framework. Particularly, in a vehicular network with multiple vehicles and base stations (BSs), computing tasks of vehicles are offloaded via BSs to an edge server. Due to the random velocities, the exact trajectories of vehicles cannot be predicted in advance. Hence, instead of deterministic optimization, the cell association, uplink time and throughput allocation of multiple vehicles in a period of task offloading are formulated as a finite-horizon Markov decision process. In the proposed solution framework, we first obtain a reference scheduling scheme of cell association, uplink time and throughput allocation via deterministic optimization at the very beginning. The reference scheduling scheme is then used to approximate the value functions of the Bellman's equations, and the actual scheduling action is determined in each time slot according to the current system state and approximate value functions. Thus, the intensive computation for value iteration in the conventional solution is eliminated. Moreover, a non-trivial average cost upper bound is provided for the proposed solution framework. In the simulation, the random trajectories of vehicles are generated from a high-fidelity traffic simulator. It is shown that the performance gain of the proposed scheduling framework over the baselines is significant.


[58] 2501.11338

Driver Behavior Soft-Sensor Based on Neurofuzzy Systems and Weighted Projection on Principal Components

This work has as main objective the development of a soft-sensor to classify, in real time, the behaviors of drivers when they are at the controls of a vehicle. Efficient classification of drivers' behavior while driving, using only the measurements of the sensors already incorporated in the vehicles and without the need to add extra hardware (smart phones, cameras, etc.), is a challenge. The main advantage of using only the data center signals of modern vehicles is economical. The classification of the driving behavior and the warning to the driver of dangerous behaviors without the need to add extra hardware (and their software) to the vehicle, would allow the direct integration of these classifiers into the current vehicles without incurring a greater cost in the manufacture of the vehicles and therefore be an added value. In this work, the classification is obtained based only on speed, acceleration and inertial measurements which are already present in many modern vehicles. The proposed algorithm is based on a structure made by several Neurofuzzy systems with the combination of projected data in components of various Principal Component Analysis. A comparison with several types of classical classifying algorithms has been made.


[59] 2501.11374

Linear ADRC is equivalent to PID with set-point weighting and measurement filter

We show that linear Active Disturbance-Rejection Control (ADRC) tuned using the "bandwidth method" is equivalent to PI(D) control with set-point weighting and a lowpass filter on the measurement signal. We also provide simple expressions that make it possible to implement linear ADRC for first and second-order systems using commonplace two degree-of-freedom PID implementations. The expressions are equivalent to ADRC in the response from measurements, and a slight approximation in the response from references.


[60] 2501.11385

Sparse Incremental Aggregation in Satellite Federated Learning

This paper studies Federated Learning (FL) in low Earth orbit (LEO) satellite constellations, where satellites are connected via intra-orbit inter-satellite links (ISLs) to their neighboring satellites. During the FL training process, satellites in each orbit forward gradients from nearby satellites, which are eventually transferred to the parameter server (PS). To enhance the efficiency of the FL training process, satellites apply in-network aggregation, referred to as incremental aggregation. In this work, the gradient sparsification methods from [1] are applied to satellite scenarios to improve bandwidth efficiency during incremental aggregation. The numerical results highlight an increase of over 4 x in bandwidth efficiency as the number of satellites in the orbital plane increases.


[61] 2501.11389

Resilience of LTE-A/5G-NR links Against Transient Electromagnetic Interference

This paper presents a comparative analysis of long-term evolution advanced (LTE-A) and fifth-generation new radio (5G-NR), focusing on the effects of Transient Electromagnetic Interference (EMI) caused by catenary-pantograph contact in a railway environment. We developed a software-defined radio (SDR)-based prototype for the performance evaluation of LTE-A and 5G-NR links in the presence of transient interference. The results show that both links experience considerable degradation due to interference at different center frequencies. Performance degradation is proportional to the gain of interference. The measurement results show that both links experience considerable performance degradation in the presence of transient EM interference.


[62] 2501.11392

Optimized Beamforming for Joint Bistatic Positioning and Monostatic Sensing

We investigate the performance tradeoff between \textit{bistatic positioning (BP)} and \textit{monostatic sensing (MS)} in a multi-input multi-output orthogonal frequency division multiplexing scenario. We derive the Cram\'er-Rao bounds (CRBs) for BP at the user equipment and MS at the base station. To balance these objectives, we propose a multi-objective optimization framework that optimizes beamformers using a weighted-sum CRB approach, ensuring the weak Pareto boundary. We also introduce two mismatch-minimizing approaches, targeting beamformer mismatch and variance matrix mismatch, and solve them distinctly. Numerical results demonstrate the performance tradeoff between BP and MS, revealing significant gains with the proposed methods and highlighting the advantages of minimizing the weighted-sum mismatch of variance matrices.


[63] 2501.11406

Efficient Reduction of Interconnected Subsystem Models using Abstracted Environments

We present two frameworks for structure-preserving model order reduction of interconnected subsystems, improving tractability of the reduction methods while ensuring stability and accuracy bounds of the reduced interconnected model. Instead of reducing each subsystem independently, we take a low-order abstraction of its environment into account to better capture the dynamics relevant to the external input-output behaviour of the interconnected system, thereby increasing accuracy of the reduced interconnected model. This approach significantly reduces the computational costs of reduction by abstracting instead of fully retaining the environment. The two frameworks differ in how they generate these abstracted environments: one abstracts the environment as a whole, whereas the other abstracts each individual subsystem. By relating low-level errors introduced by reduction and abstraction to the resulting high-level error on the interconnected system, we are able to translate high-level accuracy requirements (on the reduced interconnected system) to low-level specifications (on abstraction and reduction errors) using techniques from robust performance analysis. By adhering to these low-level specifications, restricting the introduced low-level errors, both frameworks automatically guarantee the accuracy and stability of the reduced interconnected system. We demonstrate the effectiveness of both frameworks by applying them to a structural dynamics model of a two-stroke wafer stage, achieving improved accuracy and/or greater reduction compared to an existing method from literature.


[64] 2501.11453

Integrate-and-Fire from a Mathematical and Signal Processing Perspective

Integrate-and-Fire (IF) is an idealized model of the spike-triggering mechanism of a biological neuron. It is used to realize the bio-inspired event-based principle of information processing in neuromorphic computing. We show that IF is closely related to the concept of Send-on-Delta (SOD) as used in threshold-based sampling. It turns out that the IF model can be adjusted in a way that SOD can be understood as differential version of IF. As a result, we gain insight into the underlying metric structure based on the Alexiewicz norm with consequences for clarifying the underlying signal space including bounded integrable signals with superpositions of finitely many Dirac impulses, the identification of a maximum sparsity property, error bounds for signal reconstruction and a characterization in terms of sparse regularization.


[65] 2501.11460

Efficient Multi-Source Localization in Near-Field Using only Angular Domain MUSIC

The localization of multiple signal sources using sensor arrays has been a long-standing research challenge. While numerous solutions have been developed, signal space methods like MUSIC and ESPRIT have gained widespread popularity. As sensor arrays grow in size, sources are frequently located in the near-field region. The standard MUSIC algorithm can be adapted to locate these sources by performing a 3D search over both the distance and the angles of arrival (AOA), including azimuth and elevation, though this comes with significant computational complexity. To address this, a modified version of MUSIC has been developed to decouple the AoA and distance, enabling sequential estimation of these parameters and reducing computational demands. However, this approach suffers from reduced accuracy. To maintain the accuracy of MUSIC while minimizing complexity, this paper proposes a novel method that exploits angular variation across the array aperture, eliminating the need for a grid search over distance. The proposed method divides the large aperture into smaller sections, with each focusing on estimating the angles of arrival. These angles are then triangulated to localize the sources in the near-field of the large aperture. Numerical simulations show that this approach not only surpasses the Modified MUSIC algorithm in terms of mean absolute error but also achieves accuracy comparable to standard MUSIC, all while greatly reducing computational complexity-370 times in our simulation scenario.


[66] 2501.11468

LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations

Emotion recognition in conversations (ERC) is challenging due to the multimodal nature of the emotion expression. In this paper, we propose to pretrain a text-based recognition model from unsupervised speech transcripts with LLM guidance. These transcriptions are obtained from a raw speech dataset with a pre-trained ASR system. A text LLM model is queried to provide pseudo-labels for these transcripts, and these pseudo-labeled transcripts are subsequently used for learning an utterance level text-based emotion recognition model. We use the utterance level text embeddings for emotion recognition in conversations along with speech embeddings obtained from a recently proposed pre-trained model. A hierarchical way of training the speech-text model is proposed, keeping in mind the conversational nature of the dataset. We perform experiments on three established datasets, namely, IEMOCAP, MELD, and CMU- MOSI, where we illustrate that the proposed model improves over other benchmarks and achieves state-of-the-art results on two out of these three datasets.


[67] 2501.11495

Discrete-Time Passivity-Based Control using Hermite-Obreschkoff Methods

The motivation for this paper is the implementation of nonlinear state feedback control, designed based on the continuous-time plant model, in a sampled control loop under relatively slow sampling. In previous work we have shown that using one-step predictions of the target dynamics with higher order integration schemes, together with possibly higher order input shaping, is a simple and effective way to increase the feasible sampling times until performance degradation and instability occur. In this contribution we present a unifying derivation for arbitrary orders of the previously used Lobatto IIIA collocation and Hermite interpolation schemes through the Hermite-Obreschkoff formula. We derive, moreover, an IDA-PBC controller for a magnetic levitation system, which requires a non-constant target interconnection matrix, and show experimental results.


[68] 2501.11511

Subjective and Objective Quality Assessment of Non-Uniformly Distorted Omnidirectional Images

Omnidirectional image quality assessment (OIQA) has been one of the hot topics in IQA with the continuous development of VR techniques, and achieved much success in the past few years. However, most studies devote themselves to the uniform distortion issue, i.e., all regions of an omnidirectional image are perturbed by the ``same amount'' of noise, while ignoring the non-uniform distortion issue, i.e., partial regions undergo ``different amount'' of perturbation with the other regions in the same omnidirectional image. Additionally, nearly all OIQA models are verified on the platforms containing a limited number of samples, which largely increases the over-fitting risk and therefore impedes the development of OIQA. To alleviate these issues, we elaborately explore this topic from both subjective and objective perspectives. Specifically, we construct a large OIQA database containing 10,320 non-uniformly distorted omnidirectional images, each of which is generated by considering quality impairments on one or two camera len(s). Then we meticulously conduct psychophysical experiments and delve into the influence of both holistic and individual factors (i.e., distortion range and viewing condition) on omnidirectional image quality. Furthermore, we propose a perception-guided OIQA model for non-uniform distortion by adaptively simulating users' viewing behavior. Experimental results demonstrate that the proposed model outperforms state-of-the-art methods. The source code is available at https://github.com/RJL2000/OIQAND.


[69] 2501.11512

Multitask Auxiliary Network for Perceptual Quality Assessment of Non-Uniformly Distorted Omnidirectional Images

Omnidirectional image quality assessment (OIQA) has been widely investigated in the past few years and achieved much success. However, most of existing studies are dedicated to solve the uniform distortion problem in OIQA, which has a natural gap with the non-uniform distortion problem, and their ability in capturing non-uniform distortion is far from satisfactory. To narrow this gap, in this paper, we propose a multitask auxiliary network for non-uniformly distorted omnidirectional images, where the parameters are optimized by jointly training the main task and other auxiliary tasks. The proposed network mainly consists of three parts: a backbone for extracting multiscale features from the viewport sequence, a multitask feature selection module for dynamically allocating specific features to different tasks, and auxiliary sub-networks for guiding the proposed model to capture local distortion and global quality change. Extensive experiments conducted on two large-scale OIQA databases demonstrate that the proposed model outperforms other state-of-the-art OIQA metrics, and these auxiliary sub-networks contribute to improve the performance of the proposed model. The source code is available at https://github.com/RJL2000/MTAOIQA.


[70] 2501.11520

Fundus Image Quality Assessment and Enhancement: a Systematic Review

As an affordable and convenient eye scan, fundus photography holds the potential for preventing vision impairment, especially in resource-limited regions. However, fundus image degradation is common under intricate imaging environments, impacting following diagnosis and treatment. Consequently, image quality assessment (IQA) and enhancement (IQE) are essential for ensuring the clinical value and reliability of fundus images. While existing reviews offer some overview of this field, a comprehensive analysis of the interplay between IQA and IQE, along with their clinical deployment challenges, is lacking. This paper addresses this gap by providing a thorough review of fundus IQA and IQE algorithms, research advancements, and practical applications. We outline the fundamentals of the fundus photography imaging system and the associated interferences, and then systematically summarize the paradigms in fundus IQA and IQE. Furthermore, we discuss the practical challenges and solutions in deploying IQA and IQE, as well as offer insights into potential future research directions.


[71] 2501.11522

Optimal Trajectory Control of Geometrically Exact Strings with Space-Time Finite Elements

In this contribution, we present a variational space-time formulation which generates an optimal feed-forward controller for geometrically exact strings. More concretely, the optimization problem is solved with an indirect approach, and the space-time finite element method translates the problem to a set of algebraic equations. Thereby, only the positional field and the corresponding adjoint variable field are approximated by continuous shape functions, which makes the discretization of a velocity field unnecessary. In addition, the variational formulation can be solved using commercial or open source finite element packages. The entire approach can also be interpreted as a multiple-shooting method for solving the optimality conditions based on the semi-discrete problem. The performance of our approach is demonstrated by a numerical test.


[72] 2501.11532

Early Stopping Bayesian Optimization for Controller Tuning

Manual tuning of performance-critical controller parameters can be tedious and sub-optimal. Bayesian Optimization (BO) is an increasingly popular practical alternative to automatically optimize controller parameters from few experiments. Standard BO practice is to evaluate the closed-loop performance of parameters proposed during optimization on an episode with a fixed length. However, fixed-length episodes can be wasteful. For example, continuing an episode where already the start shows undesirable behavior such as strong oscillations seems pointless. Therefore, we propose a BO method that stops an episode early if suboptimality becomes apparent before an episode is completed. Such early stopping results in partial observations of the controller's performance, which cannot directly be included in standard BO. We propose three heuristics to facilitate partially observed episodes in BO. Through five numerical and one hardware experiment, we demonstrate that early stopping BO can substantially reduce the time needed for optimization.


[73] 2501.11542

DLinear-based Prediction of Remaining Useful Life of Lithium-Ion Batteries: Feature Engineering through Explainable Artificial Intelligence

Accurate prediction of the Remaining Useful Life (RUL) of lithium-ion batteries is essential for ensuring safety, reducing maintenance costs, and optimizing usage. However, predicting RUL is challenging due to the nonlinear characteristics of the degradation caused by complex chemical reactions. Machine learning allows precise predictions by learning the latent functions of degradation relationships based on cycling behavior. This study introduces an accurate RUL prediction approach based on feature engineering and DLinear, applied to the dataset from NASA's Prognostics Center of Excellence. Among the 20 features generated from current, voltage, temperature, and time provided in this dataset, key features contributing to degradation are selected using Pearson correlation coefficient and Shapley values. Shapley value-based feature selection effectively reflects cell-to-cell variability, showing similar importance rankings across all cells. The DLinear-based RUL prediction using key features efficiently captures the time-series trend, demonstrating significantly better performance compared to Long Short-Term Memory and Transformer models.


[74] 2501.11583

Joint Optimization of Geometric and Probabilistic Constellation Shaping for OFDM-ISAC Systems

6G communications systems are expected to integrate radar-like sensing capabilities enabling novel use cases. However, integrated sensing and communications (ISAC) introduces a trade-off between communications and sensing performance because the optimal constellations for each task differ. In this paper, we compare geometric, probabilistic and joint constellation shaping for orthogonal frequency division multiplexing (OFDM)-ISAC systems using an autoencoder (AE) framework. We first derive the constellation-dependent detection probability and propose a novel loss function to include the sensing performance in the AE framework. Our simulation results demonstrate that constellation shaping enables a dynamic trade-off between communications and sensing. Depending on whether sensing or communications performance is prioritized, geometric or probabilistic constellation shaping is preferred. Joint constellation shaping combines the advantages of geometric and probabilistic shaping, significantly outperforming legacy modulation formats.


[75] 2501.11591

Integrated Long-range Sensing and Communications in Multi Target Scenarios using CP-OFDM

6G communication systems promise to deliver sensing capabilities by utilizing the orthogonal frequency division multiplexing (OFDM) communication signal for sensing. However, the cyclic prefix inherent in OFDM systems limits the sensing range, necessitating compensation techniques to detect small, distant targets like drones. In this paper, we show that state-of-the-art coherent compensation methods fail in scenarios involving multiple targets, resulting in an increased noise floor in the radar image. Our contributions include a novel multi target coherent compensation algorithm and a generalized signal-to-interference-and-noise ratio for multiple targets to evaluate the performance. Our algorithm achieves the same detection performance at long distances requiring only 3.6% of the radio resources compared to classical OFDM radar processing. This enables resource efficient sensing at long distances in multi target scenarios with legacy communications-only networks.


[76] 2501.11593

Optimal User and Target Scheduling, User-Target Pairing, and Low-Resolution Phase-Only Beamforming for ISAC Systems

We investigate the joint user and target scheduling, user-target pairing, and low-resolution phase-only beamforming design for integrated sensing and communications (ISAC). Scheduling determines which users and targets are served, while pairing specifies which users and targets are grouped into pairs. Additionally, the beamformers are designed using few-bit constant-modulus phase shifts. This resource allocation problem is a nonconvex mixed-integer nonlinear program (MINLP) and challenging to solve. To address it, we propose an exact mixed-integer linear program (MILP) reformulation, which leads to a globally optimal solution. Our results demonstrate the superiority of an optimal joint design compared to heuristic stage-wise approaches, which are highly sensitive to scenario characteristics.


[77] 2501.11594

Faster-Than-Nyquist Equalization with Convolutional Neural Networks

Faster-than-Nyquist (FTN) signaling aims at improving the spectral efficiency of wireless communication systems by exceeding the boundaries set by the Nyquist-Shannon sampling theorem. 50 years after its first introduction in the scientific literature, wireless communications have significantly changed, but spectral efficiency remains one of the key challenges. To adopt FTN signaling, inter-symbol interference (ISI) patterns need to be equalized at the receiver. Motivated by the pattern recognition capabilities of convolutional neural networks with skip connections, we propose such deep learning architecture for ISI equalization and symbol demodulation in FTN receivers. We investigate the performance of the proposed model considering quadrature phase shift keying modulation and low density parity check coding, and compare it to a set of benchmarks, including frequency-domain equalization, a quadratic-programming-based receiver, and an equalization scheme based on a deep neural network. We show that our receiver outperforms any benchmark, achieving error rates comparable to those in additive white Gaussian noise channel, and higher effective throughput, thanks to the increased spectral efficiency of FTN signaling. With a compression factor of 60% and code rate 3/4, the proposed model achieves a peak effective throughput of 2.5 Mbps at just 10dB of energy per bit over noise power spectral density ratio, with other receivers being limited by error floors due to the strong inter-symbol interference. To promote reproducibility in deep learning for wireless communications, our code is open source at the repository provided in the references.


[78] 2501.11626

DRL-Based Maximization of the Sum Cross-Layer Achievable Rate for Networks Under Jamming

In quasi-static wireless networks characterized by infrequent changes in the transmission schedules of user equipment (UE), malicious jammers can easily deteriorate network performance. Accordingly, a key challenge in these networks is managing channel access amidst jammers and under dynamic channel conditions. In this context, we propose a robust learning-based mechanism for channel access in multi-cell quasi-static networks under jamming. The network comprises multiple legitimate UEs, including predefined UEs (pUEs) with stochastic predefined schedules and an intelligent UE (iUE) with an undefined transmission schedule, all transmitting over a shared, time-varying uplink channel. Jammers transmit unwanted packets to disturb the pUEs' and the iUE's communication. The iUE's learning process is based on the deep reinforcement learning (DRL) framework, utilizing a residual network (ResNet)-based deep Q-Network (DQN). To coexist in the network and maximize the network's sum cross-layer achievable rate (SCLAR), the iUE must learn the unknown network dynamics while concurrently adapting to dynamic channel conditions. Our simulation results reveal that, with properly defined state space, action space, and rewards in DRL, the iUE can effectively coexist in the network, maximizing channel utilization and the network's SCLAR by judiciously selecting transmission time slots and thus avoiding collisions and jamming.


[79] 2501.11633

PSO-based Sliding Mode Current Control of Grid-Forming Inverter in Rotating Frame

The Grid-Forming Inverter (GFMI) is an emerging topic that is attracting significant attention from both academic and industrial communities, particularly in the area of control design. The Decoupled Average Model-based Sliding Mode Current Controller (DAM-SMC) has been used to address the need such as fast response, fixed switching frequency, and no overshoot to avoid exceeding current limits. Typically, the control parameters for DAM-SMC are chosen based on expert knowledge and certain assumptions. However, these parameters may not achieve optimized performance due to system dynamics and uncertainties. To address this, this paper proposes a Particle Swarm Optimization (PSO)-based DAM-SMC controller, which inherits the control laws from DAM-SMC but optimizes the control parameters offline using PSO. The main goal is to reduce chattering and achieve smaller tracking errors. The proposed method is compared with other metaheuristic optimization algorithms, such as Genetic Algorithm (GA) and Simulated Annealing (SA). Simulations are performed in MATLAB/Simulink across various scenarios to evaluate the effectiveness of the proposed controller. The proposed approach achieves a substantial reduction in convergence time, decreasing it by 86.36% compared to the GA and by 88.89% compared to SA. Furthermore, the tracking error is reduced by 11.61% compared to the conventional DAM-SMC algorithm. The robustness of the proposed method is validated under critical conditions, where plant and control model parameters varied by up to 40%.


[80] 2501.11646

CDMA/OTFS Sensing Outperforms Pure OTFS at the Same Communication Throughput

There is a dearth of publications on the subject of spreading-aided Orthogonal Time Frequency Space (OTFS) solutions, especially for Integrated Sensing and Communication (ISAC), even though Code Division Multiple Access (CDMA) assisted multi-user OTFS (CDMA/OTFS) exhibits tangible benefits. Hence, this work characterises both the communication Bit Error Rate (BER) and sensing Root Mean Square Error (RMSE) performance of CDMA/OTFS, and contrasts them to pure OTFS. Three CDMA/OTFS configurations are considered: Delay Code Division Multiple Access OTFS (Dl-CDMA/OTFS), Doppler Code Division Multiple Access OTFS (Dp-CDMA/OTFS), and Delay Doppler Code Division Multiple Access OTFS (DD-CDMA/OTFS), which harness direct sequence spreading along the delay axis, Doppler axis, and DD domains respectively. For each configuration, the performance of Gold, Hadamard, and Zadoff-Chu sequences is investigated. The results demonstrate that Zadoff-Chu Dl-CDMA/OTFS and DD-CDMA/OTFS consistently outperform pure OTFS sensing, whilst maintaining a similar communication performance at the same throughput. The extra modulation complexity of CDMA/OTFS is similar to that of other OTFS multi-user methodologies, but the demodulation complexity of CDMA/OTFS is lower than that of some other OTFS multi-user methodologies. CDMA/OTFS sensing can also consistently outperform OTFS sensing whilst not requiring any additional complexity for target parameter estimation. Therefore, CDMA/OTFS is an appealing candidate for implementing multi-user OTFS ISAC.


[81] 2501.11655

KKL Observer Synthesis for Nonlinear Systems via Physics-Informed Learning

This paper proposes a novel learning approach for designing Kazantzis-Kravaris/Luenberger (KKL) observers for autonomous nonlinear systems. The design of a KKL observer involves finding an injective map that transforms the system state into a higher-dimensional observer state, whose dynamics is linear and stable. The observer's state is then mapped back to the original system coordinates via the inverse map to obtain the state estimate. However, finding this transformation and its inverse is quite challenging. We propose to sequentially approximate these maps by neural networks that are trained using physics-informed learning. We generate synthetic data for training by numerically solving the system and observer dynamics. Theoretical guarantees for the robustness of state estimation against approximation error and system uncertainties are provided. Additionally, a systematic method for optimizing observer performance through parameter selection is presented. The effectiveness of the proposed approach is demonstrated through numerical simulations on benchmark examples and its application to sensor fault detection and isolation in a network of Kuramoto oscillators using learned KKL observers.


[82] 2501.11687

$SE(3)$-Based Trajectory Optimization and Target Tracking in UAV-Enabled ISAC Systems

This paper introduces a novel approach to enhance the performance of UAV-enabled integrated sensing and communication (ISAC) systems. By integrating uniform planar arrays (UPAs) and modeling the UAV as a rigid body using $SE(3)$, the study addresses key challenges in existing ISAC frameworks, such as rigid-body dynamics and trajectory design. We propose a target tracking scheme based on extended Kalman filtering (EKF) in $SE(3)$ and trajectory optimization from a control signal design perspective, leveraging the conditional Posterior Cramer-Rao bound (CPCRB) to optimize performance. Numerical results demonstrate the effectiveness of the proposed method in improving target tracking and trajectory optimization for a UAV-enabled MIMO-OFDM ISAC system.


[83] 2501.11699

Power Ramp-Rate Control via Power Regulation for Storageless Grid-Connected Photovoltaic Systems

Photovoltaic Power Ramp-Rate Control (PRRC) constitutes a key ancillary service for future power systems. Although its implementation through the installation of storage systems or irradiance sensors has been widely investigated, fewer studies have explored the power curtailment approach. The latter lacks efficiency, as it voluntarily produces power discharges, yet it is a cost-effective solution in terms of capital expenditures. This paper proposes a novel storageless and sensorless photovoltaic PRRC for grid-connected applications in which the photovoltaic power, rather than the voltage, is the controlled magnitude. The aforementioned contribution makes the effective tracking of the power ramp-rate limit possible compared to the existing methods in the literature. The method is assisted by a real-time curve-fitting algorithm that estimates the Maximum Power Point while operating suboptimally. Thus, no direct temperature or irradiance measurement systems are needed. The validation of the proposed PRRC strategy has been tested by simulation and compared to another approach available in the literature, considering real-field highly variable irradiance data. Experimental validation of the proposed strategy has been performed in real time via Controller Hardware-in-the-Loop.


[84] 2501.11704

Ultra-High Reliability by Predictive Interference Management Using Extreme Value Theory

Ultra-reliable low-latency communications (URLLC) require innovative approaches to modeling channel and interference dynamics, extending beyond traditional average estimates to encompass entire statistical distributions, including rare and extreme events that challenge achieving ultra-reliability performance regions. In this paper, we propose a risk-sensitive approach based on extreme value theory (EVT) to predict the signal-to-interference-plus-noise ratio (SINR) for efficient resource allocation in URLLC systems. We employ EVT to estimate the statistics of rare and extreme interference values, and kernel density estimation (KDE) to model the distribution of non-extreme events. Using a mixture model, we develop an interference prediction algorithm based on quantile prediction, introducing a confidence level parameter to balance reliability and resource usage. While accounting for the risk sensitivity of interference estimates, the prediction outcome is then used for appropriate resource allocation of a URLLC transmission under link outage constraints. Simulation results demonstrate that the proposed method outperforms the state-of-the-art first-order discrete-time Markov chain (DTMC) approach by reducing outage rates up to 100-fold, achieving target outage probabilities as low as \(10^{-7}\). Simultaneously, it minimizes radio resource usage \(\simnot15 \%\) compared to DTMC, while remaining only \(\simnot20 \%\) above the optimal case with perfect interference knowledge, resulting in significantly higher prediction accuracy. Additionally, the method is sample-efficient, able to predict interference effectively with minimal training data.


[85] 2501.11708

Estimating Rural Path Loss with ITU-R P.1812-7 : Impact of Geospatial Inputs

Accurate radio wave propagation modeling is essential for effective spectrum management by regulators and network deployment by operators. This paper investigates the ITU-R P.1812-7 (P.1812) propagation model's reliance on geospatial inputs, particularly clutter information, to improve path loss estimation, with an emphasis on rural geographic regions. The research evaluates the impact of geospatial elevation and land cover datasets, including Global Forest Canopy Height (GFCH), European Space Agency WorldCover, and Natural Resources Canada LandCover, on P.1812 propagation model prediction accuracy. Results highlight the trade-offs between dataset resolution, geospatial data availability, and representative clutter height assignments. Simulations reveal that high-resolution data do not always yield better results and that global datasets such as the GFCH provide a robust alternative when high-resolution data are unavailable or out-of-date. This study provides a set of guidelines for geospatial dataset integration to enhance P.1812's rural path loss predictions.


[86] 2501.11734

MedicoSAM: Towards foundation models for medical image segmentation

Medical image segmentation is an important analysis task in clinical practice and research. Deep learning has massively advanced the field, but current approaches are mostly based on models trained for a specific task. Training such models or adapting them to a new condition is costly due to the need for (manually) labeled data. The emergence of vision foundation models, especially Segment Anything, offers a path to universal segmentation for medical images, overcoming these issues. Here, we study how to improve Segment Anything for medical images by comparing different finetuning strategies on a large and diverse dataset. We evaluate the finetuned models on a wide range of interactive and (automatic) semantic segmentation tasks. We find that the performance can be clearly improved for interactive segmentation. However, semantic segmentation does not benefit from pretraining on medical images. Our best model, MedicoSAM, is publicly available at https://github.com/computational-cell-analytics/medico-sam. We show that it is compatible with existing tools for data annotation and believe that it will be of great practical value.


[87] 2501.11737

Efficient Bearing Sensor Data Compression via an Asymmetrical Autoencoder with a Lifting Wavelet Transform Layer

Bearing data compression is vital to manage the large volumes of data generated during condition monitoring. In this paper, a novel asymmetrical autoencoder with a lifting wavelet transform (LWT) layer is developed to compress bearing sensor data. The encoder part of the network consists of a convolutional layer followed by a wavelet filterbank layer. Specifically, a dual-channel convolutional block with diverse convolutional kernel sizes and varying processing depths is integrated into the wavelet filterbank layer to enable comprehensive feature extraction from the wavelet domain. Additionally, the adaptive hard-thresholding nonlinearity is applied to remove redundant components while denoising the primary wavelet coefficients. On the decoder side, inverse LWT, along with multiple linear layers and activation functions, is employed to reconstruct the original signals. Furthermore, to enhance compression efficiency, a sparsity constraint is introduced during training to impose sparsity on the latent representations. The experimental results demonstrate that the proposed approach achieves superior data compression performance compared to state-of-the-art methods.


[88] 2501.11755

A generalizable 3D framework and model for self-supervised learning in medical imaging

Current self-supervised learning methods for 3D medical imaging rely on simple pretext formulations and organ- or modality-specific datasets, limiting their generalizability and scalability. We present 3DINO, a cutting-edge SSL method adapted to 3D datasets, and use it to pretrain 3DINO-ViT: a general-purpose medical imaging model, on an exceptionally large, multimodal, and multi-organ dataset of ~100,000 3D medical imaging scans from over 10 organs. We validate 3DINO-ViT using extensive experiments on numerous medical imaging segmentation and classification tasks. Our results demonstrate that 3DINO-ViT generalizes across modalities and organs, including out-of-distribution tasks and datasets, outperforming state-of-the-art methods on the majority of evaluation metrics and labeled dataset sizes. Our 3DINO framework and 3DINO-ViT will be made available to enable research on 3D foundation models or further finetuning for a wide range of medical imaging applications.


[89] 2501.11763

Generative AI-enabled Blockage Prediction for Robust Dual-Band mmWave Communication

In mmWave wireless networks, signal blockages present a significant challenge due to the susceptibility to environmental moving obstructions. Recently, the availability of visual data has been leveraged to enhance blockage prediction accuracy in mmWave networks. In this work, we propose a Vision Transformer (ViT)-based approach for visual-aided blockage prediction that intelligently switches between mmWave and Sub-6 GHz frequencies to maximize network throughput and maintain reliable connectivity. Given the computational demands of processing visual data, we implement our solution within a hierarchical fog-cloud computing architecture, where fog nodes collaborate with cloud servers to efficiently manage computational tasks. This structure incorporates a generative AI-based compression technique that significantly reduces the volume of visual data transmitted between fog nodes and cloud centers. Our proposed method is tested with the real-world DeepSense 6G dataset, and according to the simulation results, it achieves a blockage prediction accuracy of 92.78% while reducing bandwidth usage by 70.31%.


[90] 2501.11780

Block Phase Tracking Reference Signal (PTRS) Allocation for DFT-s-OFDM

This study introduces a Block Phase Tracking Reference Signal (PTRS) allocation approach for Discrete Fourier Transform-spread-Orthogonal Frequency Division Multiplexing (DFT-s-OFDM) systems to enhance phase noise tracking and compensation. Our proposed block allocation methodology leverages the concepts of multiresolution time-frequency tiling for more effective sampling, thereby mitigating aliasing effects and improving phase noise resilience. A key contribution of our approach is a novel modulation and demodulation scheme, incorporating a dedicated DFT-s-OFDM symbol, a modulator branch for block PTRS generation, and a dedicated demodulator for accurate phase noise estimation and correction.


[91] 2501.11820

Comparative Analysis of Control Strategies for Position Regulation in DC Servo Motors

A servomotor is a closed-loop system designed for precise movement control, utilizing position feedback to achieve accurate final positions. Due to the ability to deliver higher power output and operate at enhanced speeds, DC servo motors are considered ideal for applications requiring precision and performance. This research aims to design, simulate, and compare various control strategies for precise position control in DC servo motors (DSM). The controllers evaluated in this study include proportional (P), proportional-integral (PI), proportional-integral-derivative (PID), state-feedback controllers (SFC), and state-feedback controllers augmented with integral action (SFCIA). The performance of these controllers was evaluated using MATLAB simulations, characterized by overshoot, settling time, steady-state error, rise time, and peak time. The results indicate that the state-feedback controller with integral action (SFCIA) surpasses other control strategies by achieving zero steady-state error, minimal overshoot, the shortest settling time, and optimized rise and peak times. These findings highlight the effectiveness of SFCIA for tasks requiring high levels of stability, precision, and dynamic performance.


[92] 2501.11837

30+ Years of Source Separation Research: Achievements and Future Challenges

Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since. On the occasion of ICASSP's 50th anniversary, we review the major contributions and advancements in the past three decades in the speech, audio, and music SS research field. We will cover both single- and multi-channel SS approaches. We will also look back on key efforts to foster a culture of scientific evaluation in the research field, including challenges, performance metrics, and datasets. We will conclude by discussing current trends and future research directions.


[93] 2501.11844

Keypoint Detection Empowered Near-Field User Localization and Channel Reconstruction

In the near-field region of an extremely large-scale multiple-input multiple-output (XL MIMO) system, channel reconstruction is typically addressed through sparse parameter estimation based on compressed sensing (CS) algorithms after converting the received pilot signals into the transformed domain. However, the exhaustive search on the codebook in CS algorithms consumes significant computational resources and running time, particularly when a large number of antennas are equipped at the base station (BS). To overcome this challenge, we propose a novel scheme to replace the high-cost exhaustive search procedure. We visualize the sparse channel matrix in the transformed domain as a channel image and design the channel keypoint detection network (CKNet) to locate the user and scatterers in high speed. Subsequently, we use a small-scale newtonized orthogonal matching pursuit (NOMP) based refiner to further enhance the precision. Our method is applicable to both the Cartesian domain and the Polar domain. Additionally, to deal with scenarios with a flexible number of propagation paths, we further design FlexibleCKNet to predict both locations and confidence scores. Our experimental results validate that the CKNet and FlexibleCKNet-empowered channel reconstruction scheme can significantly reduce the computational complexity while maintaining high accuracy in both user and scatterer localization and channel reconstruction tasks.


[94] 2501.11854

WaveNet-SF: A Hybrid Network for Retinal Disease Detection Based on Wavelet Transform in the Spatial-Frequency Domain

Retinal diseases are a leading cause of vision impairment and blindness, with timely diagnosis being critical for effective treatment. Optical Coherence Tomography (OCT) has become a standard imaging modality for retinal disease diagnosis, but OCT images often suffer from issues such as speckle noise, complex lesion shapes, and varying lesion sizes, making interpretation challenging. In this paper, we propose a novel framework, WaveNet-SF, to enhance retinal disease detection by integrating spatial-domain and frequency-domain learning. The framework utilizes wavelet transforms to decompose OCT images into low- and high-frequency components, enabling the model to extract both global structural features and fine-grained details. To improve lesion detection, we introduce a multi-scale wavelet spatial attention (MSW-SA) module, which enhances the model's focus on regions of interest at multiple scales. Additionally, a high-frequency feature compensation block (HFFC) is incorporated to recover edge information lost during wavelet decomposition, suppress noise, and preserve fine details crucial for lesion detection. Our approach achieves state-of-the-art (SOTA) classification accuracies of 97.82% and 99. 58% on the OCT-C8 and OCT2017 datasets, respectively, surpassing existing methods. These results demonstrate the efficacy of WaveNet-SF in addressing the challenges of OCT image analysis and its potential as a powerful tool for retinal disease diagnosis.


[95] 2501.11869

Saturation in Snapshot Compressive Imaging

Snapshot Compressive Imaging (SCI) maps three-dimensional (3D) data cubes, such as videos or hyperspectral images, into two-dimensional (2D) measurements via optical modulation, enabling efficient data acquisition and reconstruction. Recent advances have shown the potential of mask optimization to enhance SCI performance, but most studies overlook nonlinear distortions caused by saturation in practical systems. Saturation occurs when high-intensity measurements exceed the sensor's dynamic range, leading to information loss that standard reconstruction algorithms cannot fully recover. This paper addresses the challenge of optimizing binary masks in SCI under saturation. We theoretically characterize the performance of compression-based SCI recovery in the presence of saturation and leverage these insights to optimize masks for such conditions. Our analysis reveals trade-offs between mask statistics and reconstruction quality in saturated systems. Experimental results using a Plug-and-Play (PnP) style network validate the theory, demonstrating improved recovery performance and robustness to saturation with our optimized binary masks.


[96] 2501.11974

Wideband Pulse Generation for Underwater Applications Using Parametric Array

We investigated wideband pulse generation for underwater acoustic applications using a parametric array. We fabricated a transducer consisting of a 3 mm thick 75 mm-by-75 mm square-shaped PZT ceramic plate, which is matched to water media at the radiating face and terminated by a very low impedance at the back. All measurements were made in a large test tank. We transmitted square-root amplitude modulated pulses centered around 855 kHz primary frequency. We showed that phase-sensitive generation of in-phase and out-of-phase bursts suitable for coded transmission using a parametric array is possible. We generated very short duration bursts, as short as half-cycle, at a 10-80 kHz difference frequency range. The definition of the bursts is excellent, e.g., with a normalized cross-correlation of 0.92 with an ideal 2-cycle square burst, for both in-phase and out-of-phase pulses.


[97] 2501.11980

A note on the sample complexity of multi-target detection

This work studies the sample complexity of the multi-target detection (MTD) problem, which involves recovering a signal from a noisy measurement containing multiple instances of a target signal in unknown locations, each transformed by a random group element. This problem is primarily motivated by single-particle cryo-electron microscopy (cryo-EM), a groundbreaking technology for determining the structures of biological molecules. We establish upper and lower bounds for various MTD models in the high-noise regime as a function of the group, the distribution over the group, and the arrangement of signal occurrences within the measurement. The lower bounds are established through a reduction to the related multi-reference alignment problem, while the upper bounds are derived from explicit recovery algorithms utilizing autocorrelation analysis. These findings provide fundamental insights into estimation limits in noisy environments and lay the groundwork for extending this analysis to more complex applications, such as cryo-EM.


[98] 2501.11986

Diffeomorphic ICP Registration for Single and Multiple Point Sets

We propose a generalization of the iterative closest point (ICP) algorithm for point set registration, in which the registration functions are non-rigid and follow the large deformation diffeomorphic metric mapping (LDDMM) framework. The algorithm is formulated as a well-posed probabilistic inference, and requires to solve a novel variation of LDDMM landmark registration with an additional term involving the Jacobian of the mapping. The algorithm can easily be generalized to construct a diffeomorphic, statistical atlas of multiple point sets. The method is successfully validated on a first set of synthetic data.


[99] 2501.11999

Rate-Aware Learned Speech Compression

The rapid rise of real-time communication and large language models has significantly increased the importance of speech compression. Deep learning-based neural speech codecs have outperformed traditional signal-level speech codecs in terms of rate-distortion (RD) performance. Typically, these neural codecs employ an encoder-quantizer-decoder architecture, where audio is first converted into latent code feature representations and then into discrete tokens. However, this architecture exhibits insufficient RD performance due to two main drawbacks: (1) the inadequate performance of the quantizer, challenging training processes, and issues such as codebook collapse; (2) the limited representational capacity of the encoder and decoder, making it difficult to meet feature representation requirements across various bitrates. In this paper, we propose a rate-aware learned speech compression scheme that replaces the quantizer with an advanced channel-wise entropy model to improve RD performance, simplify training, and avoid codebook collapse. We employ multi-scale convolution and linear attention mixture blocks to enhance the representational capacity and flexibility of the encoder and decoder. Experimental results demonstrate that the proposed method achieves state-of-the-art RD performance, obtaining 53.51% BD-Rate bitrate saving in average, and achieves 0.26 BD-VisQol and 0.44 BD-PESQ gains.


[100] 2501.12004

Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention

For time-frequency (TF) domain speech enhancement (SE) methods, the overlap-and-add operation in the inverse TF transformation inevitably leads to an algorithmic delay equal to the window size. However, typical causal SE systems fail to utilize the future speech information within this inherent delay, thereby limiting SE performance. In this paper, we propose an overlapped-frame information fusion scheme. At each frame index, we construct several pseudo overlapped-frames, fuse them with the original speech frame, and then send the fused results to the SE model. Additionally, we introduce a causal time-frequency-channel attention (TFCA) block to boost the representation capability of the neural network. This block parallelly processes the intermediate feature maps through self-attention-based operations in the time, frequency, and channel dimensions. Experiments demonstrate the superiority of these improvements, and the proposed SE system outperforms the current advanced methods.


[101] 2501.12092

Data-Aided Regularization of Direct-Estimate Combiner in Distributed MIMO Systems

This paper explores the data-aided regularization of the direct-estimate combiner in the uplink of a distributed multiple-input multiple-output system. The network-wide combiner can be computed directly from the pilot signal received at each access point, eliminating the need for explicit channel estimation. However, the sample covariance matrix of the received pilot signal that is used in its computation may significantly deviate from the actual covariance matrix when the number of pilot symbols is limited. To address this, we apply a regularization to the sample covariance matrix using a shrinkage coefficient based on the received data signal. Initially, the shrinkage coefficient is determined by minimizing the difference between the sample covariance matrices obtained from the received pilot and data signals. Given the limitations of this approach in interference-limited scenarios, the shrinkage coefficient is iteratively optimized using the sample mean squared error of the hard-decision symbols, which is more closely related to the actual system's performance, e.g., the symbol error rate (SER). Numerical results demonstrate that the proposed regularization of the direct-estimate combiner significantly enhances the SER, particularly when the number of pilot symbols is limited.


[102] 2501.12094

A Comprehensive Metric for Resilience Evaluation of Power Distribution Systems under Cyber Attacks

Power distribution systems (PDS) serve as the backbone of our modern society, ensuring electricity reaches homes, businesses, and critical infrastructure. However, the increasing digitization and interconnectivity of these systems have exposed them to cyber threats. This study presents a comprehensive approach to evaluate and enhance the resilience of PDS under cyber attacks using the Common Vulnerability Scoring System (CVSS) and complex network parameters. By systematically assessing vulnerabilities and computing resilience once critical CVSS thresholds are reached, this work identifies key resilience metrics including the critical loads service requirements. The proposed methodology improves system resilience through strategic tie-line switching, which is validated on the modified IEEE 33-bus system. Four case studies are conducted, illustrating the performance of the proposed methodology under various cyber attack scenarios. The results demonstrate the effectiveness of the approach in quantifying and enhancing resilience, offering a valuable tool for PDS operators to mitigate risks and ensure continuous service delivery to critical loads during the exploitation of cyber threats.


[103] 2501.12156

Characterization of Invariance, Periodic Solutions and Optimization of Dynamic Financial Networks

Cascading failures, such as bankruptcies and defaults, pose a serious threat for the resilience of the global financial system. Indeed, because of the complex investment and cross-holding relations within the system, failures can occur as a result of the propagation of a financial collapse from one organization to another. While this problem has been studied in depth from a static angle, namely, when the system is at an equilibrium, we take a different perspective and study the corresponding dynamical system. The contribution of this paper is threefold. First, we carry out a systematic analysis of the regions of attraction and invariance of the system orthants, defined by the positive and negative values of the organizations' equity. Second, we investigate periodic solutions and show through a counterexample that there could exist periodic solutions of period greater than 2. Finally, we study the problem of finding the smallest cash injection that would bring the system to the maximal invariant region of the positive orthant.


[104] 2501.12209

Machine Learning Based Probe Skew Correction for High-frequency BH Loop Measurements

Experimental characterization of magnetic components has grown to be increasingly important to understand and model their behaviours in high-frequency PWM converters. The BH loop measurement is the only available approach to separate the core loss as an electrical method, which, however, is suspective to the probe phase skew. As an alternative to the regular de-skew approaches based on hardware, this work proposes a novel machine-learning-based method to identify and correct the probe skew, which builds on the newly discovered correlation between the skew and the shape/trajectory of the measured BH loop. A special technique is proposed to artificially generate the skewed images from measured waveforms as augmented training sets. A machine learning pipeline is developed with the Convolutional Neural Network (CNN) to treat the problem as an image-based prediction task. The trained model has demonstrated a high accuracy in identifying the skew value from a BH loop unseen by the model, which enables the compensation of the skew to yield the corrected core loss value and BH loop.


[105] 2501.12244

Zero-shot Bias Correction: Efficient MR Image Inhomogeneity Reduction Without Any Data

In recent years, deep neural networks for image inhomogeneity reduction have shown promising results. However, current methods with (un)supervised solutions require preparing a training dataset, which is expensive and laborious for data collection. In this work, we demonstrate a novel zero-shot deep neural networks, which requires no data for pre-training and dedicated assumption of the bias field. The designed light-weight CNN enables an efficient zero-shot adaptation for bias-corrupted image correction. Our method provides a novel solution to mitigate the biased corrupted image as iterative homogeneity refinement, which therefore ensures the considered issue can be solved easier with stable convergence of zero-shot optimization. Extensive comparison on different datasets show that the proposed method performs better than current data-free N4 methods in both efficiency and accuracy.


[106] 2501.12245

Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping

X-ray imaging is the most widely used medical imaging modality. However, in the common practice, inconsistency in the initial presentation of X-ray images is a common complaint by radiologists. Different patient positions, patient habitus and scanning protocols can lead to differences in image presentations, e.g., differences in brightness and contrast globally or regionally. To compensate for this, additional work will be executed by clinical experts to adjust the images to the desired presentation, which can be time-consuming. Existing deep-learning-based end-to-end solutions can automatically correct images with promising performances. Nevertheless, these methods are hard to be interpreted and difficult to be understood by clinical experts. In this manuscript, a novel interpretable mapping method by deep learning is proposed, which automatically enhances the image brightness and contrast globally and locally. Meanwhile, because the model is inspired by the workflow of the brightness and contrast manipulation, it can provide interpretable pixel maps for explaining the motivation of image enhancement. The experiment on the clinical datasets show the proposed method can provide consistent brightness and contrast correction on X-ray images with accuracy of 24.75 dB PSNR and 0.8431 SSIM.


[107] 2501.12288

Microgrid Operation Control with State-of-Charge- Dependent Storage Power Constraints

The microgrid concept offers high flexibility and resilience due to the possibility of switching between grid-connected and stand-alone operation. This renders microgrids an auspicious solution for rural areas and critical infrastructure. In standalone or islanded mode, the main objective is cost minimization while ensuring a safe and reliable operation. Optimal operation schemes for microgrids usually assume fixed power limits for energy storage units. This, however, is not sufficient for lithiumion energy storage systems, which often come with dynamic power limits that depend on the state of charge. These limits are especially prominent when the state of charge is close to its boundaries. In this paper, dynamic constraints for energy storages are modelled using convex polytopes and fitted to experimental data acquired from an 11.6 kWh lithium-ion energy storage system. The polytopic constraints are integrated in a model predictive control scheme that was designed for a standalone microgrid composed of a fuel cell, a photovoltaic generator and a lithium-ion energy storage system. To evaluate the advantages, a case study with two configurations is performed. The model predictive controller without polytopic constraints led to constraint violations in 11.77 % of the simulation time steps with a maximum deviation of 118 % above the power limits. The configuration with polytopic constraints in contrary led to no violations over the entire simulation horizon.


[108] 2501.12311

Heuristic Deep Reinforcement Learning for Phase Shift Optimization in RIS-assisted Secure Satellite Communication Systems with RSMA

This paper presents a novel heuristic deep reinforcement learning (HDRL) framework designed to optimize reconfigurable intelligent surface (RIS) phase shifts in secure satellite communication systems utilizing rate splitting multiple access (RSMA). The proposed HDRL approach addresses the challenges of large action spaces inherent in deep reinforcement learning by integrating heuristic algorithms, thus improving exploration efficiency and leading to faster convergence toward optimal solutions. We validate the effectiveness of HDRL through comprehensive simulations, demonstrating its superiority over traditional algorithms, including random phase shift, greedy algorithm, exhaustive search, and Deep Q-Network (DQN), in terms of secure sum rate and computational efficiency. Additionally, we compare the performance of RSMA with non-orthogonal multiple access (NOMA), highlighting that RSMA, particularly when implemented with an increased number of RIS elements, significantly enhances secure communication performance. The results indicate that HDRL is a powerful tool for improving the security and reliability of RSMA satellite communication systems, offering a practical balance between performance and computational demands.


[109] 2501.12323

Deep Learning Based Segmentation of Blood Vessels from H&E Stained Oesophageal Adenocarcinoma Whole-Slide Images

Blood vessels (BVs) play a critical role in the Tumor Micro-Environment (TME), potentially influencing cancer progression and treatment response. However, manually quantifying BVs in Hematoxylin and Eosin (H&E) stained images is challenging and labor-intensive due to their heterogeneous appearances. We propose a novel approach of constructing guiding maps to improve the performance of state-of-the-art segmentation models for BV segmentation, the guiding maps encourage the models to learn representative features of BVs. This is particularly beneficial for computational pathology, where labeled training data is often limited and large models are prone to overfitting. We have quantitative and qualitative results to demonstrate the efficacy of our approach in improving segmentation accuracy. In future, we plan to validate this method to segment BVs across various tissue types and investigate the role of cellular structures in relation to BVs in the TME.


[110] 2501.12331

Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops

Prostate cancer (PCa) detection using deep learning (DL) models has shown potential for enhancing real-time guidance during biopsies. However, prostate ultrasound images lack pixel-level cancer annotations, introducing label noise. Current approaches often focus on limited regions of interest (ROIs), disregarding anatomical context necessary for accurate diagnosis. Foundation models can overcome this limitation by analyzing entire images to capture global spatial relationships; however, they still encounter challenges stemming from the weak labels associated with coarse pathology annotations in ultrasound data. We introduce Cinepro, a novel framework that strengthens foundation models' ability to localize PCa in ultrasound cineloops. Cinepro adapts robust training by integrating the proportion of cancer tissue reported by pathology in a biopsy core into its loss function to address label noise, providing a more nuanced supervision. Additionally, it leverages temporal data across multiple frames to apply robust augmentations, enhancing the model's ability to learn stable cancer-related features. Cinepro demonstrates superior performance on a multi-center prostate ultrasound dataset, achieving an AUROC of 77.1% and a balanced accuracy of 83.8%, surpassing current benchmarks. These findings underscore Cinepro's promise in advancing foundation models for weakly labeled ultrasound data.


[111] 2501.12353

Sum Rate Enhancement using Machine Learning for Semi-Self Sensing Hybrid RIS-Enabled ISAC in THz Bands

This paper proposes a novel semi-self sensing hybrid reconfigurable intelligent surface (SS-HRIS) in terahertz (THz) bands, where the RIS is equipped with reflecting elements divided between passive and active elements in addition to sensing elements. SS-HRIS along with integrated sensing and communications (ISAC) can help to mitigate the multipath attenuation that is abundant in THz bands. In our proposed scheme, sensors are configured at the SS-HRIS to receive the radar echo signal from a target. A joint base station (BS) beamforming and HRIS precoding matrix optimization problem is proposed to maximize the sum rate of communication users while maintaining satisfactory sensing performance measured by the Cramer-Rao bound (CRB) for estimating the direction of angles of arrival (AoA) of the echo signal and thermal noise at the target. The CRB expression is first derived and the sum rate maximization problem is formulated subject to communication and sensing performance constraints. To solve the complex non-convex optimization problem, deep deterministic policy gradient (DDPG)-based deep reinforcement learning (DRL) algorithm is proposed, where the reward function, the action space and the state space are modeled. Simulation results show that the proposed DDPG-based DRL algorithm converges well and achieves better performance than several baselines, such as the soft actor-critic (SAC), proximal policy optimization (PPO), greedy algorithm and random BS beamforming and HRIS precoding matrix schemes. Moreover, it demonstrates that adopting HRIS significantly enhances the achievable sum rate compared to passive RIS and random BS beamforming and HRIS precoding matrix schemes.


[112] 2501.12362

ARM-IRL: Adaptive Resilience Metric Quantification Using Inverse Reinforcement Learning

Resilience of safety-critical systems is gaining importance, particularly with the increasing number of cyber and physical threats. Cyber-physical threats are becoming increasingly prevalent, as digital systems are ubiquitous in critical infrastructure. The challenge with determining the resilience of cyber-physical systems is identifying a set of resilience metrics that can adapt to the changing states of the system. A static resilience metric can lead to an inaccurate estimation of system state, and can result in unintended consequences against cyber threats. In this work, we propose a data-driven method for adaptive resilience metric learning. The primary goal is to learn a single resilience metric by formulating an inverse reinforcement learning problem that learns a reward or objective from a set of control actions from an expert. It learns the structure or parameters of the reward function based on information provided by expert demonstrations. Most prior work has considered static weights or theories from fuzzy logic to formulate a single resilience metric. Instead, this work learns the resilience metric, represented as reward function, using adversarial inverse reinforcement learning, to determine the optimal policy through training the generator discriminator in parallel. We evaluate our proposed technique in scenarios such as optimal communication network rerouting, power distribution network reconfiguration, and a combined cyber-physical restoration of critical load using the IEEE 123-bus system.


[113] 2412.11236

Logarithmic Positional Partition Interval Encoding

One requirement of maintaining digital information is storage. With the latest advances in the digital world, new emerging media types have required even more storage space to be kept than before. In fact, in many cases it is required to have larger amounts of storage to keep up with protocols that support more types of information at the same time. In contrast, compression algorithms have been integrated to facilitate the transfer of larger data. Numerical representations are construed as embodiments of information. However, this correct association of a sequence could feasibly be inverted to signify an elongated series of numerals. In this work, a novel mathematical paradigm was introduced to engineer a methodology reliant on iterative logarithmic transformations, finely tuned to numeric sequences. Through this fledgling approach, an intricate interplay of polymorphic numeric manipulations was conducted. By applying repeated logarithmic operations, the data were condensed into a minuscule representation. Approximately thirteen times surpassed the compression method, ZIP. Such extreme compaction, achieved through iterative reduction of expansive integers until they manifested as single-digit entities, conferred a novel sense of informational embodiment. Instead of relegating data to classical discrete encodings, this method transformed them into a quasi-continuous, logarithmically. By contrast, this introduced approach revealed that morphing data into deeply compressed numerical substrata beyond conventional boundaries was feasible. A holistic perspective emerges, validating that numeric data can be recalibrated into ephemeral sequences of logarithmic impressions. It was not merely a matter of reducing digits, but of reinterpreting data through a resolute numeric vantage.


[114] 2501.10376

Energy-Constrained Information Storage on Memristive Devices in the Presence of Resistive Drift

In this paper, we examine the problem of information storage on memristors affected by resistive drift noise under energy constraints. We introduce a novel, fundamental trade-off between the information lifetime of memristive states and the energy that must be expended to bring the device into a particular state. We then treat the storage problem as one of communication over a noisy, energy-constrained channel, and propose a joint source-channel coding (JSCC) approach to storing images in an analogue fashion. To design an encoding scheme for natural images and to model the memristive channel, we make use of data-driven techniques from the field of deep learning for communications, namely deep joint source-channel coding (DeepJSCC), employing a generative model of resistive drift as a computationally tractable differentiable channel model for end-to-end optimisation. We introduce a modified version of generalised divisive normalisation (GDN), a biologically inspired form of normalisation, that we call conditional GDN (cGDN), allowing for conditioning on continuous channel characteristics, including the initial resistive state and the delay between storage and reading. Our results show that the delay-conditioned network is able to learn an energy-aware coding scheme that achieves a higher and more balanced reconstruction quality across a range of storage delays.


[115] 2501.10392

Ion Transmitter for Molecular Communication

Molecular communication (MC) is an emerging paradigm that takes inspiration from biological processes, enabling communication at the nanoscale and facilitating the development of the Internet of Bio-Nano Things (IoBNT). Traditional models of MC often rely on idealized assumptions that overlook practical challenges related to noise and signal behavior. This paper proposes and evaluates the first physical MC ion transmitter (ITX) using an ion exchange membrane. The circuit network model is used to simulate ion transport and analyze both transient and steady-state behavior. This analysis includes the effects of noise sources such as thermal and shot noise on signal integrity and SNR. The main contributions of this paper are to demonstrate how a practical MC ITX can produce a realistic waveform and to highlight future research challenges associated with a physical membrane-based ITX.


[116] 2501.10393

One-Time Signature Based on Pseudorandom Number Generator

With the advancement of quantum computing technologies, recent years have seen increasing efforts to identify cryptographic methods resistant to quantum attacks and to establish post-quantum cryptography (PQC) approaches. Among these, hash-based digital signature algorithms (DSAs) are a notable category of PQC. Hash functions are not only utilized in digital signatures but are also widely applied in pseudorandom number generators (PRNGs). Building on the foundation of hash-based DSAs, this study proposes a modified approach that introduces a DSA based on PRNGs, suitable for one-time signature (OTS) applications. The study explores the security of the proposed PRNG-based OTS algorithm and validates its feasibility through experiments comparing various parameter configurations. These experiments examine key length, signature length, key generation time, signature generation time, and signature verification time under different parameter settings.


[117] 2501.10413

Cooperative Search and Track of Rogue Drones using Multiagent Reinforcement Learning

This work considers the problem of intercepting rogue drones targeting sensitive critical infrastructure facilities. While current interception technologies focus mainly on the jamming/spoofing tasks, the challenges of effectively locating and tracking rogue drones have not received adequate attention. Solving this problem and integrating with recently proposed interception techniques will enable a holistic system that can reliably detect, track, and neutralize rogue drones. Specifically, this work considers a team of pursuer UAVs that can search, detect, and track multiple rogue drones over a sensitive facility. The joint search and track problem is addressed through a novel multiagent reinforcement learning scheme to optimize the agent mobility control actions that maximize the number of rogue drones detected and tracked. The performance of the proposed system is investigated under realistic settings through extensive simulation experiments with varying number of agents demonstrating both its performance and scalability.


[118] 2501.10429

Recent Advances of 6G Ultra-Massive MIMO Technologies in Spatial and Beam Domains

To explore the full potential of ultra-massive multiple-input multiple-output (MIMO) communication systems, it is fundamental to understand new ultra-massive MIMO channel characteristics and establish pervasive channel models. On this basis, large dimensional spatial-temporal transmission and random access technologies need to be investigated and evaluated for better practical implementation. Firstly, this paper reviews recent advances of ultra-massive MIMO technologies in the traditional spatial domain, including wireless channel characterization and modeling, channel estimation, spatial multiplexing, and precoding. Secondly, considering the dramatic increase of base station (BS) antennas and access users in ultra-massive MIMO systems, the confronted high dimensional complexity and computing burden of these ultra-massive MIMO technologies are indicated. To provide efficient and systematic solution, the emerging tendency to transform related technologies from the traditional spatial domain to beam domain is introduced. The utilities of large sparsity merit, reduced energy consumption, and improved usage of radio frequency (RF) chains in the beam domain channel are elaborated. At last, future challenges of ultra-massive MIMO communication systems are discussed.


[119] 2501.10430

Prediction Model of Aqua Fisheries Using IoT Devices

Aquaculture involves cultivating marine and freshwater organisms, with real-time monitoring of aquatic parameters being crucial in fish farming. This thesis proposes an IoT-based framework using sensors and Arduino for efficient monitoring and control of water quality. Different sensors including pH, temperature, and turbidity are placed in cultivating pond water and each of them is connected to a common microcontroller board built on an Arduino Uno. The sensors read the data from the water and store it as a CSV file in an IoT cloud named Thingspeak through the Arduino Microcontroller. In the experimental part, we collected data from 5 ponds with various sizes and environments. After getting the real-time data, we compared these with the standard reference values. As a result, we can make the decision about which ponds are satisfactory for cultivating fish and what is not. After that, we labeled the data with 11 fish categories including Katla, sing, prawn, rui, koi, pangas, tilapia, silvercarp, karpio, magur, and shrimp. In addition, the data were analyzed using 10 machine learning (ML) algorithms containing J48, Random Forest, K-NN, K*, LMT, REPTree, JRIP, PART, Decision Table, and Logit boost. After experimental evaluation, it was observed among 5 ponds, only three ponds were perfect for fish farming, where these 3 ponds only satisfied the standard reference values of pH (6.5-8.5), Temperature (16-24)oC, Turbidity (below 10)ntu, Conductivity (970-1825){\mu}S/cm, and Depth (1-4) meter. Among the state-of-the-art machine learning algorithms, Random Forest achieved the highest score of performance metrics as accuracy 94.42%, kappa statistics 93.5%, and Avg. TP Rate 94.4%. In addition, we calculated the BOD, COD, and DO for one scenario. This study includes details of the proposed IoT system's prototype hardware.


[120] 2501.10441

A Review of Detection, Evolution, and Data Reconstruction Strategies for False Data Injection Attacks in Power Cyber-Physical Systems

The integration of information and physical systems in modern power grids has heightened vulnerabilities to False Data Injection Attacks (FDIAs), threatening the secure operation of power cyber-physical systems (CPS). This paper reviews FDIA detection, evolution, and data reconstruction strategies, highlighting cross-domain coordination, multi-temporal evolution, and stealth characteristics. Challenges in existing detection methods, including poor interpretability and data imbalance, are discussed, alongside advanced state-aware and action-control data reconstruction techniques. Key issues, such as modeling FDIA evolution and distinguishing malicious data from regular faults, are identified. Future directions to enhance system resilience and detection accuracy are proposed, contributing to the secure operation of power CPS.


[121] 2501.10492

ACCEPT: Diagnostic Forecasting of Battery Degradation Through Contrastive Learning

Modeling lithium-ion battery (LIB) degradation offers significant cost savings and enhances the safety and reliability of electric vehicles (EVs) and battery energy storage systems (BESS). Whilst data-driven methods have received great attention for forecasting degradation, they often demonstrate limited generalization ability and tend to underperform particularly in critical scenarios involving accelerated degradation, which are crucial to predict accurately. These methods also fail to elucidate the underlying causes of degradation. Alternatively, physical models provide a deeper understanding, but their complex parameters and inherent uncertainties limit their applicability in real-world settings. To this end, we propose a new model - ACCEPT. Our novel framework uses contrastive learning to map the relationship between the underlying physical degradation parameters and observable operational quantities, combining the benefits of both approaches. Furthermore, due to the similarity of degradation paths between LIBs with the same chemistry, this model transfers non-trivially to most downstream tasks, allowing for zero-shot inference. Additionally, since categorical features can be included in the model, it can generalize to other LIB chemistries. This work establishes a foundational battery degradation model, providing reliable forecasts across a range of battery types and operating conditions.


[122] 2501.10523

Multiclass Queue Scheduling Under Slowdown: An Approximate Dynamic Programming Approach

In many service systems, especially those in healthcare, customer waiting times can result in increased service requirements. Such service slowdowns can significantly impact system performance. Therefore, it is important to properly account for their impact when designing scheduling policies. Scheduling under wait-dependent service times is challenging, especially when multiple customer classes are heterogeneously affected by waiting. In this work, we study scheduling policies in multiclass, multiserver queues with wait-dependent service slowdowns. We propose a simulation-based Approximate Dynamic Programming (ADP) algorithm to find close-to-optimal scheduling policies. The ADP algorithm (i) represents the policy using classifiers based on the index policy structure, (ii) leverages a coupling method to estimate the differences of the relative value functions directly, and (iii) uses adaptive sampling for efficient state-space exploration. Through extensive numerical experiments, we illustrate that the ADP algorithm generates close-to-optimal policies that outperform well-known benchmarks. We also provide insights into the structure of the optimal policy, which reveals an important trade-off between instantaneous cost reduction and preventing the system from reaching high-cost equilibria. Lastly, we conduct a case study on scheduling admissions into rehabilitation care to illustrate the effectiveness of the ADP algorithm in practice.


[123] 2501.10525

DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids

The \textbf{DeepFilterNet} (\textbf{DFN}) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all' approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the \textbf{DFN} model, thus proposing the \textbf{DFingerNet} (\textbf{DFiN}) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.


[124] 2501.10547

HyperCam: Low-Power Onboard Computer Vision for IoT Cameras

We present HyperCam, an energy-efficient image classification pipeline that enables computer vision tasks onboard low-power IoT camera systems. HyperCam leverages hyperdimensional computing to perform training and inference efficiently on low-power microcontrollers. We implement a low-power wireless camera platform using off-the-shelf hardware and demonstrate that HyperCam can achieve an accuracy of 93.60%, 84.06%, 92.98%, and 72.79% for MNIST, Fashion-MNIST, Face Detection, and Face Identification tasks, respectively, while significantly outperforming other classifiers in resource efficiency. Specifically, it delivers inference latency of 0.08-0.27s while using 42.91-63.00KB flash memory and 22.25KB RAM at peak. Among other machine learning classifiers such as SVM, xgBoost, MicroNets, MobileNetV3, and MCUNetV3, HyperCam is the only classifier that achieves competitive accuracy while maintaining competitive memory footprint and inference latency that meets the resource requirements of low-power camera systems.


[125] 2501.10605

Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning

We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.


[126] 2501.10625

Assessing Markov Property in Driving Behaviors: Insights from Statistical Tests

The Markov property serves as a foundational assumption in most existing work on vehicle driving behavior, positing that future states depend solely on the current state, not the series of preceding states. This study validates the Markov properties of vehicle trajectories for both Autonomous Vehicles (AVs) and Human-driven Vehicles (HVs). A statistical method used to test whether time series data exhibits Markov properties is applied to examine whether the trajectory data possesses Markov characteristics. t test and F test are additionally introduced to characterize the differences in Markov properties between AVs and HVs. Based on two public trajectory datasets, we investigate the presence and order of the Markov property of different types of vehicles through rigorous statistical tests. Our findings reveal that AV trajectories generally exhibit stronger Markov properties compared to HV trajectories, with a higher percentage conforming to the Markov property and lower Markov orders. In contrast, HV trajectories display greater variability and heterogeneity in decision-making processes, reflecting the complex perception and information processing involved in human driving. These results have significant implications for the development of driving behavior models, AV controllers, and traffic simulation systems. Our study also demonstrates the feasibility of using statistical methods to test the presence of Markov properties in driving trajectory data.


[127] 2501.10629

Prompt-Enabled Large AI Models for CSI Feedback

Artificial intelligence (AI) has emerged as a promising tool for channel state information (CSI) feedback. While recent research primarily focuses on improving feedback accuracy through novel architectures, the underlying mechanisms of AI-based CSI feedback remain unclear. This study investigates these mechanisms by analyzing performance across diverse datasets and reveals that superior feedback performance stems from the strong fitting capabilities of AI models and their ability to leverage environmental knowledge. Building on these findings, we propose a prompt-enabled large AI model (LAM) for CSI feedback. The LAM employs powerful transformer blocks and is trained on extensive datasets from various scenarios. To further enhance reconstruction quality, the channel distribution -- represented as the mean of channel magnitude in the angular domain -- is incorporated as a prompt within the decoder. Simulation results confirm that the proposed prompt-enabled LAM significantly improves feedback accuracy and generalization performance while reducing data collection requirements in new scenarios.


[128] 2501.10630

Exploring the Potential of Large Language Models for Massive MIMO CSI Feedback

Large language models (LLMs) have achieved remarkable success across a wide range of tasks, particularly in natural language processing and computer vision. This success naturally raises an intriguing yet unexplored question: Can LLMs be harnessed to tackle channel state information (CSI) compression and feedback in massive multiple-input multiple-output (MIMO) systems? Efficient CSI feedback is a critical challenge in next-generation wireless communication. In this paper, we pioneer the use of LLMs for CSI compression, introducing a novel framework that leverages the powerful denoising capabilities of LLMs -- capable of error correction in language tasks -- to enhance CSI reconstruction performance. To effectively adapt LLMs to CSI data, we design customized pre-processing, embedding, and post-processing modules tailored to the unique characteristics of wireless signals. Extensive numerical results demonstrate the promising potential of LLMs in CSI feedback, opening up possibilities for this research direction.


[129] 2501.10637

HOPS: High-order Polynomials with Self-supervised Dimension Reduction for Load Forecasting

Load forecasting is a fundamental task in smart grid. Many techniques have been applied to developing load forecasting models. Due to the challenges such as the Curse of Dimensionality, overfitting, and limited computing resources, multivariate higher-order polynomial models have received limited attention in load forecasting, despite their desirable mathematical foundations and optimization properties. In this paper, we propose low rank approximation and self-supervised dimension reduction to address the aforementioned issues. To further improve computational efficiency, we also introduce a fast Conjugate Gradient based algorithm for the proposed polynomial models. Based on the ISO New England dataset used in Global Energy Forecasting Competition 2017, the proposed method high-order polynomials with self-supervised dimension reduction (HOPS) demonstrates higher forecasting accuracy over several competitive models. Additionally, experimental results indicate that our approach alleviates redundant variable construction, achieving better forecasts with fewer input variables.


[130] 2501.10666

Speech Emotion Detection Based on MFCC and CNN-LSTM Architecture

Emotion detection techniques have been applied to multiple cases mainly from facial image features and vocal audio features, of which the latter aspect is disputed yet not only due to the complexity of speech audio processing but also the difficulties of extracting appropriate features. Part of the SAVEE and RAVDESS datasets are selected and combined as the dataset, containing seven sorts of common emotions (i.e. happy, neutral, sad, anger, disgust, fear, and surprise) and thousands of samples. Based on the Librosa package, this paper processes the initial audio input into waveplot and spectrum for analysis and concentrates on multiple features including MFCC as targets for feature extraction. The hybrid CNN-LSTM architecture is adopted by virtue of its strong capability to deal with sequential data and time series, which mainly consists of four convolutional layers and three long short-term memory layers. As a result, the architecture achieved an accuracy of 61.07% comprehensively for the test set, among which the detection of anger and neutral reaches a performance of 75.31% and 71.70% respectively. It can also be concluded that the classification accuracy is dependent on the properties of emotion to some extent, with frequently-used and distinct-featured emotions having less probability to be misclassified into other categories. Emotions like surprise whose meaning depends on the specific context are more likely to confuse with positive or negative emotions, and negative emotions also have a possibility to get mixed with each other.


[131] 2501.10670

Computing Capacity-Cost Functions for Continuous Channels in Wasserstein Space

This paper investigates the problem of computing capacity-cost (C-C) functions for continuous channels. Motivated by the Kullback-Leibler divergence (KLD) proximal reformulation of the classical Blahut-Arimoto (BA) algorithm, the Wasserstein distance is introduced to the proximal term for the continuous case, resulting in an iterative algorithm related to the Wasserstein gradient descent. Practical implementation involves moving particles along the negative gradient direction of the objective function's first variation in the Wasserstein space and approximating integrals by the importance sampling (IS) technique. Such formulation is also applied to the rate-distortion (R-D) function for continuous source spaces and thus provides a unified computation framework for both problems.


[132] 2501.10694

Energy Efficiency Maximization for Movable Antenna-Enhanced System Based on Statistical CSI

This paper investigates an innovative movable antenna (MA)-enhanced multiple-input multiple-output (MIMO) system designed to enhance communication performance. We aim to maximize the energy efficiency (EE) under statistical channel state information (S-CSI) through a joint optimization of the transmit covariance matrix and the antenna position vectors (APVs). To solve the stochastic problem, we consider the large number of antennas scenario and resort to deterministic equivalent (DE) technology to reformulate the system EE w.r.t. the transmit variables, i.e., the transmit covariance matrix and APV, and the receive variables, i.e., the receive APV, respectively. Then, we propose an alternative optimization (AO) algorithm to update the transmit variables and the receive variables to maximize the system EE, respectively. Our numerical results reveal that, the proposed MA-enhanced system can significantly improve EE compared to several benchmark schemes and the optimal performance can be achieved with a finite size of movement regions for MAs.


[133] 2501.10705

Secure Communication in Dynamic RDARS-Driven Systems

In this letter, we investigate a dynamic reconfigurable distributed antenna and reflection surface (RDARS)-driven secure communication system, where the working mode of the RDARS can be flexibly configured. We aim to maximize the secrecy rate by jointly designing the active beamforming vectors, reflection coefficients, and the channel-aware mode selection matrix. To address the non-convex binary and cardinality constraints introduced by dynamic mode selection, we propose an efficient alternating optimization (AO) framework that employs penalty-based fractional programming (FP) and successive convex approximation (SCA) transformations. Simulation results demonstrate the potential of RDARS in enhancing the secrecy rate and show its superiority compared to existing reflection surface-based schemes.


[134] 2501.10727

In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review

Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static -- they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at this http URL


[135] 2501.10753

Pinching Antennas: Principles, Applications and Challenges

Flexible-antenna systems, such as fluid antennas and movable antennas, have been recognized as key enabling technologies for sixth-generation (6G) wireless networks, as they can intelligently reconfigure the effective channel gains of the users and hence significantly improve their data transmission capabilities. However, existing flexible-antenna systems have been designed to combat small-scale fading in non-line-of-sight (NLoS) conditions. As a result, they lack the ability to establish line-of-sight links, which are typically 100 times stronger than NLoS links. In addition, existing flexible-antenna systems have limited flexibility, where adding/removing an antenna is not straightforward. This article introduces an innovative flexible-antenna system called pinching antennas, which are realized by applying small dielectric particles to waveguides. We first describe the basics of pinching-antenna systems and their ability to provide strong LoS links by deploying pinching antennas close to the users as well as their capability to scale up/down the antenna system. We then focus on communication scenarios with different numbers of waveguides and pinching antennas, where innovative approaches to implement multiple-input multiple-output and non-orthogonal multiple access are discussed. In addition, promising 6G-related applications of pinching antennas, including integrated sensing and communication and next-generation multiple access, are presented. Finally, important directions for future research, such as waveguide deployment and channel estimation, are highlighted.


[136] 2501.10755

An Experimental Study on Joint Modeling for Sound Event Localization and Detection with Source Distance Estimation

In traditional sound event localization and detection (SELD) tasks, the focus is typically on sound event detection (SED) and direction-of-arrival (DOA) estimation, but they fall short of providing full spatial information about the sound source. The 3D SELD task addresses this limitation by integrating source distance estimation (SDE), allowing for complete spatial localization. We propose three approaches to tackle this challenge: a novel method with independent training and joint prediction, which firstly treats DOA and distance estimation as separate tasks and then combines them to solve 3D SELD; a dual-branch representation with source Cartesian coordinate used for simultaneous DOA and distance estimation; and a three-branch structure that jointly models SED, DOA, and SDE within a unified framework. Our proposed method ranked first in the DCASE 2024 Challenge Task 3, demonstrating the effectiveness of joint modeling for addressing the 3D SELD task. The relevant code for this paper will be open-sourced in the future.


[137] 2501.10791

A Novel Precoder for Peak-to-Average Power Ratio Reduction in OTFS Systems

We consider the issue of high peak-to-average-power ratio (PAPR) of Orthogonal time frequency space (OTFS) modulated signals. This paper proposes a low-complexity novel iterative PAPR reduction method which achieves a PAPR reduction of roughly 5 dB when compared to a OTFS modulated signal without any PAPR compensation. Simulations reveal that the PAPR achieved by the proposed method is significantly better than that achieved by other state-of-art methods. Simulations also reveal that the error rate performance of OTFS based systems with the proposed PAPR reduction is similar to that achieved with the other state-of-art methods.


[138] 2501.10806

Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis

Two-time-scale stochastic approximation is an iterative algorithm used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this paper, we study two-time-scale iterations, where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be considered a stochastic inexact Krasnoselskii-Mann iteration. We show that the mean square error decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We also show almost sure convergence of iterates to the set of fixed points. We show the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.


[139] 2501.10811

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation

The technology for generating music from textual descriptions has seen rapid advancements. However, evaluating text-to-music (TTM) systems remains a significant challenge, primarily due to the difficulty of balancing performance and cost with existing objective and subjective evaluation methods. In this paper, we propose an automatic assessment task for TTM models to align with human perception. To address the TTM evaluation challenges posed by the professional requirements of music evaluation and the complexity of the relationship between text and music, we collect MusicEval, the first generative music assessment dataset. This dataset contains 2,748 music clips generated by 31 advanced and widely used models in response to 384 text prompts, along with 13,740 ratings from 14 music experts. Furthermore, we design a CLAP-based assessment model built on this dataset, and our experimental results validate the feasibility of the proposed task, providing a valuable reference for future development in TTM evaluation. The dataset is available at https://www.aishelltech.com/AISHELL_7A.


[140] 2501.10854

Achievable DoF Bounds for Cache-Aided Asymmetric MIMO Communications

Integrating coded caching (CC) into multiple-input multiple-output (MIMO) communications can significantly enhance the achievable degrees of freedom (DoF) in wireless networks. This paper investigates a practical cache-aided asymmetric MIMO configuration with cache ratio $\gamma$, where a server equipped with $L$ transmit antennas communicates with $K$ users, each having $G_k$ receive antennas. We propose three content-aware MIMO-CC strategies: the \emph{min-G} scheme, which treats the system as symmetric by assuming all users have the same number of antennas, equal to the smallest among them; the \emph{Grouping} scheme, which maximizes spatial multiplexing gain separately within each user subset at the cost of some global caching gain; and the \emph{Phantom} scheme, which dynamically redistributes spatial resources using virtual or "phantom" antenna users, bridging the performance gains of the min-G and Grouping schemes. These strategies jointly optimize the number of users, $\Omega$, and the parallel streams decoded by each user, $\beta_k$, ensuring linear decodability for all target users. Analytical and numerical results confirm that the proposed schemes achieve significant DoF improvements across various system configurations, demonstrating the potential of content-aware MIMO-CC strategies for enhancing wireless network performance.


[141] 2501.10875

RIS Deployment Optimization with Iterative Detection and Decoding in Multiuser Multiple-Antenna Systems

This work investigates a Reconfigurable Intelligent Surface (RIS)-assisted uplink system employing iterative detection and decoding (IDD) techniques. We analyze the impact of tuning system parameter tuning for several deployment configurations, including the number of users, access point (AP) antennas, and RIS elements on the IDD performance. Analytical results for both active and passive RIS in a single-input single-output (SISO) scenario demonstrate how deployment choices affect system performance. Numerical simulations confirm the robustness of the RIS-assisted IDD system to variations in these parameters, showing performance gains in certain configurations. Moreover, the findings indicate that the insights derived from SISO analysis extend to multiuser MIMO IDD systems.


[142] 2501.10920

Data Enrichment Opportunities for Distribution Grid Cable Networks using Variational Autoencoders

Electricity distribution cable networks suffer from incomplete and unbalanced data, hindering the effectiveness of machine learning models for predictive maintenance and reliability evaluation. Features such as the installation date of the cables are frequently missing. To address data scarcity, this study investigates the application of Variational Autoencoders (VAEs) for data enrichment, synthetic data generation, imbalanced data handling, and outlier detection. Based on a proof-of-concept case study for Denmark, targeting the imputation of missing age information in cable network asset registers, the analysis underlines the potential of generative models to support data-driven maintenance. However, the study also highlights several areas for improvement, including enhanced feature importance analysis, incorporating network characteristics and external features, and handling biases in missing data. Future initiatives should expand the application of VAEs by incorporating semi-supervised learning, advanced sampling techniques, and additional distribution grid elements, including low-voltage networks, into the analysis.


[143] 2501.10937

Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Empathetic dialogue is crucial for natural human-computer interaction, allowing the dialogue system to respond in a more personalized and emotionally aware manner, improving user satisfaction and engagement. The emergence of large language models (LLMs) has revolutionized dialogue generation by harnessing their powerful capabilities and shown its potential in multimodal domains. Many studies have integrated speech with text-based LLMs to take speech question as input and output text response. However, the lack of spoken question-answering datasets that include speech style information to supervised fine-tuning (SFT) limits the performance of these systems. As a result, while these systems excel at understanding speech content, they often struggle to generate empathetic responses. In response, we propose a novel approach that circumvents the need for question-answering data, called Listen, Perceive, and Express (LPE). Our method employs a two-stage training process, initially guiding the LLM to listen the content and perceive the emotional aspects of speech. Subsequently, we utilize Chain-of-Thought (CoT) prompting to unlock the model's potential for expressing empathetic responses based on listened spoken content and perceived emotional cues. We employ experiments to prove the effectiveness of proposed method. To our knowledge, this is the first attempt to leverage CoT for speech-based dialogue.


[144] 2501.10974

Sequential Change Detection for Learning in Piecewise Stationary Bandit Environments

A finite-horizon variant of the quickest change detection problem is investigated, which is motivated by a change detection problem that arises in piecewise stationary bandits. The goal is to minimize the \emph{latency}, which is smallest threshold such that the probability that the detection delay exceeds the threshold is below a desired low level, while controlling the false alarm probability to a desired low level. When the pre- and post-change distributions are unknown, two tests are proposed as candidate solutions. These tests are shown to attain order optimality in terms of the horizon. Furthermore, the growth in their latencies with respect to the false alarm probability and late detection probability satisfies a property that is desirable in regret analysis for piecewise stationary bandits. Numerical results are provided to validate the theoretical performance results.


[145] 2501.11065

Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets

In this research, we advanced a spoken language recognition system, moving beyond traditional feature vector-based models. Our improvements focused on effectively capturing language characteristics over extended periods using a specialized pooling layer. We utilized a broad dataset range from Common-Voice, targeting ten languages across Indo-European, Semitic, and East Asian families. The major innovation involved optimizing the architecture of Time Delay Neural Networks. We introduced additional layers and restructured these networks into a funnel shape, enhancing their ability to process complex linguistic patterns. A rigorous grid search determined the optimal settings for these networks, significantly boosting their efficiency in language pattern recognition from audio samples. The model underwent extensive training, including a phase with augmented data, to refine its capabilities. The culmination of these efforts is a highly accurate system, achieving a 97\% accuracy rate in language recognition. This advancement represents a notable contribution to artificial intelligence, specifically in improving the accuracy and efficiency of language processing systems, a critical aspect in the engineering of advanced speech recognition technologies.


[146] 2501.11079

Federated Deep Reinforcement Learning for Energy Efficient Multi-Functional RIS-Assisted Low-Earth Orbit Networks

In this paper, a novel network architecture that deploys the multi-functional reconfigurable intelligent surface (MF-RIS) in low-Earth orbit (LEO) is proposed. Unlike traditional RIS with only signal reflection capability, the MF-RIS can reflect, refract, and amplify signals, as well as harvest energy from wireless signals. Given the high energy demands in shadow regions where solar energy is unavailable, MF-RIS is deployed in LEO to enhance signal coverage and improve energy efficiency (EE). To address this, we formulate a long-term EE optimization problem by determining the optimal parameters for MF-RIS configurations, including amplification and phase-shifts, energy harvesting ratios, and LEO transmit beamforming. To address the complex non-convex and non-linear problem, a federated learning enhanced multi-agent deep deterministic policy gradient (FEMAD) scheme is designed. Multi-agent DDPG of each agent can provide the optimal action policy from its interaction to environments, whereas federated learning enables the hidden information exchange among multi-agents. In numerical results, we can observe significant EE improvements compared to the other benchmarks, including centralized deep reinforcement learning as well as distributed multi-agent deep deterministic policy gradient (DDPG). Additionally, the proposed LEO-MF-RIS architecture has demonstrated its effectiveness, achieving the highest EE performance compared to the scenarios of fixed/no energy harvesting in MF-RIS, traditional reflection-only RIS, and deployment without RISs/MF-RISs.


[147] 2501.11132

Advanced technology in railway track monitoring using the GPR Technique: A Review

Subsurface evaluation of railway tracks is crucial for safe operation, as it allows for the early detection and remediation of potential structural weaknesses or defects that could lead to accidents or derailments. Ground Penetrating Radar (GPR) is an electromagnetic survey technique as advanced non-destructive technology (NDT) that can be used to monitor railway tracks. This technology is well-suited for railway applications due to the sub-layered composition of the track, which includes ties, ballast, sub-ballast, and subgrade regions. It can detect defects such as ballast pockets, fouled ballast, poor drainage, and subgrade settlement. The paper reviews recent works on advanced technology and interpretations of GPR data collected for different layers. Further, this paper demonstrates the current techniques for using synthetic modeling to calibrate real-world GPR data, enhancing accuracy in identifying subsurface features like ballast conditions and structural anomalies and applying various algorithms to refine GPR data analysis. These include Support Vector Machine (SVM) for classifying railway ballast types, Fuzzy C-means, and Generalized Regression Neural Networks for high-accuracy defect classification. Deep learning techniques, particularly Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are also highlighted for their effectiveness in recognizing patterns associated with defects in GPR images. The article specifically focuses on the development of a Convolutional Recurrent Neural Network (CRNN) model, which combines CNN and RNN architectures for efficient processing of GPR data. This model demonstrates enhanced detection capabilities and faster processing compared to traditional object detection models like Faster R-CNN.


[148] 2501.11136

A Novel Switch-Type Policy Network for Resource Allocation Problems: Technical Report

Deep Reinforcement Learning (DRL) has become a powerful tool for developing control policies in queueing networks, but the common use of Multi-layer Perceptron (MLP) neural networks in these applications has significant drawbacks. MLP architectures, while versatile, often suffer from poor sample efficiency and a tendency to overfit training environments, leading to suboptimal performance on new, unseen networks. In response to these issues, we introduce a switch-type neural network (STN) architecture designed to improve the efficiency and generalization of DRL policies in queueing networks. The STN leverages structural patterns from traditional non-learning policies, ensuring consistent action choices across similar states. This design not only streamlines the learning process but also fosters better generalization by reducing the tendency to overfit. Our works presents three key contributions: first, the development of the STN as a more effective alternative to MLPs; second, empirical evidence showing that STNs achieve superior sample efficiency in various training scenarios; and third, experimental results demonstrating that STNs match MLP performance in familiar environments and significantly outperform them in new settings. By embedding domain-specific knowledge, the STN enhances the Proximal Policy Optimization (PPO) algorithm's effectiveness without compromising performance, suggesting its suitability for a wide range of queueing network control problems.


[149] 2501.11151

Water Flow Detection Device Based on Sound Data Analysis and Machine Learning to Detect Water Leakage

In this paper, we introduce a novel mechanism that uses machine learning techniques to detect water leaks in pipes. The proposed simple and low-cost mechanism is designed that can be easily installed on building pipes with various sizes. The system works based on gathering and amplifying water flow signals using a mechanical sound amplifier. Then sounds are recorded and converted to digital signals in order to be analyzed. After feature extraction and selection, deep neural networks are used to discriminate between with and without leak pipes. The experimental results show that this device can detect at least 100 milliliters per minute (mL/min) of water flow in a pipe so that it can be used as a core of a water leakage detection system.


[150] 2501.11159

LiFT: Lightweight, FPGA-tailored 3D object detection based on LiDAR data

This paper presents LiFT, a lightweight, fully quantized 3D object detection algorithm for LiDAR data, optimized for real-time inference on FPGA platforms. Through an in-depth analysis of FPGA-specific limitations, we identify a set of FPGA-induced constraints that shape the algorithm's design. These include a computational complexity limit of 30 GMACs (billion multiply-accumulate operations), INT8 quantization for weights and activations, 2D cell-based processing instead of 3D voxels, and minimal use of skip connections. To meet these constraints while maximizing performance, LiFT combines novel mechanisms with state-of-the-art techniques such as reparameterizable convolutions and fully sparse architecture. Key innovations include the Dual-bound Pillar Feature Net, which boosts performance without increasing complexity, and an efficient scheme for INT8 quantization of input features. With a computational cost of just 20.73 GMACs, LiFT stands out as one of the few algorithms targeting minimal-complexity 3D object detection. Among comparable methods, LiFT ranks first, achieving an mAP of 51.84% and an NDS of 61.01% on the challenging NuScenes validation dataset. The code will be available at https://github.com/vision-agh/lift.


[151] 2501.11168

DeepEyeNet: Adaptive Genetic Bayesian Algorithm Based Hybrid ConvNeXtTiny Framework For Multi-Feature Glaucoma Eye Diagnosis

Glaucoma is a leading cause of irreversible blindness worldwide, emphasizing the critical need for early detection and intervention. In this paper, we present DeepEyeNet, a novel and comprehensive framework for automated glaucoma detection using retinal fundus images. Our approach integrates advanced image standardization through dynamic thresholding, precise optic disc and cup segmentation via a U-Net model, and comprehensive feature extraction encompassing anatomical and texture-based features. We employ a customized ConvNeXtTiny based Convolutional Neural Network (CNN) classifier, optimized using our Adaptive Genetic Bayesian Optimization (AGBO) algorithm. This proposed AGBO algorithm balances exploration and exploitation in hyperparameter tuning, leading to significant performance improvements. Experimental results on the EyePACS-AIROGS-light-V2 dataset demonstrate that DeepEyeNet achieves a high classification accuracy of 95.84%, which was possible due to the effective optimization provided by the novel AGBO algorithm, outperforming existing methods. The integration of sophisticated image processing techniques, deep learning, and optimized hyperparameter tuning through our proposed AGBO algorithm positions DeepEyeNet as a promising tool for early glaucoma detection in clinical settings.


[152] 2501.11190

Reinforcement Learning Based Goodput Maximization with Quantized Feedback in URLLC

This paper presents a comprehensive system model for goodput maximization with quantized feedback in Ultra-Reliable Low-Latency Communication (URLLC), focusing on dynamic channel conditions and feedback schemes. The study investigates a communication system, where the receiver provides quantized channel state information to the transmitter. The system adapts its feedback scheme based on reinforcement learning, aiming to maximize goodput while accommodating varying channel statistics. We introduce a novel Rician-$K$ factor estimation technique to enable the communication system to optimize the feedback scheme. This dynamic approach increases the overall performance, making it well-suited for practical URLLC applications where channel statistics vary over time.


[153] 2501.11225

CNN-based TEM image denoising from first principles

Transmission electron microscope (TEM) images are often corrupted by noise, hindering their interpretation. To address this issue, we propose a deep learning-based approach using simulated images. Using density functional theory calculations with a set of pseudo-atomic orbital basis sets, we generate highly accurate ground truth images. We introduce four types of noise into these simulations to create realistic training datasets. Each type of noise is then used to train a separate convolutional neural network (CNN) model. Our results show that these CNNs are effective in reducing noise, even when applied to images with different noise levels than those used during training. However, we observe limitations in some cases, particularly in preserving the integrity of circular shapes and avoiding visible artifacts between image patches. To overcome these challenges, we propose alternative training strategies and future research directions. This study provides a valuable framework for training deep learning models for TEM image denoising.


[154] 2501.11229

Successive Interference Cancellation-aided Diffusion Models for Joint Channel Estimation and Data Detection in Low Rank Channel Scenarios

This paper proposes a novel joint channel-estimation and source-detection algorithm using successive interference cancellation (SIC)-aided generative score-based diffusion models. Prior work in this area focuses on massive MIMO scenarios, which are typically characterized by full-rank channels, and fail in low-rank channel scenarios. The proposed algorithm outperforms existing methods in joint source-channel estimation, especially in low-rank scenarios where the number of users exceeds the number of antennas at the access point (AP). The proposed score-based iterative diffusion process estimates the gradient of the prior distribution on partial channels, and recursively updates the estimated channel parts as well as the source. Extensive simulation results show that the proposed method outperforms the baseline methods in terms of normalized mean squared error (NMSE) and symbol error rate (SER) in both full-rank and low-rank channel scenarios, while having a more dominant effect in the latter, at various signal-to-noise ratios (SNR).


[155] 2501.11255

Bounding the Settling Time of Finite-Time Stable Systems using Sum of Squares

Finite-time stability (FTS) of a differential equation guarantees that solutions reach a given equilibrium point in finite time, where the time of convergence depends on the initial state of the system. For traditional stability notions such as exponential stability, the convex optimization framework of Sum-of-Squares (SoS) enables the computation of polynomial Lyapunov functions to certify stability. However, finite-time stable systems are characterized by non-Lipschitz, non-polynomial vector fields, rendering standard SoS methods inapplicable. To this end, in this paper, we show that the computation of a non-polynomial Lyapunov function certifying finite-time stability can be reformulated as computation of a polynomial one under a particular transformation that we develop in this work. As a result, SoS can be utilized to compute a Lyapunov function for FTS. This Lyapunov function can then be used to obtain a bound on the settling time. We first present this approach for the scalar case and then extend it to the multivariate case. Numerical examples demonstrate the effectiveness of our approach in both certifying finite-time stability and computing accurate settling time bounds. This work represents the first combination of SoS programming with settling time bounds for finite-time stable systems.


[156] 2501.11258

Enhancing Uncertainty Estimation in Semantic Segmentation via Monte-Carlo Frequency Dropout

Monte-Carlo (MC) Dropout provides a practical solution for estimating predictive distributions in deterministic neural networks. Traditional dropout, applied within the signal space, may fail to account for frequency-related noise common in medical imaging, leading to biased predictive estimates. A novel approach extends Dropout to the frequency domain, allowing stochastic attenuation of signal frequencies during inference. This creates diverse global textural variations in feature maps while preserving structural integrity -- a factor we hypothesize and empirically show is contributing to accurately estimating uncertainties in semantic segmentation. We evaluated traditional MC-Dropout and the MC-frequency Dropout in three segmentation tasks involving different imaging modalities: (i) prostate zones in biparametric MRI, (ii) liver tumors in contrast-enhanced CT, and (iii) lungs in chest X-ray scans. Our results show that MC-Frequency Dropout improves calibration, convergence, and semantic uncertainty, thereby improving prediction scrutiny, boundary delineation, and has the potential to enhance medical decision-making.


[157] 2501.11263

Towards Loss-Resilient Image Coding for Unstable Satellite Networks

Geostationary Earth Orbit (GEO) satellite communication demonstrates significant advantages in emergency short burst data services. However, unstable satellite networks, particularly those with frequent packet loss, present a severe challenge to accurate image transmission. To address it, we propose a loss-resilient image coding approach that leverages end-to-end optimization in learned image compression (LIC). Our method builds on the channel-wise progressive coding framework, incorporating Spatial-Channel Rearrangement (SCR) on the encoder side and Mask Conditional Aggregation (MCA) on the decoder side to improve reconstruction quality with unpredictable errors. By integrating the Gilbert-Elliot model into the training process, we enhance the model's ability to generalize in real-world network conditions. Extensive evaluations show that our approach outperforms traditional and deep learning-based methods in terms of compression performance and stability under diverse packet loss, offering robust and efficient progressive transmission even in challenging environments. Code is available at https://github.com/NJUVISION/LossResilientLIC.


[158] 2501.11311

A2SB: Audio-to-Audio Schrodinger Bridges

Audio in the real world may be perturbed due to numerous factors, causing the audio quality to be degraded. The following work presents an audio restoration model tailored for high-res music at 44.1kHz. Our model, Audio-to-Audio Schrodinger Bridges (A2SB), is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, A2SB is end-to-end without need of a vocoder to predict waveform outputs, able to restore hour-long audio inputs, and trained on permissively licensed music data. A2SB is capable of achieving state-of-the-art bandwidth extension and inpainting quality on several out-of-distribution music test sets. Our demo website is https: //research.nvidia.com/labs/adlr/A2SB/.


[159] 2501.11323

Physics-Informed Machine Learning for Efficient Reconfigurable Intelligent Surface Design

Reconfigurable intelligent surface (RIS) is a two-dimensional periodic structure integrated with a large number of reflective elements, which can manipulate electromagnetic waves in a digital way, offering great potentials for wireless communication and radar detection applications. However, conventional RIS designs highly rely on extensive full-wave EM simulations that are extremely time-consuming. To address this challenge, we propose a machine-learning-assisted approach for efficient RIS design. An accurate and fast model to predict the reflection coefficient of RIS element is developed by combining a multi-layer perceptron neural network (MLP) and a dual-port network, which can significantly reduce tedious EM simulations in the network training. A RIS has been practically designed based on the proposed method. To verify the proposed method, the RIS has also been fabricated and measured. The experimental results are in good agreement with the simulation results, which validates the efficacy of the proposed method in RIS design.


[160] 2501.11351

Automatic Labelling & Semantic Segmentation with 4D Radar Tensors

In this paper, an automatic labelling process is presented for automotive datasets, leveraging on complementary information from LiDAR and camera. The generated labels are then used as ground truth with the corresponding 4D radar data as inputs to a proposed semantic segmentation network, to associate a class label to each spatial voxel. Promising results are shown by applying both approaches to the publicly shared RaDelft dataset, with the proposed network achieving over 65% of the LiDAR detection performance, improving 13.2% in vehicle detection probability, and reducing 0.54 m in terms of Chamfer distance, compared to variants inspired from the literature.


[161] 2501.11378

Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio

Hallucinations of deep neural models are amongst key challenges in automatic speech recognition (ASR). In this paper, we investigate hallucinations of the Whisper ASR model induced by non-speech audio segments present during inference. By inducting hallucinations with various types of sounds, we show that there exists a set of hallucinations that appear frequently. We then study hallucinations caused by the augmentation of speech with such sounds. Finally, we describe the creation of a bag of hallucinations (BoH) that allows to remove the effect of hallucinations through the post-processing of text transcriptions. The results of our experiments show that such post-processing is capable of reducing word error rate (WER) and acts as a good safeguard against problematic hallucinations.


[162] 2501.11409

Unsupervised Learning in Echo State Networks for Input Reconstruction

Conventional echo state networks (ESNs) require supervised learning to train the readout layer, using the desired outputs as training data. In this study, we focus on input reconstruction (IR), which refers to training the readout layer to reproduce the input time series in its output. We reformulate the learning algorithm of the ESN readout layer to perform IR using unsupervised learning (UL). By conducting theoretical analysis and numerical experiments, we demonstrate that IR in ESNs can be effectively implemented under realistic conditions without explicitly using the desired outputs as training data; in this way, UL is enabled. Furthermore, we demonstrate that applications relying on IR, such as dynamical system replication and noise filtering, can be reformulated within the UL framework. Our findings establish a theoretically sound and universally applicable IR formulation, along with its related tasks in ESNs. This work paves the way for novel predictions and highlights unresolved theoretical challenges in ESNs, particularly in the context of time-series processing methods and computational models of the brain.


[163] 2501.11410

Orbit-Aware Split Learning: Optimizing LEO Satellite Networks for Distributed Online Learning

This paper proposes a novel split learning architecture designed to exploit the cyclical movement of Low Earth Orbit (LEO) satellites in non-terrestrial networks (NTNs). Although existing research focuses on offloading tasks to the NTN infrastructure, these approaches overlook the dynamic movement patterns of LEO satellites that can be used to efficiently distribute the learning task. In this work, we analyze how LEO satellites, from the perspective of ground terminals, can participate in a time-window-based model training. By splitting the model between a LEO and a ground terminal, the computational burden on the satellite segment is reduced, while each LEO satellite offloads the partially trained model to the next satellite in the constellation. This cyclical training process allows larger and more energy-intensive models to be deployed and trained across multiple LEO satellites, despite their limited energy resources. We formulate an optimization problem that manages radio and processing resources, ensuring the entire data is processed during each satellite pass while minimizing the energy consumption. Our results demonstrate that this approach offers a more scalable and energy-efficient way to train complex models, enhancing the capabilities of LEO satellite constellations in the context of Artificial Intelligence-driven applications.


[164] 2501.11462

On the Adversarial Vulnerabilities of Transfer Learning in Remote Sensing

The use of pretrained models from general computer vision tasks is widespread in remote sensing, significantly reducing training costs and improving performance. However, this practice also introduces vulnerabilities to downstream tasks, where publicly available pretrained models can be used as a proxy to compromise downstream models. This paper presents a novel Adversarial Neuron Manipulation method, which generates transferable perturbations by selectively manipulating single or multiple neurons in pretrained models. Unlike existing attacks, this method eliminates the need for domain-specific information, making it more broadly applicable and efficient. By targeting multiple fragile neurons, the perturbations achieve superior attack performance, revealing critical vulnerabilities in deep learning models. Experiments on diverse models and remote sensing datasets validate the effectiveness of the proposed method. This low-access adversarial neuron manipulation technique highlights a significant security risk in transfer learning models, emphasizing the urgent need for more robust defenses in their design when addressing the safety-critical remote sensing tasks.


[165] 2501.11467

Fixed Point Certificates for Reachability and Expected Rewards in MDPs

The possibility of errors in human-engineered formal verification software, such as model checkers, poses a serious threat to the purpose of these tools. An established approach to mitigate this problem are certificates -- lightweight, easy-to-check proofs of the verification results. In this paper, we develop novel certificates for model checking of Markov decision processes (MDPs) with quantitative reachability and expected reward properties. Our approach is conceptually simple and relies almost exclusively on elementary fixed point theory. Our certificates work for arbitrary finite MDPs and can be readily computed with little overhead using standard algorithms. We formalize the soundness of our certificates in Isabelle/HOL and provide a formally verified certificate checker. Moreover, we augment existing algorithms in the probabilistic model checker Storm with the ability to produce certificates and demonstrate practical applicability by conducting the first formal certification of the reference results in the Quantitative Verification Benchmark Set.


[166] 2501.11553

Clinically Ready Magnetic Microrobots for Targeted Therapies

Systemic drug administration often causes off-target effects limiting the efficacy of advanced therapies. Targeted drug delivery approaches increase local drug concentrations at the diseased site while minimizing systemic drug exposure. We present a magnetically guided microrobotic drug delivery system capable of precise navigation under physiological conditions. This platform integrates a clinical electromagnetic navigation system, a custom-designed release catheter, and a dissolvable capsule for accurate therapeutic delivery. In vitro tests showed precise navigation in human vasculature models, and in vivo experiments confirmed tracking under fluoroscopy and successful navigation in large animal models. The microrobot balances magnetic material concentration, contrast agent loading, and therapeutic drug capacity, enabling effective hosting of therapeutics despite the integration complexity of its components, offering a promising solution for precise targeted drug delivery.


[167] 2501.11570

Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition

Any data annotation for subjective tasks shows potential variations between individuals. This is particularly true for annotations of emotional responses to musical stimuli. While older approaches to music emotion recognition systems frequently addressed this uncertainty problem through probabilistic modeling, modern systems based on neural networks tend to ignore the variability and focus only on predicting central tendencies of human subjective responses. In this work, we explore several methods for estimating not only the central tendencies of the subjective responses to a musical stimulus, but also for estimating the uncertainty associated with these responses. In particular, we investigate probabilistic loss functions and inference-time random sampling. Experimental results indicate that while the modeling of the central tendencies is achievable, modeling of the uncertainty in subjective responses proves significantly more challenging with currently available approaches even when empirical estimates of variations in the responses are available.


[168] 2501.11586

Compressibility Analysis for the differentiable shift-variant Filtered Backprojection Model

The differentiable shift-variant filtered backprojection (FBP) model enables the reconstruction of cone-beam computed tomography (CBCT) data for any non-circular trajectories. This method employs deep learning technique to estimate the redundancy weights required for reconstruction, given knowledge of the specific trajectory at optimization time. However, computing the redundancy weight for each projection remains computationally intensive. This paper presents a novel approach to compress and optimize the differentiable shift-variant FBP model based on Principal Component Analysis (PCA). We apply PCA to the redundancy weights learned from sinusoidal trajectory projection data, revealing significant parameter redundancy in the original model. By integrating PCA directly into the differentiable shift-variant FBP reconstruction pipeline, we develop a method that decomposes the redundancy weight layer parameters into a trainable eigenvector matrix, compressed weights, and a mean vector. This innovative technique achieves a remarkable 97.25% reduction in trainable parameters without compromising reconstruction accuracy. As a result, our algorithm significantly decreases the complexity of the differentiable shift-variant FBP model and greatly improves training speed. These improvements make the model substantially more practical for real-world applications.


[169] 2501.11631

Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection

Keyword spotting is often implemented by keyword classifier to the encoder in acoustic models, enabling the classification of predefined or open vocabulary keywords. Although keyword spotting is a crucial task in various applications and can be extended to call-for-help detection in emergencies, however, the previous method often suffers from scalability limitations due to retraining required to introduce new keywords or adapt to changing contexts. We explore a simple yet effective approach that leverages off-the-shelf pretrained ASR models to address these challenges, especially in call-for-help detection scenarios. Furthermore, we observed a substantial increase in false alarms when deploying call-for-help detection system in real-world scenarios due to noise introduced by microphones or different environments. To address this, we propose a novel noise-agnostic multitask learning approach that integrates a noise classification head into the ASR encoder. Our method enhances the model's robustness to noisy environments, leading to a significant reduction in false alarms and improved overall call-for-help performance. Despite the added complexity of multitask learning, our approach is computationally efficient and provides a promising solution for call-for-help detection in real-world scenarios.


[170] 2501.11842

Harnessing Rydberg Atomic Receivers: From Quantum Physics to Wireless Communications

The intrinsic integration of Rydberg atomic receivers into wireless communication systems is proposed, by harnessing the principles of quantum physics in wireless communications. More particularly, we conceive a pair of Rydberg atomic receivers, one incorporates a local oscillator (LO), referred to as an LO-dressed receiver, while the other operates without an LO and is termed an LO-free receiver. The appropriate wireless model is developed for each configuration, elaborating on the receiver's responses to the radio frequency (RF) signal, on the potential noise sources, and on the system performance. Next, we investigate the association distortion effects that might occur, specifically demonstrating the boundaries of linear dynamic regions, which provides critical insights into its practical implementations in wireless systems. Extensive simulation results are provided for characterizing the performance of wireless systems, harnessing this pair of Rydberg atomic receivers. Our results demonstrate that they deliver complementary benefits: LO-free systems excel in proximity operations, while LO-dressed systems are eminently suitable for long-distance sensing at extremely low power levels. More specifically, LO-dressed systems achieve a significant signal-to-noise ratio (SNR) gain of approximately 44 dB over conventional RF receivers, exhibiting an effective coverage range extension over conventional RF receivers by a factor of 150. Furthermore, LO-dressed systems support higher-order quadrature amplitude modulation (QAM) at reduced symbol error rates (SER) compared to conventional RF receivers, hence significantly enhancing wireless communication performance.


[171] 2501.11902

Transferable Adversarial Attacks on Audio Deepfake Detection

Audio deepfakes pose significant threats, including impersonation, fraud, and reputation damage. To address these risks, audio deepfake detection (ADD) techniques have been developed, demonstrating success on benchmarks like ASVspoof2019. However, their resilience against transferable adversarial attacks remains largely unexplored. In this paper, we introduce a transferable GAN-based adversarial attack framework to evaluate the effectiveness of state-of-the-art (SOTA) ADD systems. By leveraging an ensemble of surrogate ADD models and a discriminator, the proposed approach generates transferable adversarial attacks that better reflect real-world scenarios. Unlike previous methods, the proposed framework incorporates a self-supervised audio model to ensure transcription and perceptual integrity, resulting in high-quality adversarial attacks. Experimental results on benchmark dataset reveal that SOTA ADD systems exhibit significant vulnerabilities, with accuracies dropping from 98% to 26%, 92% to 54%, and 94% to 84% in white-box, gray-box, and black-box scenarios, respectively. When tested in other data sets, performance drops of 91% to 46%, and 94% to 67% were observed against the In-the-Wild and WaveFake data sets, respectively. These results highlight the significant vulnerabilities of existing ADD systems and emphasize the need to enhance their robustness against advanced adversarial threats to ensure security and reliability.


[172] 2501.11903

Finding the nearest bounded-real port-Hamiltonian system

In this paper, we consider linear time-invariant continuous control systems which are bounded real, also known as scattering passive. Our main theoretical contribution is to show the equivalence between such systems and port-Hamiltonian (PH) systems whose factors satisfy certain linear matrix inequalities. Based on this result, we propose a formulation for the problem of finding the nearest bounded-real system to a given system, and design an algorithm combining alternating optimization and Nesterov's fast gradient method. This formulation also allows us to check whether a given system is bounded real by solving a semidefinite program, and provide a PH parametrization for it. We illustrate our proposed algorithms on real and synthetic data sets.


[173] 2501.11905

Phase Transitions in Phase-Only Compressed Sensing

The goal of phase-only compressed sensing is to recover a structured signal $\mathbf{x}$ from the phases $\mathbf{z} = {\rm sign}(\mathbf{\Phi}\mathbf{x})$ under some complex-valued sensing matrix $\mathbf{\Phi}$. Exact reconstruction of the signal's direction is possible: we can reformulate it as a linear compressed sensing problem and use basis pursuit (i.e., constrained norm minimization). For $\mathbf{\Phi}$ with i.i.d. complex-valued Gaussian entries, this paper shows that the phase transition is approximately located at the statistical dimension of the descent cone of a signal-dependent norm. Leveraging this insight, we derive asymptotically precise formulas for the phase transition locations in phase-only sensing of both sparse signals and low-rank matrices. Our results prove that the minimum number of measurements required for exact recovery is smaller for phase-only measurements than for traditional linear compressed sensing. For instance, in recovering a 1-sparse signal with sufficiently large dimension, phase-only compressed sensing requires approximately 68% of the measurements needed for linear compressed sensing. This result disproves earlier conjecture suggesting that the two phase transitions coincide. Our proof hinges on the Gaussian min-max theorem and the key observation that, up to a signal-dependent orthogonal transformation, the sensing matrix in the reformulated problem behaves as a nearly Gaussian matrix.


[174] 2501.11915

Stabilizing Optimal Control for Nonlinear Stochastic Systems: A Parametric Gradient-Based Approach

This study proposes a method for designing stabilizing suboptimal controllers for nonlinear stochastic systems. These systems include time-invariant stochastic parameters that represent uncertainty of dynamics, posing two key difficulties in optimal control. Firstly, the time-invariant stochastic nature violates the principle of optimality and Hamilton-Jacobi equations, which are fundamental tools for solving optimal control problems. Secondly, nonlinear systems must be robustly stabilized against these stochastic parameters. To overcome these difficulties simultaneously, this study presents a parametric-gradient-based method with a penalty function. A controller and cost function are parameterized using basis functions, and a gradient method is employed to optimize the controller by minimizing the parameterized cost function. Crucial challenges in this approach are parameterizing the cost function appropriately and deriving the gradient of the cost. This study provides explicit formulations of an optimally parameterized cost and its gradient. Furthermore, a suitable penalty function is proposed to ensure robust stability, even when using the gradient method. Consequently, the gradient method produces a suboptimal feedback controller that guarantees the robust stability. The effectiveness of the proposed method is demonstrated through numerical simulations, highlighting its performance in comparison with other baseline methods.


[175] 2501.11921

Goal-oriented Transmission Scheduling: Structure-guided DRL with a Unified Dual On-policy and Off-policy Approach

Goal-oriented communications prioritize application-driven objectives over data accuracy, enabling intelligent next-generation wireless systems. Efficient scheduling in multi-device, multi-channel systems poses significant challenges due to high-dimensional state and action spaces. We address these challenges by deriving key structural properties of the optimal solution to the goal-oriented scheduling problem, incorporating Age of Information (AoI) and channel states. Specifically, we establish the monotonicity of the optimal state value function (a measure of long-term system performance) w.r.t. channel states and prove its asymptotic convexity w.r.t. AoI states. Additionally, we derive the monotonicity of the optimal policy w.r.t. channel states, advancing the theoretical framework for optimal scheduling. Leveraging these insights, we propose the structure-guided unified dual on-off policy DRL (SUDO-DRL), a hybrid algorithm that combines the stability of on-policy training with the sample efficiency of off-policy methods. Through a novel structural property evaluation framework, SUDO-DRL enables effective and scalable training, addressing the complexities of large-scale systems. Numerical results show SUDO-DRL improves system performance by up to 45% and reduces convergence time by 40% compared to state-of-the-art methods. It also effectively handles scheduling in much larger systems, where off-policy DRL fails and on-policy benchmarks exhibit significant performance loss, demonstrating its scalability and efficacy in goal-oriented communications.


[176] 2501.11926

Multi-Modal Variable-Rate CSI Reconstruction for FDD Massive MIMO Systems

In frequency division duplex (FDD) systems, acquiring channel state information (CSI) at the base station (BS) traditionally relies on limited feedback from mobile terminals (MTs). However, the accuracy of channel reconstruction from feedback CSI is inherently constrained by the rate-distortion trade-off. To overcome this limitation, we propose a multi-modal channel reconstruction framework that leverages auxiliary data, such as RGB images or uplink CSI, collected at the BS. By integrating contextual information from these modalities, the framework mitigates CSI distortions caused by noise, compression, and quantization. At its core, the framework utilizes an autoencoder network capable of generating variable-length CSI, tailored for rate-adaptive multi-modal channel reconstruction. By augmenting the foundational autoencoder network using a transfer learning-based multi-modal fusion strategy, we enable accurate channel reconstruction in both single-modal and multi-modal scenarios. To train and evaluate the network under diverse and realistic wireless conditions, we construct a synthetic dataset that pairs wireless channel data with sensor data through 3D modeling and ray tracing. Simulation results demonstrate that the proposed framework achieves near-optimal beamforming gains in 5G New Radio (5G NR)-compliant scenarios, highlighting the potential of sensor data integration to improve CSI reconstruction accuracy.


[177] 2501.11938

Navigating Robot Swarm Through a Virtual Tube with Flow-Adaptive Distribution Control

With the rapid development of robot swarm technology and its diverse applications, navigating robot swarms through complex environments has emerged as a critical research direction. To ensure safe navigation and avoid potential collisions with obstacles, the concept of virtual tubes has been introduced to define safe and navigable regions. However, current control methods in virtual tubes face the congestion issues, particularly in narrow virtual tubes with low throughput. To address these challenges, we first originally introduce the concepts of virtual tube area and flow capacity, and develop an new evolution model for the spatial density function. Next, we propose a novel control method that combines a modified artificial potential field (APF) for swarm navigation and density feedback control for distribution regulation, under which a saturated velocity command is designed. Then, we generate a global velocity field that not only ensures collision-free navigation through the virtual tube, but also achieves locally input-to-state stability (LISS) for density tracking errors, both of which are rigorously proven. Finally, numerical simulations and realistic applications validate the effectiveness and advantages of the proposed method in managing robot swarms within narrow virtual tubes.


[178] 2501.12023

Comparative Analysis of Pre-trained Deep Learning Models and DINOv2 for Cushing's Syndrome Diagnosis in Facial Analysis

Cushing's syndrome is a condition caused by excessive glucocorticoid secretion from the adrenal cortex, often manifesting with moon facies and plethora, making facial data crucial for diagnosis. Previous studies have used pre-trained convolutional neural networks (CNNs) for diagnosing Cushing's syndrome using frontal facial images. However, CNNs are better at capturing local features, while Cushing's syndrome often presents with global facial features. Transformer-based models like ViT and SWIN, which utilize self-attention mechanisms, can better capture long-range dependencies and global features. Recently, DINOv2, a foundation model based on visual Transformers, has gained interest. This study compares the performance of various pre-trained models, including CNNs, Transformer-based models, and DINOv2, in diagnosing Cushing's syndrome. We also analyze gender bias and the impact of freezing mechanisms on DINOv2. Our results show that Transformer-based models and DINOv2 outperformed CNNs, with ViT achieving the highest F1 score of 85.74%. Both the pre-trained model and DINOv2 had higher accuracy for female samples. DINOv2 also showed improved performance when freezing parameters. In conclusion, Transformer-based models and DINOv2 are effective for Cushing's syndrome classification.


[179] 2501.12043

High-Fidelity Coherent-One-Way QKD Simulation Framework for 6G Networks: Bridging Theory and Reality

Quantum key distribution (QKD) has been emerged as a promising solution for guaranteeing information-theoretic security. Inspired by this, a great amount of research effort has been recently put on designing and testing QKD systems as well as articulating preliminary application scenarios. However, due to the considerable high-cost of QKD equipment, a lack of QKD communication system design tools, wide deployment of such systems and networks is challenging. Motivated by this, this paper introduces a QKD communication system design tool. First we articulate key operation elements of the QKD, and explain the feasibility and applicability of coherent-one-way (COW) QKD solutions. Next, we focus on documenting the corresponding simulation framework as well as defining the key performance metrics, i.e., quantum bit error rate (QBER), and secrecy key rate. To verify the accuracy of the simulation framework, we design and deploy a real-world QKD setup. We perform extensive experiments for three deployments of diverse transmission distance in the presence or absence of a QKD eavesdropper. The results reveal an acceptable match between simulations and experiments rendering the simulation framework a suitable tool for QKD communication system design.


[180] 2501.12050

Parameterised Quantum Circuits for Novel Representation Learning in Speech Emotion Recognition

Speech Emotion Recognition (SER) is a complex and challenging task in human-computer interaction due to the intricate dependencies of features and the overlapping nature of emotional expressions conveyed through speech. Although traditional deep learning methods have shown effectiveness, they often struggle to capture subtle emotional variations and overlapping states. This paper introduces a hybrid classical-quantum framework that integrates Parameterised Quantum Circuits (PQCs) with conventional Convolutional Neural Network (CNN) architectures. By leveraging quantum properties such as superposition and entanglement, the proposed model enhances feature representation and captures complex dependencies more effectively than classical methods. Experimental evaluations conducted on benchmark datasets, including IEMOCAP, RECOLA, and MSP-Improv, demonstrate that the hybrid model achieves higher accuracy in both binary and multi-class emotion classification while significantly reducing the number of trainable parameters. While a few existing studies have explored the feasibility of using Quantum Circuits to reduce model complexity, none have successfully shown how they can enhance accuracy. This study is the first to demonstrate that Quantum Circuits has the potential to improve the accuracy of SER. The findings highlight the promise of QML to transform SER, suggesting a promising direction for future research and practical applications in emotion-aware systems.


[181] 2501.12082

A Multi-annotated and Multi-modal Dataset for Wide-angle Video Quality Assessment

Wide-angle video is favored for its wide viewing angle and ability to capture a large area of scenery, making it an ideal choice for sports and adventure recording. However, wide-angle video is prone to deformation, exposure and other distortions, resulting in poor video quality and affecting the perception and experience, which may seriously hinder its application in fields such as competitive sports. Up to now, few explorations focus on the quality assessment issue of wide-angle video. This deficiency primarily stems from the absence of a specialized dataset for wide-angle videos. To bridge this gap, we construct the first Multi-annotated and multi-modal Wide-angle Video quality assessment (MWV) dataset. Then, the performances of state-of-the-art video quality methods on the MWV dataset are investigated by inter-dataset testing and intra-dataset testing. Experimental results show that these methods impose significant limitations on their applicability.


[182] 2501.12102

Proxies for Distortion and Consistency with Applications for Real-World Image Restoration

Real-world image restoration deals with the recovery of images suffering from an unknown degradation. This task is typically addressed while being given only degraded images, without their corresponding ground-truth versions. In this hard setting, designing and evaluating restoration algorithms becomes highly challenging. This paper offers a suite of tools that can serve both the design and assessment of real-world image restoration algorithms. Our work starts by proposing a trained model that predicts the chain of degradations a given real-world measured input has gone through. We show how this estimator can be used to approximate the consistency -- the match between the measurements and any proposed recovered image. We also use this estimator as a guiding force for the design of a simple and highly-effective plug-and-play real-world image restoration algorithm, leveraging a pre-trained diffusion-based image prior. Furthermore, this work proposes no-reference proxy measures of MSE and LPIPS, which, without access to the ground-truth images, allow ranking of real-world image restoration algorithms according to their (approximate) MSE and LPIPS. The proposed suite provides a versatile, first of its kind framework for evaluating and comparing blind image restoration algorithms in real-world scenarios.


[183] 2501.12113

Dual NUP Representations and Min-Maximization in Factor Graphs

Normals with unknown parameters (NUP) can be used to convert nontrivial model-based estimation problems into iterations of linear least-squares or Gaussian estimation problems. In this paper, we extend this approach by augmenting factor graphs with convex-dual variables and pertinent NUP representations. In particular, in a state space setting, we propose a new iterative forward-backward algorithm that is dual to a recently proposed backward-forward algorithm.


[184] 2501.12122

DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset

Code-switching, the alternation between two or more languages within communication, poses great challenges for Automatic Speech Recognition (ASR) systems. Existing models and datasets are limited in their ability to effectively handle these challenges. To address this gap and foster progress in code-switching ASR research, we introduce the DOTA-ME-CS: Daily oriented text audio Mandarin-English code-switching dataset, which consists of 18.54 hours of audio data, including 9,300 recordings from 34 participants. To enhance the dataset's diversity, we apply artificial intelligence (AI) techniques such as AI timbre synthesis, speed variation, and noise addition, thereby increasing the complexity and scalability of the task. The dataset is carefully curated to ensure both diversity and quality, providing a robust resource for researchers addressing the intricacies of bilingual speech recognition with detailed data analysis. We further demonstrate the dataset's potential in future research. The DOTA-ME-CS dataset, along with accompanying code, will be made publicly available.


[185] 2501.12194

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication

Wakeword detection plays a critical role in enabling AI assistants to listen to user voices and interact effectively. However, for languages other than English, there is a significant lack of pre-trained wakeword models. Additionally, systems that merely determine the presence of a wakeword can pose serious privacy concerns. In this paper, we propose an end-to-end approach that trains wakewords for Non-English languages, particulary Korean, and uses this to develop a Voice Authentication model to protect user privacy. Our implementation employs an open-source platform OpenWakeWord, which performs wakeword detection using an FCN (Fully-Connected Network) architecture. Once a wakeword is detected, our custom-developed code calculates cosine similarity for robust user authentication. Experimental results demonstrate the effectiveness of our approach, achieving a 16.79% and a 6.6% Equal Error Rate (EER) each in the Wakeword Detection and the Voice Authentication. These findings highlight the model's potential in providing secure and accurate wakeword detection and authentication for Korean users.


[186] 2501.12216

RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression

Video encoders optimize compression for human perception by minimizing reconstruction error under bit-rate constraints. In many modern applications such as autonomous driving, an overwhelming majority of videos serve as input for AI systems performing tasks like object recognition or segmentation, rather than being watched by humans. It is therefore useful to optimize the encoder for a downstream task instead of for perceptual image quality. However, a major challenge is how to combine such downstream optimization with existing standard video encoders, which are highly efficient and popular. Here, we address this challenge by controlling the Quantization Parameters (QPs) at the macro-block level to optimize the downstream task. This granular control allows us to prioritize encoding for task-relevant regions within each frame. We formulate this optimization problem as a Reinforcement Learning (RL) task, where the agent learns to balance long-term implications of choosing QPs on both task performance and bit-rate constraints. Notably, our policy does not require the downstream task as an input during inference, making it suitable for streaming applications and edge devices such as vehicles. We demonstrate significant improvements in two tasks, car detection, and ROI (saliency) encoding. Our approach improves task performance for a given bit rate compared to traditional task agnostic encoding methods, paving the way for more efficient task-aware video compression.


[187] 2501.12235

DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains

Low-light image enhancement (LLE) aims to improve the visual quality of images captured in poorly lit conditions, which often suffer from low brightness, low contrast, noise, and color distortions. These issues hinder the performance of computer vision tasks such as object detection, facial recognition, and autonomous driving.Traditional enhancement techniques, such as multi-scale fusion and histogram equalization, fail to preserve fine details and often struggle with maintaining the natural appearance of enhanced images under complex lighting conditions. Although the Retinex theory provides a foundation for image decomposition, it often amplifies noise, leading to suboptimal image quality. In this paper, we propose the Dual Light Enhance Network (DLEN), a novel architecture that incorporates two distinct attention mechanisms, considering both spatial and frequency domains. Our model introduces a learnable wavelet transform module in the illumination estimation phase, preserving high- and low-frequency components to enhance edge and texture details. Additionally, we design a dual-branch structure that leverages the power of the Transformer architecture to enhance both the illumination and structural components of the image.Through extensive experiments, our model outperforms state-of-the-art methods on standard benchmarks.Code is available here: https://github.com/LaLaLoXX/DLEN


[188] 2501.12236

Fast sparse optimization via adaptive shrinkage

The need for fast sparse optimization is emerging, e.g., to deal with large-dimensional data-driven problems and to track time-varying systems. In the framework of linear sparse optimization, the iterative shrinkage-thresholding algorithm is a valuable method to solve Lasso, which is particularly appreciated for its ease of implementation. Nevertheless, it converges slowly. In this paper, we develop a proximal method, based on logarithmic regularization, which turns out to be an iterative shrinkage-thresholding algorithm with adaptive shrinkage hyperparameter. This adaptivity substantially enhances the trajectory of the algorithm, in a way that yields faster convergence, while keeping the simplicity of the original method. Our contribution is twofold: on the one hand, we derive and analyze the proposed algorithm; on the other hand, we validate its fast convergence via numerical experiments and we discuss the performance with respect to state-of-the-art algorithms.


[189] 2501.12256

Lie-Bracket Nash Equilibrium Seeking with Bounded Update Rates for Noncooperative Games

This paper proposes a novel approach for local convergence to Nash equilibrium in quadratic noncooperative games based on a distributed Lie-bracket extremum seeking control scheme. This is the first instance of noncooperative games being tackled in a model-free fashion integrated with the extremum seeking method of bounded update rates. In particular, the stability analysis is carried out using Lie-bracket approximation and Lyapunov's direct method. We quantify the size of the ultimate small residual sets around the Nash equilibrium and illustrate the theoretical results numerically on an example in an oligopoly setting.


[190] 2501.12279

Spatial exponential decay of perturbations in optimal control of general evolution equations

We analyze the robustness of optimally controlled evolution equations with respect to spatially localized perturbations. We prove that if the involved operators are domain-uniformly stabilizable and detectable, then these localized perturbations only have a local effect on the optimal solution. We characterize this domain-uniform stabilizability and detectability for the transport equation with constant transport velocity, showing that even for unitary semigroups, optimality implies exponential damping. Finally, we extend our result to the case of a space-dependent transport velocity. Numerical examples in one space dimension complement the theoretical results.


[191] 2501.12384

CCESAR: Coastline Classification-Extraction From SAR Images Using CNN-U-Net Combination

In this article, we improve the deep learning solution for coastline extraction from Synthetic Aperture Radar (SAR) images by proposing a two-stage model involving image classification followed by segmentation. We hypothesize that a single segmentation model usually used for coastline detection is insufficient to characterize different coastline types. We demonstrate that the need for a two-stage workflow prevails through different compression levels of these images. Our results from experiments using a combination of CNN and U-Net models on Sentinel-1 images show that the two-stage workflow, coastline classification-extraction from SAR images (CCESAR) outperforms a single U-Net segmentation model.


[192] 2501.12385

Audio Texture Manipulation by Exemplar-Based Analogy

Audio texture manipulation involves modifying the perceptual characteristics of a sound to achieve specific transformations, such as adding, removing, or replacing auditory elements. In this paper, we propose an exemplar-based analogy model for audio texture manipulation. Instead of conditioning on text-based instructions, our method uses paired speech examples, where one clip represents the original sound and another illustrates the desired transformation. The model learns to apply the same transformation to new input, allowing for the manipulation of sound textures. We construct a quadruplet dataset representing various editing tasks, and train a latent diffusion model in a self-supervised manner. We show through quantitative evaluations and perceptual studies that our model outperforms text-conditioned baselines and generalizes to real-world, out-of-distribution, and non-speech scenarios. Project page: https://berkeley-speech-group.github.io/audio-texture-analogy/