New articles on Electrical Engineering and Systems Science


[1] 2405.00678

Low-cost modular devices for on-road vehicle detection and characterisation

Detecting and characterising vehicles is one of the purposes of embedded systems used in intelligent environments. An analysis of a vehicle characteristics can reveal inappropriate or dangerous behaviour. This detection makes it possible to sanction or notify emergency services to take early and practical actions. Vehicle detection and characterisation systems employ complex sensors such as video cameras, especially in urban environments. These sensors provide high precision and performance, although the price and computational requirements are proportional to their accuracy. These sensors offer high accuracy, but the price and computational requirements are directly proportional to their performance. This article introduces a system based on modular devices that is economical and has a low computational cost. These devices use ultrasonic sensors to detect the speed and length of vehicles. The measurement accuracy is improved through the collaboration of the device modules. The experiments were performed using multiple modules oriented to different angles. This module is coupled with another specifically designed to detect distance using previous modules speed and length data. The collaboration between different modules reduces the speed relative error ranges from 1 to 5, depending on the angle configuration used in the modules.


[2] 2405.00681

Delay and Overhead Efficient Transmission Scheduling for Federated Learning in UAV Swarms

This paper studies the wireless scheduling design to coordinate the transmissions of (local) model parameters of federated learning (FL) for a swarm of unmanned aerial vehicles (UAVs). The overall goal of the proposed design is to realize the FL training and aggregation processes with a central aggregator exploiting the sensory data collected by the UAVs but it considers the multi-hop wireless network formed by the UAVs. Such transmissions of model parameters over the UAV-based wireless network potentially cause large transmission delays and overhead. Our proposed framework smartly aggregates local model parameters trained by the UAVs while efficiently transmitting the underlying parameters to the central aggregator in each FL global round. We theoretically show that the proposed scheme achieves minimal delay and communication overhead. Extensive numerical experiments demonstrate the superiority of the proposed scheme compared to other baselines.


[3] 2405.00682

SynthBrainGrow: Synthetic Diffusion Brain Aging for Longitudinal MRI Data Generation in Young People

Synthetic longitudinal brain MRI simulates brain aging and would enable more efficient research on neurodevelopmental and neurodegenerative conditions. Synthetically generated, age-adjusted brain images could serve as valuable alternatives to costly longitudinal imaging acquisitions, serve as internal controls for studies looking at the effects of environmental or therapeutic modifiers on brain development, and allow data augmentation for diverse populations. In this paper, we present a diffusion-based approach called SynthBrainGrow for synthetic brain aging with a two-year step. To validate the feasibility of using synthetically-generated data on downstream tasks, we compared structural volumetrics of two-year-aged brains against synthetically-aged brain MRI. Results show that SynthBrainGrow can accurately capture substructure volumetrics and simulate structural changes such as ventricle enlargement and cortical thinning. Our approach provides a novel way to generate longitudinal brain datasets from cross-sectional data to enable augmented training and benchmarking of computational tools for analyzing lifespan trajectories. This work signifies an important advance in generative modeling to synthesize realistic longitudinal data with limited lifelong MRI scans. The code is available at XXX.


[4] 2405.00683

Frequency-Guided U-Net: Leveraging Attention Filter Gates and Fast Fourier Transformation for Enhanced Medical Image Segmentation

Purpose Medical imaging diagnosis faces challenges, including low-resolution images due to machine artifacts and patient movement. This paper presents the Frequency-Guided U-Net (GFNet), a novel approach for medical image segmentation that addresses challenges associated with low-resolution images and inefficient feature extraction. Approach In response to challenges related to computational cost and complexity in feature extraction, our approach introduces the Attention Filter Gate. Departing from traditional spatial domain learning, our model operates in the frequency domain using FFT. A strategically placed weighted learnable matrix filters feature, reducing computational costs. FFT is integrated between up-sampling and down-sampling, mitigating issues of throughput, latency, FLOP, and enhancing feature extraction. Results Experimental outcomes shed light on model performance. The Attention Filter Gate, a pivotal component of GFNet, achieves competitive segmentation accuracy (Mean Dice: 0.8366, Mean IoU: 0.7962). Comparatively, the Attention Gate model surpasses others, with a Mean Dice of 0.9107 and a Mean IoU of 0.8685. The widely-used U-Net baseline demonstrates satisfactory performance (Mean Dice: 0.8680, Mean IoU: 0.8268). Conclusion his work introduces GFNet as an efficient and accurate method for medical image segmentation. By leveraging the frequency domain and attention filter gates, GFNet addresses key challenges of information loss, computational cost, and feature extraction limitations. This novel approach offers potential advancements for computer-aided diagnosis and other healthcare applications. Keywords: Medical Segmentation, Neural Networks,


[5] 2405.00712

SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning

Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.


[6] 2405.00714

Multi-Band mm-Wave Measurement Platform Towards Environment-Aware Beam Management

Agile beam management is key for providing seamless millimeter wave (mm-wave) connectivity given the site-specific spatio-temporal variations of the mm-wave channel. Leveraging non radio frequency (RF) sensor inputs for environment awareness, e.g. via machine learning (ML) techniques, can greatly enhance RF-based beam steering. To overcome the lack of diverse publicly available multi-modal mm-wave datasets for the design and evaluation of such novel beam steering approaches, we demonstrate our software-defined radio multi-band mm-wave measurement platform which integrates multi-modal sensors towards environment-aware beam management.


[7] 2405.00719

EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces

Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasks consistently verify the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms existing state-of-the-art methods or is comparable to them. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks. The source code can be found at https://github.com/yi-ding-cs/EEG-Deformer.


[8] 2405.00720

A Novel Machine Learning-based Equalizer for a Downstream 100G PAM-4 PON

A frequency-calibrated SCINet (FC-SCINet) equalizer is proposed for down-stream 100G PON with 28.7 dB path loss. At 5 km, FC-SCINet improves the BER by 88.87% compared to FFE and a 3-layer DNN with 10.57% lower complexity.


[9] 2405.00721

Optimizing Brain-Computer Interface Performance: Advancing EEG Signals Channel Selection through Regularized CSP and SPEA II Multi-Objective Optimization

Brain-computer interface systems and the recording of brain activity has garnered significant attention across a diverse spectrum of applications. EEG signals have emerged as a modality for recording neural electrical activity. Among the methodologies designed for feature extraction from EEG data, the method of RCSP has proven to be an approach, particularly in the context of MI tasks. RCSP exhibits efficacy in the discrimination and classification of EEG signals. In optimizing the performance of this method, our research extends to a comparative analysis with conventional CSP techniques, as well as optimized methodologies designed for similar applications. Notably, we employ the meta-heuristic multi-objective Strength Pareto Evolutionary Algorithm II (SPEA-II) as a pivotal component of our research paradigm. This is a state-of-the-art approach in the selection of an subset of channels from a multichannel EEG signal with MI tasks. Our main objective is to formulate an optimum channel selection strategy aimed at identifying the most pertinent subset of channels from the multi-dimensional electroencephalogram (EEG) signals. One of the primary objectives inherent to channel selection in the EEG signal analysis pertains to the reduction of the channel count, an approach that enhances user comfort when utilizing gel-based EEG electrodes. Additionally, within this research, we took benefit of ensemble learning models as a component of our decision-making. This technique serves to mitigate the challenges associated with overfitting, especially when confronted with an extensive array of potentially redundant EEG channels and data noise. Our findings not only affirm the performance of RCSP in MI-based BCI systems, but also underscore the significance of channel selection strategies and ensemble learning techniques in optimizing the performance of EEG signal classification.


[10] 2405.00723

EEG_RL-Net: Enhancing EEG MI Classification through Reinforcement Learning-Optimised Graph Neural Networks

Brain-Computer Interfaces (BCIs) rely on accurately decoding electroencephalography (EEG) motor imagery (MI) signals for effective device control. Graph Neural Networks (GNNs) outperform Convolutional Neural Networks (CNNs) in this regard, by leveraging the spatial relationships between EEG electrodes through adjacency matrices. The EEG_GLT-Net framework, featuring the state-of-the-art EEG_GLT adjacency matrix method, has notably enhanced EEG MI signal classification, evidenced by an average accuracy of 83.95% across 20 subjects on the PhysioNet dataset. This significantly exceeds the 76.10% accuracy rate achieved using the Pearson Correlation Coefficient (PCC) method within the same framework. In this research, we advance the field by applying a Reinforcement Learning (RL) approach to the classification of EEG MI signals. Our innovative method empowers the RL agent, enabling not only the classification of EEG MI data points with higher accuracy, but effective identification of EEG MI data points that are less distinct. We present the EEG_RL-Net, an enhancement of the EEG_GLT-Net framework, which incorporates the trained EEG GCN Block from EEG_GLT-Net at an adjacency matrix density of 13.39% alongside the RL-centric Dueling Deep Q Network (Dueling DQN) block. The EEG_RL-Net model showcases exceptional classification performance, achieving an unprecedented average accuracy of 96.40% across 20 subjects within 25 milliseconds. This model illustrates the transformative effect of the RL in EEG MI time point classification.


[11] 2405.00724

Baseline Drift Tolerant Signal Encoding for ECG Classification with Deep Learning

Common artefacts such as baseline drift, rescaling, and noise critically limit the performance of machine learningbased automated ECG analysis and interpretation. This study proposes Derived Peak (DP) encoding, a non-parametric method that generates signed spikes corresponding to zero crossings of the signals first and second-order time derivatives. Notably, DP encoding is invariant to shift and scaling artefacts, and its implementation is further simplified by the absence of userdefined parameters. DP encoding was used to encode the 12-lead ECG data from the PTB-XL dataset (n=18,869 participants) and was fed to 1D-ResNet-18 models trained to identify myocardial infarction, conductive deficits and ST-segment abnormalities. Robustness to artefacts was assessed by corrupting ECG data with sinusoidal baseline drift, shift, rescaling and noise, before encoding. The addition of these artefacts resulted in a significant drop in accuracy for seven other methods from prior art, while DP encoding maintained a baseline AUC of 0.88 under drift, shift and rescaling. DP achieved superior performance to unencoded inputs in the presence of shift (AUC under 1mV shift: 0.91 vs 0.62), and rescaling artefacts (AUC 0.91 vs 0.79). Thus, DP encoding is a simple method by which robustness to common ECG artefacts may be improved for automated ECG analysis and interpretation.


[12] 2405.00725

Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at https://github.com/vikhyatt/Hospital-FL-DP.


[13] 2405.00726

Unveiling Thoughts: A Review of Advancements in EEG Brain Signal Decoding into Text

The conversion of brain activity into text using electroencephalography (EEG) has gained significant traction in recent years. Many researchers are working to develop new models to decode EEG signals into text form. Although this area has shown promising developments, it still faces numerous challenges that necessitate further improvement. It's important to outline this area's recent developments and future research directions. In this review article, we thoroughly summarize the progress in EEG-to-text conversion. Firstly, we talk about how EEG-to-text technology has grown and what problems we still face. Secondly, we discuss existing techniques used in this field. This includes methods for collecting EEG data, the steps to process these signals, and the development of systems capable of translating these signals into coherent text. We conclude with potential future research directions, emphasizing the need for enhanced accuracy, reduced system constraints, and the exploration of novel applications across varied sectors. By addressing these aspects, this review aims to contribute to developing more accessible and effective Brain-Computer Interface (BCI) technology for a broader user base.


[14] 2405.00727

Generalised envelope spectrum-based signal-to-noise objectives: Formulation, optimisation and application for gear fault detection under time-varying speed conditions

In vibration-based condition monitoring, optimal filter design improves fault detection by enhancing weak fault signatures within vibration signals. This process involves optimising a derived objective function from a defined objective. The objectives are often based on proxy health indicators to determine the filter's parameters. However, these indicators can be compromised by irrelevant extraneous signal components and fluctuating operational conditions, affecting the filter's efficacy. Fault detection primarily uses the fault component's prominence in the squared envelope spectrum, quantified by a squared envelope spectrum-based signal-to-noise ratio. New optimal filter objective functions are derived from the proposed generalised envelope spectrum-based signal-to-noise objective for machines operating under variable speed conditions. Instead of optimising proxy health indicators, the optimal filter coefficients of the formulation directly maximise the squared envelope spectrum-based signal-to-noise ratio over targeted frequency bands using standard gradient-based optimisers. Four derived objective functions from the proposed objective effectively outperform five prominent methods in tests on three experimental datasets.


[15] 2405.00733

Joint ADS-B in 5G for Hierarchical Aerial Networks: Performance Analysis and Optimization

Unmanned aerial vehicles (UAVs) are widely applied in multiple fields, which emphasizes the challenge of obtaining UAV flight information to ensure the airspace safety. UAVs equipped with automatic dependent surveillance-broadcast (ADS-B) devices are capable of sending flight information to nearby aircrafts and ground stations (GSs). However, the saturation of limited frequency bands of ADS-B leads to interferences among UAVs and impairs the monitoring performance of GS to civil planes. To address this issue, the integration of the 5th generation mobile communication technology (5G) with ADS-B is proposed for UAV operations in this paper. Specifically, a hierarchical structure is proposed, in which the high-altitude central UAV is equipped with ADS-B and the low-altitude central UAV utilizes 5G modules to transmit flight information. Meanwhile, based on the mobile edge computing technique, the flight information of sub-UAVs is offloaded to the central UAV for further processing, and then transmitted to GS. We present the deterministic model and stochastic geometry based model to build the air-to-ground channel and air-to-air channel, respectively. The effectiveness of the proposed monitoring system is verified via simulations and experiments. This research contributes to improving the airspace safety and advancing the air traffic flow management.


[16] 2405.00734

EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Manifold Attention and Confidence Stratification (MACS) to diagnose neurodegenerative disorders based on EEG signals sourced from four centers with unreliable annotations. The MACS framework's effectiveness stems from these features: 1) The Augmentor generates various EEG-represented brain variants to enrich the data space; 2) The Switcher enhances the feature space for trusted samples and reduces overfitting on incorrectly labeled samples; 3) The Encoder uses the Riemannian manifold and Euclidean metrics to capture spatiotemporal variations and dynamic synchronization in EEG; 4) The Projector, equipped with dual heads, monitors consistency across multiple brain variants and ensures diagnostic accuracy; 5) The Stratifier adaptively stratifies learned samples by confidence levels throughout the training process; 6) Forward and backpropagation in MACS are constrained by confidence stratification to stabilize the learning system amid unreliable annotations. Our subject-independent experiments, conducted on both neurocognitive and movement disorders using cross-center corpora, have demonstrated superior performance compared to existing related algorithms. This work not only improves EEG-based diagnostics for cross-center and small-setting brain diseases but also offers insights into extending MACS techniques to other data analyses, tackling data heterogeneity and annotation unreliability in multimedia and multimodal content understanding.


[17] 2405.00736

Joint Signal Detection and Automatic Modulation Classification via Deep Learning

Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different carrier frequencies. We first generate a coexisting RADIOML dataset (CRML23) to facilitate the joint design. Different from the publicly available AMC dataset ignoring the signal detection step and containing only one signal, our synthetic dataset covers the more realistic multiple-signal coexisting scenario. Then, we present a joint framework for detection and classification (JDM) for such a multiple-signal coexisting environment, which consists of two modules for signal detection and AMC, respectively. In particular, these two modules are interconnected using a designated data structure called "proposal". Finally, we conduct extensive simulations over the newly developed dataset, which demonstrate the effectiveness of our designs. Our code and dataset are now available as open-source (https://github.com/Singingkettle/ChangShuoRadioData).


[18] 2405.00741

Diagnosis of Parkinson's Disease Using EEG Signals and Machine Learning Techniques: A Comprehensive Study

Parkinson's disease is a widespread neurodegenerative condition necessitating early diagnosis for effective intervention. This paper introduces an innovative method for diagnosing Parkinson's disease through the analysis of human EEG signals, employing a Support Vector Machine (SVM) classification model. this research presents novel contributions to enhance diagnostic accuracy and reliability. Our approach incorporates a comprehensive review of EEG signal analysis techniques and machine learning methods. Drawing from recent studies, we have engineered an advanced SVM-based model optimized for Parkinson's disease diagnosis. Utilizing cutting-edge feature engineering, extensive hyperparameter tuning, and kernel selection, our method achieves not only heightened diagnostic accuracy but also emphasizes model interpretability, catering to both clinicians and researchers. Moreover, ethical concerns in healthcare machine learning, such as data privacy and biases, are conscientiously addressed. We assess our method's performance through experiments on a diverse dataset comprising EEG recordings from Parkinson's disease patients and healthy controls, demonstrating significantly improved diagnostic accuracy compared to conventional techniques. In conclusion, this paper introduces an innovative SVM-based approach for diagnosing Parkinson's disease from human EEG signals. Building upon the IEEE framework and previous research, its novelty lies in the capacity to enhance diagnostic accuracy while upholding interpretability and ethical considerations for practical healthcare applications. These advances promise to revolutionize early Parkinson's disease detection and management, ultimately contributing to enhanced patient outcomes and quality of life.


[19] 2405.00822

Kernel-based Learning for Safe Control of Discrete-Time Unknown Systems under Incomplete Observations

Safe control for dynamical systems is critical, yet the presence of unknown dynamics poses significant challenges. In this paper, we present a learning-based control approach for tracking control of a class of high-order systems, operating under the constraint of partially observable states. The uncertainties inherent within the systems are modeled by kernel ridge regression, leveraging the proposed strategic data acquisition approach with limited state measurements. To achieve accurate trajectory tracking, a state observer that seamlessly integrates with the control law is devised. The analysis of the guaranteed control performance is conducted using Lyapunov theory due to the deterministic prediction error bound of kernel ridge regression, ensuring the adaptability of the approach in safety-critical scenarios. To demonstrate the effectiveness of our proposed approach, numerical simulations are performed, underscoring its contributions to the advancement of control strategies.


[20] 2405.00833

Modelling the nanopore sequencing process with Helicase HMMs

Recent advancements in nanopore sequencing technology, particularly the R10 nanopore from Oxford Nanopore Technology, have necessitated the development of improved data processing methods to utilize their potential for more than 9-mer resolution fully. The processing of the ion currents predominantly utilizes neural network-based methods known for their high basecalling accuracy but face developmental bottlenecks at higher resolutions. In light of this, we introduce the Helicase Hidden Markov Model (HHMM), a novel framework designed to incorporate the dynamics of the helicase motor protein alongside the nucleotide sequence during nanopore sequencing. This model supports the analysis of millions of distinct states, enhancing our understanding of raw ion currents and their alignment with nucleotide sequences. Our findings demonstrate the utility of HHMM not only as a potent visualization tool but also as an effective base for developing advanced basecalling algorithms. This approach offers a promising avenue for leveraging the full capabilities of emerging high-resolution nanopore sequencing technologies.


[21] 2405.00871

Learning to Boost the Performance of Stable Nonlinear Systems

The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee Lp closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.


[22] 2405.00887

On the Role of Reflectarrays for Interplanetary Links

Interplanetary links (IPL) serve as crucial enablers for space exploration, facilitating secure and adaptable space missions. An integrated IPL with inter-satellite communication (IP-ISL) establishes a unified deep space network, expanding coverage and reducing atmospheric losses. The challenges, including irregularities in charged density, hardware impairments, and hidden celestial body brightness are analyzed with a reflectarray-based IP-ISL between Earth and Moon orbiters. It is observed that $10^{-8}$ order severe hardware impairments with intense solar plasma density drops an ideal system's spectral efficiency (SE) from $\sim\!38~\textrm{(bit/s)/Hz}$ down to $0~\textrm{(bit/s)/Hz}$. An ideal full angle of arrival fluctuation recovery with full steering range achieves $\sim\!20~\textrm{(bit/s)/Hz}$ gain and a limited beamsteering with a numerical reflectarray design achieves at least $\sim\!1~\textrm{(bit/s)/Hz}$ gain in severe hardware impairment cases.


[23] 2405.00924

Zonotope-based Symbolic Controller Synthesis for Linear Temporal Logic Specifications

This paper studies the controller synthesis problem for nonlinear control systems under linear temporal logic (LTL) specifications using zonotope techniques. A local-to-global control strategy is proposed for the desired specification expressed as an LTL formula. First, a novel approach is developed to divide the state space into finite zonotopes and constrained zonotopes, which are called cells and allowed to intersect with the neighbor cells. Second, from the intersection relation, a graph among all cells is generated to verify the realization of the accepting path for the LTL formula. The realization verification determines if there is a need for the control design, and also results in finite local LTL formulas. Third, once the accepting path is realized, a novel abstraction-based method is derived for the controller design. In particular, we only focus on the cells from the realization verification and approximate each cell thanks to properties of zonotopes. Based on local symbolic models and local LTL formulas, an iterative synthesis algorithm is proposed to design all local abstract controllers, whose existence and combination establish the global controller for the LTL formula. Finally, the proposed framework is illustrated via a path planning problem of mobile robots.


[24] 2405.00934

Benchmarking Representations for Speech, Music, and Acoustic Events

Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods, and is useful to pinpoint promising research directions.


[25] 2405.00973

Active Cell Balancing for Extended Operational Time of Lithium-Ion Battery Systems in Energy Storage Applications

Cell inconsistency within a lithium-ion battery system poses a significant challenge in maximizing the system operational time. This study presents an optimization-driven active balancing method to minimize the effects of cell inconsistency on the system operational time while simultaneously satisfying the system output power demand and prolonging the system operational time in energy storage applications. The proposed method utilizes a fractional order model to forecast the terminal voltage dynamics of each cell within a battery system, enhanced with a particle-swarm-optimisation-genetic algorithm for precise parameter identification. It is implemented under two distinct cell-level balancing topologies: independent cell balancing and differential cell balancing. Subsequently, the current distribution for each topology is determined by resolving two optimization control problems constrained by the battery's operational specifications and power demands. The effectiveness of the proposed method is validated by extensive experiments based on the two balancing topologies. The results demonstrate that the proposed method increases the operational time by 3.2%.


[26] 2405.01011

Rare Collision Risk Estimation of Autonomous Vehicles with Multi-Agent Situation Awareness

This paper offers a formal framework for the rare collision risk estimation of autonomous vehicles (AVs) with multi-agent situation awareness, affected by different sources of noise in a complex dynamic environment. In our proposed setting, the situation awareness is considered for one of the ego vehicles by aggregating a range of diverse information gathered from other vehicles into a vector. We model AVs equipped with the situation awareness as general stochastic hybrid systems (GSHS) and assess the probability of collision in a lane-change scenario where two self-driving vehicles simultaneously intend to switch lanes into a shared one, while utilizing the time-to-collision measure for decision-making as required. Due to the substantial data requirements of simulation-based methods for the rare collision risk estimation, we leverage a multi-level importance splitting technique, known as interacting particle system-based estimation with fixed assignment splitting (IPS-FAS). This approach allows us to estimate the probability of a rare event by employing a group of interacting particles. Specifically, each particle embodies a system trajectory and engages with others through resampling and branching, focusing computational resources on trajectories with the highest probability of encountering the rare event. The effectiveness of our proposed approach is demonstrated through an extensive simulation of a lane-change scenario.


[27] 2405.01023

Experimental Evaluation of Moving Target Compensation in High Time-Bandwidth Noise Radar

In this article, the effect a moving target has on the signal-to-interference-plus-noise-ratio (SINR) for high time-bandwidth noise radars is investigated. To compensate for cell migration we apply a computationally efficient stretch processing algorithm that is tailored for batched processing and suitable for implementation onto a real-time radar processor. The performance of the algorithm is studied using experimental data. In the experiment, pseudorandom noise, with a bandwidth of 100 MHz, is generated and transmitted in real-time. An unmanned aerial vehicle (UAV), flown at a speed of 11 m/s, is acting as a target. For an integration time of 1 s, the algorithm is shown to yield an increase in SINR of roughly 13 dB, compared to no compensation. It is also shown that coherent integration times of 2.5 s can be achieved.


[28] 2405.01087

Non-overshooting sliding mode for UAV control

For a class of uncertain systems, a non-overshooting sliding mode control is presented to make them globally exponentially stable and without overshoot. Even when the unknown stochastic disturbance exists, and the time-variant reference trajectory is required, the strict non-overshooting stabilization is still achieved. The control law design is based on a desired second-order sliding mode (2-sliding mode), which successively includes two bounded-gain subsystems. Non-overshooting stability requires that the system gains depend on the initial values of system variables. In order to obtain the global non-overshooting stability, the first subsystem with non-overshooting reachability compresses the initial values of the second subsystem to a given bounded range. By partitioning these initial values, the bounded system gains are determined to satisfy the robust non-overshooting stability. In order to reject the chattering in the controller output, a tanh-function-based sliding mode is developed for the design of smoothed non-overshooting controller. The proposed method is applied to a UAV trajectory tracking when the disturbances and uncertainties exist. The control laws are designed to implement the non-overshooting stabilization in position and attitude. Finally, the effectiveness of the proposed method is demonstrated by the flying tests.


[29] 2405.01094

Closed-Loop Sensitivity Identification for Cross-Directional Systems

At Diamond Light Source, the UK's national synchrotron facility, electron beam disturbances are attenuated by the fast orbit feedback (FOFB), which controls a cross-directional (CD) system with hundreds of inputs and outputs. Due to the inability to measure the disturbance spectrum in real-time, the closed-loop sensitivity of the FOFB cannot be evaluated, making it difficult to compare FOFB algorithms and detect faults. Existing methods rely on comparing open-loop with closed-loop measurements, but they are prone to instabilities and actuator saturation because of the system's strong directionality. Here, we introduce a reference signal to estimate the complementary sensitivity in closed loop. By decoupling the system into sets of single-input, single-output (SISO) systems, we design the reference mode-by-mode to accommodate the system's strong directionality. This allows SISO system identification to be used, making our approach suitable for large-scale systems. Additionally, we derive lower bounds on reference amplitudes to achieve a predefined estimation error bound in the presence of disturbances and measurement noise. Our approach not only enables performance estimation of ill-conditioned CD systems in closed-loop but also provides a signal for fault detection. Its potential applications extend to other CD systems, such as papermaking, steel rolling, or battery manufacturing processes.


[30] 2405.01149

Optimizing Satellite Network Infrastructure: A Joint Approach to Gateway Placement and Routing

Satellite constellation systems are becoming more attractive to provide communication services worldwide, especially in areas without network connectivity. While optimizing satellite gateway placement is crucial for operators to minimize deployment and operating costs, reducing the number of gateways may require more inter-satellite link hops to reach the ground network, thereby increasing latency. Therefore, it is of significant importance to develop a framework that optimizes gateway placement, dynamic routing, and flow management in inter-satellite links to enhance network performance. To this end, we model an optimization problem as a mixed-integer problem with a cost function combining the number of gateways, flow allocation, and traffic latency, allowing satellite operators to set priorities based on their policies. Our simulation results indicate that the proposed approach effectively reduces the number of active gateways by selecting their most appropriate locations while balancing the trade-off between the number of gateways and traffic latency. Furthermore, we demonstrate the impact of different weights in the cost function on performance through comparative analysis.


[31] 2405.01161

Exponentially Consistent Outlier Hypothesis Testing for Continuous Sequences

In outlier hypothesis testing, one aims to detect outlying sequences among a given set of sequences, where most sequences are generated i.i.d. from a nominal distribution while outlying sequences (outliers) are generated i.i.d. from a different anomalous distribution. Most existing studies focus on discrete-valued sequences, where each data sample takes values in a finite set. To account for practical scenarios where data sequences usually take real values, we study outlier hypothesis testing for continuous sequences when both the nominal and anomalous distributions are \emph{unknown}. Specifically, we propose distribution free tests and prove that the probabilities of misclassification error, false reject and false alarm decay exponentially fast for three different test designs: fixed-length test, sequential test, and two-phase test. In a fixed-length test, one fixes the sample size of each observed sequence; in a sequential test, one takes a sample sequentially from each sequence per unit time until a reliable decision can be made; in a two-phase test, one adapts the sample size from two different fixed values. Remarkably, the two-phase test achieves a good balance between test design complexity and theoretical performance. We first consider the case of at most one outlier, and then generalize our results to the case with multiple outliers where the number of outliers is unknown.


[32] 2405.01197

Optimal Beamforming for Bistatic MIMO Sensing

This paper considers the beamforming optimization for sensing a point-like scatterer using a bistatic multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) radar, which could be part of a joint communication and sensing system. The goal is to minimize the Cram\'er-Rao bound on the target position's estimation error, where the radar already knows an approximate position that is taken into account in the optimization. The optimization allows for beamforming with more than one beam per subcarrier. Optimal solutions for the beamforming are discussed for known and unknown channel gain. Numerical results show that beamforming with at most one beam per subcarrier is optimal for certain parameters, but for other parameters, optimal solutions need two beams on some subcarriers. In addition, the degree of freedom in selecting which end of the bistatic radar should transmit and receive is considered.


[33] 2405.01200

Learning-to-solve unit commitment based on few-shot physics-guided spatial-temporal graph convolution network

This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for constraint satisfaction. To further enable both feasibility and continuous relaxation for integers in learning process, straight-through estimator for Tanh-Sign composition is proposed to fully differentiate the mixed integer solution space. Case study on the IEEE benchmark justifies that, our method bests mainstream learning ways on UC feasibility, and surpasses traditional solver on efficiency.


[34] 2405.01220

Misspecification of Multiple Scattering in Scalar Wave Fields and its Impact in Ultrasound Tomography

In this work, we investigate the localization of targets in the presence of multiple scattering. We focus on the often omitted scenario in which measurement data is affected by multiple scattering, and a simpler model is employed in the estimation. We study the impact of such model mismatch by means of the Misspecified Cram\'er-Rao Bound (MCRB). In numerical simulations inspired by tomographic inspection in ultrasound nondestructive testing, the MCRB is shown to correctly describe the estimation variance of localization parameters under misspecification of the wave propagation model. We provide extensive discussion on the utility of the MCRB in the practical task of verifying whether a chosen misspecified model is suitable for localization based on the properties of the maximum likelihood estimator and the nuanced distinction between bias and parameter space differences. Finally, we highlight that careful interpretation is needed whenever employing the classical CRB in the presence of mismatch through numerical examples based on the Born approximation and other simplified propagation models stemming from it.


[35] 2405.01264

Model Predictive Guidance for Fuel-Optimal Landing of Reusable Launch Vehicles

This paper introduces a landing guidance strategy for reusable launch vehicles (RLVs) using a model predictive approach based on sequential convex programming (SCP). The proposed approach devises two distinct optimal control problems (OCPs): planning a fuel-optimal landing trajectory that accommodates practical path constraints specific to RLVs, and determining real-time optimal tracking commands. This dual optimization strategy allows for reduced computational load through adjustable prediction horizon lengths in the tracking task, achieving near closed-loop performance. Enhancements in model fidelity for the tracking task are achieved through an alternative rotational dynamics representation, enabling a more stable numerical solution of the OCP and accounting for vehicle transient dynamics. Furthermore, modifications of aerodynamic force in both planning and tracking phases are proposed, tailored for thrust-vector-controlled RLVs, to reduce the fidelity gap without adding computational complexity. Extensive 6-DOF simulation experiments validate the effectiveness and improved guidance performance of the proposed algorithm.


[36] 2405.01303

Joint Sequential Fronthaul Quantization and Hardware Complexity Reduction in Uplink Cell-Free Massive MIMO Networks

Fronthaul quantization causes a significant distortion in cell-free massive MIMO networks. Due to the limited capacity of fronthaul links, information exchange among access points (APs) must be quantized significantly. Furthermore, the complexity of the multiplication operation in the base-band processing unit increases with the number of bits of the operands. Thus, quantizing the APs' signal vector reduces the complexity of signal estimation in the base-band processing unit. Most recent works consider the direct quantization of the received signal vectors at each AP without any pre-processing. However, the signal vectors received at different APs are correlated mutually (inter-AP correlation) and also have correlated dimensions (intra-AP correlation). Hence, cooperative quantization of APs fronthaul can help to efficiently use the quantization bits at each AP and further reduce the distortion imposed on the quantized vector at the APs. This paper considers a daisy chain fronthaul and three different processing sequences at each AP. We show that 1) de-correlating the received signal vector at each AP from the corresponding vectors of the previous APs (inter-AP de-correlation) and 2) de-correlating the dimensions of the received signal vector at each AP (intra-AP de-correlation) before quantization helps to use the quantization bits at each AP more efficiently than directly quantizing the received signal vector without any pre-processing and consequently, improves the bit error rate (BER) and normalized mean square error (NMSE) of users signal estimation.


[37] 2405.01314

Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network

We address a joint trajectory planning, user association, resource allocation, and power control problem to maximize proportional fairness in the aerial IoT network, considering practical end-to-end quality-of-service (QoS) and communication schedules. Though the problem is rather ancient, apart from the fact that the previous approaches have never considered user- and time-specific QoS, we point out a prevalent mistake in coordinate optimization approaches adopted by the majority of the literature. Coordinate optimization approaches, which repetitively optimize radio resources for a fixed trajectory and vice versa, generally converge to local optima when all variables are differentiable. However, these methods often stagnate at a non-stationary point, significantly degrading the network utility in mixed-integer problems such as joint trajectory and radio resource optimization. We detour this problem by converting the formulated problem into the Markov decision process (MDP). Exploiting the beneficial characteristics of the MDP, we design a non-iterative framework that cooperatively optimizes trajectory and radio resources without initial trajectory choice. The proposed framework can incorporate various trajectory planning algorithms such as the genetic algorithm, tree search, and reinforcement learning. Extensive comparisons with diverse baselines verify that the proposed framework significantly outperforms the state-of-the-art method, nearly achieving the global optimum. Our implementation code is available at https://github.com/hslyu/dbspf.


[38] 2405.01358

Propagation measurements and channel models in Indoor Environment at 6.75 GHz FR1(C) and 16.95 GHz FR3 Upper-mid band Spectrum for 5G and 6G

New spectrum allocations in the 4--8 GHz FR1(C) and 7--24 GHz FR3 mid-band frequency spectrum are being considered for 5G/6G cellular deployments. This paper presents results from the world's first comprehensive indoor hotspot (InH) propagation measurement campaign at 6.75 GHz and 16.95 GHz in the NYU WIRELESS Research Center using a 1 GHz wideband channel sounder system over distances from 11 to 97 m in line-of-sight (LOS) and non-LOS (NLOS). Analysis of directional and omnidirectional path loss (PL) using the close-in free space 1 m reference distance model shows a familiar waveguiding effect in LOS with an omnidirectional path loss exponent (PLE) of 1.40 at 6.75 GHz and 1.32 at 16.95 GHz. Compared to mmWave frequencies, the directional NLOS PLEs are lower at FR3 and FR1(C), while omnidirectional NLOS PLEs are similar, suggesting better propagation distances at lower frequencies for links with omnidirectional antennas at both ends of the links, but also, importantly, showing that higher gain antennas will offer better coverage at higher frequencies when antenna apertures are kept same over all frequencies. Comparison of the omnidirectional and directional RMS delay spread (DS) at FR1(C) and FR3 with mmWave frequencies indicates a clear decrease with increasing frequency. The mean spatial lobe and omnidirectional RMS angular spread (AS) is found to be wider at 6.75 GHz compared to 16.95 GHz indicating more multipath components are found in the azimuthal spatial domain at lower frequencies.


[39] 2405.01362

Wideband Penetration Loss through Building Materials and Partitions at 6.75 GHz in FR1(C) and 16.95 GHz in the FR3 Upper Mid-band spectrum

The 4--8 GHz FR1(C) and 7--24 GHz upper mid-band FR3 spectrum are promising new 6G spectrum allocations being considered by the International Telecommunications Union (ITU) and major governments around the world. There is an urgent need to understand the propagation behavior and radio coverage, outage, and material penetration for the global mobile wireless industry in both indoor and outdoor environments in these emerging frequency bands. This work presents measurements and models that describe the penetration loss in co-polarized and cross-polarized antenna configurations, exhibited by common materials found inside buildings and on building perimeters, including concrete, low-emissivity glass, wood, doors, drywall, and whiteboard at 6.75 GHz and 16.95 GHz. Measurement results show consistent lower penetration loss at 6.75 GHz compared to 16.95 GHz for all ten materials measured for co and cross-polarized antennas at incidence. For instance, the low-emissivity glass wall presents 33.7 dB loss at 6.75 GHz, while presenting 42.3 dB loss at 16.95 GHz. Penetration loss at these frequencies is contrasted with measurements at sub-6 GHz, mmWave and sub-THz frequencies along with 3GPP material penetration loss models. The results provide critical knowledge for future 5G and 6G cellular system deployments as well as refinements for the 3GPP material penetration models.


[40] 2405.01442

Market Power and Withholding Behavior of Energy Storage Units

Electricity markets are experiencing a rapid increase in energy storage unit participation. Unlike conventional generation resources, quantifying the competitive operation and identifying if a storage unit is exercising market power is challenging, particularly in the context of multi-interval bidding strategies. We present a framework to differentiate strategic capacity withholding behaviors attributed to market power from inherent competitive bidding in storage unit strategies. Our framework evaluates the profitability of strategic storage unit participation, analyzing bidding behaviors as both price takers and price makers using a self-scheduling model, and investigates how they leverage market inefficiencies. Specifically, we propose a price sensitivity model derived from the linear supply function equilibrium model to examine the price-anticipating bidding strategy, effectively capturing the influence of market power. We introduce a sufficient ex-post analysis for market operators to identify potential exploitative behaviors by monitoring instances of withholding within the bidding profiles, ensuring market resilience and competitiveness. We discuss and verify applicability of the proposed framework to realistic settings. Our analysis substantiates commonly observed economic bidding behaviors of storage units. Furthermore, it demonstrates that significant price volatility offers considerable profit opportunities not only for participants possessing market power but also for typical strategic profit seekers.


[41] 2405.01503

PAM-UNet: Shifting Attention on Region of Interest in Medical Images

Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To address this limitation, we propose a novel \underline{P}rogressive \underline{A}ttention based \underline{M}obile \underline{UNet} (\underline{PAM-UNet}) architecture. The inverted residual (IR) blocks in PAM-UNet help maintain a lightweight framework, while layerwise \textit{Progressive Luong Attention} ($\mathcal{PLA}$) promotes precise segmentation by directing attention toward regions of interest during synthesis. Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87, while requiring only 1.32 floating-point operations per second (FLOPS) on the Liver Tumor Segmentation Benchmark (LiTS) 2017 dataset. These results highlight the importance of developing efficient segmentation models to accelerate the adoption of AI in clinical practice.


[42] 2405.00694

Analysis of the Efficacy of the Use of Inertial Measurement and Global Positioning System Data to Reverse Engineer Automotive CAN Bus Steering Signals

Autonomous vehicle control is growing in availability for new vehicles and there is a potential need to retrofit older vehicles with this capability. Additionally, automotive cybersecurity has become a significant concern in recent years due to documented attacks on vehicles. As a result, researchers have been exploring reverse engineering techniques to automate vehicle control and improve vehicle security and threat analysis. In prior work, a vehicle's accelerator and brake pedal controller area network (CAN) channels were identified using reverse engineering techniques without prior knowledge of the vehicle. However, the correlation results for deceleration were lower than those for acceleration, which may be able to be improved by incorporating data from an additional telemetry device. In this paper, a method that uses IMU and GPS data to reverse-engineer a vehicle's steering wheel position CAN channels, without prior knowledge of the vehicle, is presented. Using GPS data is shown to greatly improve correlation values for deceleration, particularly for the brake pedal CAN channels. This work demonstrates the efficacy of using these data sources for automotive CAN reverse engineering. This has potential uses in automotive vehicle control and for improving vehicle security and threat analysis.


[43] 2405.00739

Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism

Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating the teacher's behavior does not consistently improve student generalization, posing questions on its possible causes. Confronted with this gap, we hypothesize that diverse attentions in teachers contribute to better student generalization at the expense of reduced fidelity in ensemble KD setups. By increasing data augmentation strengths, our key findings reveal a decrease in the Intersection over Union (IoU) of attentions between teacher models, leading to reduced student overfitting and decreased fidelity. We propose this low-fidelity phenomenon as an underlying characteristic rather than a pathology when training KD. This suggests that stronger data augmentation fosters a broader perspective provided by the divergent teacher ensemble and lower student-teacher mutual information, benefiting generalization performance. These insights clarify the mechanism on low-fidelity phenomenon in KD. Thus, we offer new perspectives on optimizing student model performance, by emphasizing increased diversity in teacher attentions and reduced mimicry behavior between teachers and student.


[44] 2405.00830

Analysis of Quantization Noise Suppression Gains in Digital Phased Arrays

Digital phased arrays have often been disregarded for millimeter-wave communications since the analog-to-digital converters (ADCs) are power-hungry. In this paper, we provide a different perspective on this matter by demonstrating analytically and numerically how the ADC resolution can be reduced when using digital phased arrays. We perform a theoretical analysis of the quantization noise characteristics for an OFDM signal received and processed by a digital phased array, using Gaussian approximation of the OFDM signal. In particular, we quantify the quantization noise suppression factor analytically and numerically. This factor describes how much the coherent combining reduces the quantization noise as a function of the number of antennas, which allows for reducing the ADC bit resolution. For instance in a 8-16 antenna digital phased array the ADC resolution can be reduced with 1-2 bits compared to the ADC required for an analog phased array.


[45] 2405.00837

Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations

Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}^{d \times n}$ and a vector $\mathbf{y} \in \mathbb{R}^d$, the goal is to find coefficients $\mathbf{w} \in \mathbb{R}^n$ so that $\mathbf{X} \mathbf{w} \approx \mathbf{y}$, subject to some desired structure on $\mathbf{w}$. In this work we seek $\mathbf{w}$ that forms a local reconstruction of $\mathbf{y}$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $\mathbf{X}$ that are close to $\mathbf{y}$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $\mathbf{X}$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $d \ll n$. Under the same condition we also show that for any $\mathbf{y}$ contained in the convex hull of $\mathbf{X}$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $\mathbf{y}$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $\mathbf{X}$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.


[46] 2405.00842

Quickest Change Detection with Confusing Change

In the problem of quickest change detection (QCD), a change occurs at some unknown time in the distribution of a sequence of independent observations. This work studies a QCD problem where the change is either a bad change, which we aim to detect, or a confusing change, which is not of our interest. Our objective is to detect a bad change as quickly as possible while avoiding raising a false alarm for pre-change or a confusing change. We identify a specific set of pre-change, bad change, and confusing change distributions that pose challenges beyond the capabilities of standard Cumulative Sum (CuSum) procedures. Proposing novel CuSum-based detection procedures, S-CuSum and J-CuSum, leveraging two CuSum statistics, we offer solutions applicable across all kinds of pre-change, bad change, and confusing change distributions. For both S-CuSum and J-CuSum, we provide analytical performance guarantees and validate them by numerical results. Furthermore, both procedures are computationally efficient as they only require simple recursive updates.


[47] 2405.00867

A Convex Formulation of the Soft-Capture Problem

We present a fast trajectory optimization algorithm for the soft capture of uncooperative tumbling space objects. Our algorithm generates safe, dynamically feasible, and minimum-fuel trajectories for a six-degree-of-freedom servicing spacecraft to achieve soft capture (near-zero relative velocity at contact) between predefined locations on the servicer spacecraft and target body. We solve a convex problem by enforcing a convex relaxation of the field-of-view constraint, followed by a sequential convex program correcting the trajectory for collision avoidance. The optimization problems can be solved with a standard second-order cone programming solver, making the algorithm both fast and practical for implementation in flight software. We demonstrate the performance and robustness of our algorithm in simulation over a range of object tumble rates up to 10{\deg}/s.


[48] 2405.00882

A Differentiable Dynamic Modeling Approach to Integrated Motion Planning and Actuator Physical Design for Mobile Manipulators

This paper investigates the differentiable dynamic modeling of mobile manipulators to facilitate efficient motion planning and physical design of actuators, where the actuator design is parameterized by physically meaningful motor geometry parameters. These parameters impact the manipulator's link mass, inertia, center-of-mass, torque constraints, and angular velocity constraints, influencing control authority in motion planning and trajectory tracking control. A motor's maximum torque/speed and how the design parameters affect the dynamics are modeled analytically, facilitating differentiable and analytical dynamic modeling. Additionally, an integrated locomotion and manipulation planning problem is formulated with direct collocation discretization, using the proposed differentiable dynamics and motor parameterization. Such dynamics are required to capture the dynamic coupling between the base and the manipulator. Numerical experiments demonstrate the effectiveness of differentiable dynamics in speeding up optimization and advantages in task completion time and energy consumption over established sequential motion planning approach. Finally, this paper introduces a simultaneous actuator design and motion planning framework, providing numerical results to validate the proposed differentiable modeling approach for co-design problems.


[49] 2405.00885

WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.


[50] 2405.00911

Stabilization of infinite-dimensional systems under quantization and packet loss

We study the problem of stabilizing infinite-dimensional systems with input and output quantization. The closed-loop system we consider is subject to packet loss in the sensor-to-controller channels, whose duration is assumed to be averagely bounded. Given a bound on the initial state, we propose design methods for dynamic quantizers with zoom parameters. We show that the closed-loop state staring in a given region exponentially converges to zero if the bounds of quantization errors and packet-loss duration satisfy suitable conditions. Since the norms of the operators representing the system dynamics are used in the proposed quantization schemes, we also present methods for approximately computing the operator norms.


[51] 2405.00930

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively disentangle via a concise neural network. The proposed model utilizes Siamese encoders to learn clean representations, further enhanced by the designed mutual information estimator. The Siamese structure and the newly designed convolution module contribute to the lightweight of our model while ensuring performance in diverse voice conversion tasks. The experimental results show that the proposed model achieves comparable subjective scores and exhibits improvements in objective metrics compared to existing methods in a one-shot voice conversion scenario.


[52] 2405.00945

Can FSK Be Optimised for Integrated Sensing and Communications?

Motivated by the ideal peak-to-average-power ratio and radar sensing capability of traditional frequency-coded radar waveforms, this paper considers the frequency shift keying (FSK) based waveform for joint communications and radar (JCR). An analysis of the probability distributions of its ambiguity function (AF) sidelobe levels (SLs) and peak sidelobe level (PSL) is conducted to study the radar sensing capability of random FSK. Numerical results show that the independent frequency modulation introduces uncontrollable AF PSLs. In order to address this problem, the initial phases of waveform sub-pulses are designed by solving a min-max optimisation problem. Numerical results indicate that the optimisation-based phase design can effectively reduce the AF PSL to a level close to well-designed radar waveforms while having no impact on the data rate and the receiver complexity. For large numbers of waveform sub-pulses and modulation orders, the impact on the error probability is also insignificant.


[53] 2405.00947

Co-Optimization of EV Charging Control and Incentivization for Enhanced Power System Stability

We study how high charging rate demands from electric vehicles (EVs) in a power distribution grid may collectively cause its dynamic instability, and, accordingly, how a price incentivization strategy can be used to steer customers to settle for lesser charging rate demands so that these instabilities can be avoided. We pose the problem as a joint optimization and optimal control formulation. The optimization determines the optimal charging setpoints for EVs to minimize the $\mathcal{H}_2$-norm of the transfer function of the grid model, while the optimal control simultaneously develops a linear quadratic regulator (LQR) based state-feedback control signal for the battery-currents of those EVs to jointly minimize the risk of grid instability. A subsequent algorithm is developed to determine how much customers may be willing to sacrifice their intended charging rate demands in return for financial incentives. Results are derived for both unidirectional and bidirectional charging, and validated using numerical simulations of multiple EV charging stations in the IEEE 33-bus power distribution model.


[54] 2405.00958

Generative manufacturing systems using diffusion models and ChatGPT

In this study, we introduce Generative Manufacturing Systems (GMS) as a novel approach to effectively manage and coordinate autonomous manufacturing assets, thereby enhancing their responsiveness and flexibility to address a wide array of production objectives and human preferences. Deviating from traditional explicit modeling, GMS employs generative AI, including diffusion models and ChatGPT, for implicit learning from envisioned futures, marking a shift from a model-optimum to a training-sampling decision-making. Through the integration of generative AI, GMS enables complex decision-making through interactive dialogue with humans, allowing manufacturing assets to generate multiple high-quality global decisions that can be iteratively refined based on human feedback. Empirical findings showcase GMS's substantial improvement in system resilience and responsiveness to uncertainties, with decision times reduced from seconds to milliseconds. The study underscores the inherent creativity and diversity in the generated solutions, facilitating human-centric decision-making through seamless and continuous human-machine interactions.


[55] 2405.00966

Efficient Compression of Multitask Multilingual Speech Models

Whisper is a multitask and multilingual speech model covering 99 languages. It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the model still underperforms on a non-negligible number of under-represented languages, a problem exacerbated in smaller model versions. In this work, we examine its limitations, demonstrating the presence of speaker-related (gender, age) and model-related (resourcefulness and model size) bias. Despite that, we show that only model-related bias are amplified by quantization, impacting more low-resource languages and smaller models. Searching for a better compression approach, we propose DistilWhisper, an approach that is able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities. Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2. This dual approach allows us to effectively boost ASR performance while keeping the robustness inherited from the multitask and multilingual pre-training. Results demonstrate that our approach is more effective than standard fine-tuning or LoRA adapters, boosting performance in the targeted languages for both in- and out-of-domain test sets, while introducing only a negligible parameter overhead at inference.


[56] 2405.00979

Splitting Messages in the Dark- Rate-Splitting Multiple Access for FDD Massive MIMO Without CSI Feedback

A critical hindrance to realize frequency division duplex (FDD) massive multi-input multi-output (MIMO) systems is overhead associated with downlink channel state information at the transmitter (CSIT) acquisition. To address this challenge, we propose a novel framework that achieves robust performances while completely eliminating downlink CSIT training and feedback. Specifically, by exploiting partial frequency invariance of channel parameters between the uplink (UL) and downlink (DL), we adopt the 2D-Newtonized orthogonal matching pursuit (2D-NOMP) algorithm to reconstruct DL CSIT from UL training. Due to inherent discrepancies arising from a carrier frequency difference between two disjoint bands, however, the multi-user interference is inevitable. To overcome this, we propose a precoding method that employs rate-splitting multiple access (RSMA) and also develop an error covariance matrix (ECM) estimation method by using the observed Fisher information matrix (O-FIM). We find that this ECM estimation is crucial for our precoding design in maximizing the sum spectral efficiency (SE). Simulation results show that our method significantly improves the sum SE compared to other state-of-the-art approaches, underscoring the importance of our ECM estimation.


[57] 2405.01000

Low-Complexity Near-Field Localization with XL-MIMO Sectored Uniform Circular Arrays

Rapid advancement of antenna technology catalyses the popularization of extremely large-scale multiple-input multiple-output (XL-MIMO) antenna arrays, which pose unique challenges for localization with the inescapable near-field effect. In this paper, we propose an efficient near-field localization algorithm by leveraging a sectored uniform circular array (sUCA). In particular, we first customize a backprojection algorithm in the polar coordinate for sUCA-enabled near-field localization, which facilitates the target detection procedure. We then analyze the resolutions in both angular and distance domains via deriving the interval of zero-crossing points, and further unravel the minimum required number of antennas to eliminate grating lobes. The proposed localization method is finally implemented using fast Fourier transform (FFT) to reduce computational complexity. Simulation results verify the resolution analysis and demonstrate that the proposed method remarkably outperforms conventional localization algorithms in terms of localization accuracy. Moreover, the low-complexity FFT implementation achieves an average runtime that is hundreds of times faster when large numbers of antenna elements are employed.


[58] 2405.01004

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enhances privacy, boosts performance, and promotes sustainability by effectively balancing energy use and accuracy for specific applications. This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs. We found that changing precision from fp32 to fp16 halves the energy consumption for audio transcription across different models, with minimal performance degradation. A larger model size and number of parameters neither guarantees better resilience to noise, nor predicts the energy consumption for a given transcription load. These, along with several other findings offer novel insights for optimizing ASR systems within energy- and memory-limited environments, crucial for the development of efficient on-device ASR solutions. The code and input data needed to reproduce the results in this article are open sourced are available on [https://github.com/zzadiues3338/ASR-energy-jetson].


[59] 2405.01007

Multi-User Multi-Application Packet Scheduling for Application-Specific QoE Enhancement Based on Knowledge-Embedded DDPG in 6G RAN

The rapidly growing diversity of concurrent applications from both different users and same devices calls for application-specific Quality of Experience (QoE) enhancement of future wireless communications. Achieving this goal relies on application-specific packet scheduling, as it is vital for achieving tailored QoE enhancement by realizing the application-specific Quality of Service (QoS) requirements and for optimal perceived QoE values. However, the intertwining diversified QoE perception mechanisms, fairness among concurrent applications, and the impact of network dynamics inevitably complicate tailored packet scheduling. To achieve concurrent application-specific QoE enhancement, the problem of multi-user multi-application packet scheduling in downlink 6G radio access network (RAN) is first formulated as a Markov decision process (MDP) problem in this paper. For solving this problem, a deep deterministic policy gradient (DDPG)-based solution is proposed. However, due to the high dimensionalities of both the state and action spaces, the trained DDPG agents might generate decisions causing unnecessary resource waste. Hence, a knowledge embedding method is proposed to adjust the decisions of the DDPG agents according to human insights. Extensive experiments are conducted, which demonstrate the superiority of DDPG-based packet schedulers over baseline algorithms and the effectiveness of the proposed knowledge embedding technique.


[60] 2405.01040

Few Shot Class Incremental Learning using Vision-Language models

Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL). In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model's acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.


[61] 2405.01052

Polynomial Chaos Expanded Gaussian Process

In complex and unknown processes, global models are initially generated over the entire experimental space, but they often fail to provide accurate predictions in local areas. Recognizing this limitation, this study addresses the need for models that effectively represent both global and local experimental spaces. It introduces a novel machine learning (ML) approach: Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion (PCE) to calculate input-dependent hyperparameters of the Gaussian process (GP). This approach provides a mathematically interpretable method that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP in these benchmark applications, highlighting model performance that is often competitive with or superior to previous methods. A key advantage of the presented model is the transparency and traceability in the calculation of hyperparameters and model predictions.


[62] 2405.01060

A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands

Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.


[63] 2405.01074

Stability Analysis of Interacting Wireless Repeaters

We consider a wireless network with multiple single-antenna repeaters that amplify and instantaneously re-transmit the signals they receive to improve the channel rank and system coverage. Due to the positive feedback formed by inter-repeater interference, stability could become a critical issue. We investigate the problem of determining the maximum amplification gain that the repeaters can use without breaking the system stability. Specifically, we obtain a bound by using the Gershgorin disc theorem, which reveals that the maximum amplification gain is restricted by the sum of channel amplitude gains. We show by case studies the usefulness of the so-obtained bound and provide insights on how the repeaters should be deployed.


[64] 2405.01095

Transformers Fusion across Disjoint Samples for Hyperspectral Image Classification

3D Swin Transformer (3D-ST) known for its hierarchical attention and window-based processing, excels in capturing intricate spatial relationships within images. Spatial-spectral Transformer (SST), meanwhile, specializes in modeling long-range dependencies through self-attention mechanisms. Therefore, this paper introduces a novel method: an attentional fusion of these two transformers to significantly enhance the classification performance of Hyperspectral Images (HSIs). What sets this approach apart is its emphasis on the integration of attentional mechanisms from both architectures. This integration not only refines the modeling of spatial and spectral information but also contributes to achieving more precise and accurate classification results. The experimentation and evaluation of benchmark HSI datasets underscore the importance of employing disjoint training, validation, and test samples. The results demonstrate the effectiveness of the fusion approach, showcasing its superiority over traditional methods and individual transformers. Incorporating disjoint samples enhances the robustness and reliability of the proposed methodology, emphasizing its potential for advancing hyperspectral image classification.


[65] 2405.01104

Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments

This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we jointly optimize the transmit beamforming at the base station (BS) and the end-to-end transmission matrix of the SIM, to minimize the Cram\'er-Rao Bound (CRB) for target estimation. Effective algorithms such as the alternating optimization (AO) and semidefinite relaxation (SDR) are employed to solve the non-convex SINR-constrained CRB minimization problem. Finally, we design and build an experimental platform for SIM, and evaluate the performance of the proposed algorithms for communication and sensing tasks.


[66] 2405.01107

CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications

Spatial understanding from vision is crucial for robots operating in unstructured environments. In the real world, spatial understanding is often an ill-posed problem. There are a number of powerful classical methods that accurately regress relative pose, however, these approaches often lack the ability to leverage data-derived priors to resolve ambiguities. In multi-robot systems, these challenges are exacerbated by the need for accurate and frequent position estimates of cooperating agents. To this end, we propose CoViS-Net, a cooperative, multi-robot, visual spatial foundation model that learns spatial priors from data. Unlike prior work evaluated primarily on offline datasets, we design our model specifically for online evaluation and real-world deployment on cooperative robots. Our model is completely decentralized, platform agnostic, executable in real-time using onboard compute, and does not require existing network infrastructure. In this work, we focus on relative pose estimation and local Bird's Eye View (BEV) prediction tasks. Unlike classical approaches, we show that our model can accurately predict relative poses without requiring camera overlap, and predict BEVs of regions not visible to the ego-agent. We demonstrate our model on a multi-robot formation control task outside the confines of the laboratory.


[67] 2405.01113

Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data generation in simulation using 3D synthetic environments and CycleGAN domain transfer. We compare this method of data generation to the popular NYUDepth V2 dataset by training a depth estimation model based on the DenseDepth structure using different training sets of real and simulated data. We evaluate the performance of the models on newly collected images and LiDAR depth data from a Husky robot to verify the generalizability of the approach and show that GAN-transformed data can serve as an effective alternative to real-world data, particularly in depth estimation.


[68] 2405.01115

A New Self-Alignment Method without Solving Wahba Problem for SINS in Autonomous Vehicles

Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study presents a new self-alignment method under swaying conditions, which can determine the latitude and attitude simultaneously by utilizing all observation vectors without solving the Wahba problem, and it is different from the existing methods. By constructing the dyadic tensor of each observation and reference vector itself, all equations related to observation and reference vectors are accumulated into one equation, where the latitude variable is extracted and solved according to the same eigenvalues of similar matrices on both sides of the equation, meanwhile the attitude is obtained by eigenvalue decomposition. Simulation and experiment tests verify the effectiveness of the proposed methods, and the alignment result is better than TRIAD in convergence speed and stability and comparable with OBA method in alignment accuracy with or without latitude. It is useful for guiding the design of initial alignment in autonomous vehicle applications.


[69] 2405.01119

Towards Understanding Worldwide Cross-cultural Differences in Implicit Driving Cues: Review, Comparative Analysis, and Research Roadmap

Recognizing and understanding implicit driving cues across diverse cultures is imperative for fostering safe and efficient global transportation systems, particularly when training new immigrants holding driving licenses from culturally disparate countries. Additionally, it is essential to consider cross-cultural differences in the development of Automated Driving features tailored to different countries. Previous piloting studies have compared and analyzed cross-cultural differences in selected implicit driving cues, but they typically examine only limited countries. However, a comprehensive worldwide comparison and analysis are lacking. This study conducts a thorough review of existing literature, online blogs, and expert insights from diverse countries to investigate cross-cultural disparities in driving behaviors, specifically focusing on implicit cues such as non-verbal communication (e.g., hand gestures, signal lighting, honking), norms, and social expectations. Through comparative analysis, variations in driving cues are illuminated across different cultural contexts. Based on the findings and identified gaps, a research roadmap is proposed for future research to further explore and address these differences, aiming to enhance intercultural communication, improve road safety, and increase transportation efficiency on a global scale. This paper presents the pioneering work towards a comprehensive understanding of the implicit driving cues across cultures. Moreover, this understanding will inform the development of automated driving systems tailored to different countries considering cross-cultural differences.


[70] 2405.01124

Investigating Self-Supervised Image Denoising with Denaturation

Self-supervised learning for image denoising problems in the presence of denaturation for noisy data is a crucial approach in machine learning. However, theoretical understanding of the performance of the approach that uses denatured data is lacking. To provide better understanding of the approach, in this paper, we analyze a self-supervised denoising algorithm that uses denatured data in depth through theoretical analysis and numerical experiments. Through the theoretical analysis, we discuss that the algorithm finds desired solutions to the optimization problem with the population risk, while the guarantee for the empirical risk depends on the hardness of the denoising task in terms of denaturation levels. We also conduct several experiments to investigate the performance of an extended algorithm in practice. The results indicate that the algorithm training with denatured images works, and the empirical performance aligns with the theoretical results. These results suggest several insights for further improvement of self-supervised image denoising that uses denatured data in future directions.


[71] 2405.01125

Lipschitz constant estimation for general neural network architectures using control tools

This paper is devoted to the estimation of the Lipschitz constant of neural networks using semidefinite programming. For this purpose, we interpret neural networks as time-varying dynamical systems, where the $k$-th layer corresponds to the dynamics at time $k$. A key novelty with respect to prior work is that we use this interpretation to exploit the series interconnection structure of neural networks with a dynamic programming recursion. Nonlinearities, such as activation functions and nonlinear pooling layers, are handled with integral quadratic constraints. If the neural network contains signal processing layers (convolutional or state space model layers), we realize them as 1-D/2-D/N-D systems and exploit this structure as well. We distinguish ourselves from related work on Lipschitz constant estimation by more extensive structure exploitation (scalability) and a generalization to a large class of common neural network architectures. To show the versatility and computational advantages of our method, we apply it to different neural network architectures trained on MNIST and CIFAR-10.


[72] 2405.01136

Achievable Rate Analysis of Intelligent Omni-Surface Assisted NOMA Holographic MIMO Systems

An intelligent omni-surface (IOS) assisted holographic multiple-input and multiple-output architecture is conceived for $360^\circ$ full-space coverage at a low energy consumption. The theoretical ergodic rate lower bound of our non-orthogonal multiple access (NOMA) scheme is derived based on the moment matching approximation method, while considering the signal distortion at transceivers imposed by hardware impairments (HWIs). Furthermore, the asymptotically ergodic rate lower bound is derived both for an infinite number of IOS elements and for continuous aperture surfaces. Both the theoretical analysis and the simulation results show that the achievable rate of the NOMA scheme is higher than that of its orthogonal multiple access counterpart. Furthermore, owing to the HWIs at the transceivers, the achievable rate saturates at high signal-to-noise ratio region, instead of reaching its theoretical maximum.


[73] 2405.01146

Energy-Efficient Reconfigurable Holographic Surfaces Operating in the Presence of Realistic Hardware Impairments

Reconfigurable holographic surfaces (RHSs) constitute a promising technique of supporting energy-efficient communications. In this paper, we formulate the energy efficiency maximization problem of the switch-controlled RHS-aided beamforming architecture by alternately optimizing the holographic beamformer at the RHS, the digital beamformer, the total transmit power and the power sharing ratio of each user. Specifically, to deal with this challenging non-convex optimization problem, we decouple it into three sub-problems. Firstly, the coefficients of RHS elements responsible for the holographic beamformer are optimized to maximize the sum of the eigen-channel gains of all users by our proposed low-complexity eigen-decomposition (ED) method. Then, the digital beamformer is designed by the singular value decomposition (SVD) method to support multi-user information transfer. Finally, the total transmit power and the power sharing ratio are alternately optimized, while considering the effect of transceiver hardware impairments (HWI). We theoretically derive the spectral efficiency and energy efficiency performance upper bound for the RHS-based beamforming architectures in the presence of HWIs. Our simulation results show that the switch-controlled RHS-aided beamforming architecture achieves higher energy efficiency than the conventional fully digital beamformer and the hybrid beamformer based on phase shift arrays (PSA). Moreover, considering the effect of HWI in the beamforming design can bring about further energy efficiency enhancements.


[74] 2405.01167

Ergodic Spectral Efficiency Analysis of Intelligent Omni-Surface Aided Systems Suffering From Imperfect CSI and Hardware Impairments

In contrast to the conventional reconfigurable intelligent surfaces (RIS), intelligent omni-surfaces (IOS) are capable of full-space coverage of smart radio environments by simultaneously transmitting and reflecting the incident signals. In this paper, we investigate the ergodic spectral efficiency of IOS-aided systems for transmission over random channel links, while considering both realistic imperfect channel state information (CSI) and transceiver hardware impairments (HWIs). Firstly, we formulate the linear minimum mean square error estimator of the equivalent channel spanning from the user equipments (UEs) to the access point (AP), where the transceiver HWIs are also considered. Then, we apply a two-timescale protocol for designing the beamformer of the IOS-aided system. Specifically, for the active AP beamformer, the minimum mean square error combining method is employed, which relies on the estimated equivalent channels, on the statistical information of the channel estimation error, on the inter-user interference as well as on the HWIs at the AP and UEs. By contrast, the passive IOS beamformer is designed based on the statistical CSI for maximizing the upper bound of the ergodic spectral efficiency. The theoretical analysis and simulation results show that the transceiver HWIs have a significant effect on the ergodic spectral efficiency, especially in the high transmit power region. Furthermore, we show that the HWIs at the AP can be effectively compensated by deploying more AP antennas.


[75] 2405.01170

GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we propose a novel transformer-based entropy model called GroupedMixer, which enjoys both faster coding speed and better compression performance than previous transformer-based methods. Specifically, our approach builds upon group-wise autoregression by first partitioning the latent variables into groups along spatial-channel dimensions, and then entropy coding the groups with the proposed transformer-based entropy model. The global causal self-attention is decomposed into more efficient group-wise interactions, implemented using inner-group and cross-group token-mixers. The inner-group token-mixer incorporates contextual elements within a group while the cross-group token-mixer interacts with previously decoded groups. Alternate arrangement of two token-mixers enables global contextual reference. To further expedite the network inference, we introduce context cache optimization to GroupedMixer, which caches attention activation values in cross-group token-mixers and avoids complex and duplicated computation. Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed.


[76] 2405.01171

Modeling pedestrian fundamental diagram based on Directional Statistics

Understanding pedestrian dynamics is crucial for appropriately designing pedestrian spaces. The pedestrian fundamental diagram (FD), which describes the relationship between pedestrian flow and density within a given space, characterizes these dynamics. Pedestrian FDs are significantly influenced by the flow type, such as uni-directional, bi-directional, and crossing flows. However, to the authors' knowledge, generalized pedestrian FDs that are applicable to various flow types have not been proposed. This may be due to the difficulty of using statistical methods to characterize the flow types. The flow types significantly depend on the angles of pedestrian movement; however, these angles cannot be processed by standard statistics due to their periodicity. In this study, we propose a comprehensive model for pedestrian FDs that can describe the pedestrian dynamics for various flow types by applying Directional Statistics. First, we develop a novel statistic describing the pedestrian flow type solely from pedestrian trajectory data using Directional Statistics. Then, we formulate a comprehensive pedestrian FD model that can be applied to various flow types by incorporating the proposed statistics into a traditional pedestrian FD model. The proposed model was validated using actual pedestrian trajectory data. The results confirmed that the model effectively represents the essential nature of pedestrian dynamics, such as the capacity reduction due to conflict of crossing flows and the capacity improvement due to the lane formation in bi-directional flows.


[77] 2405.01172

Frame Codes for the Block-Erasure Channel

Analog codes add redundancy by expanding the dimension using real/complex-valued operations. Frame theory provides a mathematical basis for constructing such codes, with diverse applications in non-orthogonal code-division multiple access (NOMA-CDMA), distributed computation, multiple description source coding, space-time coding (STC), and more. The channel model corresponding to these applications is a combination of noise and erasures. Recent analyses showed a useful connection between spectral random-matrix theory and large equiangular tight frames (ETFs) under random uniform erasures. In this work we generalize this model to a channel where the erasures come in blocks. This particularly fits NOMA-CDMA with multiple transmit antennas for each user and STC with known spatial grouping. We present a method to adjust ETF codes to suit block erasures, and find minimum intra-block-correlation frames which outperform ETFs in this setting.


[78] 2405.01207

Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features

Membership Inference (MI) poses a substantial privacy threat to the training data of Automatic Speech Recognition (ASR) systems, while also offering an opportunity to audit these models with regard to user data. This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models. To the best of our knowledge, this approach has not yet been investigated. We compare our proposed features with commonly used error-based features and find that the proposed features greatly enhance performance for sample-level MI. For speaker-level MI, these features improve results, though by a smaller margin, as error-based features already obtained a high performance for this task. Our findings emphasise the importance of considering different feature sets and levels of access to target models for effective MI in ASR systems, providing valuable insights for auditing such models.


[79] 2405.01215

Movable Antenna Enhanced Wireless Sensing Via Antenna Position Optimization

In this paper, we propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs). First, we show that the angle estimation performance in wireless sensing is fundamentally determined by the array geometry, where the Cramer-Rao bound (CRB) of the mean square error (MSE) for angle of arrival (AoA) estimation is derived as a function of the antennas' positions for both one-dimensional (1D) and two-dimensional (2D) MA arrays. Then, for the case of 1D MA array, we obtain a globally optimal solution for the MAs' positions in closed form to minimize the CRB of AoA estimation MSE. While in the case of 2D MA array, we aim to achieve the minimum of maximum (min-max) CRBs of estimation MSE for the two AoAs with respect to the horizontal and vertical axes, respectively. In particular, for the special case of circular antenna movement region, an optimal solution for the MAs' positions is derived under certain numbers of MAs and circle radii. Thereby, both the lower- and upper-bounds of the min-max CRB are obtained for the antenna movement region with arbitrary shapes. Moreover, we develop an efficient alternating optimization algorithm to obtain a locally optimal solution for MAs' positions by iteratively optimizing one between their horizontal and vertical coordinates with the other being fixed. Numerical results demonstrate that our proposed 1D/2D MA arrays can significantly decrease the CRB of AoA estimation MSE as well as the actual MSE compared to conventional uniform linear arrays (ULAs)/uniform planar arrays (UPAs) with different values of uniform inter-antenna spacing.


[80] 2405.01230

Evaluation of Video-Based rPPG in Challenging Environments: Artifact Mitigation and Network Resilience

Video-based remote photoplethysmography (rPPG) has emerged as a promising technology for non-contact vital sign monitoring, especially under controlled conditions. However, the accurate measurement of vital signs in real-world scenarios faces several challenges, including artifacts induced by videocodecs, low-light noise, degradation, low dynamic range, occlusions, and hardware and network constraints. In this article, we systematically investigate comprehensive investigate these issues, measuring their detrimental effects on the quality of rPPG measurements. Additionally, we propose practical strategies for mitigating these challenges to improve the dependability and resilience of video-based rPPG systems. We detail methods for effective biosignal recovery in the presence of network limitations and present denoising and inpainting techniques aimed at preserving video frame integrity. Through extensive evaluations and direct comparisons, we demonstrate the effectiveness of the approaches in enhancing rPPG measurements under challenging environments, contributing to the development of more reliable and effective remote vital sign monitoring technologies.


[81] 2405.01242

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.


[82] 2405.01258

Towards Consistent Object Detection via LiDAR-Camera Synergy

As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. However, currently, no model exists that can simultaneously detect an object's position in both point clouds and images and ascertain their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://github.com/xifen523/COD.


[83] 2405.01260

Causal Influence in Federated Edge Inference

In this paper, we consider a setting where heterogeneous agents with connectivity are performing inference using unlabeled streaming data. Observed data are only partially informative about the target variable of interest. In order to overcome the uncertainty, agents cooperate with each other by exchanging their local inferences with and through a fusion center. To evaluate how each agent influences the overall decision, we adopt a causal framework in order to distinguish the actual influence of agents from mere correlations within the decision-making process. Various scenarios reflecting different agent participation patterns and fusion center policies are investigated. We derive expressions to quantify the causal impact of each agent on the joint decision, which could be beneficial for anticipating and addressing atypical scenarios, such as adversarial attacks or system malfunctions. We validate our theoretical results with numerical simulations and a real-world application of multi-camera crowd counting.


[84] 2405.01292

Koopman Data-Driven Predictive Control with Robust Stability and Recursive Feasibility Guarantees

In this paper, we consider the design of data-driven predictive controllers for nonlinear systems from input-output data via linear-in-control input Koopman lifted models. Instead of identifying and simulating a Koopman model to predict future outputs, we design a subspace predictive controller in the Koopman space. This allows us to learn the observables minimizing the multi-step output prediction error of the Koopman subspace predictor, preventing the propagation of prediction errors. To avoid losing feasibility of our predictive control scheme due to prediction errors, we compute a terminal cost and terminal set in the Koopman space and we obtain recursive feasibility guarantees through an interpolated initial state. As a third contribution, we introduce a novel regularization cost yielding input-to-state stability guarantees with respect to the prediction error for the resulting closed-loop system. The performance of the developed Koopman data-driven predictive control methodology is illustrated on a nonlinear benchmark example from the literature.


[85] 2405.01293

Low-resource speech recognition and dialect identification of Irish in a multi-task framework

This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID). Results are compared to the current best performing models trained for ASR (TDNN-HMM) and DID (ECAPA-TDNN). An optimal InterCTC setting is initially established using a Conformer encoder. This setting is then used to train a model with an E-branchformer encoder and the performance of both architectures are compared. A multi-task fine-tuning approach is adopted for language model (LM) shallow fusion. The experiments yielded an improvement in DID accuracy of 10.8% relative to a baseline ECAPA-TDNN, and WER performance approaching the TDNN-HMM model. This multi-task approach emerges as a promising strategy for Irish low-resource ASR and DID.


[86] 2405.01352

Using Waste Factor to Optimize Energy Efficiency in Multiple-Input Single-Output (MISO) and Multiple-Input Multiple-Output (MIMO) Systems

This paper introduces Waste Factor (W) and Waste Figure (WF) to assess power efficiency in any multiple-input multiple-output (MIMO) or single-input multiple-output (SIMO) or multiple-input single-output (MISO) cascaded communication system. This paper builds upon the new theory of Waste Factor, which systematically models added wasted power in any cascade for parallel systems such as MISO, SIMO, and MIMO systems, which are prevalent in current wireless networks. Here, we also show the advantage of W compared to conventional metrics for quantifying and analyzing energy efficiency. This work explores the utility of W in assessing energy efficiency in communication channels, within Radio Access Networks (RANs).


[87] 2405.01365

Dynamic Online Ensembles of Basis Expansions

Practical Bayesian learning often requires (1) online inference, (2) dynamic models, and (3) ensembling over multiple different models. Recent advances have shown how to use random feature approximations to achieve scalable, online ensembling of Gaussian processes with desirable theoretical properties and fruitful applications. One key to these methods' success is the inclusion of a random walk on the model parameters, which makes models dynamic. We show that these methods can be generalized easily to any basis expansion model and that using alternative basis expansions, such as Hilbert space Gaussian processes, often results in better performance. To simplify the process of choosing a specific basis expansion, our method's generality also allows the ensembling of several entirely different models, for example, a Gaussian process and polynomial regression. Finally, we propose a novel method to ensemble static and dynamic models together.


[88] 2405.01402

Learning Force Control for Legged Manipulation

Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.


[89] 2405.01437

Two competing populations with a common environmental resource

Feedback-evolving games is a framework that models the co-evolution between payoff functions and an environmental state. It serves as a useful tool to analyze many social dilemmas such as natural resource consumption, behaviors in epidemics, and the evolution of biological populations. However, it has primarily focused on the dynamics of a single population of agents. In this paper, we consider the impact of two populations of agents that share a common environmental resource. We focus on a scenario where individuals in one population are governed by an environmentally "responsible" incentive policy, and individuals in the other population are environmentally "irresponsible". An analysis on the asymptotic stability of the coupled system is provided, and conditions for which the resource collapses are identified. We then derive consumption rates for the irresponsible population that optimally exploit the environmental resource, and analyze how incentives should be allocated to the responsible population that most effectively promote the environment via a sensitivity analysis.


[90] 2405.01471

Saturation of the Multiparameter Quantum Cramér-Rao Bound at the Single-Copy Level with Projective Measurements

Quantum parameter estimation theory is an important component of quantum information theory and provides the statistical foundation that underpins important topics such as quantum system identification and quantum waveform estimation. When there is more than one parameter the ultimate precision in the mean square error given by the quantum Cram\'er-Rao bound is not necessarily achievable. For non-full rank quantum states, it was not known when this bound can be saturated (achieved) when only a single copy of the quantum state encoding the unknown parameters is available. This single-copy scenario is important because of its experimental/practical tractability. Recently, necessary and sufficient conditions for saturability of the quantum Cram\'er-Rao bound in the multiparameter single-copy scenario have been established in terms of i) the commutativity of a set of projected symmetric logarithmic derivatives and ii) the existence of a unitary solution to a system of coupled nonlinear partial differential equations. New sufficient conditions were also obtained that only depend on properties of the symmetric logarithmic derivatives. In this paper, key structural properties of optimal measurements that saturate the quantum Cram\'er-Rao bound are illuminated. These properties are exploited to i) show that the sufficient conditions are in fact necessary and sufficient for an optimal measurement to be projective, ii) give an alternative proof of previously established necessary conditions, and iii) describe general POVMs, not necessarily projective, that saturate the multiparameter QCRB. Examples are given where a unitary solution to the system of nonlinear partial differential equations can be explicitly calculated when the required conditions are fulfilled.


[91] 2405.01504

Evaluation and Optimization of Adaptive Cruise Control in Autonomous Vehicles using the CARLA Simulator: A Study on Performance under Wet and Dry Weather Conditions

Adaptive Cruise Control ACC can change the speed of the ego vehicle to maintain a safe distance from the following vehicle automatically. The primary purpose of this research is to use cutting-edge computing approaches to locate and track vehicles in real time under various conditions to achieve a safe ACC. The paper examines the extension of ACC employing depth cameras and radar sensors within Autonomous Vehicles AVs to respond in real time by changing weather conditions using the Car Learning to Act CARLA simulation platform at noon. The ego vehicle controller's decision to accelerate or decelerate depends on the speed of the leading ahead vehicle and the safe distance from that vehicle. Simulation results show that a Proportional Integral Derivative PID control of autonomous vehicles using a depth camera and radar sensors reduces the speed of the leading vehicle and the ego vehicle when it rains. In addition, longer travel time was observed for both vehicles in rainy conditions than in dry conditions. Also, PID control prevents the leading vehicle from rear collisions


[92] 2405.01515

Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor generalizability and scarce training data. In this paper, we propose a fractional programming (FP) based deep unfolding (DU) approach to address resource allocation problem for a weighted sum rate optimization in RSMA. By carefully designing the penalty function, we couple the variable update with projected gradient descent algorithm (PGD). Following the structure of PGD, we embed few learnable parameters in each layer of the DU network. Through extensive simulation, we have shown that the proposed model-based neural networks has similar performance as optimal results given by traditional algorithm but with much lower computational complexity, less training data, and higher resilience to test set data and out-of-distribution (OOD) data.


[93] 2405.01521

Transformer-Aided Semantic Communications

The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of input data. Such a capability is particularly beneficial in addressing a variety of communication challenges, notably in the realm of semantic communication where proper encoding of the relevant data is critical especially in systems with limited bandwidth. In this work, we employ vision transformers specifically for the purpose of compression and compact representation of the input image, with the goal of preserving semantic information throughout the transmission process. Through the use of the attention mechanism inherent in transformers, we create an attention mask. This mask effectively prioritizes critical segments of images for transmission, ensuring that the reconstruction phase focuses on key objects highlighted by the mask. Our methodology significantly improves the quality of semantic communication and optimizes bandwidth usage by encoding different parts of the data in accordance with their semantic information content, thus enhancing overall efficiency. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset, focusing on both reconstruction quality and accuracy. Our evaluation results demonstrate that our framework successfully preserves semantic information, even when only a fraction of the encoded data is transmitted, according to the intended compression rates.