New articles on Electrical Engineering and Systems Science


[1] 2404.13051

The Development of a Microcontroller based Smoked Fish Machine

The development of a microcontroller-based smoked fish machine aims to combine and automate the boiling, smoking, and drying. The machine consists of an Arduino microcontroller, heater, spark gap igniter, stepper motor with motor driver, temperature sensor, exhaust fan, DC converter, relay module, power supply, and controller box. The main mechanism of the smoked fish machine is a stepper motor that automates the process. The PID is used to optimize the performance of controlling the machine's temperature. The machine used galvanized steel and food-grade material to support the heating of the fish, which helps minimize contamination and burns. The typical time of operation of the entire process using the proposed machine is up to 60 minutes.


[2] 2404.13080

Modeling Rainwater Harvesting Systems with Covered Storage Tank on A Smartphone

A mathematical model, suitable for smartphone implementation, is presented to simulate the performance of a rainwater harvesting system equipped with a covered storage tank. This model serves to determine the optimal tank size and develop strategies to mitigate drought conditions.


[3] 2404.13097

DISC: Latent Diffusion Models with Self-Distillation from Separated Conditions for Prostate Cancer Grading

Latent Diffusion Models (LDMs) can generate high-fidelity images from noise, offering a promising approach for augmenting histopathology images for training cancer grading models. While previous works successfully generated high-fidelity histopathology images using LDMs, the generation of image tiles to improve prostate cancer grading has not yet been explored. Additionally, LDMs face challenges in accurately generating admixtures of multiple cancer grades in a tile when conditioned by a tile mask. In this study, we train specific LDMs to generate synthetic tiles that contain multiple Gleason Grades (GGs) by leveraging pixel-wise annotations in input tiles. We introduce a novel framework named Self-Distillation from Separated Conditions (DISC) that generates GG patterns guided by GG masks. Finally, we deploy a training framework for pixel-level and slide-level prostate cancer grading, where synthetic tiles are effectively utilized to improve the cancer grading performance of existing models. As a result, this work surpasses previous works in two domains: 1) our LDMs enhanced with DISC produce more accurate tiles in terms of GG patterns, and 2) our training scheme, incorporating synthetic data, significantly improves the generalization of the baseline model for prostate cancer grading, particularly in challenging cases of rare GG5, demonstrating the potential of generative models to enhance cancer grading when data is limited.


[4] 2404.13098

Implementing Hottopixx Methods for Endmember Extraction in Hyperspectral Images

Hyperspectral imaging technology has a wide range of applications, including forest management, mineral resource exploration, and Earth surface monitoring. Endmember extraction of hyperspectral images is a key step in leveraging this technology for applications. It aims to identifying the spectral signatures of materials, i.e., the major components in the observed scenes. Theoretically speaking, Hottopixx methods should be effective on problems involving extracting endmembers from hyperspectral images. Yet, these methods are challenging to perform in practice, due to high computational costs. They require us to solve LP problems, called Hottopixx models, whose size grows quadratically with the number of pixels in the image. It is thus still unclear as to whether they are actually effective or not. This study clarifies this situation. We propose an efficient and effective implementation of Hottopixx. Our implementation follows the framework of column generation, which is known as a classical but powerful means of solving large-scale LPs. We show in experiments that our implementation is applicable to the endmember extraction from real hyperspectral images and can provide estimations of endmember signatures with higher accuracy than the existing methods can.


[5] 2404.13101

DensePANet: An improved generative adversarial network for photoacoustic tomography image reconstruction from sparse data

Image reconstruction is an essential step of every medical imaging method, including Photoacoustic Tomography (PAT), which is a promising modality of imaging, that unites the benefits of both ultrasound and optical imaging methods. Reconstruction of PAT images using conventional methods results in rough artifacts, especially when applied directly to sparse PAT data. In recent years, generative adversarial networks (GANs) have shown a powerful performance in image generation as well as translation, rendering them a smart choice to be applied to reconstruction tasks. In this study, we proposed an end-to-end method called DensePANet to solve the problem of PAT image reconstruction from sparse data. The proposed model employs a novel modification of UNet in its generator, called FD-UNet++, which considerably improves the reconstruction performance. We evaluated the method on various in-vivo and simulated datasets. Quantitative and qualitative results show the better performance of our model over other prevalent deep learning techniques.


[6] 2404.13102

Single-sample image-fusion upsampling of fluorescence lifetime images

Fluorescence lifetime imaging microscopy (FLIM) provides detailed information about molecular interactions and biological processes. A major bottleneck for FLIM is image resolution at high acquisition speeds, due to the engineering and signal-processing limitations of time-resolved imaging technology. Here we present single-sample image-fusion upsampling (SiSIFUS), a data-fusion approach to computational FLIM super-resolution that combines measurements from a low-resolution time-resolved detector (that measures photon arrival time) and a high-resolution camera (that measures intensity only). To solve this otherwise ill-posed inverse retrieval problem, we introduce statistically informed priors that encode local and global dependencies between the two single-sample measurements. This bypasses the risk of out-of-distribution hallucination as in traditional data-driven approaches and delivers enhanced images compared for example to standard bilinear interpolation. The general approach laid out by SiSIFUS can be applied to other image super-resolution problems where two different datasets are available.


[7] 2404.13103

ToNNO: Tomographic Reconstruction of a Neural Network's Output for Weakly Supervised Segmentation of 3D Medical Images

Annotating lots of 3D medical images for training segmentation models is time-consuming. The goal of weakly supervised semantic segmentation is to train segmentation models without using any ground truth segmentation masks. Our work addresses the case where only image-level categorical labels, indicating the presence or absence of a particular region of interest (such as tumours or lesions), are available. Most existing methods rely on class activation mapping (CAM). We propose a novel approach, ToNNO, which is based on the Tomographic reconstruction of a Neural Network's Output. Our technique extracts stacks of slices with different angles from the input 3D volume, feeds these slices to a 2D encoder, and applies the inverse Radon transform in order to reconstruct a 3D heatmap of the encoder's predictions. This generic method allows to perform dense prediction tasks on 3D volumes using any 2D image encoder. We apply it to weakly supervised medical image segmentation by training the 2D encoder to output high values for slices containing the regions of interest. We test it on four large scale medical image datasets and outperform 2D CAM methods. We then extend ToNNO by combining tomographic reconstruction with CAM methods, proposing Averaged CAM and Tomographic CAM, which obtain even better results.


[8] 2404.13106

Automatic Cranial Defect Reconstruction with Self-Supervised Deep Deformable Masked Autoencoders

Thousands of people suffer from cranial injuries every year. They require personalized implants that need to be designed and manufactured before the reconstruction surgery. The manual design is expensive and time-consuming leading to searching for algorithms whose goal is to automatize the process. The problem can be formulated as volumetric shape completion and solved by deep neural networks dedicated to supervised image segmentation. However, such an approach requires annotating the ground-truth defects which is costly and time-consuming. Usually, the process is replaced with synthetic defect generation. However, even the synthetic ground-truth generation is time-consuming and limits the data heterogeneity, thus the deep models' generalizability. In our work, we propose an alternative and simple approach to use a self-supervised masked autoencoder to solve the problem. This approach by design increases the heterogeneity of the training set and can be seen as a form of data augmentation. We compare the proposed method with several state-of-the-art deep neural networks and show both the quantitative and qualitative improvement on the SkullBreak and SkullFix datasets. The proposed method can be used to efficiently reconstruct the cranial defects in real time.


[9] 2404.13108

RegWSI: Whole Slide Image Registration using Combined Deep Feature- and Intensity-Based Methods: Winner of the ACROBAT 2023 Challenge

The automatic registration of differently stained whole slide images (WSIs) is crucial for improving diagnosis and prognosis by fusing complementary information emerging from different visible structures. It is also useful to quickly transfer annotations between consecutive or restained slides, thus significantly reducing the annotation time and associated costs. Nevertheless, the slide preparation is different for each stain and the tissue undergoes complex and large deformations. Therefore, a robust, efficient, and accurate registration method is highly desired by the scientific community and hospitals specializing in digital pathology. We propose a two-step hybrid method consisting of (i) deep learning- and feature-based initial alignment algorithm, and (ii) intensity-based nonrigid registration using the instance optimization. The proposed method does not require any fine-tuning to a particular dataset and can be used directly for any desired tissue type and stain. The method scored 1st place in the ACROBAT 2023 challenge. We evaluated using three open datasets: (i) ANHIR, (ii) ACROBAT, and (iii) HyReCo, and performed several ablation studies concerning the resolution used for registration and the initial alignment robustness and stability. The method achieves the most accurate results for the ACROBAT dataset, the cell-level registration accuracy for the restained slides from the HyReCo dataset, and is among the best methods evaluated on the ANHIR dataset. The method does not require any fine-tuning to a new datasets and can be used out-of-the-box for other types of microscopic images. The method is incorporated into the DeeperHistReg framework, allowing others to directly use it to register, transform, and save the WSIs at any desired pyramid level. The proposed method is a significant contribution to the WSI registration, thus advancing the field of digital pathology.


[10] 2404.13116

On fusing active and passive acoustic sensing for simultaneous localization and mapping

Studies on the social behaviors of bats show that they have the ability to eavesdrop on the signals emitted by conspecifics in their vicinity. They can fuse this ``passive" data with actively collected data from their own signals to get more information about their environment, allowing them to fly and hunt more efficiently and to avoid or cause jamming when competing for prey. Acoustic sensors are capable of similar feats but are generally used in only an active or passive capacity at one time. Is there a benefit to using both active and passive sensing simultaneously in the same array? In this work we define a family of models for active, passive, and fused sensing systems to measure range and bearing data from an environment defined by point-based landmarks. These measurements are used to solve the problem of simultaneous localization and mapping (SLAM) with extended Kalman filter (EKF) and FastSLAM 2.0 approaches. Our results show agreement with previous findings. Specifically, when active sensing is limited to a narrow angular range, fused sensing can perform just as accurately if not better, while also allowing the sensor to perceive more of the surrounding environment.


[11] 2404.13142

Decentralized Coordination of Distributed Energy Resources through Local Energy Markets and Deep Reinforcement Learning

As the energy landscape evolves toward sustainability, the accelerating integration of distributed energy resources poses challenges to the operability and reliability of the electricity grid. One significant aspect of this issue is the notable increase in net load variability at the grid edge. Transactive energy, implemented through local energy markets, has recently garnered attention as a promising solution to address the grid challenges in the form of decentralized, indirect demand response on a community level. Given the nature of these challenges, model-free control approaches, such as deep reinforcement learning, show promise for the decentralized automation of participation within this context. Existing studies at the intersection of transactive energy and model-free control primarily focus on socioeconomic and self-consumption metrics, overlooking the crucial goal of reducing community-level net load variability. This study addresses this gap by training a set of deep reinforcement learning agents to automate end-user participation in ALEX, an economy-driven local energy market. In this setting, agents do not share information and only prioritize individual bill optimization. The study unveils a clear correlation between bill reduction and reduced net load variability in this setup. The impact on net load variability is assessed over various time horizons using metrics such as ramping rate, daily and monthly load factor, as well as daily average and total peak export and import on an open-source dataset. Agents are then benchmarked against several baselines, with their performance levels showing promising results, approaching those of a near-optimal dynamic programming benchmark.


[12] 2404.13153

Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Eliminating image blur produced by various kinds of motion has been a challenging problem. Dominant approaches rely heavily on model capacity to remove blurring by reconstructing residual from blurry observation in feature space. These practices not only prevent the capture of spatially variable motion in the real world but also ignore the tailored handling of various motions in image space. In this paper, we propose a novel real-world deblurring filtering model called the Motion-adaptive Separable Collaborative (MISC) Filter. In particular, we use a motion estimation network to capture motion information from neighborhoods, thereby adaptively estimating spatially-variant motion flow, mask, kernels, weights, and offsets to obtain the MISC Filter. The MISC Filter first aligns the motion-induced blurring patterns to the motion middle along the predicted flow direction, and then collaboratively filters the aligned image through the predicted kernels, weights, and offsets to generate the output. This design can handle more generalized and complex motion in a spatially differentiated manner. Furthermore, we analyze the relationships between the motion estimation network and the residual reconstruction network. Extensive experiments on four widely used benchmarks demonstrate that our method provides an effective solution for real-world motion blur removal and achieves state-of-the-art performance. Code is available at https://github.com/ChengxuLiu/MISCFilter


[13] 2404.13185

Unlocking Robust Segmentation Across All Age Groups via Continual Learning

Most deep learning models in medical imaging are trained on adult data with unclear performance on pediatric images. In this work, we aim to address this challenge in the context of automated anatomy segmentation in whole-body Computed Tomography (CT). We evaluate the performance of CT organ segmentation algorithms trained on adult data when applied to pediatric CT volumes and identify substantial age-dependent underperformance. We subsequently propose and evaluate strategies, including data augmentation and continual learning approaches, to achieve good segmentation accuracy across all age groups. Our best-performing model, trained using continual learning, achieves high segmentation accuracy on both adult and pediatric data (Dice scores of 0.90 and 0.84 respectively).


[14] 2404.13222

Vim4Path: Self-Supervised Vision Mamba for Histopathology Images

Representation learning from Gigapixel Whole Slide Images (WSI) poses a significant challenge in computational pathology due to the complicated nature of tissue structures and the scarcity of labeled data. Multi-instance learning methods have addressed this challenge, leveraging image patches to classify slides utilizing pretrained models using Self-Supervised Learning (SSL) approaches. The performance of both SSL and MIL methods relies on the architecture of the feature encoder. This paper proposes leveraging the Vision Mamba (Vim) architecture, inspired by state space models, within the DINO framework for representation learning in computational pathology. We evaluate the performance of Vim against Vision Transformers (ViT) on the Camelyon16 dataset for both patch-level and slide-level classification. Our findings highlight Vim's enhanced performance compared to ViT, particularly at smaller scales, where Vim achieves an 8.21 increase in ROC AUC for models of similar size. An explainability analysis further highlights Vim's capabilities, which reveals that Vim uniquely emulates the pathologist workflow-unlike ViT. This alignment with human expert analysis highlights Vim's potential in practical diagnostic settings and contributes significantly to developing effective representation-learning algorithms in computational pathology. We release the codes and pretrained weights at \url{https://github.com/AtlasAnalyticsLab/Vim4Path}.


[15] 2404.13258

Human Motor Learning Dynamics in High-dimensional Tasks

Conventional approaches to enhancing movement coordination, such as providing instructions and visual feedback, are often inadequate in complex motor tasks with multiple degrees of freedom (DoFs). To effectively address coordination deficits in such complex motor systems, it becomes imperative to develop interventions grounded in a model of human motor learning; however, modeling such learning processes is challenging due to the large DoFs. In this paper, we present a computational motor learning model that leverages the concept of motor synergies to extract low-dimensional learning representations in the high-dimensional motor space and the internal model theory of motor control to capture both fast and slow motor learning processes. We establish the model's convergence properties and validate it using data from a target capture game played by human participants. We study the influence of model parameters on several motor learning trade-offs such as speed-accuracy, exploration-exploitation, satisficing, and flexibility-performance, and show that the human motor learning system tunes these parameters to optimize learning and various output performance metrics.


[16] 2404.13277

Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives

Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models.


[17] 2404.13281

A Massive MIMO Sampling Detection Strategy Based on Denoising Diffusion Model

The Langevin sampling method relies on an accurate score matching while the existing massive multiple-input multiple output (MIMO) Langevin detection involves an inevitable singular value decomposition (SVD) to calculate the posterior score. In this work, a massive MIMO sampling detection strategy that leverages the denoising diffusion model is proposed to narrow the gap between the given iterative detector and the maximum likelihood (ML) detection in an SVD-free manner. Specifically, the proposed score-based sampling detection strategy, denoted as approximate diffusion detection (ADD), is applicable to a wide range of iterative detection methods, and therefore entails a considerable potential in their performance improvement by multiple sampling attempts. On the other hand, the ADD scheme manages to bypass the channel SVD by introducing a reliable iterative detector to produce a sample from the approximate posterior, so that further Langevin sampling is tractable. Customized by the conjugated gradient descent algorithm as an instance, the proposed sampling scheme outperforms the existing score-based detector in terms of a better complexity-performance trade-off.


[18] 2404.13315

BERT: Accelerating Vital Signs Measurement for Bioradar with An Efficient Recursive Technique

Recent years have witnessed the great advance of bioradar system in smart sensing of vital signs (VS) for human healthcare monitoring. As an important part of VS sensing process, VS measurement aims to capture the chest wall micromotion induced by the human respiratory and cardiac activities. Unfortunately, the existing VS measurement methods using bioradar have encountered bottlenecks in making a trade-off between time cost and measurement accuracy. To break this bottleneck, this letter proposes an efficient recursive technique (BERT) heuristically, based on the observation that the features of bioradar VS meet the conditions of Markov model. Extensive experimental results validate that BERT measurement yields lower time costs, competitive estimates of heart rate, breathing rate, and heart rate variability. Our BERT method is promising us a new and superior option to measure VS for bioradar. This work seeks not only to solve the current issue of how to accelerate VS measurement with an acceptable accuracy, but also to inspire creative new ideas that spur further advances in this promising field in the future.


[19] 2404.13325

Integrating Physics-Informed Neural Networks into Power System Dynamic Simulations

Time-domain simulations in power systems are crucial for ensuring power system stability and avoiding critical scenarios that could lead to blackouts. The proliferation of converter-connected resources, however, adds significant additional degrees of non-linearity and complexity to these simulations. This drastically increases the computational time and the number of critical scenarios to be considered. Physics-Informed Neural Networks (PINN) have been shown to accelerate these simulations by several orders of magnitude. This paper introduces the first natural step to remove the barriers for using PINNs in time-domain simulations: it proposes the first method to integrate PINNs in conventional numerical solvers. Integrating PINNs into conventional solvers unlocks a wide range of opportunities. First, PINNs can substantially accelerate simulation time, second, the modeling of components with PINNs allows new ways to reduce privacy concerns when sharing models, and last, enhance the applicability of PINN-based surrogate modeling. We demonstrate the training, integration, and simulation framework for several combinations of PINNs and numerical solution methods, using the IEEE 9-bus system.


[20] 2404.13330

SEGSRNet for Stereo-Endoscopic Image Super-Resolution and Surgical Instrument Segmentation

SEGSRNet addresses the challenge of precisely identifying surgical instruments in low-resolution stereo endoscopic images, a common issue in medical imaging and robotic surgery. Our innovative framework enhances image clarity and segmentation accuracy by applying state-of-the-art super-resolution techniques before segmentation. This ensures higher-quality inputs for more precise segmentation. SEGSRNet combines advanced feature extraction and attention mechanisms with spatial processing to sharpen image details, which is significant for accurate tool identification in medical images. Our proposed model outperforms current models including Dice, IoU, PSNR, and SSIM, SEGSRNet where it produces clearer and more accurate images for stereo endoscopic surgical imaging. SEGSRNet can provide image resolution and precise segmentation which can significantly enhance surgical accuracy and patient care outcomes.


[21] 2404.13372

HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression

This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.


[22] 2404.13376

Cross-Forming Control and Fault Current Limiting for Grid-Forming Inverters

We propose a novel "cross-forming" control concept for grid-forming inverters operating against grid faults. Cross-forming refers to voltage angle forming and current magnitude forming, differing from classical grid-forming and grid-following concepts, i.e., voltage magnitude-and-angle forming and current magnitude-and-angle forming, respectively. Unlike purely grid-forming or grid-following paradigms, the cross-forming concept is motivated by device security requirements for fault current limitation and meanwhile grid code requirements for voltage angle forming preserving. We propose two feasible control strategies to realize the cross-forming concept, enabling inverters to fast-limit the fault current at a prescribed level while preserving voltage angle forming for grid-forming-type synchronization and dynamic ancillary services provision during fault ride-through. Moreover, the cross-forming control yields an equivalent system featuring a constant virtual impedance and an "equivalent normal form" of representation, making transient stability analysis tractable. The equivalent system allows us to extend previous transient stability results to encompass scenarios of current saturation. Simulations and experiments validate the efficacy of the proposed control.


[23] 2404.13386

SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images

Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this article, we established a label-free method, name 'SSVT',which can automatically analyze un-labeled fundus images and generate high evaluation accuracy of 97.0% of four main eye diseases based on six public datasets and two datasets collected by Beijing Tongren Hospital. The promising results showcased the effectiveness of the proposed unsupervised learning method, and the strong application potential in biomedical resource shortage regions to improve global eye health.


[24] 2404.13388

Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To address this dilemma, we propose a general self-supervised machine learning framework that can handle diverse fundus diseases from unlabeled fundus images. Our method's AUC surpasses existing supervised approaches by 15.7%, and even exceeds performance of a single human expert. Furthermore, our model adapts well to various datasets from different regions, races, and heterogeneous image sources or qualities from multiple cameras or devices. Our method offers a label-free general framework to diagnose fundus diseases, which could potentially benefit telehealth programs for early screening of people at risk of vision loss.


[25] 2404.13391

Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context

The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the stochastic nature of bushfire spread, we develop a model to capture such dynamics based on Moore's neighborhood model. Under a periodic inspection scheme that reveals the in-situ bushfire status, we propose an online optimization modeling framework that sequentially plans the power flows in the electricity network. Our framework assumes that the spread of bushfires is non-stationary over time, and the spread and containment probabilities are unknown. To meet these challenges, we develop a contextual online learning algorithm that treats the in-situ geographical information of the bushfire as a 'spatial context'. The online learning algorithm learns the unknown probabilities sequentially based on the observed data and then makes the OPF decision accordingly. The sequential OPF decisions aim to minimize the regret function, which is defined as the cumulative loss against the clairvoyant strategy that knows the true model parameters. We provide a theoretical guarantee of our algorithm by deriving a bound on the regret function, which outperforms the regret bound achieved by other benchmark algorithms. Our model assumptions are verified by the real bushfire data from NSW, Australia, and we apply our model to two power systems to illustrate its applicability.


[26] 2404.13399

A Data-Driven Condition Monitoring Method for Capacitor in Modular Multilevel Converter (MMC)

The modular multilevel converter (MMC) is a topology that consists of a high number of capacitors, and degradation of capacitors can lead to converter malfunction, limiting the overall system lifetime. Condition monitoring methods can be applied to assess the health status of capacitors and realize predictive maintenance to improve reliability. Current research works for condition monitoring of capacitors in an MMC mainly monitor either capacitance or equivalent series resistance (ESR), while these two health indicators can shift at different speeds and lead to different end-of-life times. Hence, monitoring only one of these parameters may lead to unreliable health status evaluation. This paper proposes a data-driven method to estimate capacitance and ESR at the same time, in which particle swarm optimization (PSO) is leveraged to update the obtained estimations. Then, the results of the estimations are used to predict the sub-module voltage, which is based on a capacitor voltage equation. Furthermore, minimizing the mean square error between the predicted and actual measured voltage makes the estimations closer to the actual values. The effectiveness and feasibility of the proposed method are validated through simulations and experiments.


[27] 2404.13452

Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos

High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.


[28] 2404.13484

Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content

The deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation to develop a novel disentangled representation learning method that decomposes inputs into content and appearance features. The model is trained in a self-supervised manner and we use the learned features to develop a new quality prediction model named DisQUE. We demonstrate through extensive evaluations that DisQUE achieves state-of-the-art accuracy across quality prediction tasks and distortion types. Moreover, we demonstrate that the same features may also be used for image processing tasks such as HDR tone mapping, where the desired output characteristics may be tuned using example input-output pairs.


[29] 2404.13537

Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resulting in less-than-ideal restoration outcomes. Inspired by the notion that high/low frequency information is applicable to different degradations, we introduce HLNet, a Bracketing Image Restoration and Enhancement method based on high-low frequency decomposition. Specifically, we employ two modules for feature extraction: shared weight modules and non-shared weight modules. In the shared weight modules, we use SCConv to extract common features from different degradations. In the non-shared weight modules, we introduce the High-Low Frequency Decomposition Block (HLFDB), which employs different methods to handle high-low frequency information, enabling the model to address different degradations more effectively. Compared to other networks, our method takes into account the characteristics of different degradations, thus achieving higher-quality image restoration.


[30] 2404.13673

In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review

Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including optical-based monitoring, acoustic-based sensing, laser line scanning, and operando X-ray monitoring. These techniques are evaluated for their capabilities and limitations in detecting defects within Laser Powder Bed Fusion (LPBF) and Laser Directed Energy Deposition (LDED) processes. Furthermore, the review discusses emerging multisensor monitoring and machine learning (ML)-assisted defect detection methods, benchmarking ML models tailored for in-situ defect detection. The paper also discusses in-situ adaptive defect remediation strategies that advance LAM towards zero-defect autonomous operations, focusing on real-time closed-loop feedback control and defect correction methods. Research gaps such as the need for standardization, improved reliability and sensitivity, and decision-making strategies beyond early stopping are highlighted. Future directions are proposed, with an emphasis on multimodal sensor fusion for multiscale defect prediction and fault diagnosis, ultimately enabling self-adaptation in LAM processes. This paper aims to equip researchers and industry professionals with a holistic understanding of the current capabilities, limitations, and future directions in in-situ process monitoring and adaptive quality enhancement in LAM.


[31] 2404.13693

PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images

Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging making automated defect detection essential. Current automation approaches require extensive manual expert labeling, which is time-consuming, expensive, and prone to errors. We propose PV-S3 (Photovoltaic-Semi Supervised Segmentation), a Semi-Supervised Learning approach for semantic segmentation of defects in EL images that reduces reliance on extensive labeling. PV-S3 is a Deep learning model trained using a few labeled images along with numerous unlabeled images. We introduce a novel Semi Cross-Entropy loss function to train PV-S3 which addresses the challenges specific to automated PV defect detection, such as diverse defect types and class imbalance. We evaluate PV-S3 on multiple datasets and demonstrate its effectiveness and adaptability. With merely 20% labeled samples, we achieve an absolute improvement of 9.7% in IoU, 29.9% in Precision, 12.75% in Recall, and 20.42% in F1-Score over prior state-of-the-art supervised method (which uses 100% labeled samples) on UCF-EL dataset (largest dataset available for semantic segmentation of EL images) showing improvement in performance while reducing the annotation costs by 80%.


[32] 2404.13704

PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation

Imaging modalities such as Computed Tomography (CT) and Positron Emission Tomography (PET) are key in cancer detection, inspiring Deep Neural Networks (DNN) models that merge these scans for tumor segmentation. When both CT and PET scans are available, it is common to combine them as two channels of the input to the segmentation model. However, this method requires both scan types during training and inference, posing a challenge due to the limited availability of PET scans, thereby sometimes limiting the process to CT scans only. Hence, there is a need to develop a flexible DNN architecture that can be trained/updated using only CT scans but can effectively utilize PET scans when they become available. In this work, we propose a parameter-efficient multi-modal adaptation (PEMMA) framework for lightweight upgrading of a transformer-based segmentation model trained only on CT scans to also incorporate PET scans. The benefits of the proposed approach are two-fold. Firstly, we leverage the inherent modularity of the transformer architecture and perform low-rank adaptation (LoRA) of the attention weights to achieve parameter-efficient adaptation. Secondly, since the PEMMA framework attempts to minimize cross modal entanglement, it is possible to subsequently update the combined model using only one modality, without causing catastrophic forgetting of the other modality. Our proposed method achieves comparable results with the performance of early fusion techniques with just 8% of the trainable parameters, especially with a remarkable +28% improvement on the average dice score on PET scans when trained on a single modality.


[33] 2404.13714

Self-Adjusting Prescribed Performance Control for Nonlinear Systems with Input Saturation

Among the existing works on enhancing system performance via prescribed performance functions (PPFs), the decay rates of PPFs need to be predetermined by the designer, directly affecting the convergence time of the closed-loop system. However, if only considering accelerating the system convergence by selecting a big decay rate of the performance function, it may lead to the severe consequence of closed-loop system failure when considering the prevalent actuator saturation in practical scenarios. To address this issue, this work proposes a control scheme that can flexibly self-adjust the convergence rates of the performance functions (PFs), aiming to achieve faster steady-state convergence while avoiding the risk of error violation beyond the PFs' envelopes, which may arise from input saturation and improper decay rate selection in traditional prescribed performance control (PPC) methods. Specifically, a performance index function (PIF) is introduced as a reference criterion, based on which the self-adjusting rates of the PFs are designed for different cases, exhibiting the following appealing features: 1) It can eliminate the need to prespecify the initial values of the PFs. In addition, it can also accommodate arbitrary magnitudes of initial errors while avoiding excessive initial control efforts. 2) Considering actuator saturation, this method can not only reduce the decay rates of the PFs when necessary to avoid violation of the PFs, but also increase the decay rates to accelerate system convergence when there is remaining control capacity. Finally, several numerical simulations are conducted to confirm the effectiveness and superiority of the proposed method.


[34] 2404.13748

Application of Kalman Filter in Stochastic Differential Equations

In areas such as finance, engineering, and science, we often face situations that change quickly and unpredictably. These situations are tough to handle and require special tools and methods capable of understanding and predicting what might happen next. Stochastic Differential Equations (SDEs) are renowned for modeling and analyzing real-world dynamical systems. However, obtaining the parameters, boundary conditions, and closed-form solutions of SDEs can often be challenging. In this paper, we will discuss the application of Kalman filtering theory to SDEs, including Extended Kalman filtering and Particle Extended Kalman filtering. We will explore how to fit existing SDE systems through filtering and track the original SDEs by fitting the obtained closed-form solutions. This approach aims to gather more information about these SDEs, which could be used in various ways, such as incorporating them into parameters of data-based SDE models.


[35] 2404.13756

BC-MRI-SEG: A Breast Cancer MRI Tumor Segmentation Benchmark

Binary breast cancer tumor segmentation with Magnetic Resonance Imaging (MRI) data is typically trained and evaluated on private medical data, which makes comparing deep learning approaches difficult. We propose a benchmark (BC-MRI-SEG) for binary breast cancer tumor segmentation based on publicly available MRI datasets. The benchmark consists of four datasets in total, where two datasets are used for supervised training and evaluation, and two are used for zero-shot evaluation. Additionally we compare state-of-the-art (SOTA) approaches on our benchmark and provide an exhaustive list of available public breast cancer MRI datasets. The source code has been made available at https://irulenot.github.io/BC_MRI_SEG_Benchmark.


[36] 2404.13786

Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components carefully designed to overcome various system and physical challenges. Soar can leverage the existing operational infrastructure like street lampposts for a lower barrier of adoption. Soar adopts a new communication architecture that comprises a bi-directional multi-hop I2I network and a downlink I2V broadcast service, which are designed based on off-the-shelf 802.11ac interfaces in an integrated manner. Soar also features a hierarchical DL task management framework to achieve desirable load balancing among nodes and enable them to collaborate efficiently to run multiple data-intensive autonomous driving applications. We deployed a total of 18 Soar nodes on existing lampposts on campus, which have been operational for over two years. Our real-world evaluation shows that Soar can support a diverse set of autonomous driving applications and achieve desirable real-time performance and high communication reliability. Our findings and experiences in this work offer key insights into the development and deployment of next-generation smart roadside infrastructure and autonomous driving systems.


[37] 2404.13884

MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 FLOPs

Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. In recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. In addition, combining CNN and Transformer can effectively combine global and local information for enhancement. However, this approach is still affected by the secondary complexity of the Transformer and cannot maximize the performance. Recently, the state-space model (SSM) based architecture Mamba has been proposed, which excels in modeling long distances while maintaining linear complexity. This paper explores the potential of this SSM-based model for UIE from both efficiency and effectiveness perspectives. However, the performance of directly applying Mamba is poor because local fine-grained features, which are crucial for image enhancement, cannot be fully utilized. Specifically, we customize the MambaUIE architecture for efficient UIE. Specifically, we introduce visual state space (VSS) blocks to capture global contextual information at the macro level while mining local information at the micro level. Also, for these two kinds of information, we propose a Dynamic Interaction Block (DIB) and Spatial feed-forward Network (SGFN) for intra-block feature aggregation. MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy. Experiments on UIEB datasets show that our method reduces GFLOPs by 67.4% (2.715G) relative to the SOTA method. To the best of our knowledge, this is the first UIE model constructed based on SSM that breaks the limitation of FLOPs on accuracy in UIE. The official repository of MambaUIE at https://github.com/1024AILab/MambaUIE.


[38] 2404.13890

Cell Balancing for the Transportation Sector: Techniques, Challenges, and Future Research Directions

Efficient and reliable energy systems are key to progress of society. High performance batteries are essential for widely used technologies like Electric Vehicles (EVs) and portable electronics. Additionally, an effective Battery Management System (BMS) is crucial to oversee vital parameters of battery. However, BMS can experience cell imbalance due to charging/discharging dynamics, which reduce battery capacity, lifespan, and efficiency, and raise critical safety concerns. This calls for effective cell-balancing techniques. Notably, the existing literature on cell balancing is limited, urgently necessitating a thorough survey to pinpoint key research gaps and suggest prompt solutions. In this article, cell balancing and corresponding techniques are reviewed. Initially, we detail comparison of passive cell balancing techniques and assess their respective advantages, drawbacks, and practical applications. Then, we discuss the strengths and weaknesses of active cell balancing methods and applicability of cell balancing for both, series and parallel-connected cells. Additionally, we examine the need for cell balancing in commonly used batteries, and applications in EVs. Lastly, we present detailed prospects which include challenges and directions for future research.


[39] 2404.13905

SI-FID: Only One Objective Indicator for Evaluating Stitched Images

Image quality evaluation accurately is vital in developing image stitching algorithms as it directly reflects the algorithms progress. However, commonly used objective indicators always produce inconsistent and even conflicting results with subjective indicators. To enhance the consistency between objective and subjective evaluations, this paper introduces a novel indicator the Frechet Distance for Stitched Images (SI-FID). To be specific, our training network employs the contrastive learning architecture overall. We employ data augmentation approaches that serve as noise to distort images in the training set. Both the initial and distorted training sets are then input into the pre-training model for fine-tuning. We then evaluate the altered FID after introducing interference to the test set and examine if the noise can improve the consistency between objective and subjective evaluation results. The rank correlation coefficient is utilized to measure the consistency. SI-FID is an altered FID that generates the highest rank correlation coefficient under the effect of a certain noise. The experimental results demonstrate that the rank correlation coefficient obtained by SI-FID is at least 25% higher than other objective indicators, which means achieving evaluation results closer to human subjective evaluation.


[40] 2404.13918

Emerging Advancements in 6G NTN Radio Access Technologies: An Overview

The efforts on the development, standardization and improvements to communication systems towards 5G Advanced and 6G are on track to provide benefits such as an unprecedented level of connectivity and performance, enabling a diverse range of vertical services. The full integration of non-terrestrial components into 6G plays a pivotal role in realizing this paradigm shift towards ubiquitous communication and global coverage. However, this integration into 6G brings forth a set of its own challenges, particularly in Radio Access Technologies (RATs). To this end, this paper comprehensively discusses those challenges at different levels of RATs and proposes the corresponding potential emerging advancements in the realm of 6G NTN. In particular, the focus is on advancing the prospective aspects of Radio Resource Management (RRM), spectral coexistence in terrestrial and non-terrestrial components and flexible waveform design solutions to combat the impediments. This discussion with a specific focus on emerging advancements in 6G NTN RATs is critical for shaping the next generation networks and potentially relevant in contributing the part in standardization in forthcoming releases


[41] 2404.13920

Open Datasets for Satellite Radio Resource Control

In Non-Terrestrial Networks (NTN), achieving effective radio resource allocation across multi-satellite system, encompassing efficient channel and bandwidth allocation, effective beam management, power control and interference mitigation, poses significant challenges due to the varying satellite links and highly dynamic nature of user traffic. This calls for the development of an intelligent decision-making controller using Artificial Intelligence (AI) to efficiently manage resources in this complex environment. In this context, open datasets can play a crucial role in driving new advancement and facilitating research. Recognizing the significance, this paper aims to contribute the satellite communication research community by providing various open datasets that incorporate realistic traffic flow enabling a variety of uses cases. The primary objective of sharing these datasets is to facilitate the development and benchmarking of advanced resource management solutions, thereby improving the overall satellite communication systems. Furthermore, an application example focused on beam placement optimization via terminal clustering is provided. This assists in optimizing beam allocation task, enabling adaptive beamforming to effectively meet spatiotemporally varying user traffic demands and optimize resource utilization.


[42] 2404.13929

Exploring Kinetic Curves Features for the Classification of Benign and Malignant Breast Lesions in DCE-MRI

Breast cancer is the most common malignant tumor among women and the second cause of cancer-related death. Early diagnosis in clinical practice is crucial for timely treatment and prognosis. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has revealed great usability in the preoperative diagnosis and assessing therapy effects thanks to its capability to reflect the morphology and dynamic characteristics of breast lesions. However, most existing computer-assisted diagnosis algorithms only consider conventional radiomic features when classifying benign and malignant lesions in DCE-MRI. In this study, we propose to fully leverage the dynamic characteristics from the kinetic curves as well as the radiomic features to boost the classification accuracy of benign and malignant breast lesions. The proposed method is a fully automated solution by directly analyzing the 3D features from the DCE-MRI. The proposed method is evaluated on an in-house dataset including 200 DCE-MRI scans with 298 breast tumors (172 benign and 126 malignant tumors), achieving favorable classification accuracy with an area under curve (AUC) of 0.94. By simultaneously considering the dynamic and radiomic features, it is beneficial to effectively distinguish between benign and malignant breast lesions.


[43] 2404.13937

Data-Based System Representation and Synchronization for Multiagent Systems

This paper presents novel solutions of the data-based synchronization problem for continuous-time multiagent systems. We consider the cases of homogeneous and heterogeneous systems. First, a data-based representation of the synchronization error dynamics is obtained for homogeneous systems, using input-state data collected from the agents. Then, we show how to extend existing data-based stabilization results to the multiagent case to stabilize the obtained synchronization errors. The proposed method relies on the solution of a set of linear matrix inequalities that are shown to be feasible. Then, we solve the synchronization problem for heterogeneous systems by means of dynamic controllers. Different from existing results, we do not require model knowledge for the followers and the leader. The theoretical results are finally validated using numerical simulations.


[44] 2404.13941

Autoencoder-assisted Feature Ensemble Net for Incipient Faults

Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to detect, lacking effective detection technologies in the field of fault detection. In this work, we propose Autoencoder-assisted Feature Ensemble Net (AE-FENet): a deep feature ensemble framework that uses the unsupervised autoencoder to conduct the feature transformation. Compared with the principle component analysis (PCA) technique adopted in the original Feature Ensemble Net (FENet), autoencoder can mine more exact features on incipient faults, which results in the better detection performance of AE-FENet. With same kinds of basic detectors, AE-FENet achieves a state-of-the-art average accuracy over 96% on faults 3, 9 and 15 in TEP, which represents a significant enhancement in performance compared to other methods. Plenty of experiments have been done to extend our framework, proving that DLNs can be utilized efficiently within this architecture.


[45] 2404.13955

GNSS Measurement-Based Context Recognition for Vehicle Navigation using Gated Recurrent Unit

Recent years, people have put forward higher and higher requirements for context-adaptive navigation (CAN). CAN system realizes seamless navigation in complex environments by recognizing the ambient surroundings of vehicles, and it is crucial to develop a fast, reliable, and robust navigational context recognition (NCR) method to enable CAN systems to operate effectively. Environmental context recognition based on Global Navigation Satellite System (GNSS) measurements has attracted widespread attention due to its low cost because it does not require additional infrastructure. The performance and application value of NCR methods depend on three main factors: context categorization, feature extraction, and classification models. In this paper, a fine-grained context categorization framework comprising seven environment categories (open sky, tree-lined avenue, semi-outdoor, urban canyon, viaduct-down, shallow indoor, and deep indoor) is proposed, which currently represents the most elaborate context categorization framework known in this research domain. To improve discrimination between categories, a new feature called the C/N0-weighted azimuth distribution factor, is designed. Then, to ensure real-time performance, a lightweight gated recurrent unit (GRU) network is adopted for its excellent sequence data processing capabilities. A dataset containing 59,996 samples is created and made publicly available to researchers in the NCR community on Github. Extensive experiments have been conducted on the dataset, and the results show that the proposed method achieves an overall recognition accuracy of 99.41\% for isolated scenarios and 94.95\% for transition scenarios, with an average transition delay of 2.14 seconds.


[46] 2404.14012

Coordinated Planning for Stability Enhancement in High IBR-Penetrated Systems

Security and stability challenges in future power systems with high penetration Inverter-Based Resources (IBR) have been anticipated as the main barrier to decolonization. Grid-following IBRs may become unstable under small disturbances in weak grids, while, during transient processes, system stability and protection may be jeopardized due to the lack of sufficient Short-Circuit Current (SCC). To solve these challenges and achieve decarbonization, the future system has to be carefully planned. However, it remains unclear how both small-signal and transient processes can be considered during the system planning stage. In this context, this paper proposes a coordinated planning model of different resources to enhance system-level stability. The system strength and SCC constraints are analytically derived by considering the different characteristics of synchronous units and IBRs, which are further effectively linearized through a novel data-driven approach, where an active sampling method is proposed to generate a representative data set. The significant economic value of the proposed coordinated planning framework in both system asset investment and system operation is demonstrated through detailed case studies.


[47] 2404.14048

Optimization-based Heuristic for Vehicle Dynamic Coordination in Mixed Traffic Intersections

In this paper, we address a coordination problem for connected and autonomous vehicles (CAVs) in mixed traffic settings with human-driven vehicles (HDVs). The main objective is to have a safe and optimal crossing order for vehicles approaching unsignalized intersections. This problem results in a mixed-integer quadratic programming (MIQP) formulation which is unsuitable for real-time applications. Therefore, we propose a computationally tractable optimization-based heuristic that monitors platoons of CAVs and HDVs to evaluate whether alternative crossing orders can perform better. It first checks the future constraint violation that consistently occurs between pairs of platoons to determine a potential swap. Next, the costs of quadratic programming (QP) formulations associated with the current and alternative orders are compared in a depth-first branching fashion. In simulations, we show that the heuristic can be a hundred times faster than the original and simplified MIQPs and yields solutions that are close to optimal and have better order consistency.


[48] 2404.14110

HomeLabGym: A real-world testbed for home energy management systems

Amid growing environmental concerns and resulting energy costs, there is a rising need for efficient Home Energy Management Systems (HEMS). Evaluating such innovative HEMS solutions typically relies on simulations that may not model the full complexity of a real-world scenario. On the other hand, real-world testing, while more accurate, is labor-intensive, particularly when dealing with diverse assets, each using a distinct communication protocol or API. Centralizing and synchronizing the control of such a heterogeneous pool of assets thus poses a significant challenge. In this paper, we introduce HomeLabGym, a real-world testbed to ease such real-world evaluations of HEMS and flexible assets control in general, by adhering to the well-known OpenAI Gym paradigm. HomeLabGym allows researchers to prototype, deploy, and analyze HEMS controllers within the controlled test environment of a real-world house (the IDLab HomeLab), providing access to all its available sensors and smart appliances. The easy-to-use Python interface eliminates concerns about intricate communication protocols associated with sensors and appliances, streamlining the evaluation of various control strategies. We present an overview of HomeLabGym, and demonstrate its usefulness to researchers in a comparison between real-world and simulated environments in controlling a residential battery in response to real-time prices.


[49] 2404.14114

Dionysos.jl: a Modular Platform for Smart Symbolic Control

We introduce Dionysos.jl, a modular package for solving optimal control problems for complex dynamical systems using state-of-the-art and experimental techniques from symbolic control, optimization, and learning. More often than not with Cyber-Physical systems, the only sensible way of developing a controller is by discretizing the different variables, thus transforming the control task into a purely combinatorial problem on a finite-state mathematical object, called an abstraction of this system. Although this approach offers a safety-critical framework, the available techniques suffer important scalability issues. In order to render these techniques practical, it is necessary to construct smarter abstractions that differ from classical techniques by partitioning the state-space in a non trivial way.


[50] 2404.14140

Generative Artificial Intelligence Assisted Wireless Sensing: Human Flow Detection in Practical Communication Environments

Groundbreaking applications such as ChatGPT have heightened research interest in generative artificial intelligence (GAI). Essentially, GAI excels not only in content generation but also in signal processing, offering support for wireless sensing. Hence, we introduce a novel GAI-assisted human flow detection system (G-HFD). Rigorously, G-HFD first uses channel state information (CSI) to estimate the velocity and acceleration of propagation path length change of the human-induced reflection (HIR). Then, given the strong inference ability of the diffusion model, we propose a unified weighted conditional diffusion model (UW-CDM) to denoise the estimation results, enabling the detection of the number of targets. Next, we use the CSI obtained by a uniform linear array with wavelength spacing to estimate the HIR's time of flight and direction of arrival (DoA). In this process, UW-CDM solves the problem of ambiguous DoA spectrum, ensuring accurate DoA estimation. Finally, through clustering, G-HFD determines the number of subflows and the number of targets in each subflow, i.e., the subflow size. The evaluation based on practical downlink communication signals shows G-HFD's accuracy of subflow size detection can reach 91%. This validates its effectiveness and underscores the significant potential of GAI in the context of wireless sensing.


[51] 2404.14188

Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging

Ultrafast ultrasound imaging insonifies a medium with one or a combination of a few plane waves at different beam-steered angles instead of many focused waves. It can achieve much higher frame rates, but often at the cost of reduced image quality. Deep learning approaches have been proposed to mitigate this disadvantage, in particular for single plane wave imaging. Predominantly, image-to-image post-processing networks or fully learned data-to-image neural networks are used. Both construct their mapping purely data-driven and require expressive networks and large amounts of training data to perform well. In contrast, we consider data-to-image networks which incorporate a conventional image formation techniques as differentiable layers in the network architecture. This allows for end-to-end training with small amounts of training data. In this work, using f-k migration as an image formation layer is evaluated in-depth with experimental data. We acquired a data collection designed for benchmarking data-driven plane wave imaging approaches using a realistic breast mimicking phantom and an ultrasound calibration phantom. The evaluation considers global and local image similarity measures and contrast, resolution and lesion detectability analysis. The results show that the proposed network architecture is capable of improving the image quality of single plane wave images on all evaluation metrics. Furthermore, these image quality improvements can be achieved with surprisingly little amounts of training data.


[52] 2404.14206

Localization Based on MIMO Backscattering from Retro-Directive Antenna Arrays

In next-generation vehicular environments, precise localization is crucial for facilitating advanced applications such as autonomous driving. As automation levels escalate, the demand rises for enhanced accuracy, reliability, energy efficiency, update rate, and reduced latency in position information delivery. In this paper, we propose the exploitation of backscattering from retro-directive antenna arrays (RAAs) to address these imperatives. We introduce and discuss two RAA-based architectures designed for various applications, including network localization and navigation. These architectures enable swift and simple angle-of-arrival estimation by using signals backscattered from RAAs. They also leverage multiple antennas to capitalize on multiple-input-multiple-output (MIMO) gains, thereby addressing the challenges posed by the inherent path loss in backscatter communication, especially when operating at high frequencies. Consequently, angle-based localization becomes achievable with remarkably low latency, ideal for mobile and vehicular applications. This paper introduces ad-hoc signalling and processing schemes for this purpose, and their performance is analytically investigated. Numerical results underscore the potential of these schemes, offering precise and ultra-low-latency localization with low complexity and ultra-low energy consumption devices.


[53] 2404.14211

Fidelitous Augmentation of Human Accelerometric Data for Deep Learning

Time series (TS) data have consistently been in short supply, yet their demand remains high for training systems in prediction, modeling, classification, and various other applications. Synthesis can serve to expand the sample population, yet it is crucial to maintain the statistical characteristics between the synthesized and the original TS : this ensures consistent sampling of data for both training and testing purposes. However the time domain features of the data may not be maintained. This motivates for our work, the objective which is to preserve the following features in a synthesized TS: its fundamental statistical characteristics and important time domain features like its general shape and prominent transients. In a novel way, we first isolate important TS features into various components using a spectrogram and singular spectrum analysis. The residual signal is then randomized in a way that preserves its statistical properties. These components are then recombined for the synthetic time series. Using accelerometer data in a clinical setting, we use statistical and shape measures to compare our method to others. We show it has higher fidelity to the original signal features, has good diversity and performs better data classification in a deep learning application.


[54] 2404.14269

Hybrid Fusion for 802.11ax Wi-Fi-based Passive Radars Exploiting Beamforming Feedbacks

Passive Wi-Fi-based radars (PWRs) are devices that enable the localization of targets using Wi-Fi signals of opportunity transmitted by an access point. Unlike active radars that optimize their transmitted waveform for localization, PWRs align with the 802.11 amendments. Specifically, during the channel sounding session preceding a multi-user multiple-input multiple-output downlink transmission, an access point isotropically transmits a null data packet (NDP) with a known preamble. From these known symbols, client user equipments derive their channel state information and transmit an unencrypted beamforming feedback (BFF) back to the access point. The BFF comprises the right singular matrix of the channel and the corresponding stream gain for each subcarrier, which allows the computation of a beamforming matrix at the access point. In a classical PWR processing, only the preamble symbols from the NDP are exploited during the channel sounding session. In this study, we investigate multiple target localization by a PWR exploiting hybrid information sources. On one hand, the joint angle-of-departure and angle-of-arrival evaluated from the NDP. On another hand, the line-of-sight angle-of-departures inferred from the BFFs. The processing steps at the PWR are defined and an optimal hybrid fusion rule is derived in the maximum likelihood framework. Monte-Carlo simulations assess the enhanced accuracy of the proposed combination method compared to classical PWR processing based solely on the NDP, and compare the localisation performance between client and non-client targets.


[55] 2404.14319

Multi-Agent Hybrid SAC for Joint SS-DSA in CRNs

Opportunistic spectrum access has the potential to increase the efficiency of spectrum utilization in cognitive radio networks (CRNs). In CRNs, both spectrum sensing and resource allocation (SSRA) are critical to maximizing system throughput while minimizing collisions of secondary users with the primary network. However, many works in dynamic spectrum access do not consider the impact of imperfect sensing information such as mis-detected channels, which the additional information available in joint SSRA can help remediate. In this work, we examine joint SSRA as an optimization which seeks to maximize a CRN's net communication rate subject to constraints on channel sensing, channel access, and transmit power. Given the non-trivial nature of the problem, we leverage multi-agent reinforcement learning to enable a network of secondary users to dynamically access unoccupied spectrum via only local test statistics, formulated under the energy detection paradigm of spectrum sensing. In doing so, we develop a novel multi-agent implementation of hybrid soft actor critic, MHSAC, based on the QMIX mixing scheme. Through experiments, we find that our SSRA algorithm, HySSRA, is successful in maximizing the CRN's utilization of spectrum resources while also limiting its interference with the primary network, and outperforms the current state-of-the-art by a wide margin. We also explore the impact of wireless variations such as coherence time on the efficacy of the system.


[56] 2404.14322

A Novel Approach to Chest X-ray Lung Segmentation Using U-net and Modified Convolutional Block Attention Module

Lung segmentation in chest X-ray images is of paramount importance as it plays a crucial role in the diagnosis and treatment of various lung diseases. This paper presents a novel approach for lung segmentation in chest X-ray images by integrating U-net with attention mechanisms. The proposed method enhances the U-net architecture by incorporating a Convolutional Block Attention Module (CBAM), which unifies three distinct attention mechanisms: channel attention, spatial attention, and pixel attention. The channel attention mechanism enables the model to concentrate on the most informative features across various channels. The spatial attention mechanism enhances the model's precision in localization by focusing on significant spatial locations. Lastly, the pixel attention mechanism empowers the model to focus on individual pixels, further refining the model's focus and thereby improving the accuracy of segmentation. The adoption of the proposed CBAM in conjunction with the U-net architecture marks a significant advancement in the field of medical imaging, with potential implications for improving diagnostic precision and patient outcomes. The efficacy of this method is validated against contemporary state-of-the-art techniques, showcasing its superiority in segmentation performance.


[57] 2404.14371

Analysing the interaction of expansion decisions by end customers and grid development in the context of a municipal energy system

In order to achieve greenhouse gas neutrality by 2045, the Climate Protection Act sets emission reduction targets for the years 2030 and 2040, as well as decreasing annual emission volumes for some sectors, including the building sector. Measures to decarbonize the building sector include energy retrofits and the expansion of renewable, decentralized power generators and low-CO2 heat generators. These measures thus change both the load and the generation of the future energy supply concept. Considering the interactions of the changed installed technologies on the building level and their influence on the electrical grid infrastructure is necessary. The grid operator will remedy the future congested grid states by grid expansion measures and pass on the costs to the connected grid users, which in turn could influence their behaviour and decisions. The aim of this work is a holistic analysis of the staggered interactions of generation expansion and grid expansion for a future decentralized energy supply concept conditioned by the expansion in the field of self-generation. To enable the analysis of the interactions, a multi-criteria optimization procedure for expansion and operation decisions at the building level is combined with an approach to determine grid expansion. As part of this work, the effect of an expansion of hosting capacity on the grid charges and thus the decision-making behaviour was investigated.


[58] 2404.14386

Flexibility Pricing in Distribution Systems: A Direct Method Aligned with Flexibility Models

Effective quantification of costs and values of distributed energy resources (DERs) can enhance the power system's utilization of their flexibility. This paper studies the flexibility pricing of DERs in distribution systems. We first propose a flexibility cost formulation based on adjustment ranges in both power and accumulated energy consumption. Compared with traditional power range-based approaches, the proposed method can quantify the potential inconvenience caused to DER users by flexibility activation and directly capture the time-dependent characteristics of DER's flexibility regions. We then propose a flexibility cost formulation aligned with the aggregated flexibility model and design a DSO flexibility market model to activate the aggregated flexibility in the distribution system to participate in the transmission system operation for energy arbitrage and ancillary services provision. Furthermore, we propose the concept of marginal flexibility price for the settlement of the DSO flexibility market, ensuring incentive compatibility, revenue adequacy, and a non-profit distribution system operator. Numerical tests on the IEEE 33-node distribution system verify the proposed methods.


[59] 2404.13138

Transition-state-theory-based interpretation of Landau double well potential for ferroelectrics

Existence of quasi-static negative capacitance (QSNC) was proposed from an interpretation of the widely accepted Landau model of ferroelectrics. However, many works showed not to support the QSNC theory, making it controversial. In this letter we show the Landau model when used together with transition-state-theory, can connect various models including first-principles, Landau, Preisach and nucleation limited switching while it does not predict the existence of QSNC.


[60] 2404.13140

Intro to Quantum Harmony: Chords in Superposition

Correlations between quantum theory and music theory - specifically between principles of quantum computing and musical harmony - can lead to new understandings and new methodologies for music theorists and composers. The quantum principle of superposition is shown to be closely related to different interpretations of musical meaning. Superposition is implemented directly in the authors' simulations of quantum computing, as applied in the decision-making processes of computer-generated music composition.


[61] 2404.13159

Equivariant Imaging for Self-supervised Hyperspectral Image Inpainting

Hyperspectral imaging (HSI) is a key technology for earth observation, surveillance, medical imaging and diagnostics, astronomy and space exploration. The conventional technology for HSI in remote sensing applications is based on the push-broom scanning approach in which the camera records the spectral image of a stripe of the scene at a time, while the image is generated by the aggregation of measurements through time. In real-world airborne and spaceborne HSI instruments, some empty stripes would appear at certain locations, because platforms do not always maintain a constant programmed attitude, or have access to accurate digital elevation maps (DEM), and the travelling track is not necessarily aligned with the hyperspectral cameras at all times. This makes the enhancement of the acquired HS images from incomplete or corrupted observations an essential task. We introduce a novel HSI inpainting algorithm here, called Hyperspectral Equivariant Imaging (Hyper-EI). Hyper-EI is a self-supervised learning-based method which does not require training on extensive datasets or access to a pre-trained model. Experimental results show that the proposed method achieves state-of-the-art inpainting performance compared to the existing methods.


[62] 2404.13252

3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification

In recent years, Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) due to their self-attention mechanism. Many researchers have incorporated ViTs for Hyperspectral Image (HSI) classification. HSIs are characterised by narrow contiguous spectral bands, providing rich spectral data. Although ViTs excel with sequential data, they cannot extract spectral-spatial information like CNNs. Furthermore, to have high classification performance, there should be a strong interaction between the HSI token and the class (CLS) token. To solve these issues, we propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification that utilizes a 3D-Convolution Guided Residual Module (CGRM) in-between encoders to "fuse" the local spatial and spectral information and to enhance the feature propagation. Furthermore, we forego the class token and instead apply Global Average Pooling, which effectively encodes more discriminative and pertinent high-level features for classification. Extensive experiments have been conducted on three public HSI datasets to show the superiority of the proposed model over state-of-the-art traditional, convolutional, and Transformer models. The code is available at https://github.com/ShyamVarahagiri/3D-ConvSST.


[63] 2404.13278

Federated Transfer Learning with Task Personalization for Condition Monitoring in Ultrasonic Metal Welding

Ultrasonic metal welding (UMW) is a key joining technology with widespread industrial applications. Condition monitoring (CM) capabilities are critically needed in UMW applications because process anomalies significantly deteriorate the joining quality. Recently, machine learning models emerged as a promising tool for CM in many manufacturing applications due to their ability to learn complex patterns. Yet, the successful deployment of these models requires substantial training data that may be expensive and time-consuming to collect. Additionally, many existing machine learning models lack generalizability and cannot be directly applied to new process configurations (i.e., domains). Such issues may be potentially alleviated by pooling data across manufacturers, but data sharing raises critical data privacy concerns. To address these challenges, this paper presents a Federated Transfer Learning with Task Personalization (FTL-TP) framework that provides domain generalization capabilities in distributed learning while ensuring data privacy. By effectively learning a unified representation from feature space, FTL-TP can adapt CM models for clients working on similar tasks, thereby enhancing their overall adaptability and performance jointly. To demonstrate the effectiveness of FTL-TP, we investigate two distinct UMW CM tasks, tool condition monitoring and workpiece surface condition classification. Compared with state-of-the-art FL algorithms, FTL-TP achieves a 5.35%--8.08% improvement of accuracy in CM in new target domains. FTL-TP is also shown to perform excellently in challenging scenarios involving unbalanced data distributions and limited client fractions. Furthermore, by implementing the FTL-TP method on an edge-cloud architecture, we show that this method is both viable and efficient in practice. The FTL-TP framework is readily extensible to various other manufacturing applications.


[64] 2404.13279

Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications

Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic communication systems. Since current works on backdoor attacks are not tailored for semantic communication scenarios, a new backdoor attack paradigm on semantic symbols (BASS) is introduced, based on which the corresponding defense measures are designed. Specifically, a training framework is proposed to prevent BASS. Additionally, reverse engineering-based and pruning-based defense strategies are designed to protect against backdoor attacks in semantic communication. Simulation results demonstrate the effectiveness of both the proposed attack paradigm and the defense strategies.


[65] 2404.13286

Track Role Prediction of Single-Instrumental Sequences

In the composition process, selecting appropriate single-instrumental music sequences and assigning their track-role is an indispensable task. However, manually determining the track-role for a myriad of music samples can be time-consuming and labor-intensive. This study introduces a deep learning model designed to automatically predict the track-role of single-instrumental music sequences. Our evaluations show a prediction accuracy of 87% in the symbolic domain and 84% in the audio domain. The proposed track-role prediction methods hold promise for future applications in AI music generation and analysis.


[66] 2404.13289

Double Mixture: Towards Continual Event Detection from Speech

Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretation of dialogue can vary with environmental context. This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events. We introduce a new task, continual event detection from speech, for which we also provide two benchmark datasets. To address the challenges of catastrophic forgetting and effective disentanglement, we propose a novel method, 'Double Mixture.' This method merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting. Our comprehensive experiments show that this task presents significant challenges that are not effectively addressed by current state-of-the-art methods in either computer vision or natural language processing. Our approach achieves the lowest rates of forgetting and the highest levels of generalization, proving robust across various continual learning sequences. Our code and data are available at https://anonymous.4open.science/status/Continual-SpeechED-6461.


[67] 2404.13298

MARec: Metadata Alignment for cold-start Recommendation

For many recommender systems the primary data source is a historical record of user clicks. The associated click matrix which is often very sparse, however, as the number of users x products can be far larger than the number of clicks, and such sparsity is accentuated in cold-start settings. The sparsity of the click matrix is the reason matrix factorization and autoencoders techniques remain highly competitive across collaborative filtering datasets. In this work, we propose a simple approach to address cold-start recommendations by leveraging content metadata, Metadata Alignment for cold-start Recommendation. we show that this approach can readily augment existing matrix factorization and autoencoder approaches, enabling a smooth transition to top performing algorithms in warmer set-ups. Our experimental results indicate three separate contributions: first, we show that our proposed framework largely beats SOTA results on 4 cold-start datasets with different sparsity and scale characteristics, with gains ranging from +8.4% to +53.8% on reported ranking metrics; second, we provide an ablation study on the utility of semantic features, and proves the additional gain obtained by leveraging such features ranges between +46.8% and +105.5%; and third, our approach is by construction highly competitive in warm set-ups, and we propose a closed-form solution outperformed by SOTA results by only 0.8% on average.


[68] 2404.13321

Accelerated System-Reliability-based Disaster Resilience Analysis for Structural Systems

Resilience has emerged as a crucial concept for evaluating structural performance under disasters because of its ability to extend beyond traditional risk assessments, accounting for a system's ability to minimize disruptions and maintain functionality during recovery. To facilitate the holistic understanding of resilience performance in structural systems, a system-reliability-based disaster resilience analysis framework was developed. The framework describes resilience using three criteria: reliability, redundancy, and recoverability, and the system's internal resilience is evaluated by inspecting the characteristics of reliability and redundancy for different possible progressive failure modes. However, the practical application of this framework has been limited to complex structures with numerous sub-components, as it becomes intractable to evaluate the performances for all possible initial disruption scenarios. To bridge the gap between the theory and practical use, especially for evaluating reliability and redundancy, this study centers on the idea that the computational burden can be substantially alleviated by focusing on initial disruption scenarios that are practically significant. To achieve this research goal, we propose three methods to efficiently eliminate insignificant scenarios: the sequential search method, the n-ball sampling method, and the surrogate model-based adaptive sampling algorithm. Three numerical examples, including buildings and a bridge, are introduced to prove the applicability and efficiency of the proposed approaches. The findings of this study are expected to offer practical solutions to the challenges of assessing resilience performance in complex structural systems.


[69] 2404.13358

Music Consistency Models

Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music Consistency Models (\texttt{MusicCM}), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips, maintaining high quality while minimizing the number of sampling steps. Building upon existing text-to-music diffusion models, the \texttt{MusicCM} model incorporates consistency distillation and adversarial discriminator training. Moreover, we find it beneficial to generate extended coherent music by incorporating multiple diffusion processes with shared constraints. Experimental results reveal the effectiveness of our model in terms of computational efficiency, fidelity, and naturalness. Notable, \texttt{MusicCM} achieves seamless music synthesis with a mere four sampling steps, e.g., only one second per minute of the music clip, showcasing the potential for real-time application.


[70] 2404.13362

Semantically Corrected Amharic Automatic Speech Recognition

Automatic Speech Recognition (ASR) can play a crucial role in enhancing the accessibility of spoken languages worldwide. In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa. Amharic is written in the Ge'ez script, a sequence of graphemes with spacings denoting word boundaries. This makes computational processing of Amharic challenging since the location of spacings can significantly impact the meaning of formed sentences. We find that existing benchmarks for Amharic ASR do not account for these spacings and only measure individual grapheme error rates, leading to significantly inflated measurements of in-the-wild performance. In this paper, we first release corrected transcriptions of existing Amharic ASR test datasets, enabling the community to accurately evaluate progress. Furthermore, we introduce a post-processing approach using a transformer encoder-decoder architecture to organize raw ASR outputs into a grammatically complete and semantically meaningful Amharic sentence. Through experiments on the corrected test dataset, our model enhances the semantic correctness of Amharic speech recognition systems, achieving a Character Error Rate (CER) of 5.5\% and a Word Error Rate (WER) of 23.3\%.


[71] 2404.13371

On Risk-Sensitive Decision Making Under Uncertainty

This paper studies a risk-sensitive decision-making problem under uncertainty. It considers a decision-making process that unfolds over a fixed number of stages, in which a decision-maker chooses among multiple alternatives, some of which are deterministic and others are stochastic. The decision-maker's cumulative value is updated at each stage, reflecting the outcomes of the chosen alternatives. After formulating this as a stochastic control problem, we delineate the necessary optimality conditions for it. Two illustrative examples from optimal betting and inventory management are provided to support our theory.


[72] 2404.13392

Beamforming Design for Integrated Sensing and Communications Using Uplink-Downlink Duality

This paper presents a novel optimization framework for beamforming design in integrated sensing and communication systems where a base station seeks to minimize the Bayesian Cram\'er-Rao bound of a sensing problem while satisfying quality of service constraints for the communication users. Prior approaches formulate the design problem as a semidefinite program for which acquiring a beamforming solution is computationally expensive. In this work, we show that the computational burden can be considerably alleviated. To achieve this, we transform the design problem to a tractable form that not only provides a new understanding of Cram\'er-Rao bound optimization, but also allows for an uplink-downlink duality relation to be developed. Such a duality result gives rise to an efficient algorithm that enables the beamforming design problem to be solved at a much lower complexity as compared to the-state-of-the-art methods.


[73] 2404.13418

Interactive tools for making temporally variable, multiple-attributes, and multiple-instances morphing accessible: Flexible manipulation of divergent speech instances for explorational research and education

We generalized a voice morphing algorithm capable of handling temporally variable, multiple-attributes, and multiple instances. The generalized morphing provides a new strategy for investigating speech diversity. However, excessive complexity and the difficulty of preparation have prevented researchers and students from enjoying its benefits. To address this issue, we introduced a set of interactive tools to make preparation and tests less cumbersome. These tools are integrated into our previously reported interactive tools as extensions. The introduction of the extended tools in lessons in graduate education was successful. Finally, we outline further extensions to explore excessively complex morphing parameter settings.


[74] 2404.13428

Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan

This document outlines the Text-dependent Speaker Verification (TdSV) Challenge 2024, which centers on analyzing and exploring novel approaches for text-dependent speaker verification. The primary goal of this challenge is to motive participants to develop single yet competitive systems, conduct thorough analyses, and explore innovative concepts such as multi-task learning, self-supervised learning, few-shot learning, and others, for text-dependent speaker verification.


[75] 2404.13454

Revolutionizing System Reliability: The Role of AI in Predictive Maintenance Strategies

The landscape of maintenance in distributed systems is rapidly evolving with the integration of Artificial Intelligence (AI). Also, as the complexity of computing continuum systems intensifies, the role of AI in predictive maintenance (Pd.M.) becomes increasingly pivotal. This paper presents a comprehensive survey of the current state of Pd.M. in the computing continuum, with a focus on the combination of scalable AI technologies. Recognizing the limitations of traditional maintenance practices in the face of increasingly complex and heterogenous computing continuum systems, the study explores how AI, especially machine learning and neural networks, is being used to enhance Pd.M. strategies. The survey encompasses a thorough review of existing literature, highlighting key advancements, methodologies, and case studies in the field. It critically examines the role of AI in improving prediction accuracy for system failures and in optimizing maintenance schedules, thereby contributing to reduced downtime and enhanced system longevity. By synthesizing findings from the latest advancements in the field, the article provides insights into the effectiveness and challenges of implementing AI-driven predictive maintenance. It underscores the evolution of maintenance practices in response to technological advancements and the growing complexity of computing continuum systems. The conclusions drawn from this survey are instrumental for practitioners and researchers in understanding the current landscape and future directions of Pd.M. in distributed systems. It emphasizes the need for continued research and development in this area, pointing towards a trend of more intelligent, efficient, and cost-effective maintenance solutions in the era of AI.


[76] 2404.13456

Real-Time Safe Control of Neural Network Dynamic Models with Sound Approximation

Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernstein polynomial over-approximation (BPO) of ReLU activation functions in NNDM. To mitigate the errors introduced by the approximation and to ensure persistent feasibility of the safe control problems, we synthesize a worst-case safety index using the most unsafe approximated state within the BPO relaxation of NNDM offline. For the online real-time optimization, we formulate the first-order Taylor approximation of the nonlinear worst-case safety constraint as an additional linear layer of NNDM with the l2 bounded bias term for the higher-order remainder. Comprehensive experiments with different neural dynamics and safety constraints show that with safety guaranteed, our NNDMs with sound approximation are 10-100 times faster than the safe control baseline that uses mixed integer programming (MIP), validating the effectiveness of the worst-case safety index and scalability of the proposed BOND in real-time large-scale settings.


[77] 2404.13494

Performance Analysis of RIS-Assisted Spectrum Sharing Systems

We propose a reconfigurable intelligent surface (RIS)-assisted underlay spectrum sharing system, in which a RIS-assisted secondary network shares the spectrum licensed for a primary network. The secondary network consists of a secondary source (SS), an RIS, and a secondary destination (SD), operating in a Rician fading environment. We study the performance of the secondary network while considering a peak power constraint at the SS and an interference power constraint at the primary receiver (PR). Initially, we characterize the statistics of the signal-to-noise ratio (SNR) of the RIS-assisted secondary network by deriving novel analytical expressions for the cumulative distribution function (CDF) and probability density function (PDF) in terms of the incomplete H-function. Building upon the SNR statistics, we analyze the outage probability, ergodic capacity, and average bit error rate, subsequently deriving novel exact expressions for these performance measures. Furthermore, we obtain novel asymptotic expressions for the performance measures of interest when the peak power of the SS is high. Finally, we conduct exhaustive Monte-Carlo simulations to confirm the correctness of our theoretical analysis.


[78] 2404.13509

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

Speech emotion recognition is crucial in human-computer interaction, but extracting and using emotional cues from audio poses challenges. This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio. We employ the Multi-Spatial Fusion module (MF) to efficiently identify emotion-related spectrogram regions and integrate Hubert features for higher-level acoustic information. Our approach also includes a Hierarchical Cooperative Attention module (HCA) to merge features from various auditory levels. We evaluate our method on the IEMOCAP dataset and achieve 2.6\% and 1.87\% improvements on the weighted accuracy and unweighted accuracy, respectively. Extensive experiments demonstrate the effectiveness of the proposed method.


[79] 2404.13529

Towards Parameter-free Distributed Optimization: a Port-Hamiltonian Approach

This paper introduces a novel distributed optimization technique for networked systems, which removes the dependency on specific parameter choices, notably the learning rate. Traditional parameter selection strategies in distributed optimization often lead to conservative performance, characterized by slow convergence or even divergence if parameters are not properly chosen. In this work, we propose a systems theory tool based on the port-Hamiltonian formalism to design algorithms for consensus optimization programs. Moreover, we propose the Mixed Implicit Discretization (MID), which transforms the continuous-time port-Hamiltonian system into a discrete time one, maintaining the same convergence properties regardless of the step size parameter. The consensus optimization algorithm enhances the convergence speed without worrying about the relationship between parameters and stability. Numerical experiments demonstrate the method's superior performance in convergence speed, outperforming other methods, especially in scenarios where conventional methods fail due to step size parameter limitations.


[80] 2404.13536

Joint Transmit and Reflective Beamforming for Multi-Active-IRS-Assisted Cooperative Sensing

This paper studies multi-active intelligent-reflecting-surface (IRS) cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to help the base station (BS) provide multi-view sensing. We focus on the scenario where the sensing target is located in the non-line-of-sight (NLoS) area of the BS. Based on the received echo signal, the BS aims to estimate the target's direction-of-arrival (DoA) with respect to each IRS. In addition, we leverage active IRSs to overcome the severe path loss induced by multi-hop reflections. Under this setup, we minimize the maximum Cram\'{e}r-Rao bound (CRB) among all IRSs by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum transmit power and the maximum power amplification gain at individual IRSs. To tackle the resulting highly non-convex max-CRB minimization problem, we propose an efficient algorithm based on alternating optimization, successive convex approximation, and semi-definite relaxation, to obtain a high-quality solution. Finally, numerical results are provided to verify the effectiveness of our proposed design and the benefits of active IRS-assisted sensing compared to the counterpart with passive IRSs.


[81] 2404.13550

Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90$\sim$160$\times$ faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.


[82] 2404.13551

AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition

Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt and ConvNeXt, we propose AudioRepInceptionNeXt, a single-stream architecture. Its basic building block breaks down the parallel multi-branch depth-wise convolutions with descending scales of k x k kernels into a cascade of two multi-branch depth-wise convolutions. The first multi-branch consists of parallel multi-scale 1 x k depth-wise convolutional layers followed by a similar multi-branch employing parallel multi-scale k x 1 depth-wise convolutional layers. This reduces computational and memory footprint while separating time and frequency processing of Mel-Spectrograms. The large kernels capture global frequencies and long activities, while small kernels get local frequencies and short activities. We also reparameterize the multi-branch design during inference to further boost speed without losing accuracy. Experiments show that AudioRepInceptionNeXt reduces parameters and computations by 50%+ and improves inference speed 1.28x over state-of-the-art CNNs like the Slow-Fast while maintaining comparable accuracy. It also learns robustly across a variety of audio recognition tasks. Codes are available at https://github.com/StevenLauHKHK/AudioRepInceptionNeXt.


[83] 2404.13568

Sparse Direction of Arrival Estimation Method Based on Vector Signal Reconstruction with a Single Vector Sensor

This study investigates the application of single vector hydrophones in underwater acoustic signal processing for Direction of Arrival (DOA) estimation. Addressing the limitations of traditional DOA estimation methods in multi-source environments and under noise interference, this research proposes a Vector Signal Reconstruction (VSR) technique. This technique transforms the covariance matrix of single vector hydrophone signals into a Toeplitz structure suitable for gridless sparse methods through complex calculations and vector signal reconstruction. Furthermore, two sparse DOA estimation algorithms based on vector signal reconstruction are introduced. Theoretical analysis and simulation experiments demonstrate that the proposed algorithms significantly improve the accuracy and resolution of DOA estimation in multi-source signals and low Signal-to-Noise Ratio (SNR) environments compared to traditional algorithms. The contribution of this study lies in providing an effective new method for DOA estimation with single vector hydrophones in complex environments, introducing new research directions and solutions in the field of vector hydrophone signal processing.


[84] 2404.13569

Musical Word Embedding for Music Tagging and Retrieval

Word embedding has become an essential means for text-based information retrieval. Typically, word embeddings are learned from large quantities of general and unstructured text data. However, in the domain of music, the word embedding may have difficulty understanding musical contexts or recognizing music-related entities like artists and tracks. To address this issue, we propose a new approach called Musical Word Embedding (MWE), which involves learning from various types of texts, including both everyday and music-related vocabulary. We integrate MWE into an audio-word joint representation framework for tagging and retrieving music, using words like tag, artist, and track that have different levels of musical specificity. Our experiments show that using a more specific musical word like track results in better retrieval performance, while using a less specific term like tag leads to better tagging performance. To balance this compromise, we suggest multi-prototype training that uses words with different levels of musical specificity jointly. We evaluate both word embedding and audio-word joint embedding on four tasks (tag rank prediction, music tagging, query-by-tag, and query-by-track) across two datasets (Million Song Dataset and MTG-Jamendo). Our findings show that the suggested MWE is more efficient and robust than the conventional word embedding.


[85] 2404.13598

An Integrated Communication and Computing Scheme for Wi-Fi Networks based on Generative AI and Reinforcement Learning

The continuous evolution of future mobile communication systems is heading towards the integration of communication and computing, with Mobile Edge Computing (MEC) emerging as a crucial means of implementing Artificial Intelligence (AI) computation. MEC could enhance the computational performance of wireless edge networks by offloading computing-intensive tasks to MEC servers. However, in edge computing scenarios, the sparse sample problem may lead to high costs of time-consuming model training. This paper proposes an MEC offloading decision and resource allocation solution that combines generative AI and deep reinforcement learning (DRL) for the communication-computing integration scenario in the 802.11ax Wi-Fi network. Initially, the optimal offloading policy is determined by the joint use of the Generative Diffusion Model (GDM) and the Twin Delayed DDPG (TD3) algorithm. Subsequently, resource allocation is accomplished by using the Hungarian algorithm. Simulation results demonstrate that the introduction of Generative AI significantly reduces model training costs, and the proposed solution exhibits significant reductions in system task processing latency and total energy consumption costs.


[86] 2404.13603

Beyond MMSE: Rank-1 Subspace Channel Estimator for Massive MIMO Systems

To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In this paper, we develop a novel rank-1 subspace channel estimator to approximate the maximum likelihood (ML) estimator, which outperforms the linear MMSE estimator, but incurs a surprisingly low computational complexity. Our method first acquires the highly accurate angle-of-arrival (AoA) information via a constructed space-embedding matrix and the rank-1 subspace method. Then, it adopts the post-reception beamforming to acquire the unbiased estimate of channel gains. Furthermore, a fast method is designed to implement our new estimator. Theoretical analysis shows that the extra gain achieved by our method over the linear MMSE estimator grows according to the rule of O($\log_{10}M$), while its computational complexity is linearly scalable to the number of antennas $M$. Numerical simulations also validate the theoretical results. Our new method substantially extends the accuracy-complexity region and constitutes a promising channel estimation solution to the emerging massive MIMO communications.


[87] 2404.13605

Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence

Tackling image degradation due to atmospheric turbulence, particularly in dynamic environment, remains a challenge for long-range imaging systems. Existing techniques have been primarily designed for static scenes or scenes with small motion. This paper presents the first segment-then-restore pipeline for restoring the videos of dynamic scenes in turbulent environment. We leverage mean optical flow with an unsupervised motion segmentation method to separate dynamic and static scene components prior to restoration. After camera shake compensation and segmentation, we introduce foreground/background enhancement leveraging the statistics of turbulence strength and a transformer model trained on a novel noise-based procedural turbulence generator for fast dataset augmentation. Benchmarked against existing restoration methods, our approach restores most of the geometric distortion and enhances sharpness for videos. We make our code, simulator, and data publicly available to advance the field of video restoration from turbulence: riponcs.github.io/TurbSegRes


[88] 2404.13626

Safe Force/Position Tracking Control via Control Barrier Functions for Floating Base Mobile Manipulator Systems

This paper introduces a safe force/position tracking control strategy designed for Free-Floating Mobile Manipulator Systems (MMSs) engaging in compliant contact with planar surfaces. The strategy uniquely integrates the Control Barrier Function (CBF) to manage operational limitations and safety concerns. It effectively addresses safety-critical aspects in the kinematic as well as dynamic level, such as manipulator joint limits, system velocity constraints, and inherent system dynamic uncertainties. The proposed strategy remains robust to the uncertainties of the MMS dynamic model, external disturbances, or variations in the contact stiffness model. The proposed control method has low computational demand ensures easy implementation on onboard computing systems, endorsing real-time operations. Simulation results verify the strategy's efficacy, reflecting enhanced system performance and safety.


[89] 2404.13640

Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

Multiple complex degradations are coupled in low-quality video faces in the real world. Therefore, blind video face restoration is a highly challenging ill-posed problem, requiring not only hallucinating high-fidelity details but also enhancing temporal coherence across diverse pose variations. Restoring each frame independently in a naive manner inevitably introduces temporal incoherence and artifacts from pose changes and keypoint localization errors. To address this, we propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment. PGTFormer leverages semantic parsing guidance to select optimal face priors for generating temporally coherent artifact-free results. Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors. Then, the temporal parse-guided codebook predictor (TPCP) restores faces in different poses based on face parsing context cues without performing face pre-alignment. This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment. Finally, the temporal fidelity regulator (TFR) enhances fidelity through temporal feature interaction and improves video temporal consistency. Extensive experiments on face videos show that our method outperforms previous face restoration baselines. The code will be released on \href{https://github.com/kepengxu/PGTFormer}{https://github.com/kepengxu/PGTFormer}.


[90] 2404.13677

A Dataset and Model for Realistic License Plate Deblurring

Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we introduce the first large-scale license plate deblurring dataset named License Plate Blur (LPBlur), captured by a dual-camera system and processed through a post-processing pipeline to avoid misalignment issues. Then, we propose a License Plate Deblurring Generative Adversarial Network (LPDGAN) to tackle the license plate deblurring: 1) a Feature Fusion Module to integrate multi-scale latent codes; 2) a Text Reconstruction Module to restore structure through textual modality; 3) a Partition Discriminator Module to enhance the model's perception of details in each letter. Extensive experiments validate the reliability of the LPBlur dataset for both model training and testing, showcasing that our proposed model outperforms other state-of-the-art motion deblurring methods in realistic license plate deblurring scenarios. The dataset and code are available at https://github.com/haoyGONG/LPDGAN.


[91] 2404.13681

$\mathsf{QuITO}$ $\textsf{v.2}$: Trajectory Optimization with Uniform Error Guarantees under Path Constraints

This article introduces a new transcription, change point localization, and mesh refinement scheme for direct optimization-based solutions and for uniform approximation of optimal control trajectories associated with a class of nonlinear constrained optimal control problems (OCPs). The base transcription algorithm for which we establish the refinement algorithm is a $\textit{direct multiple shooting technique}$ -- $\mathsf{QuITO}$ $\textsf{v.2}$ (Quasi-Interpolation based Trajectory Optimization). The mesh refinement technique consists of two steps -- localization of certain irregular regions in an optimal control trajectory via wavelets, followed by a targeted $h$-refinement approach around such regions of irregularity. Theoretical approximation guarantees on uniform grids are presented for optimal controls with certain regularity properties along with guarantees of localization of change points by wavelet transform. Numerical illustrations are provided for control profiles involving discontinuities to show the effectiveness of the localization and refinement strategy. We also announce and make freely available a new software package developed based on $\mathsf{QuITO}$ $\textsf{v.2}$ along with its functionalities to make the article complete.


[92] 2404.13742

Seamless Underwater Navigation with Limited Doppler Velocity Log Measurements

Autonomous Underwater Vehicles (AUVs) commonly utilize an inertial navigation system (INS) and a Doppler velocity log (DVL) for underwater navigation. To that end, their measurements are integrated through a nonlinear filter such as the extended Kalman filter (EKF). The DVL velocity vector estimate depends on retrieving reflections from the seabed, ensuring that at least three out of its four transmitted acoustic beams return successfully. When fewer than three beams are obtained, the DVL cannot provide a velocity update to bind the navigation solution drift. To cope with this challenge, in this paper, we propose a hybrid neural coupled (HNC) approach for seamless AUV navigation in situations of limited DVL measurements. First, we drive an approach to regress two or three missing DVL beams. Then, those beams, together with the measured beams, are incorporated into the EKF. We examined INS/DVL fusion both in loosely and tightly coupled approaches. Our method was trained and evaluated on recorded data from AUV experiments conducted in the Mediterranean Sea on two different occasions. The results illustrate that our proposed method outperforms the baseline loosely and tightly coupled model-based approaches by an average of 96.15%. It also demonstrates superior performance compared to a model-based beam estimator by an average of 12.41% in terms of velocity accuracy for scenarios involving two or three missing beams. Therefore, we demonstrate that our approach offers seamless AUV navigation in situations of limited beam measurements.


[93] 2404.13789

Anchor-aware Deep Metric Learning for Audio-visual Retrieval

Metric learning minimizes the gap between similar (positive) pairs of data points and increases the separation of dissimilar (negative) pairs, aiming at capturing the underlying data structure and enhancing the performance of tasks like audio-visual cross-modal retrieval (AV-CMR). Recent works employ sampling methods to select impactful data points from the embedding space during training. However, the model training fails to fully explore the space due to the scarcity of training data points, resulting in an incomplete representation of the overall positive and negative distributions. In this paper, we propose an innovative Anchor-aware Deep Metric Learning (AADML) method to address this challenge by uncovering the underlying correlations among existing data points, which enhances the quality of the shared embedding space. Specifically, our method establishes a correlation graph-based manifold structure by considering the dependencies between each sample as the anchor and its semantically similar samples. Through dynamic weighting of the correlations within this underlying manifold structure using an attention-driven mechanism, Anchor Awareness (AA) scores are obtained for each anchor. These AA scores serve as data proxies to compute relative distances in metric learning approaches. Extensive experiments conducted on two audio-visual benchmark datasets demonstrate the effectiveness of our proposed AADML method, significantly surpassing state-of-the-art models. Furthermore, we investigate the integration of AA proxies with various metric learning methods, further highlighting the efficacy of our approach.


[94] 2404.13804

Adaptive Heterogeneous Client Sampling for Federated Learning over Wireless Networks

Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm for FL over wireless networks that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probability. Based on the bound, we analytically establish the relationship between the total learning time and sampling probability with an adaptive bandwidth allocation scheme, which results in a non-convex optimization problem. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Our solution reveals the impact of system and statistical heterogeneity parameters on the optimal client sampling design. Moreover, our solution shows that as the number of sampled clients increases, the total convergence time first decreases and then increases because a larger sampling number reduces the number of rounds for convergence but results in a longer expected time per-round due to limited wireless bandwidth. Experimental results from both hardware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes.


[95] 2404.13821

Robotic Blended Sonification: Consequential Robot Sound as Creative Material for Human-Robot Interaction

Current research in robotic sounds generally focuses on either masking the consequential sound produced by the robot or on sonifying data about the robot to create a synthetic robot sound. We propose to capture, modify, and utilise rather than mask the sounds that robots are already producing. In short, this approach relies on capturing a robot's sounds, processing them according to contextual information (e.g., collaborators' proximity or particular work sequences), and playing back the modified sound. Previous research indicates the usefulness of non-semantic, and even mechanical, sounds as a communication tool for conveying robotic affect and function. Adding to this, this paper presents a novel approach which makes two key contributions: (1) a technique for real-time capture and processing of consequential robot sounds, and (2) an approach to explore these sounds through direct human-robot interaction. Drawing on methodologies from design, human-robot interaction, and creative practice, the resulting 'Robotic Blended Sonification' is a concept which transforms the consequential robot sounds into a creative material that can be explored artistically and within application-based studies.


[96] 2404.13875

Active RIS-Aided Massive MIMO Uplink Systems with Low-Resolution ADCs

This letter considers an active reconfigurable intelligent surface (RIS)-aided multi-user uplink massive multipleinput multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs). The letter derives the closedform approximate expression for the sum achievable rate (AR), where the maximum ratio combination (MRC) processing and low-resolution ADCs are applied at the base station. The system performance is analyzed, and a genetic algorithm (GA)-based method is proposed to optimize the RIS's phase shifts for enhancing the system performance. Numerical results verify the accuracy of the derivations, and demonstrate that the active RIS has an evident performance gain over the passive RIS.


[97] 2404.13892

Retrieval-Augmented Audio Deepfake Detection

With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired by retrieval-augmented generation (RAG), we propose a retrieval-augmented detection (RAD) framework that augments test samples with similar retrieved samples for enhanced detection. We also extend the multi-fusion attentive classifier to integrate it with our proposed RAD framework. Extensive experiments show the superior performance of the proposed RAD framework over baseline methods, achieving state-of-the-art results on the ASVspoof 2021 DF set and competitive results on the 2019 and 2021 LA sets. Further sample analysis indicates that the retriever consistently retrieves samples mostly from the same speaker with acoustic characteristics highly consistent with the query audio, thereby improving detection performance.


[98] 2404.13914

Audio Anti-Spoofing Detection: A Survey

The availability of smart devices leads to an exponential increase in multimedia content. However, the rapid advancements in deep learning have given rise to sophisticated algorithms capable of manipulating or creating multimedia fake content, known as Deepfake. Audio Deepfakes pose a significant threat by producing highly realistic voices, thus facilitating the spread of misinformation. To address this issue, numerous audio anti-spoofing detection challenges have been organized to foster the development of anti-spoofing countermeasures. This survey paper presents a comprehensive review of every component within the detection pipeline, including algorithm architectures, optimization techniques, application generalizability, evaluation metrics, performance comparisons, available datasets, and open-source availability. For each aspect, we conduct a systematic evaluation of the recent advancements, along with discussions on existing challenges. Additionally, we also explore emerging research topics on audio anti-spoofing, including partial spoofing detection, cross-dataset evaluation, and adversarial attack defence, while proposing some promising research directions for future work. This survey paper not only identifies the current state-of-the-art to establish strong baselines for future experiments but also guides future researchers on a clear path for understanding and enhancing the audio anti-spoofing detection mechanisms.


[99] 2404.14019

A Multimodal Feature Distillation with CNN-Transformer Network for Brain Tumor Segmentation with Incomplete Modalities

Existing brain tumor segmentation methods usually utilize multiple Magnetic Resonance Imaging (MRI) modalities in brain tumor images for segmentation, which can achieve better segmentation performance. However, in clinical applications, some modalities are missing due to resource constraints, leading to severe degradation in the performance of methods applying complete modality segmentation. In this paper, we propose a Multimodal feature distillation with Convolutional Neural Network (CNN)-Transformer hybrid network (MCTSeg) for accurate brain tumor segmentation with missing modalities. We first design a Multimodal Feature Distillation (MFD) module to distill feature-level multimodal knowledge into different unimodality to extract complete modality information. We further develop a Unimodal Feature Enhancement (UFE) module to model the relationship between global and local information semantically. Finally, we build a Cross-Modal Fusion (CMF) module to explicitly align the global correlations among different modalities even when some modalities are missing. Complementary features within and across different modalities are refined via the CNN-Transformer hybrid architectures in both the UFE and CMF modules, where local and global dependencies are both captured. Our ablation study demonstrates the importance of the proposed modules with CNN-Transformer networks and the convolutional blocks in Transformer for improving the performance of brain tumor segmentation with missing modalities. Extensive experiments on the BraTS2018 and BraTS2020 datasets show that the proposed MCTSeg framework outperforms the state-of-the-art methods in missing modalities cases. Our code is available at: https://github.com/mkang315/MCTSeg.


[100] 2404.14030

Towards Using Behavior Trees in Industrial Automation Controllers

The Industry 4.0 paradigm manifests the shift towards mass customization and cyber-physical production systems (CPPS) and sets new requirements for industrial automation software in terms of modularity, flexibility, and short development cycles of control programs. Though programmable logical controllers (PLCs) have been evolving into versatile and powerful edge devices, there is a lack of PLC software flexibility and integration between low-level programs and high-level task-oriented control frameworks. Behavior trees (BTs) is a novel framework, which enables rapid design of modular hierarchical control structures. It combines improved modularity with a simple and intuitive design of control logic. This paper proposes an approach for improving the industrial control software design by integrating BTs into PLC programs and separating hardware related functionalities from the coordination logic. Several strategies for integration of BTs into PLCs are shown. The first two integrate BTs with the IEC 61131 based PLCs and are based on the use of the PLCopen Common Behavior Model. The last one utilized event-based BTs and shows the integration with the IEC 61499 based controllers. An application example demonstrates the approach. The paper contributes in the following ways. First, we propose a new PLC software design, which improves modularity, supports better separation of concerns, and enables rapid development and reconfiguration of the control software. Second, we show and evaluate the integration of the BT framework into both IEC 61131 and IEC 61499 based PLCs, as well as the integration of the PLCopen function blocks with the external BT library. This leads to better integration of the low-level PLC code and the AI-based task-oriented frameworks. It also improves the skill-based programming approach for PLCs by using BTs for skills composition.


[101] 2404.14036

Optimal Structure of Receive Beamforming for Over-the-Air Computation

We investigate fast data aggregation via over-the-air computation (AirComp) over wireless networks. In this scenario, an access point (AP) with multiple antennas aims to recover the arithmetic mean of sensory data from multiple wireless devices. To minimize estimation distortion, we formulate a mean-squared-error (MSE) minimization problem that considers joint optimization of transmit scalars at wireless devices, denoising factor, and receive beamforming vector at the AP. We derive closed-form expressions for the transmit scalars and denoising factor, resulting in a non-convex quadratic constrained quadratic programming (QCQP) problem concerning the receive beamforming vector. To tackle the computational complexity of the beamforming design, particularly relevant in massive multiple-input multiple-output (MIMO) AirComp systems, we explore the optimal structure of receive beamforming using successive convex approximation (SCA) and Lagrange duality. By leveraging the proposed optimal beamforming structure, we develop two efficient algorithms based on SCA and semi-definite relaxation (SDR). These algorithms enable fast wireless aggregation with low computational complexity and yield almost identical mean square error (MSE) performance compared to baseline algorithms. Simulation results validate the effectiveness of our proposed methods.


[102] 2404.14063

LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search

Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.


[103] 2404.14092

Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems

Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into the CF mMIMO system as a low-cost and power-efficient solution. In this paper, we focus on optimizing the joint precoding design of the RIS-aided CF mMIMO system to maximize the sum SE. This involves optimizing the precoding matrix at the APs and the reflection coefficients at the RIS. To tackle this problem, we propose a fully distributed multi-agent reinforcement learning (MARL) algorithm that incorporates fuzzy logic (FL). Unlike conventional approaches that rely on alternating optimization techniques, our FL-based MARL algorithm only requires local channel state information, which reduces the need for high backhaul capacity. Simulation results demonstrate that our proposed FL-MARL algorithm effectively reduces computational complexity while achieving similar performance as conventional MARL methods.


[104] 2404.14132

CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. However, merely performing a single type of image enhancement still cannot yield satisfactory images. In this paper, to deal with the challenge above, we propose the Composite Refinement Network (CRNet) to address this issue using multiple exposure images. By fully integrating information-rich multiple exposure inputs, CRNet can perform unified image restoration and enhancement. To improve the quality of image details, CRNet explicitly separates and strengthens high and low-frequency information through pooling layers, using specially designed Multi-Branch Blocks for effective fusion of these frequencies. To increase the receptive field and fully integrate input features, CRNet employs the High-Frequency Enhancement Module, which includes large kernel convolutions and an inverted bottleneck ConvFFN. Our model secured third place in the first track of the Bracketing Image Restoration and Enhancement Challenge, surpassing previous SOTA models in both testing metrics and visual quality.


[105] 2404.14143

Access-Point to Access-Point Connectivity for PON-based OWC Spine and Leaf Data Centre Architecture

In this paper, we propose incorporating Optical Wireless Communication (OWC) and Passive Optical Network (PON) technologies into next generation spine-and-leaf Data Centre Networks (DCNs). In this work, OWC systems are used to connect the Data Centre (DC) racks through Wavelength Division Multiplexing (WDM) Infrared (IR) transceivers. The transceivers are placed on top of the racks and at distributed Access Points (APs) in the ceiling. Each transceiver on a rack is connected to a leaf switch that connects the servers within the rack. We replace the spine switches by Optical Line Terminal (OLT) and Network Interface Cards (NIC) in the APs to achieve the desired connectivity. We benchmark the power consumption of the proposed OWC-PON-based spine-and-leaf DC against traditional spine-and-leaf DC and report 46% reduction in the power consumption when considering eight racks.