New articles on Electrical Engineering and Systems Science

[1] 2404.12405

Pneumonia Diagnosis through pixels -- A Deep Learning Model for detection and classification

Manual identification and classification of pneumonia and COVID-19 infection is a cumbersome process that, if delayed can cause irreversible damage to the patient. We have compiled CT scan images from various sources, namely, from the China Consortium of Chest CT Image Investigation (CC-CCII), the Negin Radiology located at Sari in Iran, an open access COVID-19 repository from Havard dataverse, and Sri Ramachandra University, Chennai, India. The images were preprocessed using various methods such as normalization, sharpening, median filter application, binarizing, and cropping to ensure uniformity while training the models. We present an ensemble classification approach using deep learning and machine learning methods to classify patients with the said diseases. Our ensemble model uses pre-trained networks such as ResNet-18 and ResNet-50 for classification and MobileNetV2 for feature extraction. The features from MobileNetV2 are used by the gradient-boosting classifier for the classification of patients. Using ResNet-18, ResNet-50, and the MobileNetV2 aided gradient boosting classifier, we propose an ensemble model with an accuracy of 98 percent on unseen data.

[2] 2404.12415

Soil Fertility Prediction Using Combined USB-microscope Based Soil Image, Auxiliary Variables, and Portable X-Ray Fluorescence Spectrometry

This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and texture features from microscopic soil images, PXRF data, and auxiliary soil variables (AVs) using a Random Forest model. Results indicated that integrating image features (IFs) with auxiliary variables (AVs) significantly enhanced prediction accuracy for available B (R^2 = 0.80) and OC (R^2 = 0.88). A data fusion approach, incorporating IFs, AVs, and PXRF data, further improved predictions for available Mn and SAI with R^2 values of 0.72 and 0.70, respectively. The study demonstrated how these integrated technologies have the potential to provide quick and affordable options for soil testing, opening up access to more sophisticated prediction models and a better comprehension of the fertility and health of the soil. Future research should focus on the application of deep learning models on a larger dataset of soil images, developed using soils from a broader range of agro-climatic zones under field condition.

[3] 2404.12533

Plane-wave compounding with adaptive joint coherence factor weighting

Coherent Plane Wave Compounding (CPWC) is widely used for ultrasound imaging. This technique involves sending plane waves into a sample at different transmit angles and recording the resultant backscattered echo at different receive positions. The time-delayed signals from the different combinations of transmit angles and receive positions are then coherently summed to produce a beamformed image. Various techniques have been developed to characterize the quality of CPWC beamforming based on the measured coherence across the transmit or receive apertures. Here, we propose a more fine-grained approach where the signals from every transmit/receive combination are separately evaluated using a quality metric based on their joint spatio-angular coherence. The signals are then individually weighted according to their measured Joint Coherence Factor (JCF) prior to being coherently summed. To facilitate the comparison of JCF beamforming compared to alternative techniques, we further propose a method of image display standardization based on contrast matching. We show results from tissue-mimicking phantoms and human soft-tissue imaging. Fine-grained JCF weighting is found to improve CPWC image quality compared to alternative approaches.

[4] 2404.12554

Learning Stable and Passive Neural Differential Equations

In this paper, we introduce a novel class of neural differential equation, which are intrinsically Lyapunov stable, exponentially stable or passive. We take a recently proposed Polyak Lojasiewicz network (PLNet) as an Lyapunov function and then parameterize the vector field as the descent directions of the Lyapunov function. The resulting models have a same structure as the general Hamiltonian dynamics, where the Hamiltonian is lower- and upper-bounded by quadratic functions. Moreover, it is also positive definite w.r.t. either a known or learnable equilibrium. We illustrate the effectiveness of the proposed model on a damped double pendulum system.

[5] 2404.12595

Deep Reinforcement Learning-aided Transmission Design for Energy-efficient Link Optimization in Vehicular Communications

This letter presents a deep reinforcement learning (DRL) approach for transmission design to optimize the energy efficiency in vehicle-to-vehicle (V2V) communication links. Considering the dynamic environment of vehicular communications, the optimization problem is non-convex and mathematically difficult to solve. Hence, we propose scenario identification-based double and Dueling deep Q-Network (SI-D3QN), a DRL algorithm integrating both double deep Q-Network and Dueling deep Q-Network, for the joint design of modulation and coding scheme (MCS) selection and power control. To be more specific, we employ SI techique to enhance link performance and assit the D3QN agent in refining its decision-making processes. The experiment results demonstrate that, across various optimization tasks, our proposed SI-D3QN agent outperforms the benchmark algorithms in terms of the valid actions and link performance metrics. Particularly, while ensuring significant improvement in energy efficiency, the agent facilitates a 29.6% enhancement in the link throughput under the same energy consumption.

[6] 2404.12629

Spreading Code Optimization for Low-Earth Orbit Satellites via Mixed-Integer Convex Programming

Optimizing the correlation properties of spreading codes is critical for minimizing inter-channel interference in satellite navigation systems. By improving the codes' correlation sidelobes, we can enhance navigation performance while minimizing the required spreading code lengths. In the case of low earth orbit (LEO) satellite navigation, shorter code lengths (on the order of a hundred) are preferred due to their ability to achieve fast signal acquisition. Additionally, the relatively high signal-to-noise ratio (SNR) in LEO systems reduces the need for longer spreading codes to mitigate inter-channel interference. In this work, we propose a two-stage block coordinate descent (BCD) method which optimizes the codes' correlation properties while enforcing the autocorrelation sidelobe zero (ACZ) property. In each iteration of the BCD method, we solve a mixed-integer convex program (MICP) over a block of 25 binary variables. Our method is applicable to spreading code families of arbitrary sizes and lengths, and we demonstrate its effectiveness for a problem with 66 length-127 codes and a problem with 130 length-257 codes.

[7] 2404.12650

F2FLDM: Latent Diffusion Models with Histopathology Pre-Trained Embeddings for Unpaired Frozen Section to FFPE Translation

The Frozen Section (FS) technique is a rapid and efficient method, taking only 15-30 minutes to prepare slides for pathologists' evaluation during surgery, enabling immediate decisions on further surgical interventions. However, FS process often introduces artifacts and distortions like folds and ice-crystal effects. In contrast, these artifacts and distortions are absent in the higher-quality formalin-fixed paraffin-embedded (FFPE) slides, which require 2-3 days to prepare. While Generative Adversarial Network (GAN)-based methods have been used to translate FS to FFPE images (F2F), they may leave morphological inaccuracies with remaining FS artifacts or introduce new artifacts, reducing the quality of these translations for clinical assessments. In this study, we benchmark recent generative models, focusing on GANs and Latent Diffusion Models (LDMs), to overcome these limitations. We introduce a novel approach that combines LDMs with Histopathology Pre-Trained Embeddings to enhance restoration of FS images. Our framework leverages LDMs conditioned by both text and pre-trained embeddings to learn meaningful features of FS and FFPE histopathology images. Through diffusion and denoising techniques, our approach not only preserves essential diagnostic attributes like color staining and tissue morphology but also proposes an embedding translation mechanism to better predict the targeted FFPE representation of input FS images. As a result, this work achieves a significant improvement in classification performance, with the Area Under the Curve rising from 81.99% to 94.64%, accompanied by an advantageous CaseFD. This work establishes a new benchmark for FS to FFPE image translation quality, promising enhanced reliability and accuracy in histopathology FS image analysis. Our work is available at

[8] 2404.12651

Emerging NGSO Constellations: Spectral Coexistence with GSO Satellite Communication Systems

Global communications have undergone a paradigm shift with the rapid expansion of low-earth orbit (LEO) satellite constellations, offering a new space era of reduced latency and ubiquitous, high-speed broadband internet access. However, the fast developments in LEO orbits pose significant challenges, particularly the coexistence with geostationary earth orbit (GEO) satellite systems. This article presents an overview of the regulatory aspects that cover the spectrum sharing in the bands allocated to the Fixed Satellite Service between geostationary networks (GSO) and non-geostationary systems (NGSO), as well as the main interference mitigation techniques for their coexistence. Our work highlights the increased potential for inter-system interference. It explores the regulatory landscape following the World Radio Conference (WRC-23). We discuss the different interference management strategies proposed for the GSO-NGSO spectral coexistence, including on-board and ground-based approaches and more advanced mitigation techniques based on beamforming. Moving onto operational aspects related to the sharing of spectrum, we introduce recent work on interference detection, identification, and mitigation and provide our vision of the emerging role of artificial intelligence (AI) in the aforementioned tasks.

[9] 2404.12695

Electrification of Clay Calcination: A First Look into Dynamic Modeling and Energy Management for Integration with Sustainable Power Grids

This article explores the electrification in clay calcination, proposing a dynamic model and energy management strategy for the integration of electrified calcination plants into sustainable power grids. A theoretical dynamic modeling of the electrified calcination process is introduced, aiming at outlining temperature profiles and energy usage - thus exploring the feasibility of electrification. The model serves as a tool for optimizing parameters, estimating system behavior, and enabling model-based process control. An innovative energy management model is also presented, ensuring efficient assimilation of electrified calcination plants into the power grid. It encapsulates demand-supply balancing and optimizes renewable energy usage. In essence, we provide an insightful pathway to a more sustainable cement production, underlining the value of renewable energy sources and effective energy management in the context of clay calcination.

[10] 2404.12705

Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection

Integrated sensing and communication (ISAC) exhibits notable potential for sensing the unmanned aerial vehicles (UAVs), facilitating real-time monitoring of UAVs for security insurance. Due to the low sensing accuracy of single base stations (BSs), a cooperative UAV sensing method by multi-BS is proposed in this paper to achieve high-accuracy sensing. Specifically, a multiple signal classification (MUSIC)-based symbol-level fusion method is proposed for UAV localization and velocity estimation, consisting of a single-BS preprocessing step and a lattice points searching step. The preprocessing procedure enhances the single-BS accuracy by superposing multiple spectral functions, thereby establishing a reference value for subsequent lattice points searching. Furthermore, the lattice point with minimal error compared to the preprocessing results is determined as the fusion result. Extensive simulation results reveal that the proposed symbol-level fusion method outperforms the benchmarking methods in localization and velocity estimation.

[11] 2404.12752

User-Centric Cell-Free (UCCF) Wireless Systems: Principles and Optimization

User-centric cell-free (UCCF) wireless networks have a range of distinguished characteristics, which can be exploited for meeting some challenges that the conventional cellular systems are hard to. This chapter is devoted to delivering the fundamentals of wireless communications in UCCF systems, including channel modeling and estimation, uplink (UL) detection, downlink (DL) transmission, and resource optimization. Specifically, the advantages of cell-free networking are examined in contrast to the conventional celluar systems. The global and location-aware distributed UL detection are explored in the principles of minimum mean-square error (MMSE) and brief propagation. Correspondingly, the global and distributed DL transmission schemes are designed based on the MMSE precoding. The optimization of both UL and DL is analyzed with respect to system design and resource-allocation. Furthermore, some challenges for the implementation of UCCF systems in practice are identified and analyzed.

[12] 2404.12769

Towards Accurate and Efficient Sorting of Retired Lithium-ion Batteries: A Data Driven Based Electrode Aging Assessment Approach

Retired batteries (RBs) for second-life applications offer promising economic and environmental benefits. However, accurate and efficient sorting of RBs with discrepant characteristics persists as a pressing challenge. In this study, we introduce a data driven based electrode aging assessment approach to address this concern. To this end, a number of 15 feature points are extracted from battery open circuit voltage (OCV) curves to capture their characteristics at different levels of aging, and a convolutional neural network with an optimized structure and minimized input size is established to relocate the relative positions of these OCV feature points. Next, a rapid estimation algorithm is proposed to identify the three electrode aging parameters (EAPs) which best reconstruct the 15 OCV feature points over the entire usable capacity range. Utilizing the three EAPs as sorting indices, we employ an adaptive affinity propagation algorithm to cluster RBs without the need for pre-determining the clustering number. Unlike conventional sorting methods based solely on battery capacity, the proposed method provides profound insights into electrode aging behaviors, minimizes the need for constant-current charging data, and supports module/pack-level tests for the simultaneous processing of high volumes of RBs.

[13] 2404.12774

Recent Advancements in Battery State of Power Estimation Technology: A Comprehensive Overview and Error Source Analysis

Accurate state of power (SOP) estimation is of great importance for lithium-ion batteries in safety-critical and power-intensive applications for electric vehicles. This review article delves deeply into the entire development flow of current SOP estimation technology, offering a systematic breakdown of all key aspects with their recent advancements. First, we review the design of battery safe operation area, summarizing diverse limitation factors and furnishing a profound comprehension of battery safety across a broad operational scale. Second, we illustrate the unique discharge and charge characteristics of various peak operation modes, such as constant current, constant voltage, constant current-constant voltage, and constant power, and explore their impacts on battery peak power performance. Third, we extensively survey the aspects of battery modelling and algorithm development in current SOP estimation technology, highlighting their technical contributions and specific considerations. Fourth, we present an in-depth dissection of all error sources to unveil their propagation pathways, providing insightful analysis into how each type of error impacts the SOP estimation performance. Finally, the technical challenges and complexities inherent in this field of research are addressed, suggesting potential directions for future development. Our goal is to inspire further efforts towards developing more accurate and intelligent SOP estimation technology for next-generation battery management systems.

[14] 2404.12780

Piecewise Semi-Analytical Formulation for the Analysis of Coupled-Oscillator Systems

A new simulation technique to obtain the synchronized steady-state solutions existing in coupled oscillator systems is presented. The technique departs from a semi-analytical formulation presented in previous works. It extends the model of the admittance function describing each individual oscillator to a piecewise linear one. This provides a global formulation of the coupled system, considering the whole characteristic of each voltage-controlled oscillator (VCO) in the array. In comparison with the previous local formulation, the new formulation significantly improves the accuracy in the prediction of the system synchronization ranges. The technique has been tested by comparison with computationally demanding circuit-level Harmonic Balance simulations in an array of Van der Pol-type oscillators and then applied to a coupled system of FET based oscillators at 5 GHz, with very good agreement with measurements.

[15] 2404.12807

Leveraging P90 Requirement: Flexible Resources Bidding in Nordic Ancillary Service Markets

The P90 requirement of the Danish transmission system operator, Energinet, incentivizes flexible resources with stochastic power consumption/production baseline to bid in Nordic ancillary service markets with the minimum reliability of 90%, i.e., letting them cause reserve shortfall with the probability of up to 10%. Leveraging this requirement, we develop a distributionally robust joint chance-constrained optimization model for aggregators of flexible resources to optimize their volume of reserve capacity to be offered. Having an aggregator of electric vehicles as a case study, we show how distributional robustness is key for the aggregator when making bidding decisions in a non-stationary uncertain environment. We also develop a heuristic based on a grid search for the system operator to adjust the P90 requirement and the level of conservativeness, aiming to procure the maximum reserve capacity from stochastic resources with least expected shortfall.

[16] 2404.12818

Aggregator of Electric Vehicles Bidding in Nordic FCR-D Markets: A Chance-Constrained Program

Recently, two new innovative regulations in the Nordic ancillary service markets, the P90 rule and LER classification, were introduced to make the market more attractive for flexible stochastic resources. The regulations respectively relax market requirements related to the security and volume of flexible capacity from such resources. However, this incentivizes aggregators to exploit the rules when bidding flexible capacity. Considering the Nordic ancillary service Frequency Containment Reserve - Disturbance (FCR-D), we consider an aggregator with a portfolio of Electric Vehicles (EVs) using real-life data and present an optimization model that, new to the literature, uses Joint Chance-Constraints (JCCs) for bidding its flexible capacity while adhering to the new market regulations. Using different bundle sizes within the portfolio and the approximation methods of the JCCs, ALSO-X and Conditional Value at Risk (CVaR), we show that a significant synergy effect emerges when aggregating a portfolio of EVs, especially when applying ALSO-X which exploits the rules more than CVaR. We show that EV owners can earn a significant profit when participating in the aggregator portfolio.

[17] 2404.12825

360° phase detector cell for measurement systems based on switched dual multipliers

This letter presents a 360{\deg} phase detector cell for performing phase-shift measurements on multiple output systems. An analog phase detector, capable of detecting a maximum range of {\pm}90{\deg}, has been used to perform a double multiplication of two signals, both in-phase and phase-shifted. The proposed solution broadens the frequency range beyond other solutions that require to fulfill the quadrature condition. Subsequently, the possibility of reaching the theoretical limit of phase shift within a hybrid coupler ({\Phi} < 90{\deg} {\pm} 90{\deg}) is discussed by using four straight-line equations to characterize the phase detector response. The proposed solution allows to extend up to 360{\deg} the phase detection range and provide an increased immunity with respect to both impedance mismatching and phase deviations within the hybrid coupler. To demonstrate the feasibility of the proposed design, a phase detector cell prototype has been implemented using a commercial hybrid coupler with a phase shift of 92.5{\deg} {\pm} 0.5{\deg} at 3.1-5.9 GHz, an external switch and a microcontroller with 2 kB of memory. Measurements show a range of detection of 360{\deg} ({\pm}180{\deg}) across the tested frequency band of 2.7-6 GHz.

[18] 2404.12830

Optimal Training Design for Over-the-Air Polynomial Power Amplifier Model Estimation

The current evolution towards a massive number of antennas and a large variety of transceiver architectures forces to revisit the conventional techniques used to improve the fundamental power amplifier (PA) linearity-efficiency trade-off. Most of the digital linearization techniques rely on PA measurements using a dedicated feedback receiver. However, in modern systems with large amount of RF chains and high carrier frequency, dedicated receiver per RF chain is costly and complex to implement. This issue can be addressed by measuring PAs over the air, but in that case, this extra signalling is sharing resources with the actual data transmission. In this paper, we look at the problem from an estimation theory point of view so as to minimize pilot overhead while optimizing estimation performance. We show that conventional results in the mathematical statistics community can be used. We find the least squares (LS) optimal training design, minimizing the maximal mean squared error (MSE) of the reconstructed PA response over its whole input range. As compared to uniform training, simulations demonstrate a factor 10 reduction of the maximal MSE for a L = 7 PA polynomial order. Using prior information, the LMMSE estimator can achieve an additional gain of a factor up to 300 at low signal-to-noise ratio (SNR).

[19] 2404.12863

Grid-aware Scheduling and Control of Electric Vehicle Charging Stations for Dispatching Active Distribution Networks. Part-I: Day-ahead and Numerical Validation

This paper proposes a grid-aware scheduling and control framework for Electric Vehicle Charging Stations (EVCSs) for dispatching the operation of an active power distribution network. The framework consists of two stages. In the first stage, we determine an optimal day-ahead power schedule at the grid connection point (GCP), referred to as the dispatch plan. Then, in the second stage, a real-time model predictive control is proposed to track the day-ahead dispatch plan using flexibility from EVCSs. The dispatch plan accounts for the uncertainties of vehicles connected to the EVCS along with other uncontrollable power injections, by day-ahead predicted scenarios. We propose using a Gaussian-Mixture-Model (GMM) for the forecasting of EVCS demand using the historical dataset on arrival, departure times, EV battery capacity, State-of-Charge (SoC) targets, etc. The framework ensures that the grid is operated within its voltage and branches power-flow operational bounds, modeled by a linearized optimal power-flow model, maintaining the tractability of the problem formulation. The scheme is numerically and experimentally validated on a real-life distribution network at the EPFL connected to two EVCSs, two batteries, three photovoltaic plants, and multiple heterogeneous loads. The day-ahead and real-time stages are described in Part-I and Part-II papers respectively.

[20] 2404.12870

Grid-aware Scheduling and Control of Electric Vehicle Charging Stations for Dispatching Active Distribution Networks. Part-II: Intra-day and Experimental Validation

In Part-I, we presented an optimal day-ahead scheduling scheme for dispatching active distribution networks accounting for the flexibility provided by electric vehicle charging stations (EVCSs) and other controllable resources such as battery energy storage systems (BESSs). Part-II presents the intra-day control layer for tracking the dispatch plan computed from the day-ahead scheduling stage. The control problem is formulated as model predictive control (MPC) with an objective to track the dispatch plan setpoint every 5 minutes, while actuated every 30 seconds. MPC accounts for the uncertainty of the power injections from stochastic resources (such as demand and generation from photovoltaic - PV plants) by short-term forecasts. MPC also accounts for the grid's operational constraints (i.e., the limits on the nodal voltages and the line power-flows) by a linearized optimal power flow (LOPF) model based on the power-flow sensitivity coefficients, and for the operational constraints of the controllable resources (i.e., BESSs and EVCSs). The proposed framework is experimentally validated on a real-life ADN at the EPFL's Distributed Electrical Systems Laboratory and is composed of a medium voltage (MV) bus connected to three low voltage distribution networks. It hosts two controllable EVCSs (172 kWp and 32 F~kWp), multiple PV plants (aggregated generation of 42~kWp), uncontrollable demand from office buildings (20 kWp), and two controllable BESSs (150kW/300kWh and 25kW/25kWh).

[21] 2404.12874

Physical Layer Authentication Using Information Reconciliation

User authentication in future wireless communication networks is expected to become more complicated due to their large scale and heterogeneity. Furthermore, the computational complexity of classical cryptographic approaches based on public key distribution can be a limiting factor for using in simple, low-end Internet of things (IoT) devices. This paper proposes physical layer authentication (PLA) expected to complement existing traditional approaches, e.g., in multi-factor authentication protocols. The precision and consistency of PLA is impacted because of random variations of wireless channel realizations between different time slots, which can impair authentication performance. In order to address this, a method based on error-correcting codes in the form of reconciliation is considered in this work. In particular, we adopt distributed source coding (Slepian-Wolf) reconciliation using polar codes to reconcile channel measurements spread in time. Hypothesis testing is then applied to the reconciled vectors to accept or reject the device as authenticated. Simulation results show that the proposed PLA using reconciliation outperforms prior schemes even in low signal-to-noise ratio scenarios.

[22] 2404.12958

Improving Pediatric Pneumonia Diagnosis with Adult Chest X-ray Images Utilizing Contrastive Learning and Embedding Similarity

Despite the advancement of deep learning-based computer-aided diagnosis (CAD) methods for pneumonia from adult chest x-ray (CXR) images, the performance of CAD methods applied to pediatric images remains suboptimal, mainly due to the lack of large-scale annotated pediatric imaging datasets. Establishing a proper framework to leverage existing adult large-scale CXR datasets can thus enhance pediatric pneumonia detection performance. In this paper, we propose a three-branch parallel path learning-based framework that utilizes both adult and pediatric datasets to improve the performance of deep learning models on pediatric test datasets. The paths are trained with pediatric only, adult only, and both types of CXRs, respectively. Our proposed framework utilizes the multi-positive contrastive loss to cluster the classwise embeddings and the embedding similarity loss among these three parallel paths to make the classwise embeddings as close as possible to reduce the effect of domain shift. Experimental evaluations on open-access adult and pediatric CXR datasets show that the proposed method achieves a superior AUROC score of 0.8464 compared to 0.8348 obtained using the conventional approach of join training on both datasets. The proposed approach thus paves the way for generalized CAD models that are effective for both adult and pediatric age groups.

[23] 2404.12973

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.

[24] 2404.12986

Nuclei Instance Segmentation of Cryosectioned H&E Stained Histological Images using Triple U-Net Architecture

Nuclei instance segmentation is crucial in oncological diagnosis and cancer pathology research. H&E stained images are commonly used for medical diagnosis, but pre-processing is necessary before using them for image processing tasks. Two principal pre-processing methods are formalin-fixed paraffin-embedded samples (FFPE) and frozen tissue samples (FS). While FFPE is widely used, it is time-consuming, while FS samples can be processed quickly. Analyzing H&E stained images derived from fast sample preparation, staining, and scanning can pose difficulties due to the swift process, which can result in the degradation of image quality. This paper proposes a method that leverages the unique optical characteristics of H&E stained images. A three-branch U-Net architecture has been implemented, where each branch contributes to the final segmentation results. The process includes applying watershed algorithm to separate overlapping regions and enhance accuracy. The Triple U-Net architecture comprises an RGB branch, a Hematoxylin branch, and a Segmentation branch. This study focuses on a novel dataset named CryoNuSeg. The results obtained through robust experiments outperform the state-of-the-art results across various metrics. The benchmark score for this dataset is AJI 52.5 and PQ 47.7, achieved through the implementation of U-Net Architecture. However, the proposed Triple U-Net architecture achieves an AJI score of 67.41 and PQ of 50.56. The proposed architecture improves more on AJI than other evaluation metrics, which further justifies the superiority of the Triple U-Net architecture over the baseline U-Net model, as AJI is a more strict evaluation metric. The use of the three-branch U-Net model, followed by watershed post-processing, significantly surpasses the benchmark scores, showing substantial improvement in the AJI score

[25] 2404.13000

RadRotator: 3D Rotation of Radiographs with Diffusion Models

Transforming two-dimensional (2D) images into three-dimensional (3D) volumes is a well-known yet challenging problem for the computer vision community. In the medical domain, a few previous studies attempted to convert two or more input radiographs into computed tomography (CT) volumes. Following their effort, we introduce a diffusion model-based technology that can rotate the anatomical content of any input radiograph in 3D space, potentially enabling the visualization of the entire anatomical content of the radiograph from any viewpoint in 3D. Similar to previous studies, we used CT volumes to create Digitally Reconstructed Radiographs (DRRs) as the training data for our model. However, we addressed two significant limitations encountered in previous studies: 1. We utilized conditional diffusion models with classifier-free guidance instead of Generative Adversarial Networks (GANs) to achieve higher mode coverage and improved output image quality, with the only trade-off being slower inference time, which is often less critical in medical applications; and 2. We demonstrated that the unreliable output of style transfer deep learning (DL) models, such as Cycle-GAN, to transfer the style of actual radiographs to DRRs could be replaced with a simple yet effective training transformation that randomly changes the pixel intensity histograms of the input and ground-truth imaging data during training. This transformation makes the diffusion model agnostic to any distribution variations of the input data pixel intensity, enabling the reliable training of a DL model on input DRRs and applying the exact same model to conventional radiographs (or DRRs) during inference.

[26] 2404.13018

A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks

Despite the fact real-world video deinterlacing and demosaicing are well-suited to supervised learning from synthetically degraded data because the degradation models are known and fixed, learned video deinterlacing and demosaicing have received much less attention compared to denoising and super-resolution tasks. We propose a new multi-picture architecture for video deinterlacing or demosaicing by aligning multiple supporting pictures with missing data to a reference picture to be reconstructed, benefiting from both local and global spatio-temporal correlations in the feature space using modified deformable convolution blocks and a novel residual efficient top-$k$ self-attention (kSA) block, respectively. Separate reconstruction blocks are used to estimate different types of missing data. Our extensive experimental results, on synthetic or real-world datasets, demonstrate that the proposed novel architecture provides superior results that significantly exceed the state-of-the-art for both tasks in terms of PSNR, SSIM, and perceptual quality. Ablation studies are provided to justify and show the benefit of each novel modification made to the deformable convolution and residual efficient kSA blocks. Code is available:

[27] 2404.12441

Distributed Model Predictive Control for Heterogeneous Platoons with Affine Spacing Policies and Arbitrary Communication Topologies

This paper presents a distributed model predictive control (DMPC) algorithm for a heterogeneous platoon using arbitrary communication topologies, as long as each vehicle is able to communicate with a preceding vehicle in the platoon. The proposed DMPC algorithm is able to accommodate any spacing policy that is affine in a vehicle's velocity, which includes constant distance or constant time headway spacing policies. By analyzing the total cost for the entire platoon, a sufficient condition is derived to guarantee platoon asymptotic stability. Simulation experiments with a platoon of 50 vehicles and hardware experiments with a platoon of four 1/10th scale vehicles validate the algorithm and compare performance under different spacing policies and communication topologies.

[28] 2404.12474

Learning a Stable, Safe, Distributed Feedback Controller for a Heterogeneous Platoon of Vehicles

Platooning of autonomous vehicles has the potential to increase safety and fuel efficiency on highways. The goal of platooning is to have each vehicle drive at some speed (set by the leader) while maintaining a safe distance from its neighbors. Many prior works have analyzed various controllers for platooning, most commonly linear feedback and distributed model predictive controllers. In this work, we introduce an algorithm for learning a stable, safe, distributed controller for a heterogeneous platoon. Our algorithm relies on recent developments in learning neural network stability and safety certificates. We train a controller for autonomous platooning in simulation and evaluate its performance on hardware with a platoon of four F1Tenth vehicles. We then perform further analysis in simulation with a platoon of 100 vehicles. Experimental results demonstrate the practicality of the algorithm and the learned controller by comparing the performance of the neural network controller to linear feedback and distributed model predictive controllers.

[29] 2404.12498

A Configurable Pythonic Data Center Model for Sustainable Cooling and ML Integration

There have been growing discussions on estimating and subsequently reducing the operational carbon footprint of enterprise data centers. The design and intelligent control for data centers have an important impact on data center carbon footprint. In this paper, we showcase PyDCM, a Python library that enables extremely fast prototyping of data center design and applies reinforcement learning-enabled control with the purpose of evaluating key sustainability metrics including carbon footprint, energy consumption, and observing temperature hotspots. We demonstrate these capabilities of PyDCM and compare them to existing works in EnergyPlus for modeling data centers. PyDCM can also be used as a standalone Gymnasium environment for demonstrating sustainability-focused data center control.

[30] 2404.12501

SPIdepth: Strengthened Pose Information for Self-supervised Monocular Depth Estimation

Self-supervised monocular depth estimation has garnered considerable attention for its applications in autonomous driving and robotics. While recent methods have made strides in leveraging techniques like the Self Query Layer (SQL) to infer depth from motion, they often overlook the potential of strengthening pose information. In this paper, we introduce SPIdepth, a novel approach that prioritizes enhancing the pose network for improved depth estimation. Building upon the foundation laid by SQL, SPIdepth emphasizes the importance of pose information in capturing fine-grained scene structures. By enhancing the pose network's capabilities, SPIdepth achieves remarkable advancements in scene understanding and depth estimation. Experimental results on benchmark datasets such as KITTI and Cityscapes showcase SPIdepth's state-of-the-art performance, surpassing previous methods by significant margins. Notably, SPIdepth's performance exceeds that of unsupervised models and, after finetuning on metric data, outperforms all existing methods. Remarkably, SPIdepth achieves these results using only a single image for inference, surpassing even methods that utilize video sequences for inference, thus demonstrating its efficacy and efficiency in real-world applications. Our approach represents a significant leap forward in self-supervised monocular depth estimation, underscoring the importance of strengthening pose information for advancing scene understanding in real-world applications.

[31] 2404.12584

Multi-Objective Offloading Optimization in MEC and Vehicular-Fog Systems: A Distributed-TD3 Approach

The emergence of 5G networks has enabled the deployment of a two-tier edge and vehicular-fog network. It comprises Multi-access Edge Computing (MEC) and Vehicular-Fogs (VFs), strategically positioned closer to Internet of Things (IoT) devices, reducing propagation latency compared to cloud-based solutions and ensuring satisfactory quality of service (QoS). However, during high-traffic events like concerts or athletic contests, MEC sites may face congestion and become overloaded. Utilizing offloading techniques, we can transfer computationally intensive tasks from resource-constrained devices to those with sufficient capacity, for accelerating tasks and extending device battery life. In this research, we consider offloading within a two-tier MEC and VF architecture, involving offloading from MEC to MEC and from MEC to VF. The primary objective is to minimize the average system cost, considering both latency and energy consumption. To achieve this goal, we formulate a multi-objective optimization problem aimed at minimizing latency and energy while considering given resource constraints. To facilitate decision-making for nearly optimal computational offloading, we design an equivalent reinforcement learning environment that accurately represents the network architecture and the formulated problem. To accomplish this, we propose a Distributed-TD3 (DTD3) approach, which builds on the TD3 algorithm. Extensive simulations, demonstrate that our strategy achieves faster convergence and higher efficiency compared to other benchmark solutions.

[32] 2404.12598

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk sensitivity by adding the realized variance of the value process. Additionally, I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation; however, q-learning offers a solution and extends to infinite horizon settings. Finally, I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure. I also conduct simulation experiments to demonstrate how risk-sensitive RL improves the finite-sample performance in the linear-quadratic control problem.

[33] 2404.12604

Transmitter Side Beyond-Diagonal RIS for mmWave Integrated Sensing and Communications

This work initiates the study of a beyond-diagonal reconfigurable intelligent surface (BD-RIS)-aided transmitter architecture for integrated sensing and communication (ISAC) in the millimeter-wave (mmWave) frequency band. Deploying BD-RIS at the transmitter side not only alleviates the need for extensive fully digital radio frequency (RF) chains but also enhances both communication and sensing performance. These benefits are facilitated by the additional design flexibility introduced by the fully-connected scattering matrix of BD-RIS. To achieve the aforementioned benefits, in this work, we propose an efficient two-stage algorithm to design the digital beamforming of the transmitter and the scattering matrix of the BD-RIS with the aim of jointly maximizing the sum rate for multiple communication users and minimizing the largest eigenvalue of the Cramer-Rao bound (CRB) matrix for multiple sensing targets. Numerical results show that the transmitter-side BD-RIS-aided mmWave ISAC outperforms the conventional diagonal-RIS-aided ones in both communication and sensing performance.

[34] 2404.12613

A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models

The purpose of this paper is twofold. First, we propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models (GMMs). The algorithm takes advantage of the Hankel structure inherent in the Fourier data obtained from independent and identically distributed (i.i.d) samples of the mixture. For GMMs with a unified variance, a singular value ratio functional using the Fourier data is introduced and used to resolve the variance and component number simultaneously. The consistency of the estimator is derived. Compared to classic algorithms such as the method of moments and the maximum likelihood method, the proposed algorithm does not require prior knowledge of the number of Gaussian components or good initial guesses. Numerical experiments demonstrate its superior performance in estimation accuracy and computational cost. Second, we reveal that there exists a fundamental limit to the problem of estimating the number of Gaussian components or model order in the mixture model if the number of i.i.d samples is finite. For the case of a single variance, we show that the model order can be successfully estimated only if the minimum separation distance between the component means exceeds a certain threshold value and can fail if below. We derive a lower bound for this threshold value, referred to as the computational resolution limit, in terms of the number of i.i.d samples, the variance, and the number of Gaussian components. Numerical experiments confirm this phase transition phenomenon in estimating the model order. Moreover, we demonstrate that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.

[35] 2404.12725

Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

The integration of visual cues has revitalized the performance of the target speech extraction task, elevating it to the forefront of the field. Nevertheless, this multi-modal learning paradigm often encounters the challenge of modality imbalance. In audio-visual target speech extraction tasks, the audio modality tends to dominate, potentially overshadowing the importance of visual guidance. To tackle this issue, we propose AVSepChain, drawing inspiration from the speech chain concept. Our approach partitions the audio-visual target speech extraction task into two stages: speech perception and speech production. In the speech perception stage, audio serves as the dominant modality, while visual information acts as the conditional modality. Conversely, in the speech production stage, the roles are reversed. This transformation of modality status aims to alleviate the problem of modality imbalance. Additionally, we introduce a contrastive semantic matching loss to ensure that the semantic information conveyed by the generated speech aligns with the semantic information conveyed by lip movements during the speech production stage. Through extensive experiments conducted on multiple benchmark datasets for audio-visual target speech extraction, we showcase the superior performance achieved by our proposed method.

[36] 2404.12771

Phase-space analysis of a two-section InP laser as an all-optical spiking neuron: dependency on control and design parameters

Using a rate-equation model we numerically evaluate the carrier concentration and photon number in an integrated two-section semiconductor laser, and analyse its dynamics in three-dimensional phase space. The simulation comprises compact model descriptions extracted from a commercially-available generic InP technology platform, allowing us to model an applied reverse-bias voltage to the saturable absorber. We use the model to study the influence of the injected gain current, reverse-bias voltage, and cavity mirror reflectivity on the excitable operation state, which is the operation mode desired for the laser to act as an all-optical integrated neuron. We show in phase-space that our model is capable of demonstrating four different operation modes, i.e. cw, self-pulsating and an on-set and excitable mode under optical pulse injection. In addition, we show that lowering the reflectivity of one of the cavity mirrors greatly enhances the control parameter space for excitable operation, enabling more relaxed operation parameter control and lower power consumption of an integrated two-section laser neuron.

[37] 2404.12786

Unlocking the Potential of Local CSI in Cell-Free Networks with Channel Aging and Fronthaul Delays

It is generally believed that downlink cell-free networks perform best under centralized implementations where the local channel state information (CSI) acquired by the access-points (AP) is forwarded to one or more central processing units (CPU) for the computation of the joint precoders based on global CSI. However, mostly due to limited fronthaul capabilities, this procedure incurs some delay that may lead to partially outdated precoding decisions and hence performance degradation. In some scenarios, this may even lead to worse performance than distributed implementations where the precoders are locally computed by the APs based on partial yet timely local CSI. To address this issue, this study considers the problem of robust precoding design merging the benefits of timely local CSI and delayed global CSI. As main result, we provide a novel distributed precoding design based on the recently proposed team minimum mean-square error method. As a byproduct, we also obtain novel insights related to the AP-CPU functional split problem. Our main conclusion, corroborated by simulations, is that the opportunity of performing some local precoding computations at the APs should not be neglected, even in centralized implementations.

[38] 2404.12794

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code of this work will be made publicly available at

[39] 2404.12804

Linearly-evolved Transformer for Pan-sharpening

Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of model parameters and FLOPs, thus preventing its application over low-resource satellites.To address this challenge between favorable performance and expensive computation, we tailor an efficient linearly-evolved transformer variant and employ it to construct a lightweight pan-sharpening framework. In detail, we deepen into the popular cascaded transformer modeling with cutting-edge methods and develop the alternative 1-order linearly-evolved transformer variant with the 1-dimensional linear convolution chain to achieve the same function. In this way, our proposed method is capable of benefiting the cascaded modeling rule while achieving favorable performance in the efficient manner. Extensive experiments over multiple satellite datasets suggest that our proposed method achieves competitive performance against other state-of-the-art with fewer computational resources. Further, the consistently favorable performance has been verified over the hyper-spectral image fusion task. Our main focus is to provide an alternative global modeling framework with an efficient structure. The code will be publicly available.

[40] 2404.12813

Open Datasets for AI-Enabled Radio Resource Control in Non-Terrestrial Networks

By effectively implementing the strategies for resource allocation, the capabilities, and reliability of non-terrestrial networks (NTN) can be enhanced. This leads to enhance spectrum utilization performance while minimizing the unmet system capacity, meeting quality of service (QoS) requirements and overall system optimization. In turn, a wide range of applications and services in various domains can be supported. However, allocating resources in a multi-constellation system with heterogeneous satellite links and highly dynamic user traffic demand pose challenges in ensuring sufficient and fair resource distribution. To mitigate these complexities and minimize the overhead, there is a growing shift towards utilizing artificial intelligence (AI) for its ability to handle such problems effectively. This calls for the development of an intelligent decision-making controller using AI to efficiently manage resources in this complex environment. In this context, real-world open datasets play a pivotal role in the development of AI models addressing radio control optimization problems. As a matter of fact, acquiring suitable datasets can be arduous. Therefore, this paper identifies pertinent real-world open datasets representing realistic traffic pattern, network performances and demand for fixed and dynamic user terminals, enabling a variety of uses cases. The aim of gathering and publishing the information of these datasets are to inspire and assist the research community in crafting the advance resource management solutions. In a nutshell, this paper establishes a solid foundation of commercially accessible data, with the potential to set benchmarks and accelerate the resolution of resource allocation optimization challenges.

[41] 2404.12841

Explainable Deepfake Video Detection using Convolutional Neural Network and CapsuleNet

Deepfake technology, derived from deep learning, seamlessly inserts individuals into digital media, irrespective of their actual participation. Its foundation lies in machine learning and Artificial Intelligence (AI). Initially, deepfakes served research, industry, and entertainment. While the concept has existed for decades, recent advancements render deepfakes nearly indistinguishable from reality. Accessibility has soared, empowering even novices to create convincing deepfakes. However, this accessibility raises security concerns.The primary deepfake creation algorithm, GAN (Generative Adversarial Network), employs machine learning to craft realistic images or videos. Our objective is to utilize CNN (Convolutional Neural Network) and CapsuleNet with LSTM to differentiate between deepfake-generated frames and originals. Furthermore, we aim to elucidate our model's decision-making process through Explainable AI, fostering transparent human-AI relationships and offering practical examples for real-life scenarios.

[42] 2404.12887

3D Multi-frame Fusion for Video Stabilization

In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rendering module, which extends beyond the image fusion by incorporating feature fusion. The core of our RStab framework lies in Stabilized Rendering (SR), a volume rendering module, fusing multi-frame information in 3D space. Specifically, SR involves warping features and colors from multiple frames by projection, fusing them into descriptors to render the stabilized image. However, the precision of warped information depends on the projection accuracy, a factor significantly influenced by dynamic regions. In response, we introduce the Adaptive Ray Range (ARR) module to integrate depth priors, adaptively defining the sampling range for the projection process. Additionally, we propose Color Correction (CC) assisting geometric constraints with optical flow for accurate color aggregation. Thanks to the three modules, our RStab demonstrates superior performance compared with previous stabilizers in the field of view (FOV), image quality, and video stability across various datasets.

[43] 2404.12908

Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images

Diffusion models (DMs) have revolutionized image generation, producing high-quality images with applications spanning various fields. However, their ability to create hyper-realistic images poses significant challenges in distinguishing between real and synthetic content, raising concerns about digital authenticity and potential misuse in creating deepfakes. This work introduces a robust detection framework that integrates image and text features extracted by CLIP model with a Multilayer Perceptron (MLP) classifier. We propose a novel loss that can improve the detector's robustness and handle imbalanced datasets. Additionally, we flatten the loss landscape during the model training to improve the detector's generalization capabilities. The effectiveness of our method, which outperforms traditional detection techniques, is demonstrated through extensive experiments, underscoring its potential to set a new state-of-the-art approach in DM-generated image detection. The code is available at

[44] 2404.12979

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in diminished SER performance in practical use. In this paper, we introduce a Two-level Refinement Network, dubbed TRNet, to address this challenge. Specifically, a pre-trained speech enhancement module is employed for front-end noise reduction and noise level estimation. Later, we utilize clean speech spectrograms and their corresponding deep representations as reference signals to refine the spectrogram distortion and representation shift of enhanced speech during model training. Experimental results validate that the proposed TRNet substantially increases the system's robustness in both matched and unmatched noisy environments, without compromising its performance in clean environments.

[45] 2404.13008

Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-based sampling approach applied to pre-trained models trained on distinct datasets to create a new training database. Using ASVspoof 2019 dataset as a proof-of-concept, we implement pre-trained models with Resnet and ConvNext architectures. Our approach demonstrates comparable generalization on unseen data while being computationally efficient, requiring less training data. Evaluation is conducted using the In-the-wild dataset.

[46] 2404.13024

BANF: Band-limited Neural Fields for Levels of Detail Reconstruction

Largely due to their implicit nature, neural fields lack a direct mechanism for filtering, as Fourier analysis from discrete signal processing is not directly applicable to these representations. Effective filtering of neural fields is critical to enable level-of-detail processing in downstream applications, and support operations that involve sampling the field on regular grids (e.g. marching cubes). Existing methods that attempt to decompose neural fields in the frequency domain either resort to heuristics or require extensive modifications to the neural field architecture. We show that via a simple modification, one can obtain neural fields that are low-pass filtered, and in turn show how this can be exploited to obtain a frequency decomposition of the entire signal. We demonstrate the validity of our technique by investigating level-of-detail reconstruction, and showing how coarser representations can be computed effectively.