This paper presents a novel approach for denoising Electron Backscatter Diffraction (EBSD) patterns using diffusion models. We propose a two-stage training process with a UNet-based architecture, incorporating an auxiliary regression head to predict the quality of the experimental pattern and assess the progress of the denoising process. The model uses an adaptive denoising strategy, which integrates quality prediction and feedback-driven iterative denoising process control. This adaptive feedback loop allows the model to adjust its schedule, providing fine control over the denoising process. Furthermore, our model can identify samples where no meaningful signal is present, thereby reducing the risk of hallucinations. We demonstrate the successful application of diffusion models to EBSD pattern denoising using a custom-collected dataset of EBSD patterns, their corresponding master patterns, and quality values.
Tokenized visual representations have shown great promise in image compression, yet their extension to video remains underexplored due to the challenges posed by complex temporal dynamics and stringent bitrate constraints. In this paper, we propose Tokenized Video Compression (TVC), the first token-based dual-stream video compression framework designed to operate effectively at ultra-low bitrates. TVC leverages the powerful Cosmos video tokenizer to extract both discrete and continuous token streams. The discrete tokens (i.e., code maps generated by FSQ) are partially masked using a strategic masking scheme, then compressed losslessly with a discrete checkerboard context model to reduce transmission overhead. The masked tokens are reconstructed by a decoder-only transformer with spatiotemporal token prediction. Meanwhile, the continuous tokens, produced via an autoencoder (AE), are quantized and compressed using a continuous checkerboard context model, providing complementary continuous information at ultra-low bitrate. At the Decoder side, both streams are fused using ControlNet, with multi-scale hierarchical integration to ensure high perceptual quality alongside strong fidelity in reconstruction. This work mitigates the long-standing skepticism about the practicality of tokenized video compression and opens up new avenues for semantics-aware, token-native video compression.
High-resolution medical images can provide more detailed information for better diagnosis. Conventional medical image super-resolution relies on a single task which first performs the extraction of the features and then upscaling based on the features. The features extracted may not be complete for super-resolution. Recent multi-task learning,including reconstruction and super-resolution, is a good solution to obtain additional relevant information. The interaction between the two tasks is often insufficient, which still leads to incomplete and less relevant deep features. To address above limitations, we propose an iterative collaboration network (ICONet) to improve communications between tasks by progressively incorporating reconstruction prior to the super-resolution learning procedure in an iterative collaboration way. It consists of a reconstruction branch, a super-resolution branch, and a SR-Rec fusion module. The reconstruction branch generates the artifact-free image as prior, which is followed by a super-resolution branch for prior knowledge-guided super-resolution. Unlike the widely-used convolutional neural networks for extracting local features and Transformers with quadratic computational complexity for modeling long-range dependencies, we develop a new residual spatial-channel feature learning (RSCFL) module of two branches to efficiently establish feature relationships in spatial and channel dimensions. Moreover, the designed SR-Rec fusion module fuses the reconstruction prior and super-resolution features with each other in an adaptive manner. Our ICONet is built with multi-stage models to iteratively upscale the low-resolution images using steps of 2x and simultaneously interact between two branches in multi-stage supervisions.
Compressed sensing Synthetic Aperture Radar (SAR) image formation, formulated as an inverse problem and solved with traditional iterative optimization methods can be very computationally expensive. We investigate the use of denoising diffusion probabilistic models for compressive SAR image reconstruction, where the diffusion model is guided by a poor initial reconstruction from sub-sampled data obtained via standard imaging methods. We present results on real SAR data and compare our compressively sampled diffusion model reconstruction with standard image reconstruction methods utilizing the full data set, demonstrating the potential performance gains in imaging quality.
Accurate kinetic analysis of [$^{18}$F]FDG distribution in dynamic positron emission tomography (PET) requires anatomically constrained modelling of image-derived input functions (IDIFs). Traditionally, IDIFs are obtained from the aorta, neglecting anatomical variations and complex vascular contributions. This study proposes a multi-organ segmentation-based approach that integrates IDIFs from the aorta, portal vein, pulmonary artery, and ureters. Using high-resolution CT segmentations of the liver, lungs, kidneys, and bladder, we incorporate organ-specific blood supply sources to improve kinetic modelling. Our method was evaluated on dynamic [$^{18}$F]FDG PET data from nine patients, resulting in a mean squared error (MSE) reduction of $13.39\%$ for the liver and $10.42\%$ for the lungs. These initial results highlight the potential of multiple IDIFs in improving anatomical modelling and fully leveraging dynamic PET imaging. This approach could facilitate the integration of tracer kinetic modelling into clinical routine.
This paper studies the synthesis and mitigation of stealthy attacks in nonlinear cyber-physical systems (CPS). To quantify stealthiness, we employ the Kullback-Leibler (KL) divergence, a measure rooted in hypothesis testing and detection theory, which captures the trade-off between an attacker's desire to remain stealthy and her goal of degrading system performance. First, we synthesize the worst-case stealthy attack in nonlinear CPS using the path integral approach. Second, we consider how a controller can mitigate the impact of such stealthy attacks by formulating a minimax KL control problem, yielding a zero-sum game between the attacker and the controller. Again, we leverage a path integral-based solution that computes saddle-point policies for both players through Monte Carlo simulations. We validate our approach using unicycle navigation and cruise control problems, demonstrating how an attacker can covertly drive the system into unsafe regions, and how the controller can adapt her policy to combat the worst-case attacks.
Dynamic positron emission tomography (PET) with [$^{18}$F]FDG enables non-invasive quantification of glucose metabolism through kinetic analysis, often modelled by the two-tissue compartment model (TCKM). However, voxel-wise kinetic parameter estimation using conventional methods is computationally intensive and limited by spatial resolution. Deep neural networks (DNNs) offer an alternative but require large training datasets and significant computational resources. To address these limitations, we propose a physiological neural representation based on implicit neural representations (INRs) for personalized kinetic parameter estimation. INRs, which learn continuous functions, allow for efficient, high-resolution parametric imaging with reduced data requirements. Our method also integrates anatomical priors from a 3D CT foundation model to enhance robustness and precision in kinetic modelling. We evaluate our approach on an [$^{18}$F]FDG dynamic PET/CT dataset and compare it to state-of-the-art DNNs. Results demonstrate superior spatial resolution, lower mean-squared error, and improved anatomical consistency, particularly in tumour and highly vascularized regions. Our findings highlight the potential of INRs for personalized, data-efficient tracer kinetic modelling, enabling applications in tumour characterization, segmentation, and prognostic assessment.
In this paper, we address the problem of a two-player linear quadratic differential game with incomplete information, a scenario commonly encountered in multi-agent control, human-robot interaction (HRI), and approximation methods for solving general-sum differential games. While solutions to such linear differential games are typically obtained through coupled Riccati equations, the complexity increases when agents have incomplete information, particularly when neither is aware of the other's cost function. To tackle this challenge, we propose a model-based Peer-Aware Cost Estimation (PACE) framework for learning the cost parameters of the other agent. In PACE, each agent treats its peer as a learning agent rather than a stationary optimal agent, models their learning dynamics, and leverages this dynamic to infer the cost function parameters of the other agent. This approach enables agents to infer each other's objective function in real time based solely on their previous state observations and dynamically adapt their control policies. Furthermore, we provide a theoretical guarantee for the convergence of parameter estimation and the stability of system states in PACE. Additionally, in our numerical studies, we demonstrate how modeling the learning dynamics of the other agent benefits PACE, compared to approaches that approximate the other agent as having complete information, particularly in terms of stability and convergence speed.
Human-robot interactions can be modeled as incomplete-information general-sum dynamic games since the objective functions of both agents are not explicitly known to each other. However, solving for equilibrium policies for such games presents a major challenge, especially if the games involve nonlinear underlying dynamics. To simplify the problem, existing work often assumes that one agent is an expert with complete information about its peer, which can lead to biased estimates and failures in coordination. To address this challenge, we propose a nonlinear peer-aware cost estimation (N-PACE) algorithm for general-sum dynamic games. In N-PACE, using iterative linear quadratic (LQ) approximation of the nonlinear general-sum game, each agent explicitly models the learning dynamics of its peer agent while inferring their objective functions, leading to unbiased fast learning in inferring the unknown objective function of the peer agent, which is critical for task completion and safety assurance. Additionally, we demonstrate how N-PACE enables \textbf{intent communication} in such multi-agent systems by explicitly modeling the peer's learning dynamics.
Designing controllers that achieve task objectives while ensuring safety is a key challenge in control systems. This work introduces Opt-ODENet, a Neural ODE framework with a differentiable Quadratic Programming (QP) optimization layer to enforce constraints as hard requirements. Eliminating the reliance on nominal controllers or large datasets, our framework solves the optimal control problem directly using Neural ODEs. Stability and convergence are ensured through Control Lyapunov Functions (CLFs) in the loss function, while Control Barrier Functions (CBFs) embedded in the QP layer enforce real-time safety. By integrating the differentiable QP layer with Neural ODEs, we demonstrate compatibility with the adjoint method for gradient computation, enabling the learning of the CBF class-$\mathcal{K}$ function and control network parameters. Experiments validate its effectiveness in balancing safety and performance.
This paper presents a method for the joint detection and tracking of weak targets in automotive radars using the multi-frame track-before-detect (MF-TBD) procedure. Generally, target tracking in automotive radars is challenging due to radar field of view (FOV) misalignment, nonlinear coordinate conversion, and self-positioning errors of the ego-vehicle, which are caused by platform motion. These issues significantly hinder the implementation of MF-TBD in automotive radars. To address these challenges, a new MF-TBD detection architecture is first proposed. It can adaptively adjust the detection threshold value based on the existence of moving targets within the radar FOV. Since the implementation of MF-TBD necessitates the inclusion of position, velocity, and yaw angle information of the ego-vehicle, each with varying degrees of measurement error, we further propose a multi-frame energy integration strategy for moving-platform radar and accurately derive the target energy integration path functions. The self-positioning errors of the ego-vehicle, which are usually not considered in some previous target tracking approaches, are well addressed. Numerical simulations and experimental results with real radar data demonstrate large detection and tracking gains over standard automotive radar processing in weak target environments.
The escalating overlap between non-geostationary orbit (NGSO) and geostationary orbit (GSO) satellite frequency allocations necessitates accurate interference detection methods that address two pivotal technical gaps: computationally efficient signal analysis for real-time operation, and robust anomaly discrimination under varying interference patterns. Existing deep learning approaches employ encoder-decoder anomaly detectors that threshold input-output discrepancies for robustness. While the transformer-based TrID model achieves state-of-the-art performance (AUC: 0.8318, F1: 0.8321), its multi-head attention incurs prohibitive computation time, and its decoupled training of time-frequency models overlooks cross-domain dependencies. To overcome these problems, we propose DualAttWaveNet. A bidirectional attention fusion layer dynamically correlates time-domain samples using parameter-efficient cross-attention routing. A wavelet-regularized reconstruction loss enforces multi-scale consistency. We train the model on public dataset which consists of 48 hours of satellite signals. Experiments show that compared to TrID, DualAttWaveNet improves AUC by 12% and reduces inference time by 50% to 540ms per batch while maintaining F1-score.
As water distribution networks (WDNs) become increasingly connected with digital infrastructures, they face greater exposure to cyberattacks that threaten their operational integrity. Stealthy False Data Injection Attacks (SFDIAs) are particularly concerning, as they manipulate sensor data to compromise system operations while avoiding detection. While existing studies have focused on either detection methods or specific attack formulations, the relationship between attack sophistication, system knowledge requirements, and achievable impact remains unexplored. This paper presents a systematic analysis of sensor attacks against WDNs, investigating different combinations of physical constraints, state monitoring requirements, and intrusion detection evasion conditions. We propose several attack formulations that range from tailored strategies satisfying both physical and detection constraints to simpler measurement manipulations. The proposed attacks are simple and local -- requiring knowledge only of targeted sensors and their hydraulic connections -- making them scalable and practical. Through case studies on Net1 and Net3 benchmark networks, we demonstrate how these attacks can persistently increase operational costs and alter water flows while remaining undetected by monitoring systems for extended periods. The analysis provides utilities with insights for vulnerability assessment and motivates the development of protection strategies that combine physical and statistical security mechanisms.
Skin, the primary regulator of heat exchange, relies on sweat glands for thermoregulation. Alterations in sweat gland morphology play a crucial role in various pathological conditions and clinical diagnoses. Current methods for observing sweat gland morphology are limited by their two-dimensional, in vitro, and destructive nature, underscoring the urgent need for real-time, non-invasive, quantifiable technologies. We proposed a novel three-dimensional (3D) transformer-based multi-object segmentation framework, integrating a sliding window approach, joint spatial-channel attention mechanism, and architectural heterogeneity between shallow and deep layers. Our proposed network enables precise 3D sweat gland segmentation from skin volume data captured by optical coherence tomography (OCT). For the first time, subtle variations of sweat gland 3D morphology in response to temperature changes, have been visualized and quantified. Our approach establishes a benchmark for normal sweat gland morphology and provides a real-time, non-invasive tool for quantifying 3D structural parameters. This enables the study of individual variability and pathological changes in sweat gland structure, advancing dermatological research and clinical applications, including thermoregulation and bromhidrosis treatment.
Channel knowledge map (CKM) is a promising technology to enable environment-aware wireless communications and sensing with greatly enhanced performance, by offering location-specific channel prior information for future wireless networks. One fundamental problem for CKM-enabled wireless systems lies in how to construct high-quality and complete CKM for all locations of interest, based on only limited and noisy on-site channel knowledge data. This problem resembles the long-standing ill-posed inverse problem, which tries to infer from a set of limited and noisy observations the cause factors that produced them. By utilizing the recent advances of solving inverse problems with learned priors using generative artificial intelligence (AI), we propose CKMDiff, a conditional diffusion model that can be applied to perform various tasks for CKM constructions such as denoising, inpainting, and super-resolution, without having to know the physical environment maps or transceiver locations. Furthermore, we propose an environment-aware data augmentation mechanism to enhance the model's ability to learn implicit relations between electromagnetic propagation patterns and spatial-geometric features. Extensive numerical results are provided based on the CKMImageNet and RadioMapSeer datasets, which demonstrate that the proposed CKMDiff achieves state-of-the-art performance, outperforming various benchmark methods.
This paper investigates the impact of false data injection attacks on data-driven control systems. Specifically, we consider an adversary injecting false data into the sensor channels during the learning phase. When the operator seeks to learn a stable state-feedback controller, we propose an attack strategy capable of misleading the operator into learning an unstable feedback gain. We also investigate the effects of constant-bias injection attacks on data-driven linear quadratic regulation (LQR). Finally, we explore potential mitigation strategies and support our findings with numerical examples.
Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images (WSIs). However, conventional MIL methods such as Attention-Based Deep Multiple Instance Learning (ABMIL) typically disregard spatial interactions among patches that are crucial to pathological diagnosis. Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships. However, it remains unclear whether explicitly modeling patch relationships yields similar performance gains in ABMIL, which relies solely on Multi-Layer Perceptrons (MLPs). In contrast, TransMIL employs Transformer-based layers, introducing a fundamental architectural shift at the cost of substantially increased computational complexity. In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question. Our proposed model, Global ABMIL (GABMIL), explicitly captures inter-instance dependencies while preserving computational efficiency. Experimental results on two publicly available datasets for tumor subtyping in breast and lung cancers demonstrate that GABMIL achieves up to a 7 percentage point improvement in AUPRC and a 5 percentage point increase in the Kappa score over ABMIL, with minimal or no additional computational overhead. These findings underscore the importance of incorporating patch interactions within MIL frameworks.
This paper presents new graph-theoretic conditions for structural target controllability of directed networks. After reviewing existing conditions and highlighting some gaps in the literature, we introduce a new class of network systems, named Christmas trees, which generalizes trees and cacti. We then establish a graph-theoretic characterization of sets of nodes that are structurally target controllable for a simple subclass of Christmas trees. Our characterization applies to general network systems by considering spanning subgraphs of Christmas tree class and allows us to uncover target controllable sets that existing criteria fail to identify.
This paper presents a comprehensive analytical framework for evaluating filtering penalties in ASE-noise-limited coherent optical links. The model accounts for the cumulative effects of cascaded optical filters, amplifier-induced ASE noise, and transceiver noise, alongside digital equalization at the receiver. By developing a generalized channel representation, we derive closed-form expressions for signal-to-noise ratio degradation under various equalization strategies, including Zero-Forcing Equalizer, Minimum Mean Square Error Equalizer, and Fractionally Spaced Equalizer. These models capture the impact of colored noise resulting from linear filtering and provide both time and frequency-domain insights. The proposed framework is validated through experimental comparisons using accurately modeled optical filters, demonstrating close agreement between theory and practice and offering a robust foundation for system-level performance evaluation in metro-access networks.
Usually, a controller for path- or trajectory tracking is employed in autonomous driving. Typically, these controllers generate high-level commands like longitudinal acceleration or force. However, vehicles with combustion engines expect different actuation inputs. This paper proposes a longitudinal control concept that translates high-level trajectory-tracking commands to the required low-level vehicle commands such as throttle, brake pressure and a desired gear. We chose a modular structure to easily integrate different trajectory-tracking control algorithms and vehicles. The proposed control concept enables a close tracking of the high-level control command. An anti-lock braking system, traction control, and brake warmup control also ensure a safe operation during real-world tests. We provide experimental validation of our concept using real world data with longitudinal accelerations reaching up to $25 \, \frac{\mathrm{m}}{\mathrm{s}^2}$. The experiments were conducted using the EAV24 racecar during the first event of the Abu Dhabi Autonomous Racing League on the Yas Marina Formula 1 Circuit.
This paper considers a reconfigurable intelligent surface (RIS) aided wireless communication system where the transmitter employs one-sided amplitude-shift keying (ASK) for data modulation and the receiver employs an optimal noncoherent maximum-likelihood rule for symbol detection. A novel statistical analysis is presented to approximate the weighted sum of a central and a non-central chi-squared random variable, which is used to derive a novel closed-form expression for the symbol error probability (SEP) of the noncoherent system. Furthermore, an optimization problem to minimize the system's SEP under transmission energy constraint is proposed, and novel algorithms are presented to obtain the optimal ASK constellation minimizing the SEP. Numerical results indicate that the noncoherent system achieves superior error performance and higher diversity order with the optimal ASK constellation compared to the traditional equispaced ASK constellation. Further, the optimal ASK significantly differs from the traditional one, with the difference becoming significant with an increase in the average signal-to-noise ratio and at a lower number of RIS's reflecting elements.
We propose geometric shaping for IM-DD links dominated by relative intensity noise (RIN). For 400 Gbps links, our geometrically-shaped constellations result in error probability improvements that relaxes the RIN laser design by 3 dB.
Advanced sound zone control (SZC) techniques typically rely on massive multi-channel loudspeaker arrays to create high-contrast personal sound zones, making single-loudspeaker SZC seem impossible. In this Letter, we challenge this paradigm by introducing the multi-carrier parametric loudspeaker (MCPL), which enables SZC using only a single loudspeaker. In our approach, distinct audio signals are modulated onto separate ultrasonic carrier waves at different frequencies and combined into a single composite signal. This signal is emitted by a single-channel ultrasonic transducer, and through nonlinear demodulation in air, the audio signals interact to virtually form multi-channel outputs. This novel capability allows the application of existing SZC algorithms originally designed for multi-channel loudspeaker arrays. Simulations validate the effectiveness of our proposed single-channel MCPL, demonstrating its potential as a promising alternative to traditional multi-loudspeaker systems for achieving high-contrast SZC. Our work opens new avenues for simplifying SZC systems without compromising performance.
The increasing integration of inverter-based resources (IBRs) into the power grid introduces new challenges, requiring detailed electromagnetic transient (EMT) studies to analyze system interactions. Despite these needs, access to the internal firmware of power electronic devices remains restricted due to stringent nondisclosure agreements enforced by manufacturers. To address this, we explore three system identification techniques: sweep frequency response analysis (SFRA), step excitation method (SEM), and eigensystem realization algorithm (ERA). SFRA employs sinusoidal signals of varying frequencies to measure the system's frequency response, while SEM and ERA utilize step functions to derive time-domain responses and transform them into Laplace-domain transfer functions. All three approaches are shown to provide consistent results in identifying the dq admittance of grid-forming inverters (GFM) over a frequency range of 1 Hz to 100 Hz.
The conventional digital beamforming technique needs one radio frequency (RF) chain per antenna element. High power consumption, significantly high cost of RF chain components per antenna and complex signal processing task at base band makes digital beamforming unsuitable for implementation in massive MIMO system. Hybrid beamforming takes the benefits of both analog and digital beamforming schemes. Near optimal performance can be achieved using very few RF chains in hybrid beamforming method. By optimizing the hybrid transmit and receive beamformers, the spectral efficiency of a system can be maximized. The hybrid beamforming optimization problem is non-convex optimization problem due to the non-convexity of the constraints. In this research work, the millimeter wave massive MIMO system performance implementing the hybrid precoding based on the conventional methods and deep neural network is studied considering the spectral efficiency, bit error rate and complexity. It is shown that the deep neural network based approach for hybrid precoding optimization outperforms conventional techniques by solving the non-convex optimization problem. Moreover, the DNN model accuracy is also analyzed by observing the training accuracy and test accuracy.
The paper focuses on the problem of tracking eigenvalue trajectories in large-scale power system models as system parameters vary. A continuation-based formulation is presented for tracing any single eigenvalue of interest, which supports sparse matrix representations and accommodates both explicit and semi-implicit differential-algebraic models. Key implementation aspects, such as numerical integration, matrix updates, derivative approximations, and handling defective eigenvalues, are discussed in detail and practical recommendations are duly provided. The tracking approach is demonstrated through a comprehensive case study on the IEEE 39-bus system, as well as on a realistic dynamic model of the Irish transmission system.
This paper investigates semi-blind channel estimation for massive multiple-input multiple-output (MIMO) systems. To this end, we first estimate a subspace based on all received symbols (pilot and payload) to provide additional information for subsequent channel estimation. We show how this additional information enhances minimum mean square error (MMSE) channel estimation. Two variants of the linear MMSE (LMMSE) estimator are formulated, where the first one solves the estimation within the subspace, and the second one uses a subspace projection as a preprocessing step. Theoretical derivations show the superior estimation performance of the latter method in terms of mean square error for uncorrelated Rayleigh fading. Subsequently, we introduce parameterizations of this semi-blind LMMSE estimator based on two different conditional Gaussian latent models, i.e., the Gaussian mixture model and the variational autoencoder. Both models learn the underlying channel distribution of the propagation environment based on training data and serve as generative priors for semi-blind channel estimation. Extensive simulations for real-world measurement data and spatial channel models show the superior performance of the proposed methods compared to state-of-the-art semi-blind channel estimators with respect to the MSE.
Precise assembly of composite fuselages is critical for aircraft assembly to meet the ultra-high precision requirements. Due to dimensional variations, there is a gap when two fuselage assemble. In practice, actuators are required to adjust fuselage dimensions by applying forces to specific points on fuselage edge through pulling or pushing force actions. The positioning and force settings of these actuators significantly influence the efficiency of the shape adjustments. The current literature usually predetermines the fixed number of actuators, which is not optimal in terms of overall quality and corresponding actuator costs. However, optimal placement of actuators in terms of both locations and number is challenging due to compliant structures, complex material properties, and dimensional variabilities of incoming fuselages. To address these challenges, this paper introduces a reinforcement learning (RL) framework that enables sequential decision-making for actuator placement selection and optimal force computation. Specifically, our methodology employs the Dueling Double Deep Q-Learning (D3QN) algorithm to refine the decision-making capabilities of sequential actuator placements. The environment is meticulously crafted to enable sequential and incremental selection of an actuator based on system states. We formulate the actuator selection problem as a submodular function optimization problem, where the sub-modularity properties can be adopted to efficiently achieve near-optimal solutions. The proposed methodology has been comprehensively evaluated through numerical studies and comparison studies, demonstrating its effectiveness and outstanding performance in enhancing assembly precision with limited actuator numbers.
Diabetic foot ulcers (DFUs) pose a significant challenge in healthcare, requiring precise and efficient wound assessment to enhance patient outcomes. This study introduces the Attention Diffusion Zero-shot Unsupervised System (ADZUS), a novel text-guided diffusion model that performs wound segmentation without relying on labeled training data. Unlike conventional deep learning models, which require extensive annotation, ADZUS leverages zero-shot learning to dynamically adapt segmentation based on descriptive prompts, offering enhanced flexibility and adaptability in clinical applications. Experimental evaluations demonstrate that ADZUS surpasses traditional and state-of-the-art segmentation models, achieving an IoU of 86.68\% and the highest precision of 94.69\% on the chronic wound dataset, outperforming supervised approaches such as FUSegNet. Further validation on a custom-curated DFU dataset reinforces its robustness, with ADZUS achieving a median DSC of 75\%, significantly surpassing FUSegNet's 45\%. The model's text-guided segmentation capability enables real-time customization of segmentation outputs, allowing targeted analysis of wound characteristics based on clinical descriptions. Despite its competitive performance, the computational cost of diffusion-based inference and the need for potential fine-tuning remain areas for future improvement. ADZUS represents a transformative step in wound segmentation, providing a scalable, efficient, and adaptable AI-driven solution for medical imaging.
While electrifying transportation eliminates tailpipe greenhouse gas (GHG) emissions, electric vehicle (EV) adoption can create additional electricity sector emissions. To quantify this emissions impact, prior work typically employs short-run marginal emissions or average emissions rates calculated from historical data or power systems models that do not consider changes in installed capacity. In this work, we use an electricity system capacity expansion model to consider the full consequential GHG emissions impact from large-scale EV adoption in the western United States, accounting for induced changes in generation and storage capacity. We find that the metrics described above do not accurately reflect the true emissions impact of EV adoption-average emissions rates can either under- or over-estimate emission impacts, and short-run marginal emissions rates can significantly underestimate emission reductions, especially when charging timing is flexible. Our results also show that using short-run marginal emission rates as signals to coordinate EV charging could increase emissions relative to price-based charging signals, indicating the need for alternative control strategies to minimize consequential emissions.
Accurate mobile device localization is critical for emerging 5G/6G applications such as autonomous vehicles and augmented reality. In this paper, we propose a unified localization method that integrates model-based and machine learning (ML)-based methods to reap their respective advantages by exploiting available map information. In order to avoid supervised learning, we generate training labels automatically via optimal transport (OT) by fusing geometric estimates with building layouts. Ray-tracing based simulations are carried out to demonstrate that the proposed method significantly improves positioning accuracy for both line-of-sight (LoS) users (compared to ML-based methods) and non-line-of-sight (NLoS) users (compared to model-based methods). Remarkably, the unified method is able to achieve competitive overall performance with the fully-supervised fingerprinting, while eliminating the need for cumbersome labeled data measurement and collection.
Purpose: This work proposes a novel self-supervised noise-adaptive image denoising framework, called Repetition to Repetition (Rep2Rep) learning, for low-field (<1T) MRI applications. Methods: Rep2Rep learning extends the Noise2Noise framework by training a neural network on two repeated MRI acquisitions, using one repetition as input and another as target, without requiring ground-truth data. It incorporates noise-adaptive training, enabling denoising generalization across varying noise levels and flexible inference with any number of repetitions. Performance was evaluated on both synthetic noisy brain MRI and 0.55T prostate MRI data, and compared against supervised learning and Monte Carlo Stein's Unbiased Risk Estimator (MC-SURE). Results: Rep2Rep learning outperforms MC-SURE on both synthetic and 0.55T MRI datasets. On synthetic brain data, it achieved denoising quality comparable to supervised learning and surpassed MC-SURE, particularly in preserving structural details and reducing residual noise. On the 0.55T prostate MRI dataset, a reader study showed radiologists preferred Rep2Rep-denoised 2-average images over 8-average noisy images. Rep2Rep demonstrated robustness to noise-level discrepancies between training and inference, supporting its practical implementation. Conclusion: Rep2Rep learning offers an effective self-supervised denoising for low-field MRI by leveraging routinely acquired multi-repetition data. Its noise-adaptivity enables generalization to different SNR regimes without clean reference images. This makes Rep2Rep learning a promising tool for improving image quality and scan efficiency in low-field MRI.
We propose a fully unsupervised algorithm that detects from encephalography (EEG) recordings when a subject actively listens to sound, versus when the sound is ignored. This problem is known as absolute auditory attention decoding (aAAD). We propose an unsupervised discriminative CCA model for feature extraction and combine it with an unsupervised classifier called minimally informed linear discriminant analysis (MILDA) for aAAD classification. Remarkably, the proposed unsupervised algorithm performs significantly better than a state-of-the-art supervised model. A key reason is that the unsupervised algorithm can successfully adapt to the non-stationary test data at a low computational cost. This opens the door to the analysis of the auditory attention of a subject using EEG signals with a model that automatically tunes itself to the subject without requiring an arduous supervised training session beforehand.
Exosuits have recently been developed as alternatives to rigid exoskeletons and are increasingly adopted for both upper and lower limb therapy and assistance in clinical and home environments. Many cable-driven exosuits have been developed but little has been published on their electromechanical designs and performance. Therefore, this paper presents a comprehensive design and performance analysis of a two degree of freedom tendon driver unit (TDU) for cable-driven wearable exosuits. Detailed methodologies are presented to benchmark the functionality of the TDU. A static torque output test compares the commanded and measured torques. A velocity control test evaluates the attenuation and phase shift across velocities. A noise test evaluates how loud the TDU is for the wearer under different speeds. A thermal stress test captures the cooling performance of the TDU to ensure safe operation at higher loads. Finally, a battery endurance test evaluates the runtime of the TDU under various loading conditions to inform the usable time. To demonstrate these tests, a modular TDU system for cable-driven applications is introduced, which allows components such as motors, pulleys, and sensors to be adapted based on the requirements of the intended application. By sharing detailed methodologies and performance results, this study aims to provide a TDU design that may be leveraged by others and resources for researchers and engineers to better document the capabilities of their TDU designs.
Multi-modal large language models (MLLMs) have recently achieved great success in processing and understanding information from diverse modalities (e.g., text, audio, and visual signals). Despite their growing popularity, there remains a lack of comprehensive evaluation measuring the audio-visual capabilities of these models, especially in diverse scenarios (e.g., distribution shifts and adversarial attacks). In this paper, we present a multifaceted evaluation of the audio-visual capability of MLLMs, focusing on four key dimensions: effectiveness, efficiency, generalizability, and robustness. Through extensive experiments, we find that MLLMs exhibit strong zero-shot and few-shot generalization abilities, enabling them to achieve great performance with limited data. However, their success relies heavily on the vision modality, which impairs performance when visual input is corrupted or missing. Additionally, while MLLMs are susceptible to adversarial samples, they demonstrate greater robustness compared to traditional models. The experimental results and our findings provide insights into the audio-visual capabilities of MLLMs, highlighting areas for improvement and offering guidance for future research.
Free space optical (FSO) communication using lasers is a rapidly developing field in telecommunications that can offer advantages over traditional radio frequency technology. For example, optical laser links may allow transmissions at far higher data rates, require less operating power and smaller systems and have a smaller risk of interception. In recent years, FSO laser links have been demonstrated, tested or integrated in a range of environments and scenarios. These include FSO links for terrestrial communication, between ground stations and cube-sats in low Earth orbit, between ground and satellite in lunar orbit, as part of scientific or commercial space relay networks, and deep space communications beyond the moon. The possibility of FSO links from and to the surface of Mars could be a natural extension of these developments. In this paper we evaluate some effects of the Martian atmosphere on the propagation of optical communication links, with an emphasis on the impact of dust on the total link budget. We use the output of the Mars Climate Database to generate maps of the dust optical depth for a standard Mars climatology, as well as for a warm (dusty) atmosphere. These dust optical depths are then extrapolated to a wavelength of 1.55 um, and translated into total slant path optical depths to calculate link budgets and availability statistics for a link between the surface and a satellite in a sun-synchronous orbit. The outcomes of this study are relevant to potential future missions to Mars that may require laser communications to or from its surface. For example, the results could be used to constrain the design of communication terminals suitable to the Mars environment, or to assess the link performance as a function of ground station location.
As semantic communication (SemCom) gains increasing attention as a novel communication paradigm, ensuring the security of transmitted semantic information over open wireless channels becomes crucial. Existing secure SemCom solutions often lack explicit control over security. To address this, we propose a coding-enhanced jamming approach for secure SemCom over wiretap channels. This approach integrates deep joint source and channel coding (DeepJSCC) with neural network-based digital modulation, enabling controlled jamming through two-layer superposition coding. The outer constellation sequence encodes the source image, while the inner constellation sequence, derived from a secret image, acts as the jamming signal. By minimizing the mutual information between the outer and inner constellation sequences, the jamming effect is enhanced. The jamming signal is superposed on the outer constellation sequence, preventing the eavesdropper from recovering the source image. The power allocation coefficient (PAC) in the superposition coding can be adjusted to control system security. Experiments show that our approach matches existing methods in security while significantly improving reconstruction performance across varying channel signal-to-noise ratios (SNRs) and compression ratios.
The rapid growth of unlabeled time-series data in domains such as wireless communications, radar, biomedical engineering, and the Internet of Things (IoT) has driven advancements in unsupervised learning. This review synthesizes recent progress in applying autoencoders and vision transformers for unsupervised signal analysis, focusing on their architectures, applications, and emerging trends. We explore how these models enable feature extraction, anomaly detection, and classification across diverse signal types, including electrocardiograms, radar waveforms, and IoT sensor data. The review highlights the strengths of hybrid architectures and self-supervised learning, while identifying challenges in interpretability, scalability, and domain generalization. By bridging methodological innovations and practical applications, this work offers a roadmap for developing robust, adaptive models for signal intelligence.
In this paper, we present an impedance control framework on the SE(3) manifold, which enables force tracking while guaranteeing passivity. Building upon the unified force-impedance control (UFIC) and our previous work on geometric impedance control (GIC), we develop the geometric unified force impedance control (GUFIC) to account for the SE(3) manifold structure in the controller formulation using a differential geometric perspective. As in the case of the UFIC, the GUFIC utilizes energy tank augmentation for both force-tracking and impedance control to guarantee the manipulator's passivity relative to external forces. This ensures that the end effector maintains safe contact interaction with uncertain environments and tracks a desired interaction force. Moreover, we resolve a non-causal implementation problem in the UFIC formulation by introducing velocity and force fields. Due to its formulation on SE(3), the proposed GUFIC inherits the desirable SE(3) invariance and equivariance properties of the GIC, which helps increase sample efficiency in machine learning applications where a learning algorithm is incorporated into the control law. The proposed control law is validated in a simulation environment under scenarios requiring tracking an SE(3) trajectory, incorporating both position and orientation, while exerting a force on a surface. The codes are available at https://github.com/Joohwan-Seo/GUFIC_mujoco.
Singular arcs emerge in the solutions of Optimal Control Problems (OCPs) when the optimal inputs on some finite time intervals cannot be directly obtained via the optimality conditions. Solving OCPs with singular arcs often requires tailored treatments, suitable for offline trajectory optimization. This approach can become increasingly impractical for online closed-loop implementations, especially for large-scale engineering problems. Recent development of Integrated Residual Methods (IRM) have indicated their suitability for handling singular arcs; the convergence of error measures in IRM automatically suppresses singular arc-induced fluctuations and leads to non-fluctuating solutions more suitable for practical problems. Through several examples, we demonstrate the advantages of solving OCPs with singular arcs using {IRM} under an economic model predictive control framework. In particular, the following observations are made: (i) IRM does not require special treatment for singular arcs, (ii) it solves the OCPs reliably with singular arc fluctuation suppressed, and (iii) the closed-loop results closely match the analytic optimal solutions.
Contraction metrics are crucial in control theory because they provide a powerful framework for analyzing stability, robustness, and convergence of various dynamical systems. However, identifying these metrics for complex nonlinear systems remains an open challenge due to the lack of scalable and effective tools. This paper explores the approach of learning verifiable contraction metrics parametrized as neural networks (NNs) for discrete-time nonlinear dynamical systems. While prior works on formal verification of contraction metrics for general nonlinear systems have focused on convex optimization methods (e.g. linear matrix inequalities, etc) under the assumption of continuously differentiable dynamics, the growing prevalence of NN-based controllers, often utilizing ReLU activations, introduces challenges due to the non-smooth nature of the resulting closed-loop dynamics. To bridge this gap, we establish a new sufficient condition for establishing formal neural contraction metrics for general discrete-time nonlinear systems assuming only the continuity of the dynamics. We show that from a computational perspective, our sufficient condition can be efficiently verified using the state-of-the-art neural network verifier $\alpha,\!\beta$-CROWN, which scales up non-convex neural network verification via novel integration of symbolic linear bound propagation and branch-and-bound. Built upon our analysis tool, we further develop a learning method for synthesizing neural contraction metrics from sampled data. Finally, our approach is validated through the successful synthesis and verification of NN contraction metrics for various nonlinear examples.
This work presents a novel approach for analyzing and controlling bearing rigidity in multi-robot networks with dynamic topology. By decomposing the system's framework into subframeworks, we express bearing rigidity, a global property, as a set of local properties, with rigidity eigenvalues serving as natural local rigidity metrics. We propose a decentralized, scalable, gradient-based controller that uses only bearing measurements to execute mission-specific commands. The controller preserves bearing rigidity by maintaining rigidity eigenvalues above a threshold, and also avoids inter-robot collisions. Simulations confirm the scheme's effectiveness, with information exchange confined to subframeworks, underscoring its scalability and practicality.
X-ray absorption near edge structure (XANES) spectroscopy is a powerful technique for characterizing the chemical state and symmetry of individual elements within materials, but requires collecting data at many energy points which can be time-consuming. While adaptive sampling methods exist for efficiently collecting spectroscopic data, they often lack domain-specific knowledge about XANES spectra structure. Here we demonstrate a knowledge-injected Bayesian optimization approach for adaptive XANES data collection that incorporates understanding of spectral features like absorption edges and pre-edge peaks. We show this method accurately reconstructs the absorption edge of XANES spectra using only 15-20% of the measurement points typically needed for conventional sampling, while maintaining the ability to determine the x-ray energy of the sharp peak after absorption edge with errors less than 0.03 eV, the absorption edge with errors less than 0.1 eV; and overall root-mean-square errors less than 0.005 compared to compared to traditionally sampled spectra. Our experiments on battery materials and catalysts demonstrate the method's effectiveness for both static and dynamic XANES measurements, improving data collection efficiency and enabling better time resolution for tracking chemical changes. This approach advances the degree of automation in XANES experiments reducing the common errors of under- or over-sampling points in near the absorption edge and enabling dynamic experiments that require high temporal resolution or limited measurement time.
Stochastic Optimal Control (SOC) problems arise in systems influenced by uncertainty, such as autonomous robots or financial models. Traditional methods like dynamic programming are often intractable for high-dimensional, nonlinear systems due to the curse of dimensionality. This dissertation explores the path integral control framework as a scalable, sampling-based alternative. By reformulating SOC problems as expectations over stochastic trajectories, it enables efficient policy synthesis via Monte Carlo sampling and supports real-time implementation through GPU parallelization. We apply this framework to six classes of SOC problems: Chance-Constrained SOC, Stochastic Differential Games, Deceptive Control, Task Hierarchical Control, Risk Mitigation of Stealthy Attacks, and Discrete-Time LQR. A sample complexity analysis for the discrete-time case is also provided. These contributions establish a foundation for simulator-driven autonomy in complex, uncertain environments.
We consider dynamical models given by rational ODE systems. Parameter estimation is an important and challenging task of recovering parameter values from observed data. Recently, a method based on differential algebra and rational interpolation was proposed to express parameter estimation in terms of polynomial system solving. Typically, polynomial system solving is a bottleneck, hence the choice of the polynomial solver is crucial. In this contribution, we compare two polynomial system solvers applied to parameter estimation: homotopy continuation solver from HomotopyContinuation.jl and our new implementation of a certified solver based on rational univariate representation (RUR) and real root isolation. We show how the new RUR solver can tackle examples that are out of reach for the homotopy methods and vice versa.
A substantial amount of research has demonstrated the robustness and accuracy of the Riemannian minimum distance to mean (MDM) classifier for all kinds of EEG-based brain--computer interfaces (BCIs). This classifier is simple, fully deterministic, robust to noise, computationally efficient, and prone to transfer learning. Its training is very simple, requiring just the computation of a geometric mean of a symmetric positive-definite (SPD) matrix per class. We propose an improvement of the MDM involving a number of power means of SPD matrices instead of the sole geometric mean. By the analysis of 20 public databases, 10 for the motor-imagery BCI paradigm and 10 for the P300 BCI paradigm, comprising 587 individuals in total, we show that the proposed classifier clearly outperforms the MDM, approaching the state-of-the art in terms of performance while retaining the simplicity and the deterministic behavior. In order to promote reproducible research, our code will be released as open source.
As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for NN inference on reconfigurable hardware such as FPGAs. This is achieved by combining pruning via regularized training, weight sharing and linear computation coding (LCC). Contrary to common NN compression techniques, where the objective is to reduce the memory used for storing the weights of the NNs, our approach is optimized to reduce the number of additions required for inference in a hardware-friendly manner. The proposed scheme achieves competitive performance for simple multilayer perceptrons, as well as for large scale deep NNs such as ResNet-34.
We consider a network of identical, first-order linear systems, and investigate how replacing a subset of the systems composing the network with higher-order ones, either taken to be generic or specifically designed, may affect its controllability. After establishing a correspondence between state controllability in networks of first-order systems with output controllability in networks of higher-order systems, we show that adding higher-order dynamics may require significantly fewer subsystem modifications to achieve structural controllability, when compared to first-order heterogeneous subsystems. Furthermore, we characterize the topology of networks (which we call X-networks) in which the introduction of heterogeneous local dynamics is not necessary for structural output controllability, as the latter can be attained by suitable higher-order subsystems with homogeneous internal dynamics.
The recent global spread of monkeypox, particularly in regions where it has not historically been prevalent, has raised significant public health concerns. Early and accurate diagnosis is critical for effective disease management and control. In response, this study proposes a novel deep learning-based framework for the automated detection of monkeypox from skin lesion images, leveraging the power of transfer learning, dimensionality reduction, and advanced machine learning techniques. We utilize the newly developed Monkeypox Skin Lesion Dataset (MSLD), which includes images of monkeypox, chickenpox, and measles, to train and evaluate our models. The proposed framework employs the Xception architecture for deep feature extraction, followed by Principal Component Analysis (PCA) for dimensionality reduction, and the Natural Gradient Boosting (NGBoost) algorithm for classification. To optimize the model's performance and generalization, we introduce the African Vultures Optimization Algorithm (AVOA) for hyperparameter tuning, ensuring efficient exploration of the parameter space. Our results demonstrate that the proposed AVOA-NGBoost model achieves state-of-the-art performance, with an accuracy of 97.53%, F1-score of 97.72% and an AUC of 97.47%. Additionally, we enhance model interpretability using Grad-CAM and LIME techniques, providing insights into the decision-making process and highlighting key features influencing classification. This framework offers a highly precise and efficient diagnostic tool, potentially aiding healthcare providers in early detection and diagnosis, particularly in resource-constrained environments.
Navigating unmanned aerial vehicles (UAVs) through cluttered and dynamic environments remains a significant challenge, particularly when dealing with fast-moving or sudden-appearing obstacles. This paper introduces a complete LiDAR-based system designed to enable UAVs to avoid various moving obstacles in complex environments. Benefiting the high computational efficiency of perception and planning, the system can operate in real time using onboard computing resources with low latency. For dynamic environment perception, we have integrated our previous work, M-detector, into the system. M-detector ensures that moving objects of different sizes, colors, and types are reliably detected. For dynamic environment planning, we incorporate dynamic object predictions into the integrated planning and control (IPC) framework, namely DynIPC. This integration allows the UAV to utilize predictions about dynamic obstacles to effectively evade them. We validate our proposed system through both simulations and real-world experiments. In simulation tests, our system outperforms state-of-the-art baselines across several metrics, including success rate, time consumption, average flight time, and maximum velocity. In real-world trials, our system successfully navigates through forests, avoiding moving obstacles along its path.
Safety-critical whole-body robot control demands reactive methods that ensure collision avoidance in real-time. Complementarity constraints and control barrier functions (CBF) have emerged as core tools for ensuring such safety constraints, and each represents a well-developed field. Despite addressing similar problems, their connection remains largely unexplored. This paper bridges this gap by formally proving the equivalence between these two methodologies for sampled-data, first-order systems, considering both single and multiple constraint scenarios. By demonstrating this equivalence, we provide a unified perspective on these techniques. This unification has theoretical and practical implications, facilitating the cross-application of robustness guarantees and algorithmic improvements between complementarity and CBF frameworks. We discuss these synergistic benefits and motivate future work in the comparison of the methods in more general cases.
Intelligent condition monitoring of wind turbines is essential for reducing downtimes. Machine learning models trained on wind turbine operation data are commonly used to detect anomalies and, eventually, operation faults. However, data-driven normal behavior models (NBMs) require a substantial amount of training data, as NBMs trained with scarce data may result in unreliable fault diagnosis. To overcome this limitation, we present a novel generative deep learning approach to make SCADA samples from one wind turbine lacking training data resemble SCADA data from wind turbines with representative training data. Through CycleGAN-based domain mapping, our method enables the application of an NBM trained on an existing wind turbine to one with severely limited data. We demonstrate our approach on field data mapping SCADA samples across 7 substantially different WTs. Our findings show significantly improved fault diagnosis in wind turbines with scarce data. Our method achieves the most similar anomaly scores to an NBM trained with abundant data, outperforming NBMs trained on scarce training data with improvements of +10.3% in F1-score when 1 month of training data is available and +16.8% when 2 weeks are available. The domain mapping approach outperforms conventional fine-tuning at all considered degrees of data scarcity, ranging from 1 to 8 weeks of training data. The proposed technique enables earlier and more reliable fault diagnosis in newly installed wind farms, demonstrating a novel and promising research direction to improve anomaly detection when faced with training data scarcity.
In this paper, we address the problem of designing stochastic model predictive control (SMPC) schemes for linear systems affected by unbounded disturbances. The contribution of the paper is rooted in a measured-state initialization strategy. First, due to the nonzero probability of violating chance-constraints in the case of unbounded noise, we introduce ellipsoidal-based probabilistic reachable sets and we include constraint relaxations to recover recursive feasibility conditioned to the measured state. Second, we prove that the solution of this novel SMPC scheme guarantees closed-loop chance constraints satisfaction under minimum relaxation. Last, we demonstrate that, in expectation, the need of relaxing the constraints vanishes over time, which leads the closed-loop trajectories steered towards the unconstrained LQR invariant region. This novel SMPC scheme is proven to satisfy the recursive feasibility conditioned to the state realization, and its superiority with respect to open-loop initialization schemes is shown through numerical examples.
Modern edge devices, such as cameras, drones, and Internet-of-Things nodes, rely on deep learning to enable a wide range of intelligent applications, including object recognition, environment perception, and autonomous navigation. However, deploying deep learning models directly on the often resource-constrained edge devices demands significant memory footprints and computational power for real-time inference using traditional digital computing architectures. In this paper, we present WISE, a novel computing architecture for wireless edge networks designed to overcome energy constraints in deep learning inference. WISE achieves this goal through two key innovations: disaggregated model access via wireless broadcasting and in-physics computation of general complex-valued matrix-vector multiplications directly at radio frequency. Using a software-defined radio platform with wirelessly broadcast model weights over the air, we demonstrate that WISE achieves 95.7% image classification accuracy with ultra-low operation power of 6.0 fJ/MAC per client, corresponding to a computation efficiency of 165.8 TOPS/W. This approach enables energy-efficient deep learning inference on wirelessly connected edge devices, achieving more than two orders of magnitude improvement in efficiency compared to traditional digital computing.