Existing cell-free integrated sensing and communication (CF-ISAC) beamforming algorithms predominantly rely on classical optimization techniques, which often entail high computational complexity and limited scalability. Meanwhile, recent learning-based approaches have difficulty capturing the global interactions and long-range dependencies among distributed access points (APs), communication users, and sensing targets. To address these limitations, we propose the first Set Transformer-based CF-ISAC beamforming framework (STCIB). By exploiting attention mechanisms, STCIB explicitly models global relationships among network entities, naturally handles unordered input sets, and preserves permutation invariance across APs, users, and targets. The proposed framework operates in an unsupervised manner, eliminating the need for labeled training data, and supports three design regimes: (i) sensing-centric, (ii) communication-centric, and (iii) joint ISAC optimization. We benchmark STCIB against a convolutional neural network (CNN) baseline and two state-of-the-art optimization algorithms: the convex-concave procedure algorithm (CCPA) and augmented Lagrangian manifold optimization (ALM-MO). Numerical results demonstrate that STCIB consistently outperforms the CNN, achieving substantially higher ISAC performance with only a negligible increase in runtime. For instance, in regime (iii), at $\eta$=0.4, STCIB improves the sensing and communication sum rates by 14.8 % and 31.6 %, respectively, relative to the CNN, while increasing runtime by only 0.26 %. Compared with CCPA and ALM-MO, STCIB offers significantly lower computational cost while maintaining modest performance gains. In regime (i), for a 3.0 bps/Hz communication threshold, the runtime of STCIB is only 0.1 % and 0.3 % of that required by CCPA and ALM-MO, respectively, while improving the sensing sum rate by 4.45 % and 5.9 %.
Adversarial training is a defense method that trains machine learning models on intentionally perturbed attack inputs, so they learn to be robust against adversarial examples. This paper develops a robust voltage control framework for distribution networks with high penetration of distributed energy resources (DERs). Conventional voltage control methods are vulnerable to strategic cyber attacks, as they typically consider only random or black-box perturbations. To address this, we formulate white-box adversarial attacks using Projected Gradient Descent (PGD) and train a deep reinforcement learning (DRL) agent adversarially. The resulting policy adapts in real time to high-impact, strategically optimized perturbations. Simulations on DER-rich networks show that the approach maintains voltage stability and operational efficiency under realistic attack scenarios, highlighting the effectiveness of gradient-based adversarial DRL in enhancing robustness and adaptability in modern distribution system control.
We address target interception in contested environments in the presence of multiple defenders whose interception capability is limited by finite ranges. Conventional methods typically impose conservative stand-off constraints based on maximum engagement distance and neglect the interceptors' actuator limitations. Instead, we formulate safety constraints using defender-induced engagement zones. To account for actuator limits, the vehicle model is augmented with input saturation dynamics. A time-varying safe-set tightening parameter is introduced to compensate for transient constraint violations induced by actuator dynamics. To ensure scalable safety enforcement in multi-defender scenarios, a smooth aggregate safety function is constructed using a log-sum-exp operator combining individual threat measures associated with each defender's capability. A smooth switching guidance strategy is then developed to coordinate interception and safety objectives. The attacker pursues the target when sufficiently distant from threat boundaries and progressively activates evasive motion as the EZ boundaries are approached. The resulting controller relies only on relative measurements and does not require knowledge of defender control inputs, thus facilitating a fully distributed and scalable implementation. Rigorous analysis provides sufficient conditions guaranteeing target interception, practical safety with respect to all defender engagement zones, and satisfaction of actuator bounds. An input-constrained guidance law based on conservative stand-off distance is also developed to quantify the conservatism of maximum-range-based safety formulations. Simulations with stationary and maneuvering defenders demonstrate that the proposed formulation yields shorter interception paths and reduced interception time compared with conventional methods while maintaining safety throughout the engagement.
Speech Emotion Recognition (SER) in real-world scenarios remains challenging due to severe class imbalance and the prevalence of spontaneous, natural speech. While recent approaches leverage self-supervised learning (SSL) representations and multimodal fusion of speech and text, most existing methods apply supervision only at the final classification layer, limiting the discriminative power of intermediate representations. In this work, we propose Crab (Contrastive Representation and Multimodal Aligned Bottleneck), a bimodal Cross-Modal Transformer architecture that integrates speech representations from WavLM and textual representations from RoBERTa, together with a novel \textit{Multi Layer Contrastive Supervision} (MLCS) strategy. MLCS injects multi-positive contrastive learning signals at multiple layers of the network, encouraging emotionally discriminative representations throughout the model without introducing additional parameters at inference time. To further address data imbalance, we adopt weighted cross-entropy during training. We evaluate the proposed approach on three benchmark datasets covering different degrees of emotional naturalness: IEMOCAP, MELD, and MSP-Podcast 2.0. Experimental results demonstrate that Crab consistently outperforms strong unimodal and multimodal baselines across all datasets, with particularly large gains under naturalistic and highly imbalanced conditions. These findings highlight the effectiveness of \textit{Multi Layer Contrastive Supervision} as a general and robust strategy for SER. Official implementation can be found in this https URL.
Deep spatially selective filters achieve high-quality enhancement with real-time capable architectures for stationary speakers of known directions. To retain this level of performance in dynamic scenarios when only the speakers' initial directions are given, accurate, yet computationally lightweight tracking algorithms become necessary. Assuming a frame-wise causal processing style, temporal feedback allows for leveraging the enhanced speech signal to improve tracking performance. In this work, we investigate strategies to incorporate the enhanced signal into lightweight tracking algorithms and autoregressively guide deep spatial filters. Our proposed Bayesian tracking algorithms are compatible with arbitrary deep spatial filters. To increase the realism of simulated trajectories during development and evaluation, we propose and publish a novel dataset based on the social force model. Results validate that the autoregressive incorporation significantly improves the accuracy of our Bayesian trackers, resulting in superior enhancement with none or only negligibly increased computational overhead. Real-world recordings complement these findings and demonstrate the generalizability of our methods to unseen, challenging acoustic conditions.
Underwater observatories have recently emerged as an efficient mean of marine biodiversity monitoring. In order to conduct data muling from the underwater sensors in an efficient and cost-effective way, we consider the use of optical wireless communications to transmit the data from the underwater sensors to an aerial node close to the water surface, such as an unmanned aerial vehicle (UAV). More specifically, we utilize a direct water-to-air (W2A) optical communication link between the sensor node equipped with an LED emitter and the UAV equipped with an ultra-sensitive receiver, i.e., a silicon photomultiplier (SiPM). To characterize this particularly complex communication channel, we introduce a ray-tracing algorithm based on the Monte Carlo method, incorporating the impact of bubbles modeled through the Mie scattering theory and a realistic sea surface representation derived from the JONSWAP spectrum. Additionally, we incorporate into this model the channel losses resulting from UAV instability under windy weather conditions. Furthermore, we conduct a comprehensive analysis of the wireless channel, examining the influence of key parameters such as wind speed, transmitter configurations, and receiver characteristics. Finally, we evaluate the end-to-end performance of the system by analyzing the average bit-error rate at varying depths and data rates, providing valuable insights into the feasibility and efficiency of the proposed approach.
District heating systems (DHSs) require coordinated economic dispatch and temperature regulation under uncertain operating conditions. Existing DHS operation strategies often rely on disturbance forecasts and nominal models, so their economic and thermal performance may degrade when predictive information or model knowledge is inaccurate. This paper develops a data-driven online control framework for DHS operation by embedding steady-state economic optimality conditions into the temperature dynamics, so that the closed-loop system converges to the economically optimal operating point without relying on disturbance forecasts. Based on this formulation, we develop a Data-Enabled Policy Optimization (DeePO)-based online learning controller and incorporate Adaptive Moment Estimation (ADAM) to improve closed-loop performance. We further establish convergence and performance guarantees for the resulting closed-loop system. Simulations on an industrial-park DHS in Northern China show that the proposed method achieves stable near-optimal operation and strong empirical robustness to both static and time-varying model mismatch under practical disturbance conditions.
Accurate and timely crop yield estimation is critical for global food security, agricultural policy, and farm management. The Copernicus Sentinel-2 satellite constellation, with high spatial, temporal, and spectral resolution, has transformed agricultural monitoring by enabling field- and sub-field-scale analysis. This review synthesizes recent advances in Sentinel-2-based crop yield estimation. A key trend is the shift from regional models to high-resolution field-level assessments driven by three main approaches: (i) empirical models using vegetation indices combined with machine and deep learning methods such as Random Forest and Convolutional Neural Networks; (ii) integration of process-based crop growth models (e.g., WOFOST, SAFY) via data assimilation of Sentinel-2-derived variables like Leaf Area Index (LAI); and (iii) data fusion techniques combining Sentinel-2 optical data with Sentinel-1 SAR to mitigate cloud-related limitations. The review shows that machine learning, deep learning, and hybrid modeling frameworks can explain substantial within-field yield variability across crops and regions. However, performance remains constrained by limited ground-truth data, cloud-induced gaps, and challenges in model transferability across years and locations. Future directions include tighter integration of multi-modal data and improved in-season observations to support robust, operational decision-making in precision agriculture and sustainable intensification.
In this paper, we present DROP, high-Density Relocation-free sequential OPerations in automated valet parking. DROP addresses the challenges in high-density parking & vehicle retrieval without relocations. Each challenge is handled by jointly providing area-efficient layouts and relocation-free parking & exit sequences, considering accessibility with relocation-free sequential operations. To generate such sequences, relocation-free constraints are formulated as explicit logical conditions expressed in boolean variables. Recursive search strategies are employed to derive the logical conditions and enumerate relocation-free sequences under sequential constraints. We demonstrate the effectiveness of our framework through extensive simulations, showing its potential to significantly improve area utilization with relocation-free constraints. We also examine its viability on an application problem with prescribed operational order. The results from all experiments are available at: this https URL.
Since the introduction of Masked Autoencoders, various improvements to masking techniques have been explored. In this paper, we rethink masking strategies for audio representation learning using masked prediction-based self-supervised learning (SSL) on general audio spectrograms. While recent informed masking techniques have attracted attention, we observe that they incur substantial computational overhead. Motivated by this observation, we propose dispersion-weighted masking (DWM), a lightweight masking strategy that leverages the spectral sparsity inherent in the frequency structure of audio content. Our experiments show that inverse block masking, commonly used in recent SSL frameworks, improves audio event understanding performance while introducing a trade-off in generalization. The proposed DWM alleviates these limitations and computational complexity, leading to consistent performance improvements. This work provides practical guidance on masking strategy design for masked prediction-based audio representation learning.
The fading-memory (FM) property captures the progressive loss of influence of past inputs on a system's current output and has originally been formalized by Boyd and Chua in an operator-theoretic framework. Despite its importance for systems approximation, reservoir computing, and recurrent neural networks, its connection with state-space notions of nonlinear stability, especially incremental ones, remains understudied. This paper introduces a state-space definition of FM. In state-space, FM can be interpreted as an extension of incremental input-to-output stability ($\delta$IOS) that explicitly incorporates a memory kernel upper-bounding the decay of past input differences. It is also closely related to Boyd and Chua's FM definition, with the sole difference of requiring uniform, instead of general, continuity of the memory functional with respect to an input-fading norm. We demonstrate that incremental input-to-state stability ($\delta$ISS) implies FM semi-globally for time-invariant systems under an equibounded input assumption. Notably, Boyd and Chua's approximation theorems apply to delta-ISS state-space models. As a closing application, we show that, under mild assumptions, the state-space model of current-driven memristors possess the FM property.
Active incoherent millimeter-wave (AIM) imaging is a recently developed technique that has been shown to generate fast millimeter-wave imaging using sparse apertures and Fourier domain sampling. In these systems, spatial frequency sampling is determined by cross-correlation between antenna pairs, making array geometry an important aspect that dictates the field of view (FOV) and image quality. This work investigates the impact of array redundancy and spatial sampling diversity on AIM image reconstruction performance. We present a comparative study of three receive array configurations, including one simple circular design and two arrays obtained through optimization strategies designed to maximize unique spatial samples while preserving system resolution and FOV. Performance is evaluated using the image-domain metrics of structural similarity index (SSIM) and peak sidelobe level (PSL), enabling a quantitative assessment of reconstruction fidelity and artifact suppression. We perform experimental validation using a 38-GHz AIM imaging system, implementing a 24-element receive array within a 48-position reconfigurable aperture. Results demonstrate that optimized array configurations improve spatial sampling efficiency and yield measurable gains in reconstruction quality compared to a conventional circular array, highlighting the importance of array design for AIM imaging systems.
Analog beamforming holds great potential for future terahertz (THz) communications due to its ability to generate high-gain directional beams with low-cost phase shifters. However, conventional analog beamforming may suffer substantial performance degradation in wideband systems due to the beam squint effect. Instead of relying on high-cost true-time delayers, we propose an efficient six-dimensional movable antenna (6DMA) architecture to mitigate the beam-squint effect. In particular, we study a wideband wide-beam coverage problem in this paper, aiming to maximize the minimum beamforming gain over a given range of azimuth/elevation angles and frequencies by jointly optimizing the analog beamforming vector, the MA positions within a two-dimensional (2D) region, and the three-dimensional (3D) rotation angles of the antenna array. However, this problem is non-convex and intractable to solve optimally due to the coupling of the spatial and frequency domains and that of the antenna weights, positions and rotation. To tackle this problem, we first derive an optimal solution to it in a special case with azimuth or elevation angle coverage only. It is shown that rotating a uniform linear array (ULA) is sufficient to achieve global optimality and eliminate beam-squint effects. While for other general cases, an alternating optimization (AO) algorithm is proposed to obtain a high-quality suboptimal solution, where the antennas' beamforming weights, positions, and rotation angles are alternately optimized by combining successive convex approximation (SCA), sequential update with Gibbs sampling (GS), and hybrid coarse- and fine-grained search. Simulation results demonstrate that our proposed scheme can significantly outperform conventional antenna arrays without antenna movement or rotation, thus offering a cost-effective solution for wideband transmission over THz bands.
Semantic communication has emerged as a promising paradigm for improving transmission efficiency and task-level reliability, yet most existing reliability-enhancement approaches rely on retransmission strategies driven by semantic fidelity checking that require additional check codewords solely for retransmission triggering, thereby incurring substantial communication overhead. In this paper, we propose S3CHARQ, a Joint Source-Channel-Check Coding framework with hybrid automatic repeat request that fundamentally rethinks the role of check codewords in semantic communications. By integrating the check codeword into the JSCC process, S3CHARQ enables JS3C, allowing the check codeword to simultaneously support semantic fidelity verification and reconstruction enhancement. At the transmitter, a semantic fidelity-aware check encoder embeds auxiliary reconstruction information into the check codeword. At the receiver, the JSCC and check codewords are jointly decoded by a JS3C decoder, while the check codeword is additionally exploited for perceptual quality estimation. Moreover, because retransmission decisions are necessarily based on imperfect semantic quality estimation in the absence of ground-truth reconstruction, estimation errors are unavoidable and fundamentally limit the effectiveness of rule-based decision schemes. To overcome this limitation, we develop a reinforcement learning-based retransmission decision module that enables adaptive, sample-level retransmission decisions, effectively balancing recovery and refinement information under dynamic channel conditions. Experimental results demonstrate that compared with existing HARQ-based semantic communication systems, the proposed S3CHARQ framework achieves a 2.36 dB improvement in the 97th percentile PSNR, as well as a 37.45% reduction in outage probability.
Collaboration is a central theme in multi-robot systems as tasks and demands increasingly require capabilities that go beyond what any one individual robot possesses. Yet, despite extensive work on cooperative control and coordinated behaviors, the terminology surrounding collective multi-robot interaction remains inconsistent across research communities. In particular, cooperation, coordination, and collaboration are often treated interchangeably, without clearly articulating the differences among them. To address this gap, we propose definitions that distinguish and relate cooperation, coordination, and collaboration in multi-robot systems, highlighting the support of new capabilities in collaborative behaviors, and illustrate these concepts through representative examples. Building on this taxonomy, different frameworks for collaboration are reviewed, and technical challenges and promising future research directions are identified for collaborative multi-robot systems.
This paper addresses the modeling gap between complex dispersive-medium characterization and clutter statistical analysis in single-snapshot frequency diverse array multiple-input multiple-output ground-penetrating radar (FDA-MIMO-GPR). Existing FDA-MIMO clutter studies have rarely incorporated subsurface dispersion, dissipation, and random inhomogeneity in an explicit statistical framework. To bridge this gap, a continuous relaxation spectrum is adopted to describe complex media, and a statistical propagation chain is established from random relaxation-spectrum perturbations to complex permittivity, complex wavenumber, steering-vector perturbation, medium-induced additional clutter covariance, and total clutter covariance. On this basis, the effects of medium randomness on covariance spectral spreading, effective rank, effective clutter-subspace dimension, and target-clutter separability are further characterized. Numerical results show close agreement between the derived theory and Monte Carlo sample statistics across multiple stages of the propagation chain. The results further indicate that medium uncertainty not only changes clutter-covariance entries, but also reshapes its eigenspectrum and effective subspace, thereby influencing the geometric separation between target and clutter. The study provides an explicit and interpretable theoretical interface for embedding complex-medium uncertainty into FDA-MIMO-GPR clutter statistical analysis.
Highly directional mmWave/THz links require rapid beam alignment, yet exhaustive codebook sweeps incur prohibitive training overhead. This letter proposes a sensing-assisted adaptive probing policy that maps multimodal sensing (radar/LiDAR/camera) to a calibrated prior over beams, predicts per-beam reward with a deep Q-ensemble whose disagreement serves as a practical epistemic-uncertainty proxy, and schedules a small probe set using a Prior-Q upper-confidence score. The probing budget is adapted from prior entropy, explicitly coupling sensing confidence to communication overhead, while a margin-based safety rule prevents low signal-to-noise ratio (SNR) locks. Experiments on DeepSense-6G (train: scenarios 42 and 44; test:43) with a 21-beam discrete Fourier transform (DFT) codebook achieve Top-1/Top-3 of 0.81/0.99 with expected beam probe of 2 per sweep and zero observed outages at {\theta} = 0 dB with margin {\Delta} = 3 dB. The results show that multimodal priors with ensemble uncertainty match link quality and improve reliability compared to ablations while cutting overhead with better predictive model.
Point cloud compression often introduces noticeable reconstruction artifacts, which makes quality enhancement necessary. Existing approaches typically assume prior knowledge of the distortion level and train multiple models with identical architectures, each designed for a specific distortion setting. This significantly limits their practical applicability in scenarios where the distortion level is unknown and computational resources are limited. To overcome these limitations, we propose the first blind quality enhancement (BQE) model for compressed dynamic point clouds. BQE enhances compressed point clouds under unknown distortion levels by exploiting temporal dependencies and jointly modeling feature similarity and differences across multiple distortion levels. It consists of a joint progressive feature extraction branch and an adaptive feature fusion branch. In the joint progressive feature extraction branch, consecutive reconstructed frames are first fed into a recoloring-based motion compensation module to generate temporally aligned virtual reference frames. These frames are then fused by a temporal correlation-guided cross-attention module and processed by a progressive feature extraction module to obtain hierarchical features at different distortion levels. In the adaptive feature fusion branch, the current reconstructed frame is input to a quality estimation module to predict a weighting distribution that guides the adaptive weighted fusion of these hierarchical features. When applied to the latest geometry-based point cloud compression (G-PCC) reference software, i.e., test model category13 version 28, BQE achieved average PSNR improvements of 0.535 dB, 0.403 dB, and 0.453 dB, with BD-rates of -17.4%, -20.5%, and -20.1% for the Luma, Cb, and Cr components, respectively.
General audio understanding is a fundamental goal for large audio-language models, with audio captioning serving as a cornerstone task for their development. However, progress in this domain is hindered by existing datasets, which lack the scale and descriptive granularity required to train truly versatile models. To address this gap, we introduce ACAVCaps, a new large-scale, fine-grained, and multi-faceted audio captioning dataset. Derived from the ACAV100M collection, ACAVCaps is constructed using a multi-expert pipeline that analyzes audio from diverse perspectives-including speech, music, and acoustic properties-which are then synthesized into rich, detailed descriptions by a large language model. Experimental results demonstrate that models pre-trained on ACAVCaps exhibit substantially stronger generalization capabilities on various downstream tasks compared to those trained on other leading captioning datasets. The dataset is available at this https URL.
This work presents a fully discrete, low cost digital isolator requiring no specialized ICs and implemented entirely with general purpose transistors and a two layer PCB embedded air core transformer. The design avoids vendor lock in and long term component obsolescence risks, while providing >1 kV isolation, ~200 ns propagation delay, and validated NRZ data rates of 1 Mbps. A modified dual oscillator architecture enables inherent hardware lockout suitable for half bridge gate driver applications. Measured performance and PCB layout guidelines are provided.
The sub-THz spectrum offers numerous advantages, including massive multiple-input multiple-output (MIMO) technology with large antenna arrays that enhance spectral efficiency (SE) of future systems. Hybrid precoding (HP) thus emerges as a cost-effective alternative to fully digital precoding regarding complexity and energy consumption. However, sub-THz frequencies introduce hardware challenges, particularly phase noise (PN) from local oscillators (LOs). We analyze PN impact on MIMO systems using HP, leveraging singular value decomposition and common LO architecture. We adopt the Gaussian PN (GPN) model, recognized as accurate for describing PN behavior in sub-THz transmissions. We derive a lower bound on achievable SE and provide closed-form bit error rate expressions for quadrature amplitude modulation (QAM), specifically 4-QAM and 16-QAM, under high-SNR and strong GPN conditions. These analytical results are validated through Monte Carlo simulations. We show that GPN can be effectively counteracted with a single pilot symbol in single-user MIMO systems, unlike single-input single-output systems where mitigation proves infeasible. Simulation results compare conventional QAM against polar-QAM tailored for GPN-impaired systems. Finally, we introduce perspectives for further improvements in performance and energy efficiency.
Individual head-related transfer functions (HRTFs) are essential for accurate spatial audio binaural rendering but remain difficult to obtain due to measurement complexity. This study investigates whether photogrammetry-reconstructed (PR) head and ear meshes, acquired with consumer hardware, can provide a practically useful baseline for individual HRTF synthesis. Using the SONICOM HRTF dataset, 72-image photogrammetry captures per subject were processed with Apple's Object Capture API to generate PR meshes for 150 subjects. Mesh2HRTF was used to compute PR synthetic HRTFs, which were compared against measured HRTFs, high-resolution 3D scan-derived HRTFs, KEMAR, and random HRTFs through numerical evaluation, auditory models, and a behavioural sound localisation experiment (N = 27). PR synthetic HRTFs preserved ITD cues but exhibited increased ILD and spectral errors. Auditory-model predictions and behavioural data showed substantially higher quadrant error rates, reduced elevation accuracy, and greater front-back confusions than measured HRTFs, performing worse than random HRTFs on perceptual metrics. Current photogrammetry pipelines support individual HRTF synthesis but are limited by insufficient pinna morphology details and high-frequency spectral fidelity needed for accurate individual HRTFs containing monaural cues.
Multi-modal Satellite Image Time Series (SITS) analysis faces significant computational challenges for live land monitoring applications. While Transformer architectures excel at capturing temporal dependencies and fusing multi-modal data, their quadratic computational complexity and the need to reprocess entire sequences for each new acquisition limit their deployment for regular, large-area monitoring. This paper studies various dual-form attention mechanisms for efficient multi-modal SITS analysis, that enable parallel training while supporting recurrent inference for incremental processing. We compare linear attention and retention mechanisms within a multi-modal spectro-temporal encoder. To address SITS-specific challenges of temporal irregularity and unalignment, we develop temporal adaptations of dual-form mechanisms that compute token distances based on actual acquisition dates rather than sequence indices. Our approach is evaluated on two tasks using Sentinel-1 and Sentinel-2 data: multi-modal SITS forecasting as a proxy task, and real-world solar panel construction monitoring. Experimental results demonstrate that dual-form mechanisms achieve performance comparable to standard Transformers while enabling efficient recurrent inference. The multimodal framework consistently outperforms mono-modal approaches across both tasks, demonstrating the effectiveness of dual mechanisms for sensor fusion. The results presented in this work open new opportunities for operational land monitoring systems requiring regular updates over large geographic areas.
Open-source text-to-speech (TTS) frameworks have emerged as highly adaptable platforms for developing speech synthesis systems across a wide range of languages. However, their applicability is not uniform -- particularly when the target language is under-resourced or when computational resources are constrained. In this study, we systematically assess the feasibility of building novel TTS models using four widely adopted open-source architectures: FastPitch, VITS, Grad-TTS, and Matcha-TTS. Our evaluation spans multiple dimensions, including qualitative aspects such as ease of installation, dataset preparation, and hardware requirements, as well as quantitative assessments of synthesis quality for Romanian. We employ both objective metrics and subjective listening tests to evaluate intelligibility, speaker similarity, and naturalness of the generated speech. The results reveal significant challenges in tool chain setup, data preprocessing, and computational efficiency, which can hinder adoption in low-resource contexts. By grounding the analysis in reproducible protocols and accessible evaluation criteria, this work aims to inform best practices and promote more inclusive, language-diverse TTS development. All information needed to reproduce this study (i.e. code and data) are available in our git repository: this https URL
Capturing dynamic spatiotemporal neural activity is essential for understanding large-scale brain mechanisms. Functional magnetic resonance imaging (fMRI) provides high-resolution cortical representations that form a strong basis for characterizing fine-grained brain activity patterns. The high acquisition cost of fMRI limits large-scale applications, therefore making high-quality fMRI reconstruction a crucial task. Electroencephalography (EEG) offers millisecond-level temporal cues that complement fMRI. Leveraging this complementarity, we present an EEG-conditioned framework for reconstructing dynamic fMRI as continuous neural sequences with high spatial fidelity and strong temporal coherence at the cortical-vertex level. To address sampling irregularities common in real fMRI acquisitions, we incorporate a null-space intermediate-frame reconstruction, enabling measurement-consistent completion of arbitrary intermediate frames and improving sequence continuity and practical applicability. Experiments on the CineBrain dataset demonstrate superior voxel-wise reconstruction quality and robust temporal consistency across whole-brain and functionally specific regions. The reconstructed fMRI also preserves essential functional information, supporting downstream visual decoding tasks. This work provides a new pathway for estimating high-resolution fMRI dynamics from EEG and advances multimodal neuroimaging toward more dynamic brain activity modeling.
Safe navigation in complex environments remains a central challenge for reinforcement learning (RL) in robotics. This paper introduces Continuous Space-Time Empowerment for Physics-informed (C-STEP) safe RL, a novel measure of agent-centric safety tailored to deterministic, continuous domains. This measure can be used to design physics-informed intrinsic rewards by augmenting positive navigation reward functions. The reward incorporates the agents internal states (e.g., initial velocity) and forward dynamics to differentiate safe from risky behavior. By integrating C-STEP with navigation rewards, we obtain an intrinsic reward function that jointly optimizes task completion and collision avoidance. Numerical results demonstrate fewer collisions, reduced proximity to obstacles, and only marginal increases in travel time. Overall, C-STEP offers an interpretable, physics-informed approach to reward shaping in RL, contributing to safety for agentic mobile robotic systems.
Low-altitude wireless platforms increasingly require lightweight, conformal, and densely sampled antenna array apertures with high array gain and spatial selectivity. However, when deployed on nonplanar surfaces, curvature alters the array manifold, local visibility, and propagation support, potentially invalidating spatial-stationarity assumptions. In this paper, we investigate a holographic curvature-reconfigurable aperture (HoloCuRA), modeled as a curvature-controllable holographic surface, and develop a visibility-aware spatial characterization framework for its low-altitude applications. Specifically, the framework jointly quantifies array-domain spatial non-stationarity (SnS), and spatial degrees of freedom (DoF) in line-of-sight, 3GPP non-line-of-sight, and isotropic-scattering propagation environments. For SnS, a novel Power-balanced, Visibility-aware Correlation-Matrix Distance (PoVi-CMD) and a two-stage subarray-screening procedure are introduced. For DoF, the Rényi-2 effective rank is adopted, and tractable spatial-correlation expressions under isotropic scattering are developed for efficient DoF analysis. Furthermore, a realizable antenna port mode is introduced to connect SnS with DoF. Numerical results reveal that curvature and propagation support are the primary determinants of both SnS and DoF in HoloCuRA: array domain SnS determines whether subarray statistics can be treated as locally consistent, whereas DoF limits the global spatial modes. The findings provide useful guidance for low-altitude antenna-system design.
The proliferation of civilian and commercial unmanned aerial vehicles (UAVs) has heightened the demand for reliable radio frequency (RF)-based drone identification systems that can operate under dynamic and uncertain airspace conditions. Most existing RF-based recognition methods adopt a closed-set assumption, where all UAV types are known during training. Such an assumption becomes unrealistic in practical deployments, as new or unknown UAVs frequently emerge, leading to overconfident misclassifications and inefficient retraining cycles. To address these challenges, this paper proposes a unified incremental open-set learning framework for RF-based UAV recognition that enables both novel class discovery and incremental adaptation. The framework first performs open-set recognition to separate unknown signals from known classes in the semantic feature space, followed by an unsupervised clustering module that discovers new UAV categories by selecting between K-Means and Gaussian Mixture Models (GMM) based on composite validity scores. Subsequently, a lightweight incremental learning module integrates the newly discovered classes through a memory-bounded replay mechanism that mitigates catastrophic forgetting. Experiments on a real-world UAV RF dataset comprising 24 classes (18 known and 6 unknown) show effective open-set detection, promising clustering performance under the evaluated noise settings, and stable incremental adaptation with minimal storage cost, supporting the potential of the proposed framework for open-world UAV recognition.
A unified structural framework is presented for model-based fault diagnosis that explicitly incorporates both fault locations and constraints imposed by the residual generation methodology. Building on the concepts of proper and minimal structurally overdetermined (PSO/MSO) sets and Test Equation Supports (TES/MTES), the framework introduces testable PSO sets, Residual Generation (RG) sets, irreducible fault signatures (IFS), and Irreducible RG (IRG) sets to characterize which submodels are suitable for residual generation under given computational restrictions. An operator $M^*$ is defined to extract, from any model, the largest testable PSO subset consistent with a specified residual generation method. Using this operator, an algorithm is developed to compute all RG sets, and it is shown that irreducible fault signature sets form the join-irreducible elements of a join-semilattice of sets and fully capture the multiple-fault isolability properties in the method-constrained setting. The approach is exemplified on a semi-explicit linear DAE model, where low structural differential index can be used to define $M^*$. The results demonstrate that the proposed framework generalizes MTES-based analysis to residual generation scenarios with explicit computational limitations.
The International Telecommunication Union (ITU) identifies "Artificial Intelligence (AI) and Communication" as one of six key usage scenarios for 6G. Agentic AI, characterized by its ca-pabilities in multi-modal environmental sensing, complex task coordination, and continuous self-optimization, is anticipated to drive the evolution toward agent-based communication net-works. Semantic communication (SemCom), in turn, has emerged as a transformative paradigm that offers task-oriented efficiency, enhanced reliability in complex environments, and dynamic adaptation in resource allocation. However, comprehensive reviews that trace their technologi-cal evolution in the contexts of agent communications remain scarce. Addressing this gap, this paper systematically explores the role of semantics in agent communication networks. We first propose a novel architecture for semantic-based agent communication networks, structured into three layers, four entities, and four stages. Three wireless agent network layers define the logical structure and organization of entity interactions: the intention extraction and understanding layer, the semantic encoding and processing layer, and the distributed autonomy and collabora-tion layer. Across these layers, four AI agent entities, namely embodied agents, communication agents, network agents, and application agents, coexist and perform distinct tasks. Furthermore, four operational stages of semantic-enhanced agentic AI systems, namely perception, memory, reasoning, and action, form a cognitive cycle guiding agent behavior. Based on the proposed architecture, we provide a comprehensive review of the state-of-the-art on how semantics en-hance agent communication networks. Finally, we identify key challenges and present potential solutions to offer directional guidance for future research in this emerging field.
Interleaved ADCs are critical for applications requiring multi-gigasample per second (GS/s) rates, but their performance is often limited by offset, gain, and timing skew mismatches across the sub-ADCs. We propose exact but compact expressions that describe the impact of each of those non-idealities on the output spectrum. We derive the distribution of the power of the induced spurs and replicas, critical for yield-oriented derivation of sub-ADC specifications. Finally, we provide a practical example in which calibration step sizes are derived under the constraint of a target production yield.
Multi-channel speech enhancement aims to recover clean speech from noisy multi-channel recordings. Most deep learning methods employ discriminative training, which can lead to non-linear distortions from regression-based objectives, especially under challenging environmental noise conditions. Inspired by ArrayDPS for unsupervised multi-channel source separation, we introduce ArrayDPS-Refine, a method designed to enhance the outputs of discriminative models using a clean speech diffusion prior. ArrayDPS-Refine is training-free, generative, and array-agnostic. It first estimates the noise spatial covariance matrix (SCM) from the enhanced speech produced by a discriminative model, then uses this estimated noise SCM for diffusion posterior sampling. This approach allows direct refinement of any discriminative model's output without retraining. Our results show that ArrayDPS-Refine consistently improves the performance of various discriminative models, including state-of-the-art waveform and STFT domain models. Audio demos are provided at this https URL.
The rapid deployment of distributed energy resources (DERs) is one of the essential efforts to mitigate global climate change. However, a vast number of small-scale DERs are difficult to manage individually, motivating the introduction of virtual power plants (VPPs). A VPP operator coordinates a group of DERs by setting suitable prices, and aggregates them for interaction with the power grid. In this context, optimal pricing plays a critical role in VPP operation. This paper proposes a robust optimal operation model for VPPs that considers uncertainty in the price elasticity of demand. Specifically, the demand elasticity is found to be influenced by the pricing decision, giving rise to decision-dependent uncertainty (DDU). An improved column-and-constraint (C&CG) algorithm, together with tailored transformation and reformulation techniques, is developed to solve the robust model with DDU efficiently. Case studies based on actual electricity consumption data of London households demonstrate the effectiveness of the proposed model and algorithm.
The deployment of extremely large-scale antenna array (ELAA) in sixth-generation (6G) communication systems introduces unique challenges for efficient near-field channel estimation. To tackle these issues, this paper presents a theory-guided approach that incorporates angular information into an attention-based estimation framework. A piecewise Fourier representation is proposed to implicitly encode the near-field channel's inherent nonlinearity, enabling the entire channel to be segmented into multiple subchannels, each mapped to the angular domain via the discrete Fourier transform (DFT). Then, we develop a joint subchannel-spatial-attention network (JSSAnet) to extract the spatial features of both intra- and inter-subchannels. To guide theoretically the design of the joint attention mechanism, we derive upper and lower bounds based on approximation criteria and DFT quantization loss mitigation, respectively. Following by both bounds, a JSSA layer of an attention block is constructed to assign independent and adaptive spatial attention weights to each subchannel in parallel. Subsequently, a feed-forward network (FFN) of an attention block further captures and refines the residual nonlinear dependencies across subchannels. Moreover, the proposed JSSA map is linearly computed via element-wise product combining large-kernel convolutions (DLKC), maintaining strong contextual learning capability. Numerical results verify the effectiveness of embedding sparsity information into the attention network and demonstrate JSSAnet achieves superior estimation performance compared with existing methods.
Communication-aware control is essential to reduce costs and complexity in large-scale networks. This work proposes a method to design dissipativity-augmented output feedback controllers with reduced online communication. The contributions of this work are three fold: a generalized well-posedness condition for the controller network, a convex relaxation for the constraints that infer stability of a network from dissapativity of its agents, and a synthesis algorithm integrating the Network Dissipativity Theorm, alternating direction method of multipliers, and iterative convex overbounding. The proposed approach yields a sparsely interconnected controller that is both robust and applicable to networks with heterogeneous nonlinear agents. The efficiency of these methods is demonstrated on heterogeneous networks with uncertain and unstable agents, and is compared to standard $\cH_\infty$ control.
This work presents a modular, Python-based simulator that simplifies the evaluation of novel vehicle control and coordination algorithms in complex traffic scenarios while keeping the implementation overhead low. It allows researchers to focus primarily on developing the control and coordination strategies themselves, while the simulator manages the setup of complex road networks, vehicle configuration, execution of the simulation and the generation of video visualizations of the results. It is thereby also well-suited to support control education by allowing instructors to create interactive exercises providing students with direct visual feedback. Thanks to its modular architecture, the simulator remains easily customizable and extensible, lowering the barrier for conducting advanced simulation studies in vehicle and traffic control research.
Time delays in feedback control loops can cause controllers to respond too late, and with excessively large corrective actions, leading to unsafe behavior (violation of state constraints) and controller infeasibility (violation of input constraints). To address this problem, we develop a safety-critical control framework for nonlinear systems with input delay using dynamically defined (integral) controllers. Building on the concept of Integral Control Barrier Functions (ICBFs), we concurrently address two fundamental challenges: compensating the effect of delays, while ensuring feasibility when state and input constraints are imposed jointly. To this end, we embed predictor feedback into a dynamically defined control law to compensate for delays, with the predicted state evolving according to delay-free dynamics. Then, utilizing ICBFs, we formulate a quadratic program for safe control design. For systems subject to simultaneous state and input constraints, we derive a closed-form feasibility condition for the resulting controller, yielding a compatible ICBF pair that guarantees forward invariance under delay. We also address robustness to prediction errors (e.g., caused by delay uncertainty) using tunable robust ICBFs. Our approach is validated on an adaptive cruise control example with actuation delay.
Regenerating singing voices with altered lyrics while preserving melody consistency remains challenging, as existing methods either offer limited controllability or require laborious manual alignment. We propose YingMusic-Singer, a fully diffusion-based model enabling melody-controllable singing voice synthesis with flexible lyric manipulation. The model takes three inputs: an optional timbre reference, a melody-providing singing clip, and modified lyrics, without manual alignment. Trained with curriculum learning and Group Relative Policy Optimization, YingMusic-Singer achieves stronger melody preservation and lyric adherence than Vevo2, the most comparable baseline supporting melody control without manual alignment. We also introduce LyricEditBench, the first benchmark for melody-preserving lyric modification evaluation. The code, weights, benchmark, and demos are publicly available at this https URL.
Peak-to-average power ratio (PAPR) remains a major limitation of multicarrier modulation schemes such as orthogonal frequency-division multiplexing (OFDM), reducing power amplifier efficiency and limiting practical transmit power. In this work, we propose DeepOFW, a deep learning-driven OFDM-flexible waveform modulation framework that enables data-driven waveform design while preserving the low-complexity hardware structure of conventional transceivers. The proposed architecture is fully differentiable, allowing end-to-end optimization of waveform generation and receiver processing under practical physical constraints. Unlike neural transceiver approaches that require deep learning inference at both ends of the link, DeepOFW confines the learning stage to an offline or centralized unit, enabling deployment on standard transmitter and receiver hardware without additional computational overhead. The framework jointly optimizes waveform representations and detection parameters while explicitly incorporating PAPR constraints during training. Extensive simulations over 3GPP multipath channels demonstrate that the learned waveforms significantly reduce PAPR compared with classical OFDM while simultaneously improving bit error rate (BER) performance relative to state-of-the-art transmission schemes. These results highlight the potential of data-driven waveform design to enhance multicarrier communication systems while maintaining hardware-efficient implementations. An open-source implementation of the proposed framework is released to facilitate reproducible research and practical adoption.
We introduce Echoes, a new dataset for music deepfake detection designed for training and benchmarking detectors under realistic and provider-diverse conditions. Echoes comprises 3,577 tracks (110 hours of audio) spanning multiple genres (pop, rock, electronic), and includes content generated by ten popular AI music generation systems. To prevent shortcut learning and promote robust generalization, the dataset is deliberately constructed to be challenging, enforcing semantic-level alignment between spoofed audio and bona fide references. This alignment is achieved by conditioning generated audio samples directly on bona-fide waveforms or song descriptors. We evaluate Echoes in a cross-dataset setting against three existing AI-generated music datasets using state-of-the-art Wav2Vec2 XLS-R 2B representations. Results show that (i) Echoes is the hardest in-domain dataset; (ii) detectors trained on existing datasets transfer poorly to Echoes; (iii) training on Echoes yields the strongest generalization performance. These findings suggest that provider diversity and semantic alignment help learn more transferable detection cues.
During human motor skill training and physical rehabilitation, there is an inherent trade-off between task difficulty and user performance. Characterizing this trade-off is crucial for evaluating user performance, designing assist-as-needed (AAN) protocols, and assessing the efficacy of training protocols. In this study, we propose a novel human-in-the-loop (HiL) Pareto optimization approach to characterize the trade-off between task performance and the perceived challenge level of motor learning or rehabilitation tasks. We adapt Bayesian multi-criteria optimization to systematically and efficiently perform HiL Pareto characterizations. Our HiL optimization employs a hybrid model that measures performance with a quantitative metric, while the perceived challenge level is captured with a qualitative metric. We demonstrate the feasibility of the proposed HiL Pareto characterization through a user study. Furthermore, we present the utility of the framework through three use cases in the context of a manual skill training task with haptic feedback. First, we demonstrate how the characterized trade-off can be used to design a sample AAN training protocol for a motor learning task and to evaluate the group-level efficacy of the proposed AAN protocol relative to a baseline adaptive assistance protocol. Second, we demonstrate that individual-level comparisons of the trade-offs characterized before and after the training session enable fair evaluation of training progress under different assistance levels. This evaluation method is more general than standard performance evaluations, as it can provide insights even when users cannot perform the task without assistance. Third, we show that the characterized trade-offs also enable fair performance comparisons among different users, as they capture the best possible performance of each user under all feasible assistance levels.
Deep neural networks (DNNs), particularly those using Rectified Linear Unit (ReLU) activation functions, have achieved remarkable success across diverse machine learning tasks, including image recognition, audio processing, and language modeling. Despite this success, the non-convex nature of DNN loss functions complicates optimization and limits theoretical understanding. In this paper, we highlight how recently developed convex equivalences of ReLU NNs and their connections to sparse signal processing models can address the challenges of training and understanding NNs. Recent research has uncovered several hidden convexities in the loss landscapes of certain NN architectures, notably two-layer ReLU networks and other deeper or varied architectures. This paper seeks to provide an accessible and educational overview that bridges recent advances in the mathematics of deep learning with traditional signal processing, encouraging broader signal processing applications.
This paper presents an open-source Software-in-the-Loop (SIL) simulation platform designed for autonomous Ackerman vehicle research and education. The proposed framework focuses on simplicity, while making it easy to work with small-scale experimental setups, such as the XTENTH-CAR platform. The system was designed using open source tools, creating an environment with a monocular camera vision system to capture stimuli from it with minimal computational overhead through a sliding window based lane detection method. The platform supports a flexible algorithm testing and validation environment, allowing researchers to implement and compare various control strategies within an easy-to-use virtual environment. To validate the working of the platform, Model Predictive Control (MPC) and Proportional-Integral-Derivative (PID) algorithms were implemented within the SIL framework. The results confirm that the platform provides a reliable environment for algorithm verification, making it an ideal tool for future multi-agent system research, educational purposes, and low-cost AGV development. Our code is available at this https URL.
Recent advances in real-time interactive text-driven motion generation have enabled humanoids to perform diverse behaviors. However, kinematics-only generators often exhibit physical hallucinations, producing motion trajectories that are physically infeasible to track with a downstream motion tracking controller or unsafe for real-world deployment. These failures often arise from the lack of explicit physics-aware objectives for real-robot execution and become more severe under out-of-distribution (OOD) user inputs. Hence, we propose SafeFlow, a text-driven humanoid whole-body control framework that combines physics-guided motion generation with a 3-Stage Safety Gate driven by explicit risk indicators. SafeFlow adopts a two-level architecture. At the high level, we generate motion trajectories using Physics-Guided Rectified Flow Matching in a VAE latent space to improve real-robot executability, and further accelerate sampling via Reflow to reduce the number of function evaluations (NFE) for real-time control. The 3-Stage Safety Gate enables selective execution by detecting semantic OOD prompts using a Mahalanobis score in text-embedding space, filtering unstable generations via a directional sensitivity discrepancy metric, and enforcing final hard kinematic constraints such as joint and velocity limits before passing the generated trajectory to a low-level motion tracking controller. Extensive experiments on the Unitree G1 demonstrate that SafeFlow outperforms prior diffusion-based methods in success rate, physical compliance, and inference speed, while maintaining diverse expressiveness.
Deploying six-dimensional movable antenna (6DMA) systems in Internet-of-Vehicles (IoV) scenarios can greatly enhance spectral efficiency. However, the high mobility of vehicles causes rapid spatio-temporal channel variations, posing a significant challenge to real-time 6DMA optimization. In this work, we pioneer the application of 6DMA in IoV and propose a low-complexity, instantaneous channel state information (CSI)-free dynamic configuration method. By integrating vehicle motion prediction with offline directional response priors, the proposed approach optimizes antenna positions and orientations at each reconfiguration epoch to maximize the average sum rate over a future time window. Simulation results in a typical urban intersection scenario demonstrate that the proposed 6DMA scheme significantly outperforms conventional fixed antenna arrays and simplified 6DMA baseline schemes in terms of total sum rate.
Sensor placement for leakage detection in water distribution networks is an important and practical challenge for water utilities. Recent work has shown that graph neural networks can estimate and predict pressures and detect leaks, but their performance strongly depends on the available sensor measurements and configurations. In this paper, we investigate how sensor placement influences the performance of GNN-based leakage detection. We propose a novel PageRank-Centrality-based sensor placement method and demonstrate that it substantially impacts reconstruction, prediction, and leakage detection on the EPANET Net1.
Near-field beamfocusing with extremely large aperture arrays can effectively enhance physical layer security. Nevertheless, even small estimation errors of the eavesdropper's location may cause a pronounced focal shift, resulting in a severe degradation of the secrecy rate. In this letter, we propose a physics-informed robust beamforming strategy that leverages the electromagnetic (EM) caustic effect for near-field physical layer security provisioning, which can be implemented via phase shifts only. Specifically, we partition the transmit array into caustic and focusing subarrays to simultaneously bypass the potential eavesdropping region and illuminate the legitimate user, thereby significantly improving the robustness against the localization error of eavesdroppers. Moreover, by leveraging the connection between the phase gradient and the EM wave departing angle, we derive the corresponding piece-wise closed-form array phase profile for the subarrays. Simulation results demonstrate that the proposed scheme achieves up to an 80% reduction of the worst-case eavesdropping rate for a localization error of 0.25 m, highlighting its superiority for providing robust and secure communication.
Most algorithms for hyperspectral image unmixing produce point estimates of fractional abundances of the materials to be separated. However, in the absence of reliable ground truth, the ability to perform abundance uncertainty quantification (UQ) should be an important feature of algorithms, e.g. to evaluate how hard the unmixing problem is and how much the results should be trusted. The usual modeling assumptions in Bayesian models for unmixing rely heavily on the Euclidean geometry of the simplex and typically disregard spatial information. In addition, to our knowledge, abundance UQ is close to nonexistent. In this paper, we propose to leverage Aitchinson geometry from the compositional data analysis literature to provide practitioners with alternative tools for modeling prior abundance distributions. In particular we show how to design simplex-valued Gaussian Process priors using this geometry. Then we link Aitchinson geometry to constrained sampling algorithms in the literature, and propose UQ diagnostics that comply with the constraints on abundance vectors. We illustrate these concepts on real and simulated data.
This paper presents an equivariant filter (EqF) transformation approach for visual--inertial navigation. By establishing analytical links between EqFs with different symmetries, the proposed approach enables systematic consistency design and efficient implementation. First, we formalize the mapping from the global system state to the local error-state and prove that it induces a nonsingular linear transformation between the error-states of any two EqFs. Second, we derive transformation laws for the associated linearized error-state systems and unobservable subspaces. These results yield a general consistency design principle: for any unobservable system, a consistent EqF with a state-independent unobservable subspace can be synthesized by transforming the local coordinate chart, thereby avoiding ad hoc symmetry analysis. Third, to mitigate the computational burden arising from the non-block-diagonal Jacobians required for consistency, we propose two efficient implementation strategies. These strategies exploit the Jacobians of a simpler EqF with block-diagonal structure to accelerate covariance operations while preserving consistency. Extensive Monte Carlo simulations and real-world experiments validate the proposed approach in terms of both accuracy and runtime.
Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.
Achieving natural full-duplex interaction in spoken dialogue systems (SDS) remains a challenge due to the difficulty of accurately detecting user interruptions. Current solutions are polarized between "trigger-happy" VAD-based methods that misinterpret backchannels and robust end-to-end models that exhibit unacceptable response delays. Moreover, the absence of real-world benchmarks and holistic metrics hinders progress in the field. This paper presents a comprehensive frame-work to overcome these limitations. We first introduce SID-Bench, the first benchmark for semantic-aware interruption detection built entirely from real-world human dialogues. To provide a rigorous assessment of the responsiveness-robustness trade-off, we propose the Average Penalty Time (APT) metric, which assigns a temporal cost to both false alarms and late responses. Building on this framework, we design an LLM-based detection model optimized through a novel training paradigm to capture subtle semantic cues of intent. Experimental results show that our model significantly outperforms mainstream baselines, achieving a nearly threefold reduction in APT. By successfully resolving the long-standing tension between speed and stability, our work establishes a new state-of-the-art for intelligent interruption handling in SDS. To facilitate future research, SID-Bench and the associated code are available at: this https URL.
We propose an alternating optimization framework for maximizing energy efficiency (EE) in reconfigurable intelligent surface (RIS) assisted distributed MIMO (D-MIMO) systems under both coherent and non-coherent reception modes. The framework jointly optimizes access point (AP) power allocation and RIS phase configurations to improve EE under per-AP power and signal-to-interference-plus-noise ratio (SINR) constraints. Using majorization-minimization for power allocation together with per-element RIS adaptation, the framework achieves tractable optimization of this non-convex problem. Simulation results for indoor deployments with realistic power-consumption models show that the proposed scheme outperforms equal-power and random-scatterer baselines, with clear EE gains. We evaluate the performance of both reception modes and quantify the impact of RIS phase-shift optimization, RIS controller architectures (centralized vs. per-RIS control), and RIS size, providing design insights for practical RIS-assisted D-MIMO deployments in future 6G networks.
Designing effective auxiliary rewards for cooperative multi-agent systems remains a precarious task; misaligned incentives risk inducing suboptimal coordination, especially where sparse task feedback fails to provide sufficient grounding. This study introduces an automated reward design framework that leverages large language models to synthesize executable reward programs from environment instrumentation. The procedure constrains candidate programs within a formal validity envelope and evaluates their efficacy by training policies from scratch under a fixed computational budget; selection depends exclusively on the sparse task return. The framework is evaluated across four distinct Overcooked-AI layouts characterized by varied corridor congestion, handoff dependencies, and structural asymmetries. Iterative search generations consistently yield superior task returns and delivery counts, with the most pronounced gains occurring in environments dominated by interaction bottlenecks. Diagnostic analysis of the synthesized shaping components indicates increased interdependence in action selection and improved signal alignment in coordination-intensive tasks. These results demonstrate that the search for objectivegrounded reward programs can mitigate the burden of manual engineering while producing shaping signals compatible with cooperative learning under finite budgets.
Leadership in social groups is often a dynamic characteristic that emerges from interactions and opinion exchange. Empirical evidence suggests that individuals with strong opinions tend to gain influence, at the same time maintaining alignment with the social context is crucial for sustained leadership. Motivated by the social psychology literature that supports these empirical observations, we propose a novel dynamical system in which opinions and leadership co-evolve within a social network. Our model extends the Friedkin-Johnsen framework by making susceptibility to peer influence time-dependent, turning it into the leadership variable. Leadership strengthens when an agent holds strong yet socially aligned opinions, and declines when such alignment is lost, capturing the trade-off between conviction and social acceptance. After illustrating the emergent behavior of this complex system, we formally analyze the coupled dynamics, establishing sufficient conditions for convergence to a non-trivial equilibrium, and examining two time-scale separation regimes reflecting scenarios where opinion and leadership evolve at different speeds.
Accurate forecasting of state-of-health (SOH) is essential for ensuring safe and reliable operation of lithium-ion cells. However, existing models calibrated on laboratory tests at specific conditions often fail to generalize to new cells that differ due to small manufacturing variations or operate under different conditions. To address this challenge, an uncertainty-aware transfer learning framework is proposed, combining a Long Short-Term Memory (LSTM) model with domain adaptation via Maximum Mean Discrepancy (MMD) and uncertainty quantification through Conformal Prediction (CP). The LSTM model is trained on a virtual battery dataset designed to capture real-world variability in electrode manufacturing and operating conditions. MMD aligns latent feature distributions between simulated and target domains to mitigate domain shift, while CP provides calibrated, distribution-free prediction intervals. This framework improves both the generalization and trustworthiness of SOH forecasts across heterogeneous cells.
Model Predictive Path Integral (MPPI) control is a popular sampling-based method for trajectory optimization in nonlinear and nonconvex settings, yet its optimization structure remains only partially understood. We develop a variational, optimization-theoretic interpretation of MPPI by lifting constrained trajectory optimization to a KL-regularized problem over distributions and reducing it to a negative log-partition (free-energy) objective over a tractable sampling family. For a general parametric family, this yields a preconditioned gradient method on the distribution parameters and a natural multi-step extension of MPPI. For the fixed-covariance Gaussian family, we show that classical MPPI is recovered exactly as a preconditioned gradient descent step with unit step size. This interpretation enables a direct convergence analysis: under bounded feasible sets, we derive an explicit upper bound on the smoothness constant and a simple sufficient condition guaranteeing descent of exact MPPI. Numerical experiments support the theory and illustrate the effect of key hyperparameters on performance.
The practical deployment of nonlinear model predictive control (NMPC) is often limited by online computation: solving a nonlinear program at high control rates can be expensive on embedded hardware, especially when models are complex or horizons are long. Learning-based NMPC approximations shift this computation offline but typically demand large expert datasets and costly training. We propose Sequential-AMPC, a sequential neural policy that generates MPC candidate control sequences by sharing parameters across the prediction horizon. For deployment, we wrap the policy in a safety-augmented online evaluation and fallback mechanism, yielding Safe Sequential-AMPC. Compared to a naive feedforward policy baseline across several benchmarks, Sequential-AMPC requires substantially fewer expert MPC rollouts and yields candidate sequences with higher feasibility rates and improved closed-loop safety. On high-dimensional systems, it also exhibits better learning dynamics and performance in fewer epochs while maintaining stable validation improvement where the feedforward baseline can stagnate.
Psychophysical experiments remain the most reliable approach for perceptual image quality assessment (IQA), yet their cost and limited scalability encourage automated approaches. We investigate whether Vision Language Models (VLMs) can approximate human perceptual judgments across three image quality scales: contrast, colorfulness and overall preference. Six VLMs four proprietary and two openweight models are benchmarked against psychophysical data. This work presents a systematic benchmark of VLMs for perceptual IQA through comparison with human psychophysical data. The results reveal strong attribute dependent variability models with high human alignment for colorfulness (\rho up to 0.93) underperform on contrast and vice-versa. Attribute weighting analysis further shows that most VLMs assign higher weights to colorfulness compared to contrast when evaluating overall preference similar to the psychophysical data. Intramodel consistency analysis reveals a counterintuitive tradeoff: the most self consistent models are not necessarily the most human aligned suggesting response variability reflects sensitivity to scene dependent perceptual cues. Furthermore, human-VLM agreement is increased with perceptual separability, indicating VLMs are more reliable when stimulus differences are clearly expressed.
This paper discusses and summarizes some results on complex variables that are very useful in fractional-order systems analysis and design, specifically when the system is analyzed in the frequency domain. The author hopes that this document will serve as a handy reference when performing computations with complex variables, especially when working within the Laplace and Fourier domains. The reader can refer to Table I for the summary of these formulas.
Data augmentation (DA) has been widely leveraged in computer vision to alleviate data shortage, while its application in medical imaging faces multiple challenges. The prevalent DA approaches in medical image analysis encompass conventional DA, synthetic DA, and automatic DA. However, these approaches may result in experience-driven design and intensive computation costs. Here, we propose a suitable yet general automatic DA method for medical images termed MedAugment. We propose pixel and spatial augmentation spaces and exclude the operations that can break medical details and features. Besides, we propose a sampling strategy by sampling a limited number of operations from the two spaces. Moreover, we present a hyperparameter mapping relationship to produce a rational augmentation level and make the MedAugment fully controllable using a single hyperparameter. These configurations settle the differences between natural and medical images. Extensive experimental results on four classification and four segmentation datasets demonstrate the superiority of MedAugment. Compared with existing approaches, the proposed MedAugment prevents producing color distortions or structural alterations while involving negligible computational overhead. Our method can serve as a plugin without an extra training stage, offering significant benefits to the community and medical experts lacking a deep learning foundation. The code is available at this https URL.
Cardiac MRI allows for a comprehensive assessment of myocardial structure, function and tissue characteristics. Here we describe a foundational vision system for cardiac MRI, capable of representing the breadth of human cardiovascular disease and health. Our deep-learning model is trained via self-supervised contrastive learning, in which visual concepts in cine-sequence cardiac MRI scans are learned from the raw text of the accompanying radiology reports. We train and evaluate our model on data from four large academic clinical institutions in the United States. We additionally showcase the performance of our models on the UK BioBank and two additional publicly available external datasets. We explore emergent capabilities of our system and demonstrate remarkable performance across a range of tasks, including the problem of left-ventricular ejection fraction regression and the diagnosis of 39 different conditions such as cardiac amyloidosis and hypertrophic cardiomyopathy. We show that our deep-learning system is capable of not only contextualizing the staggering complexity of human cardiovascular disease but can be directed towards clinical problems of interest, yielding impressive, clinical-grade diagnostic accuracy with a fraction of the training data typically required for such tasks.
Determining conditions on the coupling strength for the synchronization in networks of interconnected oscillators is a challenging problem in nonlinear dynamics. While sophisticated mathematical methods have been used to derive conditions, these conditions are usually only sufficient and/ or based on numerical methods. We addressed the gap between the sufficient coupling strength and numerically observations using the Lyapunov-Floquet Theory and the Master Stability Function framework. We showed that a positive coupling strength is a necessary and sufficient condition for local synchronization in a network of identical oscillators coupled linearly and in full state fashion. For partial state coupling, we showed that a positive coupling constant results in an asymptotic contraction of the trajectories in the state space, which results in synchronisation for two-dimensional oscillators. We extended the results to networks with non-identical coupling over directed graphs and showed that positive coupling constants is a sufficient condition for synchronisation. These theoretical results are validated using numerical simulations and experimental implementations. Our results contribute to bridging the gap between the theoretically derived sufficient coupling strengths and the numerically observed ones.
The interaction between extreme weather events and interdependent critical infrastructure systems involves complex spatiotemporal dynamics. Multi-type emergency decisions within energy-transportation infrastructures significantly influence system performance throughout the extreme weather process. A comprehensive assessment of these factors faces challenges in model complexity, heterogeneous differences between energy and transportation systems, and cross-sector privacy. This paper proposes a risk assessment framework that integrates the heterogeneous energy and transportation systems in the form of a unified network flow model, which enables full accommodation of multiple types of energy-transportation emergency decisions while capturing the compound spatiotemporal impacts of extreme weather on both systems simultaneously. Based on this framework, a targeted method for identifying system vulnerabilities is further developed. This method employs neural network surrogates to achieve privacy protection and accelerated identification while maintaining consideration of system interdependencies. Numerical experiments demonstrate that the proposed framework and method can reveal the risk levels faced by urban infrastructure systems, identify vulnerabilities that should be prioritized for reinforcement, and strike a balance between accuracy and speed.
This paper introduces Smart Predict Then Control (SPC), a control aware refinement procedure for model based control. SPC refines a prediction oriented model by optimizing a surrogate objective that evaluates candidate models through the control actions they induce. For a fixed surrogate variant under unconstrained control, we establish the smoothness of the surrogate, projected gradient convergence at a sublinear rate of order one over K, and a bias decomposition that yields a conditional transfer diagnostic. On a wind disturbed quadrotor trajectory tracking task, Updated SPC reduces tracking RMSE by 70 percent and closed loop cost by 42 percent relative to the nominal baseline.
This paper investigates the design of output-feedback schemes for systems described by a class of recurrent neural networks. We propose a procedure based on linear matrix inequalities for designing an observer and a static state-feedback controller. The algorithm leverages global and regional incremental input-to-state stability (incremental ISS) and enables the tracking of constant setpoints, ensuring robustness to disturbances and state estimation uncertainty. To address the potential limitations of regional incremental ISS, we introduce an alternative scheme in which the static law is replaced with a tube-based nonlinear model predictive controller (NMPC) that exploits regional incremental ISS properties. We show that these conditions enable the formulation of a robust NMPC law with guarantees of convergence and recursive feasibility, leading to an enlarged region of attraction. Theoretical results are validated through numerical simulations on the pH-neutralisation process benchmark.
The synergy between integrated sensing and communication (ISAC) and reconfigurable intelligent surfaces (RISs) unlocks novel applications and advanced services for next-generation wireless networks, yet also introduces new security challenges. In this study, a novel dual target-mounted RISs-assisted ISAC scheme is proposed, where a base station with ISAC capability performs sensing of two unmanned aerial vehicle (UAV) targets, one of which is legitimate and the other is eavesdropper, while communicating with the users through an RIS mounted on the legitimate UAV target. The proposed scheme addresses dual security threats posed by a hostile UAV target: eavesdropping on legitimate user communications and random interference attacks launched by a malicious RIS mounted on this eavesdropper UAV target, aiming to disrupt secure transmissions. Moreover, malicious RIS interference is also optimized for a worst-case scenario, in which both the channel state information (CSI) and the transmit beamforming of the base station are assumed to be fully compromised by a malicious RIS-mounted eavesdropper UAV. A non-convex optimization problem maximizing the secrecy rate of the users is formulated, and a semi-definite relaxation (SDR)-based two-stage solution is developed to optimize the transmit beamforming matrix of the base station and the phase shift coefficients of the legitimate RIS. Extensive computer simulations are conducted to evaluate the robustness of the proposed solution under various system configurations. The proposed system's communication performance is assessed using the secrecy rate metric, while the sensing performance is evaluated through the signal-to-interference-plus-noise ratio and the Cramer-Rao bound (CRB) for angle-of-departure (AoD) estimation of the eavesdropper UAV target.
Semantic communication (SemCom), as a novel paradigm for future communication systems, has recently attracted much attention due to its superiority in communication efficiency. However, similar to traditional communication, it also suffers from eavesdropping threats. Intelligent eavesdroppers could launch advanced semantic analysis techniques to infer secret semantic information. Therefore, some researchers have designed Semantic Steganography Communication (SemSteCom) schemes to confuse semantic eavesdroppers. However, the state-of-the-art SemSteCom schemes for image transmission rely on the pre-selected cover image, which limits the generalization. To address this issue, we propose a Generative Diffusion Model-based Coverless Semantic Steganography Communication (SemSteDiff) scheme to hide secret images into generated stego images. The semantic related private and public keys enable legitimate receiver to decode secret images correctly while the eavesdropper without the completely correct key-pairs fail to obtain them. Simulation results demonstrate the effectiveness of the plug-and-play design in different Joint Source-Channel Coding (JSCC) frameworks. Results under different eavesdropping settings show that, when Signal-to-Noise Ratio (SNR) = 0 dB, the peak signal-to-noise ratio (PSNR) of the legitimate receiver is 4.14 dB higher than that of the eavesdropper.
This paper presents a time-optimal Model Predictive Control (MPC) scheme for linear discrete-time systems subject to multiplicative uncertainties represented by interval matrices. To render the uncertainty propagation computationally tractable, the set-valued error system dynamics are approximated using a matrix-zonotope-based bounding operator. Recursive feasibility and finite-time convergence are ensured through an adaptive terminal constraint mechanism. A key advantage of the proposed approach is that all the necessary bounding sets can be computed offline, substantially reducing the online computational burden. The effectiveness of the method is illustrated via a numerical case study on an orbital rendezvous maneuver between two satellites.
In inkjet printing, optimal paper moisture is crucial for print quality, achieved through hot-air impingement in the fixation unit. This paper presents a modular digital twin of the fixation unit, modeling the thermo-fluidic drying process and monitoring its spatio-temporal performance. The novel approach formulates the digital twin as an infinite-dimensional state estimator that infers fixation states from limited sensor data, while remaining robust to disturbances. Modularity is achieved through a graph-theoretic model, where each node represents thermo-fluidic dynamics in different sections of the fixation unit. Evaporation is modeled as a nonlinear boundary effect coupled with node dynamics via Linear Fractional Representation. Using the Partial Integral Equation (PIE) framework, we develop a unified approach for stability, input-output analysis, simulation, and rapid prototyping, validated with operational data from a commercial printer. An $\mathcal{H}_{\infty}$-optimal Luenberger state estimator is then synthesized to estimate thermal states from available sensor data, enabling real-time monitoring of spatio-temporal thermal effects on paper sheets.
Real time model based control of high dimensional nonlinear systems presents severe computational challenges. Conventional reduced order model control relies heavily on expert tuning or parameter adaptation and seldom offers mechanisms for online supervised reconstruction. We introduce AURORA, Autonomous Updating of ROM and Controller via Recursive Adaptation, a supervisory framework that automates ROM based controller design and augments it with diagnostic triggered structural adaptation. Five specialized agents collaborate through iterative generate judge revise cycles, while an Evaluation Agent classifies performance degradation into three operationally distinct categories, subspace inadequacy, parametric drift, and control inadequacy, and routes corrective action to the responsible agent. For linear ROMs, we analytically prove that this classification is correct under mild assumptions and that the supervisory switching cycle preserves exponential stability subject to a dwell time condition. For nonlinear systems, the absence of a universal Lyapunov construction for autonomously discovered ROM structures precludes analogous analytical guarantees, so we validate the same classification empirically. Experiments on eight benchmark systems with state dimensions up to 5177 compare AURORA against expert tuned baselines, gain scheduled control, and online RLS adaptive alternatives. Controlled fault injection experiments confirm 91 percent diagnostic routing accuracy. AURORA achieves 6 to 12 percent tracking improvement over expert baselines and 4 to 5 percent over classical adaptive alternatives.
Data-Enabled Predictive Control (DeePC) has emerged as a powerful framework for controlling unknown systems directly from input-output data. For nonlinear systems, recent work has proposed selecting relevant subsets of data columns based on geometric proximity to the current operating point. However, such proximity-based selection ignores the control objective: different reference trajectories may benefit from different data even at the same operating point. In this paper, we propose a datamodel-based approach that learns a context-dependent influence function mapping the current initial trajectory and reference trajectory to column importance scores. Adapting the linear datamodel framework from machine learning, we model closed-loop cost as a linear function of column inclusion indicators, with coefficients that depend on the control context. Training on closed-loop simulations, our method captures which data columns actually improve tracking performance for specific control tasks. Experimental results demonstrate that task-aware selection substantially outperforms geometry-based heuristics, particularly when using small data subsets.
We extend the Datamodels framework from supervised learning to Model Predictive Path Integral (MPPI) control. Whereas Datamodels estimate sample influence via regression on a fixed dataset, we instead learn to predict influence directly from sample cost features, enabling real-time estimation for newly generated samples without online regression. Our influence predictor is trained offline using influence coefficients computed via the Datamodel framework across diverse MPPI instances, and is then deployed online for efficient sample pruning and adaptive constraint handling. A single learned model simultaneously addresses efficiency and safety: low-influence samples are pruned to reduce computational cost, while monitoring the influence of constraint-violating samples enables adaptive penalty tuning. Experiments on path-tracking with obstacle avoidance demonstrate up to a $5\times$ reduction in the number of samples while maintaining control performance and improving constraint satisfaction.
Power system security assessments, e.g. via cascading outage models, often use operational set-points based on optimal power flow (OPF) dispatch. However, driven by cost minimization, OPF provides an ideal, albeit unrealistic, clearing of the generating units that disregards the complex interactions among market participants. In addition, existing market modeling tools often utilize economic dispatch and unit commitment to minimize total system costs, often disregarding the profit-driven behavior of market participants. The security of the system, therefore, may be overestimated. To address this gap, we introduce a social-welfare-based day-ahead market-clearing model. The security implications are analyzed using Cascades, a model for cascading failure analysis. We apply this model to the IEEE-118 bus system with three independent control zones. The results show that market dispatch leads to an increase in demand not served (DNS) of up to 80% higher than OPF, highlighting a significant security overestimation. This is especially pronounced in large-scale cascading events with DNS above 100MW. A key driver is the increased dispatch of storage and gas units, which can place the system in critical operating conditions. Operators can use this information to properly estimate the impact of the market on system security and plan efficient expansion strategies.
The temporal and spatial analysis of river dynamics is a key factor for studying and understanding human impacts on floodplains. To assess the changes taking place, it is necessary to have high-resolution images with a large spatial coverage and a high temporal revisit frequency over the long term. Satellite imagery meets several of these criteria. For instance, Sentinel data provide high-resolution images but only after 2015. Therefore, to study water surface evolution prior to this date, it is necessary to rely on other satellite images such as Landsat, which offers longer historical coverage, albeit with lower spatial resolution. In this study, we aim to increase the spatial resolution of Landsat data from 30 to 10 meters (resolution of Sentinel images). To achieve this goal, we develop an innovative single image super-resolution method based on a plug-and-play approach.
This paper develops a control-theoretic framework for analyzing agentic systems embedded within feedback control loops, where an AI agent may adapt controller parameters, select among control strategies, invoke external tools, reconfigure decision architectures, and modify control objectives during operation. These capabilities are formalized by interpreting agency as hierarchical runtime decision authority over elements of the control architecture, leading to an augmented closed-loop representation in which physical states, internal memory, tool outputs, interaction signals, and design variables evolve as a coupled dynamical system. A five-level hierarchy of agency is defined, ranging from fixed control laws to runtime synthesis of control architectures and objectives. The analysis shows that increasing agency introduces interacting dynamical mechanisms such as time-varying adaptation, endogenous switching, decision-induced delays, and structural reconfiguration. The framework is developed in both nonlinear and linear settings, providing explicit design constraints for AI-enabled control systems in safety-critical applications.
We present a formulation of an optimal control problem for a two-dimensional diffusion process governed by a Fokker-Planck equation to achieve a nonequilibrium steady state with a desired circulation while accelerating convergence toward the stationary distribution. To achieve the control objective, we introduce costs for both the probability density function and flux rotation to the objective functional. We formulate the optimal control problem through dimensionality reduction of the Fokker-Planck equation via eigenfunction expansion, which requires a low-computational cost. We demonstrate that the proposed optimal control achieves the desired circulation while accelerating convergence to the stationary distribution through numerical simulations.
Autonomous aerial vehicles (AAVs) empower sixth-generation (6G) Internet-of-Things (IoT) networks through mobility-driven data collection. However, conventional reward-driven reinforcement learning for AAV trajectory planning suffers from severe credit assignment issues and training instability, because sparse scalar rewards fail to capture the long-term and nonlinear effects of sequential movements. To address these challenges, this paper proposes Learn for Variation (L4V), a gradient-informed trajectory learning framework that replaces high-variance scalar reward signals with dense and analytically grounded policy gradients. Particularly, the coupled evolution of AAV kinematics, distance-dependent channel gains, and per-user data-collection progress is first unrolled into an end-to-end differentiable computational graph. Backpropagation through time then serves as a discrete adjoint solver, which propagates exact sensitivities from the cumulative mission objective to every control action and policy parameter. These structured gradients are used to train a deterministic neural policy with temporal smoothness regularization and gradient clipping. Extensive simulations demonstrate that L4V consistently outperforms representative baselines, including a genetic algorithm, DQN, A2C, and DDPG, in mission completion time, average transmission rate, and training cost
RaRadio maps (RMs) provide spatially continuous propagation characterizations essential for 6G network planning, but high-fidelity RM construction remains challenging. Rigorous electromagnetic solvers incur prohibitive computational latency, while data-driven models demand massive labeled datasets and generalize poorly from simplified simulations to complex multipath environments. This paper proposes RadioDiff-FS, a few-shot diffusion framework that adapts a pre-trained main-path generator to multipath-rich target domains with only a small number of high-fidelity samples. The adaptation is grounded in a theoretical decomposition of the multipath RM into a dominant main-path component and a directionally sparse residual. This decomposition shows that the cross-domain shift corresponds to a bounded and geometrically structured feature translation rather than an arbitrary distribution change. A Direction-Consistency Loss (DCL) is then introduced to constrain diffusion score updates along physically plausible propagation directions, thereby suppressing phase-inconsistent artifacts that arise in the low-data regime. Experiments show that RadioDiff-FS reduces NMSE by 59.5% on static RMs and by 74.0% on dynamic RMs relative to the vanilla diffusion baseline, achieving an SSIM of 0.9752 and a PSNR of 36.37 dB under severely limited supervision.
Efficient multi-user multi-task video transmission is an important research topic within the realm of current wireless communication systems. To reduce the transmission burden and save communication resources, we propose a goal-oriented semantic communication framework for optical flow-based multi-user multi-task video transmission (OF-GSC). At the transmitter, we design a semantic encoder that consists of a motion extractor and a patch-level optical flow-based semantic representation extractor to effectively identify and select important semantic representations. At the receiver, we design a transformer-based semantic decoder for high-quality video reconstruction and video classification tasks. To minimize the communication time, we develop a deep deterministic policy gradient (DDPG)-based bandwidth allocation algorithm for multi-user transmission. For video reconstruction tasks, our OF-GSC framework achieves a significant improvement in the received video quality, as evidenced by a 13.47% increase in the structural similarity index measure (SSIM) score in comparison to DeepJSCC. For video classification tasks, OF-GSC achieves a Top-1 accuracy slightly surpassing the performance of VideoMAE with only 25% required data under the same mask ratio of 0.3. For bandwidth allocation optimization, our DDPG-based algorithm reduces the maximum transmission time by 25.97% compared with the baseline equal-bandwidth allocation scheme.
Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are genuinely scarce. Most small companies skip it entirely. We built a six-agent AI system where each agent handles one analytical stage: profiling the organization, mapping assets, analyzing threats, evaluating controls, scoring risks, and generating recommendations. Agents share a persistent context that grows as the assessment proceeds, so later agents build on what earlier ones concluded -- the mechanism that distinguishes this from standard sequential agent pipelines. We tested it on a 15-person HIPAA-covered healthcare company and compared outputs to independent assessments by three CISSP practitioners -- the system agreed with them 85% of the time on severity classifications, covered 92% of identified risks, and finished in under 15 minutes. We then ran 30 repeated single-agent assessments across five synthetic but sector-realistic organizational profiles in healthcare, fintech, manufacturing, retail, and SaaS, comparing a general-purpose Mistral-7B against a domain fine-tuned model. Both completed every run. The fine-tuned model flagged threats the baseline could not see at all: PHI exposure in healthcare, OT/IIoT vulnerabilities in manufacturing, platform-specific risks in retail. The full multi-agent pipeline, however, failed every one of 30 attempts on a Tesla T4 with its 4,096-token default context window -- context capacity, not model quality, turned out to be the binding constraint.
Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity, causing straggler effects under unified fine-tuning, while diverse downstream tasks require different representation depths, making full-model updates inefficient. To address these challenges, we propose an adaptive federated fine-tuning framework with early exits. Lightweight prediction heads are inserted at intermediate layers of the SSL backbone, allowing clients to terminate computation based on local constraints and task requirements. We further introduce a layer-wise, depth-aware partial aggregation strategy to better utilize representations from different network depths. Experiments show that the framework reduces edge overhead, supports heterogeneous hardware, and maintains competitive performance in resource-constrained federated environments.
Agrivoltaic systems--photovoltaic (PV) panels installed above agricultural land--have emerged as a promising dual-use solution to address competing land demands for food and energy production. In this paper, we propose a model predictive control (MPC) approach to dual-axis agrivoltaic panel tracking control that dynamically adjusts panel positions in real time to maximize power production and crop yield given solar irradiance and ambient temperature measurements. We apply convex relaxations and shading factor approximations to reformulate the MPC optimization problem as a convex second-order cone program that determines the PV panel position adjustments away from the sun-tracking trajectory. Through case studies, we demonstrate our approach, exploring the Pareto front between i) an approach that maximizes power production without considering crop needs and ii) crop yield with no agrivoltaics. We also conduct a case study exploring the impact of forecast error on MPC performance. We find that dynamically adjusting agrivoltaic panel position helps us actively manage the trade-offs between power production and crop yield, and that active panel control enables the agrivoltaic system to achieve land equivalent ratio values of up to 1.897.
We present a new analytical framework on the uplink data detection for massive multiple-input multiple-output systems with 1-bit analog-to-digital converters (ADCs). We first characterize the expected values of the soft-estimated symbols (after the linear receiver and prior to the data detection), which are affected by the 1-bit quantization during both the channel estimation and the uplink data transmission. In our analysis, we consider conventional receivers such as maximum ratio combining (MRC), zero forcing, and minimum mean squared error (MMSE), with multiple user equipments (UEs) and correlated Rayleigh fading. Additionally, we design a linear minimum mean dispersion (LMMD) receiver tailored for the data detection with 1-bit ADCs, which exploits the expected values of the soft-estimated symbols previously derived. Then, we propose a joint data detection (JD) strategy that exploits the interdependence among the soft-estimated symbols of the interfering UEs, along with its low-complexity variant. These strategies are compared with the robust maximum likelihood data detection with 1-bit ADCs. Numerical results examining the symbol error rate show that MMSE exhibits a considerable performance gain over MRC, whereas the proposed LMMD receiver significantly outperforms all the conventional receivers. Lastly, the proposed JD and its low-complexity variant provide a significant boost in comparison with the single-UE data detection.
Speech-based depression detection tools could aid early screening. Here, we propose an interpretable speech foundation model approach to enhance the clinical applicability of such tools. We introduce a speech-level Audio Spectrogram Transformer (AST) to detect depression using long-duration speech instead of short segments, along with a novel interpretation method that reveals prediction-relevant acoustic features for clinician interpretation. Our experiments show the proposed model outperforms a segment-level AST, highlighting the impact of segment-level labelling noise and the advantage of leveraging longer speech duration for more reliable depression detection. Through interpretation, we observe our model identifies reduced loudness and F0 as relevant depression signals, aligning with documented clinical findings. This interpretability supports a responsible AI approach for speech-based depression detection, rendering such tools more clinically applicable.
In this paper, we address the distributed prescribed-time convex optimization (DPTCO) problem for a class of nonlinear multi-agent systems (MASs) under undirected connected graph. A cascade design framework is proposed such that the DPTCO implementation is divided into two parts: distributed optimal trajectory generator design and local reference trajectory tracking controller design. The DPTCO problem is then transformed into the prescribed-time stabilization problem of a cascaded system. Changing Lyapunov function method and time-varying state transformation method together with the sufficient conditions are proposed to prove the prescribed-time stabilization of the cascaded system as well as the uniform boundedness of internal signals in the closed-loop systems. The proposed framework is then utilized to solve robust DPTCO problem for a class of chain-integrator MASs with external disturbances by constructing a novel variables and exploiting the property of time-varying gains. The proposed framework is further utilized to solve the adaptive DPTCO problem for a class of strict-feedback MASs with parameter uncertainty, in which backstepping method with prescribed-time dynamic filter is adopted. The descending power state transformation is introduced to compensate the growth of increasing rate induced by the derivative of time-varying gains in recursive steps and the high-order derivative of local reference trajectory is not required. Finally, theoretical results are verified by two numerical examples.
While initial applications of artificial intelligence (AI) in wireless communications over the past decade have demonstrated considerable potential using specialized models for targeted communication tasks, the revolutionary demands of sixth-generation (6G) networks for holographic communications, ubiquitous sensing, and native intelligence are propelling a necessary evolution towards AI-native wireless networks. The arrival of large AI models paves the way for the next phase of Wireless AI, driven by wireless foundation models (WFMs). In particular, pre-training on universal electromagnetic (EM) principles equips WFMs with the essential adaptability for a multitude of demanding 6G applications. However, existing large AI models face critical limitations, including pre-training strategies disconnected from EM-compliant constraints leading to physically inconsistent predictions, a lack of embedded understanding of wave propagation physics, and the inaccessibility of massive labeled datasets for comprehensive EM-aware training. To address these challenges, this article presents an electromagnetic information theory-guided self-supervised pre-training (EIT-SPT) framework designed to systematically inject EM physics into WFMs. The EIT-SPT framework aims to infuse WFMs with intrinsic EM knowledge, thereby enhancing their physical consistency, generalization capabilities across varied EM landscapes, and overall data efficiency. Building upon the proposed EIT-SPT framework, this article first elaborates on diverse potential applications in 6G scenarios of WFMs, then validates the efficacy of the proposed framework through illustrative case studies, and finally summarizes critical open research challenges and future directions for WFMs.
This work proposes a novel Alternating Direction Method of Multipliers (ADMM)-based Ensemble Kalman Inversion (EKI) algorithm for solving constrained nonlinear model predictive control (NMPC) problems. First, stage-wise nonlinear inequality constraints in the NMPC problem are embedded via an augmented Lagrangian with nonnegative slack variables. We then show that the resulting unconstrained augmented-Lagrangian primal subproblem admits a Bayesian interpretation: under independent Gaussian virtual observations, its minimizers coincide with MAP estimators, enabling solution via EKI. However, since the nonnegativity constraint on the slacks is a hard constraint not naturally encoded by a Gaussian model, our proposed algorithm yields a two-block ADMM scheme that alternates between (i) an inexact primal step that minimizes the augmented-Lagrangian objective (implemented via EKI rollouts), (ii) a nonnegativity projection for the slacks, and (iii) a dual ascent step. To balance exploration and convergence, an annealing schedule tempers sampling covariances while a penalty schedule increases constraint enforcement over outer iterations, encouraging global search early and precise constraint satisfaction later. We evaluate the proposed controller on a 6-DOF UR5e manipulation benchmark in MuJoCo, comparing it against DIAL-MPC (an iterative MPPI variant) as the arm traverses a cluttered tabletop environment.
This paper presents a measurement-driven study of early warning for reliability breakdown events in 5G non-standalone (NSA) railway networks. Using 10 Hz metro-train measurement traces with serving- and neighbor-cell indicators, we benchmark six representative learning models, including CNN, LSTM, XGBoost, Anomaly Transformer, PatchTST, and TimesNet, under multiple observation windows and prediction horizons. Rather than proposing a new prediction architecture, this study develops a measurement-driven benchmark to quantify the feasibility and operating trade-offs of seconds-ahead reliability prediction in 5G NSA railway environments. Experimental results show that learning models can anticipate RLF-related reliability breakdown events seconds in advance using lightweight radio features available on commercial devices. The presented benchmark provides insights for sensing-assisted communication control and offers an empirical foundation for integrating sensing and analytics into future mobility control.
Recent advances in diffusion models (DMs) have achieved exceptional visual quality in image editing tasks. However, the global denoising dynamics of DMs inherently conflate local editing targets with the full-image context, leading to unintended modifications in non-target regions. In this paper, we shift our attention beyond DMs and turn to Masked Generative Transformers (MGTs) as an alternative approach to tackle this challenge. By predicting multiple masked tokens rather than holistic refinement, MGTs exhibit a localized decoding paradigm that endows them with the inherent capacity to explicitly preserve non-relevant regions during the editing process. Building upon this insight, we introduce the first MGT-based image editing framework, termed EditMGT. We first demonstrate that MGT's cross-attention maps provide informative localization signals for localizing edit-relevant regions and devise a multi-layer attention consolidation scheme that refines these maps to achieve fine-grained and precise localization. On top of these adaptive localization results, we introduce region-hold sampling, which restricts token flipping within low-attention areas to suppress spurious edits, thereby confining modifications to the intended target regions and preserving the integrity of surrounding non-target areas. To train EditMGT, we construct CrispEdit-2M, a high-resolution dataset spanning seven diverse editing categories. Without introducing additional parameters, we adapt a pre-trained text-to-image MGT into an image editing model through attention injection. Extensive experiments across four standard benchmarks demonstrate that, with fewer than 1B parameters, our model achieves similarity performance while enabling 6 times faster editing. Moreover, it delivers comparable or superior editing quality, with improvements of 3.6% and 17.6% on style change and style transfer tasks, respectively.
Existing mainstream video customization methods focus on generating identity-consistent videos based on given reference images and textual prompts. Benefiting from the rapid advancement of joint audio-video generation, this paper proposes a more compelling new task: sync audio-video customization, which aims to synchronously customize both video identity and audio timbre. Specifically, given a reference image $I^{r}$ and a reference audio $A^{r}$, this novel task requires generating videos that maintain the identity of the reference image while imitating the timbre of the reference audio, with spoken content freely specifiable through user-provided textual prompts. To this end, we propose OmniCustom, a powerful DiT-based audio-video customization framework that can synthesize a video following reference image identity, audio timbre, and text prompts all at once in a zero-shot manner. Our framework is built on three key contributions. First, identity and audio timbre control are achieved through separate reference identity and audio LoRA modules that operate through self-attention layers within the base audio-video generation model. Second, we introduce a contrastive learning objective alongside the standard flow matching objective. It uses predicted flows conditioned on reference inputs as positive examples and those without reference conditions as negative examples, thereby enhancing the model ability to preserve identity and timbre. Third, we train OmniCustom on our constructed large-scale, high-quality audio-visual human dataset. Extensive experiments demonstrate that OmniCustom outperforms existing methods in generating audio-video content with consistent identity and timbre fidelity. Project page: this https URL.
Selecting the right deep learning model for power grid forecasting is challenging, as performance heavily depends on the data available to the operator. This paper presents a comprehensive benchmark of five modern neural architectures: two state space models (PowerMamba, S-Mamba), two Transformers (iTransformer, PatchTST), and a traditional LSTM. We evaluate these models on hourly electricity demand across six diverse US power grids for forecast windows between 24 and 168 hours. To ensure a fair comparison, we adapt each model with specialized temporal processing and a modular layer that cleanly integrates weather covariates. Our results reveal that there is no single best model for all situations. When forecasting using only historical load, PatchTST and the state space models provide the highest accuracy. However, when explicit weather data is added to the inputs, the rankings reverse: iTransformer improves its accuracy three times more efficiently than PatchTST. By controlling for model size, we confirm that this advantage stems from the architecture's inherent ability to mix information across different variables. Extending our evaluation to solar generation, wind power, and wholesale prices further demonstrates that model rankings depend on the forecast task: PatchTST excels on highly rhythmic signals like solar, while state space models are better suited for the chaotic fluctuations of wind and price. Ultimately, this benchmark provides grid operators with actionable guidelines for selecting the optimal forecasting architecture based on their specific data environments.