With the development of sixth-generation (6G) wireless communication networks, the security challenges are becoming increasingly prominent, especially for mobile users (MUs). As a promising solution, physical layer security (PLS) technology leverages the inherent characteristics of wireless channels to provide security assurance. Particularly, stacked intelligent metasurface (SIM) directly manipulates electromagnetic waves through their multilayer structures, offering significant potential for enhancing PLS performance in an energy efficient manner. Thus, in this work, we investigate an SIM-assisted secure communication system for MUs under the threat of an eavesdropper, addressing practical challenges such as channel uncertainty in mobile environments, multiple MU interference, and residual hardware impairments. Consequently, we formulate a joint power and phase shift optimization problem (JPPSOP), aiming at maximizing the achievable secrecy rate (ASR) of all MUs. Given the non-convexity and dynamic nature of this optimization problem, we propose an enhanced proximal policy optimization algorithm with a bidirectional long short-term memory mechanism, an off-policy data utilization mechanism, and a policy feedback mechanism (PPO-BOP). Through these mechanisms, the proposed algorithm can effectively capture short-term channel fading and long-term MU mobility, improve sample utilization efficiency, and enhance exploration capabilities. Extensive simulation results demonstrate that PPO-BOP significantly outperforms benchmark strategies and other deep reinforcement learning algorithms in terms of ASR.https://doi.org/10.1109/TWC.2026.3658332
Pneumonia is a serious global health problem, contributing to high morbidity and mortality, especially in areas with limited diagnostic tools and healthcare resources. This study develops a Convolutional Neural Network (CNN) based on deep learning to automatically detect pneumonia from chest X-ray images. The method involves training the model on labeled datasets with preprocessing techniques such as normalization, data augmentation, and image quality enhancement to improve robustness and generalization. Testing results show that the optimized model achieves 91.67% accuracy, ROC-AUC of 0.96, and PR-AUC of 0.95, demonstrating strong performance in distinguishing pneumonia from normal images. In conclusion, this CNN model has significant potential as a fast, consistent, and reliable diagnostic aid, supporting Society 5.0 by integrating artificial intelligence to improve healthcare services and public well-being.
Medical image analysis requires substantial labeled data for model training, yet expert annotation is expensive and time-consuming. Active learning (AL) addresses this challenge by strategically selecting the most informative samples for the annotation purpose, but traditional methods solely rely on predictive uncertainty while ignoring whether models learn from clinically meaningful features a critical requirement for clinical deployment. We propose an explainability-guided active learning framework that integrates spatial attention alignment into a sample acquisition process. Our approach advocates for a dual-criterion selection strategy combining: (i) classification uncertainty to identify informative examples, and (ii) attention misalignment with radiologist-defined regions-of-interest (ROIs) to target samples where the model focuses on incorrect features. By measuring misalignment between Grad-CAM attention maps and expert annotations using \emph{Dice similarity}, our acquisition function judiciously identifies samples that enhance both predictive performance and spatial interpretability. We evaluate the framework using three expert-annotated medical imaging datasets, namely, BraTS (MRI brain tumors), VinDr-CXR (chest X-rays), and SIIM-COVID-19 (chest X-rays). Using only 570 strategically selected samples, our explainability-guided approach consistently outperforms random sampling across all the datasets, achieving 77.22\% accuracy on BraTS, 52.37\% on VinDr-CXR, and 52.66\% on SIIM-COVID. Grad-CAM visualizations confirm that the models trained by our dual-criterion selection focus on diagnostically relevant regions, demonstrating that incorporating explanation guidance into sample acquisition yields superior data efficiency while maintaining clinical interpretability.
Implicit neural representations (INRs) have emerged as powerful tools for encoding signals, yet dominant MLP-based designs often suffer from slow convergence, overfitting to noise, and poor extrapolation. We introduce FUTON (Fourier Tensor Network), which models signals as generalized Fourier series whose coefficients are parameterized by a low-rank tensor decomposition. FUTON implicitly expresses signals as weighted combinations of orthonormal, separable basis functions, combining complementary inductive biases: Fourier bases capture smoothness and periodicity, while the low-rank parameterization enforces low-dimensional spectral structure. We provide theoretical guarantees through a universal approximation theorem and derive an inference algorithm with complexity linear in the spectral resolution and the input dimension. On image and volume representation, FUTON consistently outperforms state-of-the-art MLP-based INRs while training 2--5$\times$ faster. On inverse problems such as image denoising and super-resolution, FUTON generalizes better and converges faster.
Selective auditory attention decoding aims to identify the speaker of interest from listeners' neural signals, such as electroencephalography (EEG), in the presence of multiple concurrent speakers. Most existing methods operate at the window level, facing a trade-off between temporal resolution and decoding accuracy. Recent work has shown that hidden Markov model (HMM)-based post-processing can smooth window-level decoder outputs to improve this trade-off. Instead of using a separate smoothing step, we propose to integrate the decoding and smoothing components into a single probabilistic framework using a Markov switching model (MSM). It directly models the relationship between the EEG and speech envelopes under each attention state while incorporating the temporal dynamics of attention. This formulation enables sample-level attention decoding, with model parameters and attention states jointly estimated via the expectation-maximization algorithm. Experimental results demonstrate that this integrated MSM formulation achieves comparable decoding accuracy to HMM post-processing while providing faster attention switch detection. The code for the proposed method is available at this https URL.
Real-world power distribution data are often inaccessible due to privacy and security concerns, highlighting the need for tools for generating realistic synthetic networks. Existing methods typically overlook critical reliability metrics such as the Customer Average Interruption Frequency Index (CAIFI) and the Customer Average Interruption Duration Index (CAIDI). Moreover, these methods often neglect phase consistency during the design stage, necessitating the use of a separate phase assignment algorithm. This work proposes a Bayesian Hierarchical Model (BHM) that generates phase-consistent unbalanced three-phase distribution systems, and incorporates reliability indices. The BHM learns the joint distribution of phase configuration, power demand, and reliability indices from a reference network, conditioning these attributes on topological features. We apply the proposed methodology to generate synthetic power distribution networks in Brazil, and validated it on known Brazilian networks. The results show that the BHM accurately reproduces the distributions of phase allocation, power demand, and reliability metrics on the training system. Furthermore, in out-of-sample validation on unseen data, the model generates phase-consistent networks and accurately predicts the reliability indices for the synthetic systems. The generated networks are also electrically feasible: three-phase power flows converge and voltages remain within typical operating limits, enabling studies of planning, reliability, and resilience.
Robust characterization of dynamic causal interactions in multivariate biomedical signals is essential for advancing computational and algorithmic methods in biomedical imaging. Conventional approaches, such as Dynamic Bayesian Networks (DBNs), often assume linear or simple statistical dependencies, while manifold based techniques like Convergent Cross Mapping (CCM) capture nonlinear, lagged interactions but lack probabilistic quantification and interventional modeling. We introduce a DBN informed CCM framework that integrates geometric manifold reconstruction with probabilistic temporal modeling. Applied to multimodal EEG-EMG recordings from dystonic and neurotypical children, the method quantifies uncertainty, supports interventional simulation, and reveals distinct frequency specific reorganization of corticomuscular pathways in dystonia. Experimental results show marked improvements in predictive consistency and causal stability as compared to baseline approaches, demonstrating the potential of causality aware multimodal modeling for developing quantitative biomarkers and guiding targeted neuromodulatory interventions.
This paper focuses on solving a challenging problem of blind deconvolution demixing involving modulated inputs. Specifically, multiple input signals $s_n(t)$, each bandlimited to $B$ Hz, are modulated with known random sequences $r_n(t)$ that alter at rate $Q$. Each modulated signal is convolved with a different M tap channel of impulse response $h_n(t)$, and the outputs of each channel are added at a common receiver to give the observed signal $y(t)=\sum_{n=1}^N (r_n(t)\odot s_n(t))\circledast h_n(t)$, where $\odot$ is the point wise multiplication, and $\circledast$ is circular convolution. Given this observed signal $y(t)$, we are concerned with recovering $s_n(t)$ and $h_n(t)$. We employ deterministic subspace assumption for the input signal $s_n(t)$ and keep the channel impulse response $h_n(t)$ arbitrary. We show that if modulating sequence is altered at a rate $Q \geq N^2 (B+M)$ and sample complexity bound is obeyed then all the signals and the channels, $\{s_n(t),h_n(t)\}_{n=1}^N$, can be estimated from the observed mixture $y(t)$ using gradient descent algorithm. We have performed extensive simulations that show the robustness of our algorithm and used phase transitions to numerically investigate the theoretical guarantees provided by our algorithm.
Simultaneous recording of electroencephalography (EEG) and functional MRI (fMRI) can provide a more complete view of brain function by merging high temporal and spatial resolutions. High-field ($\geq$3T) systems are standard, and require technical trade-offs, including artifacts in the EEG signal, reduced compatibility with metallic implants, high acoustic noise, and artifacts around high-susceptibility areas such as the optic nerve and nasal sinus. This proof-of-concept study demonstrates the feasibility of simultaneous EEG-fMRI at 0.55T in a visual task. We characterize the gradient and ballistocardiogram (BCG) artifacts inherent to this environment and observe reduced BCG magnitude consistent with the expected scaling of pulse-related artifacts with static magnetic field strength. This reduction shows promise for facilitating effective denoising while preserving the alpha rhythm and signal integrity. Furthermore, we tested a multimodal integration pipeline and demonstrated that the EEG power envelope corresponds with the hemodynamic BOLD response, supporting the potential to measure neurovascular coupling in this environment. We demonstrate that combined EEG-fMRI at 0.55T is feasible and represents a promising environment for multimodal neuroimaging.
Detecting anomalies in hyperspectral image data, i.e. regions which are spectrally distinct from the image background, is a common task in hyperspectral imaging. Such regions may represent interesting objects to human operators, but obtaining results often requires post-processing of captured data, delaying insight. To address this limitation, we apply an anomaly detection algorithm to a visible and near-infrared (VNIR) push-broom hyperspectral image sensor in real time onboard a small uncrewed aerial system (UAS), exploring how UAS limitations affect the algorithm. As the generated anomaly information is much more concise than the raw hyperspectral data, it can feasibly be transmitted wirelessly. To detection, we couple an innovative and fast georectification algorithm that enables anomalous areas to be interactively investigated and characterized immediately by a human operator receiving the anomaly data at a ground station. Using these elements, we demonstrate a novel and complete end-to-end solution from data capture and preparation, through anomaly detection and transmission, to ground station display and interaction, all in real time and with relatively low cost components.
J. B. Fourier in his \emph{Théorie Analytique de la Chaleur} of 1822 introduced, amongst other things, two ideas that have made a fundamental impact in fields as diverse as Mathematical Physics, Electrical Engineering, Computer Science, and Music. The first one of these, a method to find the coefficients for a trigonometric series describing an arbitrary function, was very early on picked up by G. Ohm and H. Helmholtz as the foundation for a theory of \emph{musical tones}. The second one, which is described by Fourier's double integral, became the basis for treating certain kinds of infinity in discontinuous functions, as shown by A. De Morgan in his 1842 \emph{The Differential and Integral Calculus}. Both make up the fundamental basis for what is now commonly known as the \emph{Fourier theorem}. With the help of P. A. M. Dirac's insights into the nature of these infinities, we can have a compact description of the frequency spectrum of a function of time, or conversely of a waveform corresponding to a given function of frequency. This paper, using solely primary sources, takes us from the physics of heat propagation to the modern theory of musical signals. It concludes with some considerations on the inherent duality of time and frequency emerging from Fourier's theorem.
While Mamba models offer efficient sequence modeling, vanilla versions struggle with temporal correlations and boundary details in Arctic sea ice concentration (SIC) prediction. To address these limitations, we propose Frequency-enhanced Hilbert scanning Mamba Framework (FH-Mamba) for short-term Arctic SIC prediction. Specifically, we introduce a 3D Hilbert scan mechanism that traverses the 3D spatiotemporal grid along a locality-preserving path, ensuring that adjacent indices in the flattened sequence correspond to neighboring voxels in both spatial and temporal dimensions. Additionally, we incorporate wavelet transform to amplify high-frequency details and we also design a Hybrid Shuffle Attention module to adaptively aggregate sequence and frequency features. Experiments conducted on the OSI-450a1 and AMSR2 datasets demonstrate that our FH-Mamba achieves superior prediction performance compared with state-of-the-art baselines. The results confirm the effectiveness of Hilbert scanning and frequency-aware attention in improving both temporal consistency and edge reconstruction for Arctic SIC forecasting. Our codes are publicly available at this https URL.
Off-grid targets whose Doppler (or angle) does not lie on the discrete processing grid can severely degrade classical normalized matched-filter (NMF) detectors: even at high SNR, the detection probability may saturate at operationally relevant low false-alarm rates. A principled remedy is the continuous-parameter GLRT, which maximizes a normalized correlation over the parameter domain; however, dense scanning increases test-time cost and remains sensitive to covariance mismatch through whitening. We propose DopplerGLRTNet, an amortized off-grid GLRT: a lightweight regressor predicts the continuous Doppler within a resolution cell from the whitened observation, and the detector outputs a single GLRT/NMF-like score given by the normalized matched-filter energy at the predicted Doppler. Monte Carlo simulations in Gaussian and compound-Gaussian clutter show that DopplerGLRTNet mitigates off-grid saturation, approaches dense-scan performance at a fraction of its cost, and improves robustness to covariance estimation at the same empirically calibrated Pfa.
Spoofing-robust automatic speaker verification (SASV) seeks to build automatic speaker verification systems that are robust against both zero-effort impostor attacks and sophisticated spoofing techniques such as voice conversion (VC) and text-to-speech (TTS). In this work, we propose a novel SASV architecture that introduces score-aware gated attention (SAGA), SASV-SAGA, enabling dynamic modulation of speaker embeddings based on countermeasure (CM) scores. By integrating speaker embeddings and CM scores from pre-trained ECAPA-TDNN and AASIST models respectively, we explore several integration strategies including early, late, and full integration. We further introduce alternating training for multi-module (ATMM) and a refined variant, evading alternating training (EAT). Experimental results on the ASVspoof 2019 Logical Access (LA) and Spoofceleb datasets demonstrate significant improvements over baselines, achieving a spoofing aware speaker verification equal error rate (SASV-EER) of 1.22% and minimum normalized agnostic detection cost function (min a-DCF) of 0.0304 on the ASVspoof 2019 evaluation set. These results confirm the effectiveness of score-aware attention mechanisms and alternating training strategies in enhancing the robustness of SASV systems.
Large Language Models (LLMs) have demonstrated strong semantic reasoning across multimodal domains. However, their integration with graph-based models of brain connectivity remains limited. In addition, most existing fMRI analysis methods rely on static Functional Connectivity (FC) representations, which obscure transient neural dynamics critical for neurodevelopmental disorders such as autism. Recent state-space approaches, including Mamba, model temporal structure efficiently, but are typically used as standalone feature extractors without explicit high-level reasoning. We propose NeuroMambaLLM, an end-to-end framework that integrates dynamic latent graph learning and selective state-space temporal modelling with LLMs. The proposed method learns the functional connectivity dynamically from raw Blood-Oxygen-Level-Dependent (BOLD) time series, replacing fixed correlation graphs with adaptive latent connectivity while suppressing motion-related artifacts and capturing long-range temporal dependencies. The resulting dynamic brain representations are projected into the embedding space of an LLM model, where the base language model remains frozen and lightweight low-rank adaptation (LoRA) modules are trained for parameter-efficient alignment. This design enables the LLM to perform both diagnostic classification and language-based reasoning, allowing it to analyze dynamic fMRI patterns and generate clinically meaningful textual reports.
Recent advances in satellite and communication technologies have significantly improved geographical information and monitoring systems. Global System for Mobile Communications (GSM) and Global Navigation Satellite System (GNSS) technologies, which rely on electromagnetic signals transmitted from satellites and base stations, have long been utilized for geolocation applications. However, signal attenuation due to environmental conditions or intentional interference such as jamming may lead to severe degradation or complete loss of positioning capability. In such GNSS-denied environments, landmark extraction becomes critical for the navigation of unmanned aerial vehicles (UAVs) used in monitoring applications. By processing images captured from onboard UAV cameras, reliable visual landmarks can be identified to enable navigation without GNSS support. In this study, a convolution-based deep learning approach is proposed for the extraction of appropriate landmarks, and its effectiveness is examined.
This paper presents the history of the online simulation program Java-DSP (J-DSP) and the most recent function development and deployment. J-DSP was created to support online laboratories in DSP classes and was first deployed in our ASU DSP class in 2000. The development of the program and its extensions was supported by several NSF grants including CCLI and IUSE. The web-based software was developed by our team in Java and later transitioned to the more secure HTML5 environment. J-DSP supports laboratory exercises on: digital filters and their design, the FFT and its utility in spectral analysis, machine learning for signal classification, and more recently online simulations with the Quantum Fourier Transform. Throughout the J-DSP development and deployment of this tool and its associated laboratory exercises, we documented evaluations. Mobile versions of the program for iOS and Android were also developed. J-DSP is used to this day in several universities, and specific functions of the program have been used in NSF REU, IRES and RET workforce development and high school outreach.
Voltage stability in modern power systems involves coupled dynamics across multiple time scales. Conventional methods based on time-scale separation or static stability margins may overlook instabilities caused by the coupling of slow and fast transients. Uncertainty in operating conditions further complicates stability assessment, and high computational cost of Monte Carlo simulations limit its applicability to multi-scale dynamics. This paper presents a deep reinforcement learning-based framework for probabilistic reachability analysis of multi-scale voltage dynamics. By formulating each instability mechanism as a distinct absorbing state and introducing a multi-critic architecture for mechanism-specific learning, the proposed method enables consistent learning of risk probabilities associated with multiple instability types within a unified framework. The approach is demonstrated on a four-bus system with load tap changers and over-excitation limiters, illustrating effectiveness of the proposed learning-based reachability analysis in identifying and quantifying the mechanisms leading to voltage collapse.
In this paper, we propose a data-enabled moving horizon estimation (MHE) approach for nonlinear systems. While the approach is formulated by leveraging Koopman theory, its implementation does not require explicit Koopman modeling. Lifting functions are learned from the state and input data of the original nonlinear system to project the system trajectories into the lifted space, where the resulting trajectories implicitly describe the Koopman representation for the original nonlinear system. A convex data-enabled MHE formulation is developed to provide real-time state estimates of the Koopman representation, from which the states of the nonlinear system can be reconstructed. Sufficient conditions are derived to ensure the stability of the estimation error. The effectiveness of the proposed method is illustrated using a membrane-based biological water treatment process.
Cardiac MRI is limited by long acquisition times, which can lead to patient discomfort and motion artifacts. We aim to accelerate Cartesian dynamic cardiac MRI by learning efficient, scan-adaptive undersampling patterns that preserve diagnostic image quality. We develop a learning-based framework for designing scan- or slice-adaptive Cartesian undersampling masks tailored to dynamic cardiac MRI. Undersampling patterns are optimized using fully sampled training dynamic time-series data. At inference time, a nearest-neighbor search in low-frequency $k$-space selects an optimized mask from a dictionary of learned patterns. Our learned sampling approach improves reconstruction quality across multiple acceleration factors on public and in-house cardiac MRI datasets, including PSNR gains of 2-3 dB, reduced NMSE, improved SSIM, and higher radiologist ratings. The proposed scan-adaptive sampling framework enables faster and higher-quality dynamic cardiac MRI by adapting $k$-space sampling to individual scans.
Accurate cascaded channel state information is pivotal for extremely large-scale intelligent reflecting surfaces (XL-IRS) in next-generation wireless networks. However, the large XL-IRS aperture induces spherical wavefront propagation due to near-field (NF) effects, complicating cascaded channel estimation. Conventional dictionary-based methods suffer from cumulative quantization errors and high complexity, especially in uniform planar array (UPA) systems. To address these issues, we first propose a tensor modelization method for NF cascaded channels by exploiting the tensor product among the horizontal and vertical response vectors of the UPA-structured base station (BS) and the incident-reflective array response vector of the IRS. This structure leverages spatial characteristics, enabling independent estimation of factor matrices to improve efficiency. Meanwhile, to avoid quantization errors, we propose an off-grid cascaded channel estimation framework based on sparse Tucker decomposition. Specifically, we model the received signal as a Tucker tensor, where the sparse core tensor captures path gain-delay terms and three factor matrices are spanned by BS and NF IRS array responses. We then formulate a sparse core tensor minimization problem with tri-modal log-sum sparsity constraints to tackle the NP-hard challenge. Finally, the method is accelerated via higher-order singular value decomposition preprocessing, combined with majorization-minimization and a tailored tensor over-relaxation fast iterative shrinkage-thresholding technique. We derive the Cramér-Rao lower bound and conduct convergence analysis. Simulations show the proposed scheme achieves a 13.6 dB improvement in normalized mean square error over benchmarks with significantly reduced runtime.
Quasi-static human activities such as lying, standing or sitting produce very low Doppler shifts and highly spread radar signatures, making them difficult to detect with conventional constant-false-alarm rate (CFAR) detectors tuned for point targets. Moreover, privacy concerns and low lighting conditions limit the use of cameras in long-term care (LTC) facilities. This paper proposes a lightweight, non-visual image-based method for robust quasi-static human presence detection using a low-cost 60 GHz FMCW radar. On a dataset covering five semi-static activities, the proposed method improves average detection accuracy from 68.3% for Cell-Averaging CFAR (CA-CFAR) and 78.8% for Order-Statistics CFAR (OS-CFAR) to 93.24% for Subject 1, from 51.3%, 68.3% to 92.3% for Subject 2, and 57.72%, 69.94% to 94.82% for Subject 3, respectively. Finally, we benchmarked all three detectors across all activities on a Raspberry Pi 4B using a shared Range-Angle (RA) preprocessing pipeline. The proposed algorithm obtains an average 8.2 ms per frame, resulting in over 120 frames per second (FPS) and a 74 times speed-up over OS-CFAR. These results demonstrate that simple image-based processing can provide robust and deployable quasi-static human sensing in cluttered indoor environments.
The Received Signal Strength Indicator (RSSI) is widely available on commodity WiFi devices but is commonly regarded as too coarse for fine-grained sensing. This paper revisits its sensing potential and presents WiRSSI, a bistatic WiFi sensing framework for passive human tracking using only RSSI measurements. WiRSSI adopts a 1Tx-3Rx configuration and is readily extensible to Multiple-Input Multiple-Output (MIMO) deployments. We first reveal how CSI power implicitly encodes phase-related information and how this relationship carries over to RSSI, showing that RSSI preserves exploitable Doppler, Angle-of-Arrival (AoA), and delay cues associated with human motion. WiRSSI then extracts Doppler-AoA features via a 2D Fast Fourier Transform and infers delay from amplitude-only information in the absence of subcarrier-level phase. The estimated AoA and delay are then mapped to Cartesian coordinates and denoised to recover motion trajectories. Experiments in practical environments show that WiRSSI achieves median XY localization errors of 0.905 m, 0.784 m, and 0.785 m for elliptical, linear, and rectangular trajectories, respectively. In comparison, a representative CSI-based method attains median errors of 0.574 m, 0.599 m, and 0.514 m, corresponding to an average accuracy gap of 0.26 m. These results demonstrate that, despite its lower resolution, RSSI can support practical passive sensing and offers a low-cost alternative to CSI-based WiFi sensing.
Recent advances in deep learning (DL)-based joint source-channel coding (JSCC) have enabled efficient semantic communication in dynamic wireless environments. Among these approaches, vector quantization (VQ)-based JSCC effectively maps high-dimensional semantic feature vectors into compact codeword indices for digital modulation. However, existing methods, including universal JSCC (uJSCC), rely on fixed, modulation-specific encoders, decoders, and codebooks, limiting adaptability to fine-grained SNR variations. We propose an extended universal JSCC (euJSCC) framework that achieves SNR- and modulation-adaptive transmission within a single model. euJSCC employs a hypernetwork-based normalization layer for fine-grained feature vector normalization and a dynamic codebook generation (DCG) network that refines modulation-specific base codebooks according to block-wise SNR. To handle block fading channels, which consist of multiple coherence blocks, an inner-outer encoder-decoder architecture is adopted, where the outer encoder and decoder capture long-term channel statistics, and the inner encoder and decoder refine feature vectors to align with block-wise codebooks. A two-phase training strategy, i.e., pretraining on AWGN channels followed by finetuning on block fading channels, ensures stable convergence. Experiments on image transmission demonstrate that euJSCC consistently outperforms state-of-the-art channel-adaptive digital JSCC schemes under both block fading and AWGN channels.
Extra-large apertures, high carrier frequencies, and integrated sensing and communications (ISAC) are pushing array processing into the Fresnel region, where spherical wavefronts induce a range-dependent phase across the aperture. This curvature breaks the Fourier/Vandermonde structure behind classical subspace methods, and it is especially limiting with hybrid front-ends that provide only a small number of pilot measurements. Consequently, practical systems need continuous angle resolution and joint angle-range inference where many near-field approaches still rely on costly 2D gridding. We show that convexity can meet curvature via a lifted, gridless superresolution framework for near-field measurements. The key is a Bessel-Vandermonde factorization of the Fresnel-phase manifold that exposes a hidden Vandermonde structure in angle while isolating the range dependence into a compact coefficient map. Building on this, we introduce a lifting that maps each range bin and continuous angle to a structured rank-one atom, converting the nonlinear near-field model into a linear inverse problem over a row-sparse matrix. Recovery is posed as atomic-norm minimization and an explicit dual characterization via bounded trigonometric polynomials yields certificate-based localization that super-resolves off-grid angles and identifies active range bins. Simulations with strongly undersampled hybrid observations validate reliable joint angle-range recovery for next-generation wireless and ISAC systems.
Operating complex real-world systems, such as soft robots, can benefit from precise predictive control schemes that require accurate state and model knowledge. This knowledge is typically not available in practical settings and must be inferred from noisy measurements. In particular, it is challenging to simultaneously estimate unknown states and learn a model online from sequentially arriving measurements. In this paper, we show how a recently proposed gray-box system identification tool enables the estimation of a soft robot's current pose while at the same time learning a bending stiffness model. For estimation and learning, we rely solely on a nominal constant-curvature robot model and measurements of the robot's base reactions (e.g., base forces). The estimation scheme -- relying on a marginalized particle filter -- allows us to conveniently interface nominal constant-curvature equations with a Gaussian Process (GP) bending stiffness model to be learned. This, in contrast to estimation via a random walk over stiffness values, enables prediction of bending stiffness and improves overall model quality. We demonstrate, using real-world soft-robot data, that the method learns a bending stiffness model online while accurately estimating the robot's pose. Notably, reduced multi-step forward-prediction errors indicate that the learned bending-stiffness GP improves overall model quality.
Wireless communication systems exhibit structural and functional similarities to neural networks: signals propagate through cascaded elements, interact with the environment, and undergo transformations. Building upon this perspective, we introduce a unified paradigm, termed \textit{wireless physical neural networks (WPNNs)}, in which components of a wireless network, such as transceivers, relays, backscatter, and intelligent surfaces, are interpreted as computational layers within a learning architecture. By treating the wireless propagation environment and network elements as differentiable operators, new opportunities arise for joint communication-computation designs, where system optimization can be achieved through learning-based methods applied directly to the physical network. This approach may operate independently of, or in conjunction with, conventional digital neural layers, enabling hybrid communication learning pipelines. In the article, we outline representative architectures that embody this viewpoint and discuss the algorithmic and training considerations required to leverage the wireless medium as a computational resource. Through numerical examples, we highlight the potential performance gains in processing, adaptability, efficiency, and end-to-end optimization, demonstrating the promise of reconfiguring wireless systems as learning networks in next-generation communication frameworks.
A key question for most applications involving reconfigurable linear wave systems is how accurately a desired linear operator can be realized by configuring the system's tunable elements. The relevance of this question spans from hybrid-MIMO analog combiners via computational meta-imagers to programmable wave-domain signal processing. Yet, no electromagnetically consistent bounds have been derived for the fidelity with which a desired operator can be realized in a real-world reconfigurable wave system. Here, we derive such bounds based on an electromagnetically consistent multiport-network model (capturing mutual coupling between tunable elements) and accounting for real-world hardware constraints (lossy, 1-bit-programmable elements). Specifically, we formulate the operator-synthesis task as a quadratically constrained fractional-quadratic problem and compute rigorous fidelity upper bounds based on semidefinite relaxation. We apply our technique to three distinct experimental setups. The first two setups are, respectively, a free-space and a rich-scattering $4\times 4$ MIMO channel at 2.45 GHz parameterized by a reconfigurable intelligent surface (RIS) comprising 100 1-bit-programmable elements. The third setup is a $4\times 4$ MIMO channel at 19 GHz from four feeds of a dynamic metasurface antenna (DMA) to four users. We systematically study how the achievable fidelity scales with the number of tunable elements, and we probe the tightness of our bounds by trying to find optimized configurations approaching the bounds with standard discrete-optimization techniques. We observe a strong influence of the coupling strength between tunable elements on our fidelity bound. For the two RIS-based setups, our bound attests to insufficient wave-domain flexibility for the considered operator synthesis.
The detection of interictal epileptiform discharge (IED) is crucial for the diagnosis of epilepsy, but automated methods often lack interpretability. This study proposes IED-RAG, an explainable multimodal framework for joint IED detection and report generation. Our approach employs a dual-encoder to extract electrophysiological and semantic features, aligned via contrastive learning in a shared EEG-text embedding space. During inference, clinically relevant EEG-text pairs are retrieved from a vector database as explicit evidence to condition a large language model (LLM) for the generation of evidence-based reports. Evaluated on a private dataset from Wuhan Children's Hospital and the public TUH EEG Events Corpus (TUEV), the framework achieved balanced accuracies of 89.17\% and 71.38\%, with BLEU scores of 89.61\% and 64.14\%, respectively. The results demonstrate that retrieval of explicit evidence enhances both diagnostic performance and clinical interpretability compared to standard black-box methods.
Signal processing has played, and continues to play, a fundamental role in the evolution of modern localization technologies. Localization using spatial variations in the Earth's magnetic field is no exception. It relies on signal-processing methods for statistical state inference, magnetic-field modeling, and sensor calibration. Contemporary localization techniques based on spatial variations in the magnetic field can provide decimeter-level indoor localization accuracy and outdoor localization accuracy on par with strategic-grade inertial navigation systems. This article provides a broad, high-level overview of current signal-processing principles and open research challenges in localization using spatial variations in the Earth's magnetic field. The aim is to provide the reader with an understanding of the similarities and differences among existing key technologies from a statistical signal-processing perspective. To that end, existing key technologies will be presented within a common parametric signal-model framework compatible with well-established statistical inference methods.
This paper investigates secure downlink transmission in a UAV-assisted reconfigurable intelligent surface (RIS)-enabled multiuser multiple-input single-output network, where legitimate information-harvesting receivers coexist with untrusted energy-harvesting receivers (UEHRs) capable of eavesdropping. A UAV-mounted RIS provides blockage mitigation and passive beamforming, while the base station employs zero-forcing precoding for multiuser interference suppression. Due to limited feedback from UEHRs, their channel state information (CSI) is imperfect, leading to a worst-case secrecy energy efficiency (WCSEE) maximization problem. We jointly optimize the UAV horizontal position, RIS phase shifts, and transmit power allocation under both perfect and imperfect CSI, considering discrete RIS phases, UAV mobility, and energy-harvesting constraints. The resulting problem is highly nonconvex due to coupled channel geometry, robustness requirements, and discrete variables. To address this challenge, we propose a soft actor-critic (SAC)-based deep reinforcement learning framework that learns WCSEE-maximizing policies through interaction with the wireless environment. As a structured benchmark, a successive convex approximation (SCA) approach is developed for the perfect CSI case with continuous RIS phases. Simulation results show that the proposed SAC method achieves up to 28% and 16% secrecy energy efficiency gains over SCA and deep deterministic policy gradient baselines, respectively, while demonstrating superior robustness to CSI uncertainty and stable performance across varying transmit power levels and RIS sizes.
3D Gaussian Splatting (3DGS) has emerged as a powerful approach for novel view synthesis. However, the number of Gaussian primitives often grows substantially during training as finer scene details are reconstructed, leading to increased memory and storage costs. Recent coarse-to-fine strategies regulate Gaussian growth by modulating the frequency content of the ground-truth images. In particular, AutoOpti3DGS employs the learnable Discrete Wavelet Transform (DWT) to enable data-adaptive frequency modulation. Nevertheless, its modulation depth is limited by the 1-level DWT, and jointly optimizing wavelet regularization with 3D reconstruction introduces gradient competition that promotes excessive Gaussian densification. In this paper, we propose a multi-level DWT-based frequency modulation framework for 3DGS. By recursively decomposing the low-frequency subband, we construct a deeper curriculum that provides progressively coarser supervision during early training, consistently reducing Gaussian counts. Furthermore, we show that the modulation can be performed using only a single scaling parameter, rather than learning the full 2-tap high-pass filter. Experimental results on standard benchmarks demonstrate that our method further reduces Gaussian counts while maintaining competitive rendering quality.
Integrating massive multiple-input multiple-output (mMIMO) systems with intelligent reflecting surfaces (IRS) presents a promising paradigm for enhancing physical-layer security (PLS) in wireless communications. However, deploying high-resolution quantizers in large-scale mMIMO arrays, along with numerous IRS elements, leads to substantial hardware complexity. To address these challenges, this paper proposes a cost-effective PLS design for IRS-assisted mMIMO systems by employing one-bit digital-to-analog converters (DACs). The focus is on jointly optimizing one-bit quantized precoding at the transmitter and constant-modulus phase shifts at the IRS to maximize the secrecy rate. This leads to a highly non-convex fractional secrecy rate maximization (SRM) problem. To efficiently solve this problem, two algorithms are proposed: (1) the WMMSE-PDD algorithm, which reformulates the SRM problem into a sequence of non-fractional programs with auxiliary variables using the weighted minimum mean-square error (WMMSE) method and solves them via the penalty dual decomposition (PDD) approach, achieving superior secrecy performance; and (2) the exact penalty product Riemannian gradient descent (EPPRGD) algorithm, which transforms the SRM problem into an unconstrained optimization over a product Riemannian manifold, eliminating auxiliary variables and enabling faster convergence with a slight trade-off in secrecy performance. Both algorithms provide analytical solutions at each iteration and are proven to converge to Karush-Kuhn-Tucker (KKT) points. Simulation results confirm the effectiveness of the proposed methods and highlight their respective advantages.
This paper establishes a data-driven solution for infinite horizon linear quadratic Gaussian Mean Field Games with network-coupled heterogeneous agent populations where the dynamics of the agents are unknown. The solution technique relies on Integral Reinforcement Learning and Kleinman's iteration for solving algebraic Riccati equations (ARE). The resulting algorithm uses trajectory data to generate network-coupled MFG strategies for agents and does not require parameters of agents' dynamics. Under technical conditions on the persistency of excitation and on the existence of unique stabilizing solution to the corresponding AREs, the learned network-coupled MFG strategies are shown to converge to their true values.
This paper proposes a prescribed performance function aware hybrid gain finite time sliding mode control framework for a class of nonlinear systems subject to matched disturbances. The hybrid gain structure ensures bounded control effort while retaining finite time convergence, and the incorporation of PPFs enables explicit enforcement of transient performance requirements. Theoretical guarantees are first established for first order systems, characterizing finite time convergence, disturbance rejection, and residual bounds. The approach is then extended to second order dynamics, where a sliding manifold is designed using PPF constraints to facilitate controlled shaping of position and velocity transients. Simulation studies illustrate the proposed design under matched peak control conditions. Comparative results for second-order systems demonstrate that, while a well tuned non-PPF hybrid gain controller achieves competitive tracking performance, the PPF-aware formulation strictly enforces prescribed transient constraints and yields consistent reductions of approximately 9 to 12 percent in integral error and control energy metrics without increasing peak actuation effort.
AS-ISTA (Architecture Searched-Iterative Shrinkage Thresholding Algorithm) and AS-FISTA (AS-Fast ISTA) are compressed sensing algorithms introducing structural parameters to ISTA and FISTA to enable architecture search within the iterative process. The structural parameters are determined using deep unfolding, but this approach requires training data and the large overhead of training time. In this paper, we propose HGD-AS-ISTA (Hypergradient Descent-AS-ISTA) and HGD-AS-FISTA that use hypergradient descent, which is an online hyperparameter optimization method, to determine the structural parameters. Experimental results show that the proposed method improves performance of the conventional ISTA/FISTA while avoiding the need for re-training when the environment changes.
This paper investigates the problem of high-precision target localization in integrated sensing and communication (ISAC) systems, where the target is sensed via both a direct path and a reconfigurable intelligent surface (RIS)-assisted reflection path. We first develop a sequential matched-filter estimator to acquire coarse angular parameters, followed by a range recovery process based on subcarrier phase differences. Subsequently, we formulate the target localization problem as a non-linear least squares optimization, using the coarse estimates to initialize the target's position coordinates. To solve this efficiently, we introduce a fast iterative refinement algorithm tailored for RIS-aided ISAC environments. Recognizing that the signal model involves both linear path gains and non-linear geometric dependencies, we exploit the separable least-squares structure to decouple these parameters. Furthermore, we propose a modified Levenberg algorithm with an approximation strategy, which enables low-cost parameter updates without necessitating repeated evaluations of the full non-linear model. Simulation results show that the proposed refinement method achieves accuracy comparable to conventional approaches, while significantly reducing algorithmic complexity.
Collaborative virtual queueing has been proposed as a mechanism to mitigate airport surface congestion while preserving airline autonomy over aircraft-level pushback decisions. A central coordinator can regulate aggregate pushback capacity but cannot directly control which specific aircraft are released, limiting its ability to steer system-level performance. We propose a noncooperative coordination mechanism for collaborative virtual queueing based on the correlated equilibrium concept, which enables the coordinator to provide incentive-compatible recommendations on aircraft-level pushback decisions without overriding airline autonomy. To account for uncertainty in airlines' internal cost assessments, we introduce chance constraints into the correlated equilibrium formulation. This formulation provides explicit probabilistic guarantees on incentive compatibility, allowing the coordinator to adjust the confidence level with which airlines are expected to follow the recommended actions. We further propose a scalable algorithm for computing chance-constrained correlated equilibria by exploiting a reduced-rank structure. Numerical experiments demonstrate that the proposed method scales to realistic traffic levels up to 210 eligible pushbacks per hour, reduces accumulated delay by up to approximately 8.9% compared to current first-come-first-served schemes, and reveals a trade-off between confidence level, deviation robustness, and achievable cost efficiency.
Magnetic induction (MI) enables communication in RF-denied environments (underground, underwater, in-body), where the medium conductivity imprints a deterministic signature on the channel. This letter derives a closed-form Cramér--Rao bound (CRB) for the joint estimation of range and medium conductivity from MI pilot observations in an integrated sensing and communication (ISAC) framework. The Fisher information matrix reveals that the joint estimation penalty converges to 3\,dB in the near-field regime, meaning conductivity sensing adds at most a factor-of-two loss in ranging precision. Monte Carlo maximum-likelihood simulations confirm that the CRB is achievable under practical operating conditions.
In this work, we propose a method for computing centroids, or barycenters, in the spectral Wasserstein-2 metric for sets of power spectral densities, where the barycenters are restricted to belong to the set of all-pole spectra with a certain model order. This may be interpreted as finding an autoregressive representative for sets of second-order stationary Gaussian processes. While Wasserstein, or optimal transport, barycenters have been successfully used earlier in problems of spectral estimation and clustering, the resulting barycenters are non-parametric and the complexity of representing and storing them depends on, e.g., the choice of discretization grid. In contrast, the herein proposed method yields compact, low-dimensional, and interpretable spectral centroids that can be used in downstream tasks. Computing the all-pole centroids corresponds to solving a non-convex optimization problem in the model parameters, and we present a gradient descent scheme for addressing this. Although convergence to a globally optimal point cannot be guaranteed, the sub-optimality of the obtained centroids can be quantified. The proposed method is illustrated on a problem of phoneme classification.
Conventional automatic word-naming recognition systems struggle to recognize words from post-stroke patients with aphasia because of disfluencies and mispronunciations, limiting reliable automated assessment in this population. In this paper, we propose a Contrastive Language-Audio Pretraining (CLAP) based approach for automatic word-naming recognition to address this challenge by leveraging text-audio alignment. Our approach treats word-naming recognition as an audio-text matching problem, projecting speech signals and textual prompts into a shared embedding space to identify intended words even in challenging recordings. Evaluated on two speech datasets of French post-stroke patients with aphasia, our approach achieves up to 90% accuracy, outperforming existing classification-based and automatic speech recognition-based baselines.
The Dirac operator provides a unified framework for processing signals defined over different order topological domains, such as node and edge signals. Its eigenmodes define a spectral representation that inherently captures cross-domain interactions, in contrast to conventional Hodge-Laplacian eigenmodes that operate within a single topological dimension. In this paper, we compare the two alternatives in terms of the distortion/sparsity trade-off and we show how an overcomplete basis built concatenating the two dictionaries can provide better performance with respect to each approach. Then, we propose a parameterized nonredundant transform whose eigenmodes incorporate a mode-specific mass parameter that captures the interplay between node and edge modes. Interestingly, we show that learning the mass parameters from data makes the proposed transform able to achieve the best distortion-sparsity tradeoff with respect to both complete and overcomplete bases.
Long-duration audio is increasingly common in industrial and consumer settings, yet reviewing multi-hour recordings is impractical, motivating systems that answer natural-language queries with precise temporal grounding and minimal hallucination. Existing audio-language models show promise, but long-audio question answering remains difficult due to context-length limits. We introduce LongAudio-RAG (LA-RAG), a hybrid framework that grounds Large Language Model (LLM) outputs in retrieved, timestamped acoustic event detections rather than raw audio. Multi-hour streams are converted into structured event records stored in an SQL database, and at inference time the system resolves natural-language time references, classifies intent, retrieves only the relevant events, and generates answers using this constrained evidence. To evaluate performance, we construct a synthetic long-audio benchmark by concatenating recordings with preserved timestamps and generating template-based question-answer pairs for detection, counting, and summarization tasks. Finally, we demonstrate the practicality of our approach by deploying it in a hybrid edge-cloud environment, where the audio grounding model runs on-device on IoT-class hardware while the LLM is hosted on a GPU-backed server. This architecture enables low-latency event extraction at the edge and high-quality language reasoning in the cloud. Experiments show that structured, event-level retrieval significantly improves accuracy compared to vanilla Retrieval-Augmented Generation (RAG) or text-to-SQL approaches.
While synthetic aperture radar is widely adopted to provide high-resolution imaging at long distances using small arrays, the concept of coherent synthetic aperture communication (SAC) has not yet been explored. This article introduces the principles of SAC for direct satellite-to-device uplink, showcasing precise direction-of-arrival estimation for user equipment (UE) devices, facilitating spatial signal separation, localization, and easing link budget constraints. Simulations for a low Earth orbit satellite at 600 km orbit and two UE devices performing orthogonal frequency-division multiplexing-based transmission with polar coding at 3.5 GHz demonstrate block error rates below 0.1 with transmission powers as low as -10 dBm, even under strong interference when UE devices are resolved but fall on each other's strongest angular sidelobe. These results validate the ability of the proposed scheme to address mutual interference and stringent power limitations, paving the way for massive Internet of Things connectivity in non-terrestrial networks.
Proximity operations of rigid bodies, such as spacecraft rendezvous and docking, require precise tracking of both position and attitude over finite time intervals. These operations are often repeated under uncertain conditions, with unknown but repeatable parameters and disturbances. Adaptive iterative learning control (ILC) is well suited to such tasks, as it can track desired trajectories while learning unknown, iteration-invariant signals or parameters. However, conventional adaptive ILC faces two challenges: (i) the coupling between rotational and translational dynamics complicates the design of the two coordinated learning loops for position and attitude, and (ii) standard adaptive ILC designs cannot guarantee bounded control inputs. To address these issues, we propose a dual-number-based, segment-based two-loop adaptive ILC framework for simultaneous high-precision position and attitude tracking. The framework employs two learning loops that interact through a dual-number representation of tracking errors, combining position and attitude errors into a single mathematical object for unified control design. A segment-based dynamic projection mechanism ensures that both parameter estimates and control inputs remain bounded without prior knowledge of uncertainties. Mathematical analysis and numerical simulations demonstrate that the proposed framework significantly enhances tracking performance under unknown but repeatable uncertainties and strong rotational-translational coupling.
The performance of state-of-the-art speech enhancement (SE) models considerably degrades for pathological speech due to atypical acoustic characteristics and limited data availability. This paper systematically investigates data augmentation (DA) strategies to improve SE performance for pathological speakers, evaluating both predictive and generative SE models. We examine three DA categories, i.e., transformative, generative, and noise augmentation, assessing their impact with objective SE metrics. Experimental results show that noise augmentation consistently delivers the largest and most robust gains, transformative augmentations provide moderate improvements, while generative augmentation yields limited benefits and can harm performance as the amount of synthetic data increases. Furthermore, we show that the effectiveness of DA varies depending on the SE model, with DA being more beneficial for predictive SE models. While our results demonstrate that DA improves SE performance for pathological speakers, a performance gap between neurotypical and pathological speech persists, highlighting the need for future research on targeted DA strategies for pathological speech.
We introduce a system capable of faithfully modifying the perceptual voice quality of creak while preserving the speaker's perceived identity. While it is well known that high creak probability is typically correlated with low pitch, it is important to note that this is a property observed on a population of speakers but does not necessarily hold across all situations. Disentanglement of pitch from creak is achieved by augmentation of the training dataset of a speech synthesis system with a speaker manipulation block based on conditional continuous normalizing flow. The experiments show greatly improved speaker verification performance over a range of creak manipulation strengths.
We present a comprehensive overview of the Deep Image Prior (DIP) framework and its applications to image reconstruction in computed tomography. Unlike conventional deep learning methods that rely on large, supervised datasets, the DIP exploits the implicit bias of convolutional neural networks and operates in a fully unsupervised setting, requiring only a single measurement, even in the presence of noise. We describe the standard DIP formulation, outline key algorithmic design choices, and review several strategies to mitigate overfitting, including early stopping, explicit regularisation, and self-guided methods that adapt the network input. In addition, we examine computational improvements such as warm-start and stochastic optimisation methods to reduce the reconstruction time. The discussed methods are tested on real $\mu$CT measurements, which allows examination of trade-offs among the different modifications and extensions.
This paper investigates a cyber-physical DC microgrid employing a nonlinear distributed consensus-based control scheme for coordinated integration and management of distributed generating units within an expandable framework. Relying on nested primary andsecondary control loops; a (distributed) outer-loop and a (decentralized) inner-loop, the controller achieves proportional current sharing among all distributed generation units, while dynamically operating within predefined voltage limits. A rigorous Lyapunov-based stability analysis establishes a scalable global exponential stability certificate under some tuning conditions and sufficient time-scale separation between the control loops, based on singular perturbation theory. An optimization-based tuning strategy is then formulated to identify and subsequently diminish unstable operating conditions. In turn, various practical tuning strategies are introduced to provide stable operations while facilitating near-optimal proportional current sharing. The effectiveness of the proposed control framework and tuning approaches are finally supported through time-domain simulations of a case-specific low-voltage DC microgrid.
With the ongoing transition of electricity markets worldwide from hourly to intra-hourly bidding, market participants--especially Renewable Energy Sources (RES)--gain improved opportunities to adjust energy and reserve schedules and to benefit from more accurate higher-resolution forecasts. However, this shift requires participants to update decision-making frameworks and to strengthen uncertainty management in order to fully exploit the new market potential. In particular, Renewable-Based Virtual Power Plants (RVPPs) aggregating dispatchable and non-dispatchable RES must account for these changes through market-oriented scheduling methods that efficiently address multiple uncertainties, including electricity prices, RES generation, and demand consumption. In this vein, this paper proposes a multi-bound robust optimization framework to simultaneously capture these uncertainties, explicitly incorporate intra-hourly variability, and differentiate the deviation levels (frequent, moderate deviations and rare, extreme ones) of uncertain parameters. The proposed approach yields less conservative and more implementable bidding and scheduling decisions, thus improving RVPP profitability in both energy and reserve markets. Simulation studies compare the proposed method with standard robust optimization and evaluate the operational, market-strategy, and economic impacts of quarter-hourly versus hourly market resolution. Results indicate that the normalized absolute differences, across different uncertainty-handling strategies, between hourly and 15-minute schedules are 18.0--34.2% for day-ahead traded energy, and 28.7--65.6% and 10.1--16.3% for upward and downward reserve traded in the secondary reserve market, respectively. Furthermore, relative to classic robust optimization, the proposed multi-bound approach increases profit by 24.9--49.2% across the considered strategies.
This work introduces a novel two-stage distributed framework to globally estimate constant parameters in a networked system, separating shared information from local estimation. The first stage uses dynamic average consensus to aggregate agents' measurements into surrogates of centralized data. Using these surrogates, the second stage implements a local estimator to determine the parameters. By designing an appropriate consensus gain, the persistence of excitation of the regressor matrix is achieved, and thus, exponential convergence of a local Gradient Estimator (GE) is guaranteed. The framework facilitates its extension to switched network topologies, quantization, and the heterogeneous substitution of the GE with a Dynamic Regressor Extension and Mixing (DREM) estimator, which supports relaxed excitation requirements.
Designing a speech quality assessment (SQA) system for estimating mean-opinion-score (MOS) of multi-rate speech with varying sampling frequency (16-48 kHz) is a challenging task. The challenge arises due to the limited availability of a MOS-labeled training dataset comprising multi-rate speech samples. While self-supervised learning (SSL) models have been widely adopted in SQA to boost performance, a key limitation is that they are pretrained on 16 kHz speech and therefore discard high-frequency information present in higher sampling rates. To address this issue, we propose a spectrogram-augmented SSL method that incorporates high-frequency features (up to 48 kHz sampling rate) through a parallel-branch architecture. We further introduce a two-step training scheme: the model is first pre-trained on a large 48 kHz dataset and then fine-tuned on a smaller multi-rate dataset. Experimental results show that leveraging high-frequency information overlooked by SSL features is crucial for accurate multi-rate SQA, and that the proposed two-step training substantially improves generalization when multi-rate data is limited.
Large language models (LLMs) and multimodal models have become powerful general-purpose reasoning systems. However, radio-frequency (RF) signals, which underpin wireless systems, are still not natively supported by these models. Existing LLM-based approaches for telecom focus mainly on text and structured data, while conventional RF deep-learning models are built separately for specific signal-processing tasks, highlighting a clear gap between RF perception and high-level reasoning. To bridge this gap, we introduce RF-GPT, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms. In this framework, complex in-phase/quadrature (IQ) waveforms are mapped to time-frequency spectrograms and then passed to pretrained visual encoders. The resulting representations are injected as RF tokens into a decoder-only LLM, which generates RF-grounded answers, explanations, and structured outputs. To train RF-GPT, we perform supervised instruction fine-tuning of a pretrained multimodal LLM using a fully synthetic RF corpus. Standards-compliant waveform generators produce wideband scenes for six wireless technologies, from which we derive time-frequency spectrograms, exact configuration metadata, and dense captions. A text-only LLM then converts these captions into RF-grounded instruction-answer pairs, yielding roughly 12,000 RF scenes and 0.625 million instruction examples without any manual labeling. Across benchmarks for wideband modulation classification, overlap analysis, wireless-technology recognition, WLAN user counting, and 5G NR information extraction, RF-GPT achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.
Classical controllability and observability characterise reachability and reconstructibility of the full system state and admit equivalent geometric and eigenvalue-based Popov-Belevitch-Hautus (PBH) tests. Motivated by large-scale and networked systems where only selected linear combinations of the state are of interest, this paper studies functional generalisations of these properties. A PBH-style framework for functional system properties is developed, providing necessary and sufficient spectral characterisations. The results apply uniformly to diagonalizable and non-diagonalizable systems and recover the classical PBH tests as special cases. Two new intrinsic notions are introduced: intrinsic functional controllability, and intrinsic functional stabilizability. These intrinsic properties are formulated directly in terms of invariant subspaces associated with the functional and provide verifiable conditions for the existence of admissible augmentations required for functional controller design and observer-based functional controller design. The intrinsic framework enables the generalized separation principle at the functional level, establishing that functional controllers and functional observers can be designed independently. Illustrative examples demonstrate the theory and highlight situations where functional control and estimation are possible despite lack of full-state controllability or observability.
This work presents the demonstration of lattice filters based on laterally excited bulk acoustic resonators (XBARs). Two filter implementations, namely direct lattice and layout-balanced lattice topologies, are designed and fabricated in periodically poled piezoelectric film (P3F) thin-film lithium niobate (TFLN). By leveraging the strong electromechanical coupling of XBARs in P3F TFLN together with the inherently wideband nature of the lattice topology, 3-dB fractional bandwidths (FBWs) of 27.42\% and 39.11\% and low insertion losses (ILs) of 0.88 dB and 0.96 dB are achieved at approximately 20 GHz for the direct and layout-balanced lattice filters, respectively, under conjugate matching. Notably, all prototypes feature compact footprints smaller than 1.3 mm\textsuperscript{2}. These results highlight the potential of XBAR-based lattice architectures to enable low-loss, wideband acoustic filters for compact, high-performance RF front ends in next-generation wireless communication and sensing systems, while also identifying key challenges and directions for further optimization.
In recent times, there has been considerable interest in fault detection within electrical power systems, garnering attention from both academic researchers and industry professionals. Despite the development of numerous fault detection methods and their adaptations over the past decade, their practical application remains highly challenging. Given the probabilistic nature of fault occurrences and parameters, certain decision-making tasks could be approached from a probabilistic standpoint. Protective systems are tasked with the detection, classification, and localization of faulty voltage and current line magnitudes, culminating in the activation of circuit breakers to isolate the faulty line. An essential aspect of designing effective fault detection systems lies in obtaining reliable data for training and testing, which is often scarce. Leveraging deep learning techniques, particularly the powerful capabilities of pattern classifiers in learning, generalizing, and parallel processing, offers promising avenues for intelligent fault detection. To address this, our paper proposes an anomaly-based approach for fault detection in electrical power systems, employing deep autoencoders. Additionally, we utilize Convolutional Autoencoders (CAE) for dimensionality reduction, which, due to its fewer parameters, requires less training time compared to conventional autoencoders. The proposed method demonstrates superior performance and accuracy compared to alternative detection approaches by achieving an accuracy of 97.62% and 99.92% on simulated and publicly available datasets.
This paper presents a physics-informed neural network approach for dynamic modeling of saturable synchronous machines, including cases with spatial harmonics. We introduce an architecture that incorporates gradient networks directly into the fundamental machine equations, enabling accurate modeling of the nonlinear and coupled electromagnetic constitutive relationship. By learning the gradient of the magnetic field energy, the model inherently satisfies energy balance (reciprocity conditions). The proposed architecture can universally approximate any physically feasible magnetic behavior and offers several advantages over lookup tables and standard machine learning models: it requires less training data, ensures monotonicity and reliable extrapolation, and produces smooth outputs. These properties further enable robust model inversion and optimal trajectory generation, often needed in control applications. We validate the proposed approach using measured and finite-element method (FEM) datasets from a 5.6-kW permanent-magnet (PM) synchronous reluctance machine. Results demonstrate accurate and physically consistent models, even with limited training data.
Multi-static backscatter networks (BNs) are strong candidates for joint communication and localization in the ambient IoT paradigm for 6G. Enabling real-time localization in large-scale multi-static deployments with thousands of devices require highly efficient algorithms for estimating key parameters such as range and angle of arrival (AoA), and for fusing these parameters into location estimates. We propose two low-complexity algorithms, Joint Range-Angle Clustering (JRAC) and Stage-wise Range-Angle Estimation (SRAE). Both deliver range and angle estimation accuracy comparable to FFT- and subspace-based baselines while significantly reducing the computation. We then introduce two real-time localization algorithms that fuse the estimated ranges and AoAs: a maximum-likelihood (ML) method solved via gradient search and an iterative re-weighted least squares (IRLS) method. Both achieve localization accuracy comparable to ML-based brute force search albeit with far lower complexity. Experiments on a real-world large-scale multi-static testbed with 4 illuminators, 1 multi-antenna receiver, and 100 tags show that JRAC and SRAE reduce runtime by up to 40X and IRLS achieves up to 500X reduction over ML-based brute force search without degrading localization accuracy. The proposed methods achieve 3 m median localization error across all 100 tags in a sub-6GHz band with 40 MHz bandwidth. These results demonstrate that multi-static range-angle estimation and localization algorithms can make real-time, scalable backscatter localization practical for next-generation ambient IoT networks.
We study language-in-the-loop control for multi-drone systems that execute evolving, high-level missions while retaining formal robustness guarantees at the physical layer. We propose a three-layer architecture in which (i) a human operator issues natural-language instructions, (ii) an LLM-based supervisor periodically interprets, verifies, and corrects the commanded task in the context of the latest state and target estimates, and (iii) a distributed inner-loop controller tracks the resulting reference using only local relative information. We derive a theoretical guarantee that characterizes tracking performance under bounded disturbances and piecewise-smooth references with discrete jumps induced by LLM updates. Overall, our results illustrate how centralized language-based task reasoning can be combined with distributed feedback control to achieve complex behaviors with provable robustness and stability.
Speech emotion recognition (SER) is essential for humanoid robot tasks such as social robotic interactions and robotic psychological diagnosis, where interpretable and efficient models are critical for safety and performance. Existing deep models trained on large datasets remain largely uninterpretable, often insufficiently modeling underlying emotional acoustic signals and failing to capture and analyze the core physiology of emotional vocal behaviors. Physiological research on human voices shows that the dynamics of vocal amplitude and phase correlate with emotions through the vocal tract filter and the glottal source. However, most existing deep models solely involve amplitude but fail to couple the physiological features of and between amplitude and phase. Here, we propose PhysioSER, a physiology-informed vocal spectrotemporal representation learning method, to address these issues with a compact, plug-and-play design. PhysioSER constructs amplitude and phase views informed by voice anatomy and physiology (VAP) to complement SSL models for SER. This VAP-informed framework incorporates two parallel workflows: a vocal feature representation branch to decompose vocal signals based on VAP, embed them into a quaternion field, and use Hamilton-structured quaternion convolutions for modeling their dynamic interactions; and a latent representation branch based on a frozen SSL backbone. Then, utterance-level features from both workflows are aligned by a Contrastive Projection and Alignment framework, followed by a shallow attention fusion head for SER classification. PhysioSER is shown to be interpretable and efficient for SER through extensive evaluations across 14 datasets, 10 languages, and 6 backbones, and its practical efficacy is validated by real-time deployment on a humanoid robotic platform.
Automatic speech recognition (ASR) systems often degrade on accented speech because acoustic-phonetic and prosodic shifts induce a mismatch to training data, making labeled accent adaptation costly. However, common pseudo-label selection heuristics are largely text-centric (e.g., perplexity (PPL) filtering) and can prefer fluent yet acoustically mismatched hypotheses, leading to error amplification when fine-tuning. To address this, we introduce a multimodal consistency-guided, reference-free data selection pipeline for ASR accent adaptation under a transductive, label-free protocol. The pipeline starts with a target-aware preselection step based on submodular mutual information to improve query relevance and reduce downstream computation. It then generates multiple pseudo-transcriptions per utterance via perturbation-based decoding and scores each hypothesis using two reference-free signals: speech--text alignment in a shared embedding space and predicted word error rate (WER). A simple percentile-based selection rule retains reliable pseudo-labels for fine-tuning while discarding noisy utterances. In an in-domain setting, selecting ~1.5k utterances from a 30k pool achieves 10.91% WER, close to 10.45% obtained using 30k supervised labels. In a cross-domain setting with a mismatched candidate pool, consistency-filtered subsets avoid the degradation caused by unfiltered pseudo-labels under strong accent shift, and matched-hour experiments on a stronger ASR backbone further confirm gains over random sampling and recent selection baselines.
Localization is a fundamental capability in unmanned aerial vehicle (UAV) systems. Map-free LiDAR relocalization offers an effective solution for achieving high-precision positioning in environments with weak or unavailable GNSS signals. However, existing LiDAR relocalization methods are primarily tailored to autonomous driving, exhibiting significantly degraded accuracy in UAV scenarios. In this paper, we propose MAILS, a novel map-free LiDAR relocalization framework for UAVs. A Locality-Preserving Sliding Window Attention module is first introduced to extract locally discriminative geometric features from sparse point clouds. To handle substantial yaw rotations and altitude variations encountered during UAV flight, we then design a coordinate-independent feature initialization module and a locally invariant positional encoding mechanism, which together significantly enhance the robustness of feature extraction. Furthermore, existing LiDAR-based relocalization datasets fail to capture real-world UAV flight characteristics, such as irregular trajectories and varying altitudes. To address this gap, we construct a large-scale LiDAR localization dataset for UAVs, which comprises four scenes and various flight trajectories, designed to evaluate UAV relocalization performance under realistic conditions. Extensive experiments demonstrate that our method achieves satisfactory localization precision and consistently outperforms existing techniques by a significant margin. Our code and dataset will be released soon.
Vision Language Models (VLMs) have advanced perception in autonomous driving (AD), but they remain vulnerable to adversarial threats. These risks range from localized physical patches to imperceptible global perturbations. Existing defense methods for VLMs remain limited and often fail to reconcile robustness with clean-sample performance. To bridge these gaps, we propose NutVLM, a comprehensive self-adaptive defense framework designed to secure the entire perception-decision lifecycle. Specifically, we first employ NutNet++ as a sentinel, which is a unified detection-purification mechanism. It identifies benign samples, local patches, and global perturbations through three-way classification. Subsequently, localized threats are purified via efficient grayscale masking, while global perturbations trigger Expert-guided Adversarial Prompt Tuning (EAPT). Instead of the costly parameter updates of full-model fine-tuning, EAPT generates "corrective driving prompts" via gradient-based latent optimization and discrete projection. These prompts refocus the VLM's attention without requiring exhaustive full-model retraining. Evaluated on the Dolphins benchmark, our NutVLM yields a 4.89% improvement in overall metrics (e.g., Accuracy, Language Score, and GPT Score). These results validate NutVLM as a scalable security solution for intelligent transportation. Our code is available at this https URL.
Conditional diffusion inversion provides a powerful framework for unpaired image-to-image translation. However, we demonstrate through an extensive analysis that standard deterministic inversion (e.g. DDIM) fails when the source domain is spectrally sparse compared to the target domain (e.g., super-resolution, sketch-to-image). In these contexts, the recovered latent from the input does not follow the expected isotropic Gaussian distribution. Instead it exhibits a signal with lower frequencies, locking target sampling to oversmoothed and texture-poor generations. We term this phenomenon spectral collapse. We observe that stochastic alternatives attempting to restore the noise variance tend to break the semantic link to the input, leading to structural drift. To resolve this structure-texture trade-off, we propose Orthogonal Variance Guidance (OVG), an inference-time method that corrects the ODE dynamics to enforce the theoretical Gaussian noise magnitude within the null-space of the structural gradient. Extensive experiments on microscopy super-resolution (BBBC021) and sketch-to-image (Edges2Shoes) demonstrate that OVG effectively restores photorealistic textures while preserving structural fidelity.
We present FireRed-Image-Edit, a diffusion transformer for instruction-based image editing that achieves state-of-the-art performance through systematic optimization of data curation, training methodology, and evaluation design. We construct a 1.6B-sample training corpus, comprising 900M text-to-image and 700M image editing pairs from diverse sources. After rigorous cleaning, stratification, auto-labeling, and two-stage filtering, we retain over 100M high-quality samples balanced between generation and editing, ensuring strong semantic coverage and instruction alignment. Our multi-stage training pipeline progressively builds editing capability via pre-training, supervised fine-tuning, and reinforcement learning. To improve data efficiency, we introduce a Multi-Condition Aware Bucket Sampler for variable-resolution batching and Stochastic Instruction Alignment with dynamic prompt re-indexing. To stabilize optimization and enhance controllability, we propose Asymmetric Gradient Optimization for DPO, DiffusionNFT with layout-aware OCR rewards for text editing, and a differentiable Consistency Loss for identity preservation. We further establish REDEdit-Bench, a comprehensive benchmark spanning 15 editing categories, including newly introduced beautification and low-level enhancement tasks. Extensive experiments on REDEdit-Bench and public benchmarks (ImgEdit and GEdit) demonstrate competitive or superior performance against both open-source and proprietary systems. We release code, models, and the benchmark suite to support future research.
An end-to-end autoencoder (AE) framework is developed for downlink non-orthogonal multiple access (NOMA) over Rayleigh fading channels, which learns interference-aware and channel-adaptive super-constellations. While existing works either assume additive white Gaussian noise channels or treat fading channels without a fully end-to-end learning approach, our framework directly embeds the wireless channel into both training and inference. To account for practical channel state information (CSI), we further incorporate limited feedback via both uniform and Lloyd-Max quantization of channel gains and analyze their impact on AE training and bit error rate (BER) performance. Simulation results show that, with perfect CSI, the proposed AE outperforms the existing analytical NOMA schemes. In addition, Lloyd-Max quantization achieves superior BER performance compared to uniform quantization. These results demonstrate that end-to-end AEs trained directly over Rayleigh fading can effectively learn robust, interference-aware signaling strategies, paving the way for NOMA deployment in fading environments with realistic CSI constraints.
In this paper, we propose a fast algorithm for element selection, a multiplication-free form of dimension reduction that produces a dimension-reduced vector by simply selecting a subset of elements from the input. Dimension reduction is a fundamental technique for reducing unnecessary model parameters, mitigating overfitting, and accelerating training and inference. A standard approach is principal component analysis (PCA), but PCA relies on matrix multiplications; on resource-constrained systems, the multiplication count itself can become a bottleneck. Element selection eliminates this cost because the reduction consists only of selecting elements, and thus the key challenge is to determine which elements should be retained. We evaluate a candidate subset through the minimum mean-squared error of linear regression that predicts a target vector from the selected elements, where the target may be, for example, a one-hot label vector in classification. When an explicit target is unavailable, the input itself can be used as the target, yielding a reconstruction-based criterion. The resulting optimization is combinatorial, and exhaustive search is impractical. To address this, we derive an efficient formula for the objective change caused by swapping a selected and an unselected element, using the matrix inversion lemma, and we perform a swap-based local search that repeatedly applies objective-decreasing swaps until no further improvement is possible. Experiments on MNIST handwritten-digit images demonstrate the effectiveness of the proposed method.
We introduce Discernment, a semantic communication system that transmits the meaning of physical signals (baseband radio and audio) over a technical channel using GenAI models operating in discrete spaces. Discernment dynamically adapts to channel impairments - modeled as erasure channels - by switching between an autoregressive or a diffusion-based generative algorithm, depending on the erasure pattern. Our results show that Discernment maintains semantic integrity even as channel capacity severely degrades, exhibiting very small and graceful performance decline in both classification accuracy and statistical fidelity of the reconstructed meaning. These findings demonstrate Discernment's ability to adjust to diverse physical channel conditions while maintaining spectral efficiency and low model complexity, making it well suited for IoT deployments and strongly motivating further research on this semantic channel paradigm.
As deepfake audio becomes more realistic and diverse, developing generalizable countermeasure systems has become crucial. Existing detection methods primarily depend on XLS-R front-end features to improve generalization. Nonetheless, their performance remains limited, partly due to insufficient attention to fine-grained information, such as physiological cues or frequency-domain features. In this paper, we propose BreathNet, a novel audio deepfake detection framework that integrates fine-grained breath information to improve generalization. Specifically, we design BreathFiLM, a feature-wise linear modulation mechanism that selectively amplifies temporal representations based on the presence of breathing sounds. BreathFiLM is trained jointly with the XLS-R extractor, in turn encouraging the extractor to learn and encode breath-related cues into the temporal features. Then, we use the frequency front-end to extract spectral features, which are then fused with temporal features to provide complementary information introduced by vocoders or compression artifacts. Additionally, we propose a group of feature losses comprising Positive-only Supervised Contrastive Loss (PSCL), center loss, and contrast loss. These losses jointly enhance the discriminative ability, encouraging the model to separate bona fide and deepfake samples more effectively in the feature space. Extensive experiments on five benchmark datasets demonstrate state-of-the-art (SOTA) performance. Using the ASVspoof 2019 LA training set, our method attains 1.99% average EER across four related eval benchmarks, with particularly strong performance on the In-the-Wild dataset, where it achieves 4.70% EER. Moreover, under the ASVspoof5 evaluation protocol, our method achieves an EER of 4.94% on this latest benchmark.
This paper provides two results that are useful in the study of the existence and the stability properties of a periodic solution for a given dynamical system. The first result deals with scalar time-periodic systems and establishes the equivalence of the existence of a periodic solution and the existence of a bounded solution. The second result provides sufficient conditions for the existence and the stability of a periodic solution for a time-periodic dynamical system. Both results are applied to extremum seeking problems for a static output map with no plant dynamics and novel non-local results are provided without the use of averaging theorems and singular perturbation arguments.
Accurate, computationally efficient, and adaptive vehicle models are essential for autonomous vehicle control. Hybrid models that combine a nominal model with a Gaussian Process (GP)-based residual model have emerged as a promising approach. However, the GP-based residual model suffers from the curse of dimensionality, high evaluation complexity, and the inefficiency of online learning, which impede the deployment in real-time vehicle controllers. To address these challenges, we propose SPLIT, a sparse incremental learning framework for control-oriented vehicle dynamics modeling. SPLIT integrates three key innovations: (i) Model Decomposition. We decompose the vehicle model into invariant elements calibrated by experiments, and variant elements compensated by the residual model to reduce feature dimensionality. (ii) Local Incremental Learning. We define the valid region in the feature space and partition it into subregions, enabling efficient online learning from streaming data. (iii) GP Sparsification. We use bayesian committee machine to ensure scalable online evaluation. Integrated into model-based controllers, SPLIT is evaluated in aggressive simulations and real-vehicle experiments. Results demonstrate that SPLIT improves model accuracy and control performance online. Moreover, it enables rapid adaptation to vehicle dynamics deviations and exhibits robust generalization to previously unseen scenarios.
In safety-critical applications such as medical image segmentation, prediction systems must provide reliability guarantees that extend beyond conventional expected loss control. While risk-controlling prediction sets (RCPS) offer probabilistic guarantees on the expected risk, they fail to capture tail behavior and worst-case scenarios that are crucial in high-stakes settings. This paper introduces optimized certainty equivalent RCPS (OCE-RCPS), a novel framework that provides high-probability guarantees on general optimized certainty equivalent (OCE) risk measures, including conditional value-at-risk (CVaR) and entropic risk. OCE-RCPS leverages upper confidence bounds to identify prediction set parameters that satisfy user-specified risk tolerance levels with provable reliability. We establish theoretical guarantees showing that OCE-RCPS satisfies the desired probabilistic constraint for loss functions such as miscoverage and false negative rate. Experiments on image segmentation demonstrate that OCE-RCPS consistently meets target satisfaction rates across various risk measures and reliability configurations, while OCE-CRC fails to provide probabilistic guarantees.
Mobile robotic gas distribution mapping (GDM) provides critical situational awareness during emergency responses to hazardous gas releases. However, most systems still rely on teleoperation, limiting scalability and response speed. Autonomous active GDM is challenging in unknown and cluttered environments, because the robot must simultaneously explore traversable space, map the environment, and infer the gas distribution belief from sparse chemical measurements. We address this by formulating active GDM as a next-best-trajectory informative path planning (IPP) problem and propose XIT (Exploration-Exploitation Informed Trees), a sampling-based planner that balances exploration and exploitation by generating concurrent trajectories toward exploration-rich goals while collecting informative gas measurements en route. XIT draws batches of samples from an Upper Confidence Bound (UCB) information field derived from the current gas posterior and expands trees using a cost that trades off travel effort against gas concentration and uncertainty. To enable plume-aware exploration, we introduce the gas frontier concept, defined as unobserved regions adjacent to high gas concentrations, and propose the Wavefront Gas Frontier Detection (WGFD) algorithm for their identification. High-fidelity simulations and real-world experiments demonstrate the benefits of XIT in terms of GDM quality and efficiency. Although developed for active GDM, XIT is readily applicable to other robotic information-gathering tasks in unknown environments that face the exploration and exploitation trade-off.
Cochlear implants (CIs) have been developed to the point where they can restore hearing and speech understanding in a large proportion of patients. Although spatial hearing is central to controlling and directing attention and to enabling speech understanding in noisy environments, it has been largely neglected in the past. We propose here a multi-disciplinary research framework in which physicians, psychologists and engineers collaborate to improve spatial hearing for CI users.
We present a physics-informed voiced backend renderer for singing-voice synthesis. Given synthetic single-channel audio and a fund-amental--frequency trajectory, we train a time-domain Webster model as a physics-informed neural network to estimate an interpretable vocal-tract area function and an open-end radiation coefficient. Training enforces partial differential equation and boundary consistency; a lightweight DDSP path is used only to stabilize learning, while inference is purely physics-based. On sustained vowels (/a/, /i/, /u/), parameters rendered by an independent finite-difference time-domain Webster solver reproduce spectral envelopes competitively with a compact DDSP baseline and remain stable under changes in discretization, moderate source variations, and about ten percent pitch shifts. The in-graph waveform remains breathier than the reference, motivating periodicity-aware objectives and explicit glottal priors in future work.
Tasks ranging from sleep staging to clinical diagnosis traditionally rely on standard polysomnography (PSG) devices, bedside monitors and wearable devices, which capture diverse nocturnal biosignals (e.g., EEG, EOG, ECG, SpO$_2$). However, heterogeneity across devices and frequent sensor dropout pose significant challenges for unified modelling of these multimodal signals. We present \texttt{sleep2vec}, a foundation model for diverse and incomplete nocturnal biosignals that learns a shared representation via cross-modal alignment. \texttt{sleep2vec} is contrastively pre-trained on 42,249 overnight recordings spanning nine modalities using a \textit{Demography, Age, Site \& History-aware InfoNCE} objective that incorporates physiological and acquisition metadata (\textit{e.g.}, age, gender, recording site) to dynamically weight negatives and mitigate cohort-specific shortcuts. On downstream sleep staging and clinical outcome assessment, \texttt{sleep2vec} consistently outperforms strong baselines and remains robust to any subset of available modalities and sensor dropout. We further characterize, to our knowledge for the first time, scaling laws for nocturnal biosignals with respect to modality diversity and model capacity. Together, these results show that unified cross-modal alignment, coupled with principled scaling, enables label-efficient, general-purpose modelling of real-world nocturnal biosignals.
Rydberg atomic receivers hold extremely high sensitivity to electric fields, yet their effective 3-dB baseband bandwidth under conventional electromagnetically induced transparency (EIT) is typically constrained to tens to a few hundreds of kilohertz, which hinders wideband wireless applications. To relax this bottleneck, we investigate a six-wave mixing (SWM)-based Rydberg atomic receiver as a wideband radio frequency (RF)-to-optical quantum transducer. Specifically, we develop an explicit baseband input-output model spanning from the probe input to the output light field. Based upon this model, a closed-form 3-dB bandwidth expression is derived to expose its dependence on key optical and RF parameters. We further quantify the linear dynamic range by employing the 1-dB compression point (P1dB) and the input-referred third-order intercept point (IIP3), unveiling a communication-compatible characterization of the bandwidth-linearity trade-off. Finally, our numerical results demonstrate that, given identical optical driving conditions, the SWM configuration increases the 3-dB baseband bandwidth by more than an order of magnitude compared to the EIT-based counterpart, while retaining comparable electric-field sensitivity and revealing a broad, tunable linear operating region.
Decentralized federated learning (DFL) enables edge devices to collaboratively train models through local training and fully decentralized device-to-device (D2D) model exchanges. However, these energy-intensive operations often rapidly deplete limited device batteries, reducing their operational lifetime and degrading the learning performance. To address this limitation, we apply energy harvesting technique to DFL systems, allowing edge devices to extract ambient energy and operate sustainably. We first derive the convergence bound for wireless DFL with energy harvesting, showing that the convergence is influenced by partial device participation and transmission packet drops, both of which further depend on the available energy supply. To accelerate convergence, we formulate a joint device scheduling and power control problem and model it as a multi-agent Markov decision process (MDP). Traditional MDP algorithms (e.g., value or policy iteration) require a centralized coordinator with access to all device states and exhibit exponential complexity in the number of devices, making them impractical for large-scale decentralized networks. To overcome these challenges, we propose a fully decentralized policy iteration algorithm that leverages only local state information from two-hop neighboring devices, thereby substantially reducing both communication overhead and computational complexity. We further provide a theoretical analysis showing that the proposed decentralized algorithm achieves asymptotic optimality. Finally, comprehensive numerical experiments on real-world datasets are conducted to validate the theoretical results and corroborate the effectiveness of the proposed algorithm.
Precise in-hand manipulation of force-sensitive objects typically requires judicious coordinated force planning as well as accurate contact force feedback and control. Unlike multi-arm platforms with gripper end effectors, multi-fingered hands rely solely on fingertip point contacts and are not able to apply pull forces, therefore poses a more challenging problem. Furthermore, calibrated torque sensors are lacking in most commercial dexterous hands, adding to the difficulty. To address these challenges, we propose a dual-layer framework for multi-finger coordination, enabling high-precision manipulation of force-sensitive objects through joint control without tactile feedback. This approach solves coordinated contact force planning by incorporating graph rigidity and force closure constraints. By employing a force-to-position mapping, the planned force trajectory is converted to a joint trajectory. We validate the framework on a custom dexterous hand, demonstrating the capability to manipulate fragile objects-including a soft yarn, a plastic cup, and a raw egg-with high precision and safety.
This paper presents a comprehensive cryogenic analog signal processing architecture designed for superconducting qubit control and quantum state readout operating at 4 Kelvin. The proposed system implements a complete bidirectional signal path bridging room-temperature digital controllers with quantum processors at millikelvin stages. The control path incorporates a Phase-Locked Loop (PLL) for stable local oscillator generation, In-phase/Quadrature (I/Q) modulation for precise qubit gate operations, and a cryogenic power amplifier for signal conditioning. The readout path features a Low Noise Amplifier (LNA) with 14 dB gain and 8-Phase Shift Keying (8-PSK) demodulation for quantum state discrimination. All circuit blocks are designed and validated through SPICE simulations employing cryogenic MOSFET models at 180nm that account for carrier freeze-out, threshold voltage elevation, and enhanced mobility at 4 K. Simulation results demonstrate successful end-to-end signal integrity with I/Q phase error below 2°, image rejection ratio exceeding 35~dB, and symbol error rate below $10^{-6}$. This work provides a modular, simulation-validated framework for scalable cryogenic quantum control systems.
In robotics and human biomechanics, the tension between energy economy and kinematic readiness is well recognized; this work brings that fundamental principle to aerial multirotors. We show that the limited torque of the motors and the nonlinear aerodynamic map from rotor speed to thrust naturally give rise to the novel concept of promptness-a metric akin to dynamic aerodynamic manipulability. By treating energy consumption as a competing objective and introducing a geometric fiber-bundle formulation, we turn redundancy resolution into a principled multi-objective program on affine fibers. The use of the diffeomorphic transformation linearizing the signed-quadratic propulsion model allows us to lay the foundations for a rigorous study of the interplay between these costs. Through an illustrative case study on 4-DoF allocation on the hexarotor, we reveal that this interplay is fiber-dependent and physically shaped by hardware inequalities. For unidirectional thrusters, the feasible fibers are compact, yielding interior allocations and a short Pareto arc, while torque demands break symmetry and separate the optima. Conversely, with reversible propellers, the null space enables antagonistic rotor co-contraction that drives promptness to hardware limits, making optimal endurance and agility fundamentally incompatible in those regimes. Ultimately, rather than relying on heuristic tuning or black box algorithms to empirically improve task execution, this framework provides a foundational understanding of why and how to achieve agility through geometry-aware control allocation, offering possible guidance for vehicle design, certification metrics, and threat-aware flight operation.
A networked aerial robot team (NART) comprises a group of agents (e.g., unmanned aerial vehicles (UAVs), ground control stations, etc.) interconnected by wireless links. Inter-agent connectivity, even if intermittent (i.e. sparse), enables data exchanges between agents and supports cooperative behaviours in several NART missions. It can benefit online decentralised decision-making and group resilience, particularly when prior knowledge is inaccurate or incomplete. These requirements can be accounted for in the offline mission planning stages to incentivise cooperative behaviours and improve mission efficiency during the NART deployment. This paper proposes a novel path planning tool for a Sparse, Aware, and Cooperative Networked Aerial Robot Team (SpArC-NART) in exploration missions. It simultaneously considers different levels of prior information regarding the environment, limited agent energy, sensing, and communication, as well as distinct NART constitutions. The communication model takes into account the limitations of user-defined radio technology and physical phenomena. The proposed tool aims to maximise the mission goals (e.g., finding one or multiple targets, covering the full area of the environment, etc.), while cooperating with other agents to reduce agent reporting times, increase their global situational awareness (e.g., their knowledge of the environment), and facilitate mission replanning, if required. The developed cooperation mechanism leverages soft-motion constraints and dynamic rewards based on the Value of Movement and the expected communication availability between the agents at each time step. A ground sensing coverage use case was chosen to illustrate the current capabilities of this tool.
Pinching antennas systems (PASSs) have recently been proposed as a novel flexible-antenna technology. These systems are implemented by attaching low-cost pinching elements to dielectric waveguides. As the direct link is bypassed through waveguides, PASSs can effectively compensate large-scale effects of the wireless channel. This work explores the potential gains of employing PASSs for over-the-air federated learning (OTA-FL). For a PASS-assisted server, we develop a low-complexity algorithmic approach, which jointly tunes the PASS parameters and schedules the mobile devices for minimal energy consumption in OTA-FL. We study the efficiency of the proposed design and compare it against the conventional OTA-FL setting with MIMO server. Numerical experiments demonstrate that using a single-waveguide PASS at the server within a moderately sized area, the required energy for model aggregation is drastically reduced as compared to the case with fully-digital MIMO server. This introduces PASS as a potential technology for energy-efficient distributed learning in next generations of wireless systems.
Bengali (Bangla) remains under-resourced in long-form speech technology despite its wide use. We present Bengali-Loop, two community benchmarks to address this gap: (1) a long-form ASR corpus of 191 recordings (158.6 hours, 792k words) from 11 YouTube channels, collected via a reproducible subtitle-extraction pipeline and human-in-the-loop transcript verification; and (2) a speaker diarization corpus of 24 recordings (22 hours, 5,744 annotated segments) with fully manual speaker-turn labels in CSV format. Both benchmarks target realistic multi-speaker, long-duration content (e.g., Bangla drama/natok). We establish baselines (Tugstugi: 34.07% WER; this http URL: 40.08% DER) and provide standardized evaluation protocols (WER/CER, DER), annotation rules, and data formats to support reproducible benchmarking and future model development for Bangla long-form ASR and diarization.
Several applications of medical ultrasound can benefit from a larger imaging field of view (FOV). This study is aimed at increasing the FOV of linear array probes by increasing the element size rather than the element count. To investigate larger FOV, this study used coupled elements to imitate a larger element size. The effects of coupling on array beam patterns are examined with Fourier transforms of elements. The effects of coupling on resolution, contrast, and speckle signal-to-noise ratio are examined through phantom images and in-vivo images of a rabbit tumor reconstructed with plane-wave compounding. Furthermore, a positioning system was used to acquire data from a virtual large aperture with 120 mm FOV and 128 elements, collected in sections with a single probe. This study also investigates the Null Subtraction Imaging (NSI), Sign Coherence Factor (SCF), and Minimum Variance (MV) beamformers for regaining resolution lost by an increased F-number with large elements. The MV beamformer, while the most computationally expensive, was best for improving resolution without increasing speckle variance, decreasing Full-Width at Half-Max (FWHM) estimates of wire targets from 0.78 mm with DAS on a 2.5 wavelength element size to 0.54 mm with MV on a 5 wavelength element size.
Koopman linear representations have become a popular tool for control design of nonlinear systems, yet it remains unclear when such representations are exact. In this paper, we establish sufficient and necessary conditions under which a controlled nonlinear system admits an exact finite-dimensional Koopman linear representation, which we term Koopman linear embedding. We show that such a system must be transformable into a special control-affine preserved (CAP) structure, which enforces affine dependence of the state on the control input and isolates all nonlinearities into an autonomous subsystem. We further prove that this autonomous subsystem must itself admit a finite-dimensional Koopman linear model with a sufficiently-rich Koopman invariant subspace. Finally, we introduce a symbolic procedure to determine whether a given controlled nonlinear system admits the CAP structure, thereby elucidating whether Koopman approximation errors arise from intrinsic system dynamics or from the choice of lifting functions.
We investigate the optimal transport (OT) problem over networks, wherein supply and demand are conceptualized as temporal marginals governing departure rates of particles from source nodes and arrival rates at sink nodes. This setting extends the classical OT framework, where all mass is conventionally assumed to depart at $t = 0$ and arrive at $t = t_f$. Our generalization accommodates departures and arrivals at specified times, referred as departure--arrival(DA) constraints. In particular, we impose nodal-temporal flux constraints at source and sink nodes, characterizing two distinct scenarios: (i) Independent DA constraints, where departure and arrival rates are prescribed independently, and (ii) Coupled DA constraints, where each particle's transportation time span is explicitly specified. We establish that OT with independent DA constraints admits a multi-marginal optimal transport formulation, while the coupled DA case aligns with the unequal-dimensional OT framework. For line graphs, we analyze the existence and uniqueness of the solution path. For general graphs, we use a constructive path-based reduction and optimize over a prescribed set of paths. From a computational perspective, we consider entropic regularization of the original problem to efficiently provide solutions based on multi-marginal Sinkhorn method, making use of the graphical structure of the cost to further improve scalability. Our numerical simulation further illustrates the linear convergence rate in terms of marginal violation.
We propose a parameter-minimal neural architecture for solving differential equations by restricting the hypothesis class to Horner-factorized polynomials, yielding an implicit, differentiable trial solution with only a small set of learnable coefficients. Initial conditions are enforced exactly by construction by fixing the low-order polynomial degrees of freedom, so training focuses solely on matching the differential-equation residual at collocation points. To reduce approximation error without abandoning the low-parameter regime, we introduce a piecewise ("spline-like") extension that trains multiple small Horner models on subintervals while enforcing continuity (and first-derivative continuity) at segment boundaries. On illustrative ODE benchmarks and a heat-equation example, Horner networks with tens (or fewer) parameters accurately match the solution and its derivatives and outperform small MLP and sinusoidal-representation baselines under the same training settings, demonstrating a practical accuracy-parameter trade-off for resource-efficient scientific modeling.
We present ROSA -- Roundabout Optimized Speed Advisory -- a system that combines multi-agent trajectory prediction with coordinated speed guidance for multimodal, mixed traffic at roundabouts. Using a Transformer-based model, ROSA jointly predicts the future trajectories of vehicles and Vulnerable Road Users (VRUs) at roundabouts. Trained for single-step prediction and deployed autoregressively, it generates deterministic outputs, enabling actionable speed advisories. Incorporating motion dynamics, the model achieves high accuracy (ADE: 1.29m, FDE: 2.99m at a five-second prediction horizon), surpassing prior work. Adding route intention further improves performance (ADE: 1.10m, FDE: 2.36m), demonstrating the value of connected vehicle data. Based on predicted conflicts with VRUs and circulating vehicles, ROSA provides real-time, proactive speed advisories for approaching and entering the roundabout. Despite prediction uncertainty, ROSA significantly improves vehicle efficiency and safety, with positive effects even on perceived safety from a VRU perspective. The source code of this work is available under: this http URL.
In this paper, we present a detailed convergence analysis of a recently developed approximate Newton-type fully distributed optimization method for smooth, strongly convex local loss functions, called Network-GIANT, which has been empirically illustrated to show faster linear convergence properties while having the same communication complexity (per iteration) as its first order distributed counterparts. By using consensus based parameter updates, and a local Hessian based descent direction at the individual nodes with gradient tracking, we first explicitly characterize a global linear convergence rate for Network-GIANT, which can be computed as the spectral radius of a $3 \times 3$ matrix dependent on the Lipschitz continuity ($L$) and strong convexity ($\mu$) parameters of the objective functions, and the spectral norm ($\sigma$) of the underlying undirected graph represented by a doubly stochastic consensus matrix. We provide an explicit bound on the step size parameter $\eta$, below which this spectral radius is guaranteed to be less than $1$. Furthermore, we derive a mixed linear-quadratic inequality based upper bound for the optimality gap norm, which allows us to conclude that, under small step size values, asymptotically, as the algorithm approaches the global optimum, it achieves a locally linear convergence rate of $1-\eta(1 -\frac{\gamma}{\mu})$ for Network-GIANT, provided the Hessian approximation error $\gamma$ (between the harmonic mean of the local Hessians and the global hessian (the arithmetic mean of the local Hessians) is smaller than $\mu$. This asymptotically linear convergence rate of $\approx 1-\eta$ explains the faster convergence rate of Network-GIANT for the first time. Numerical experiments are carried out with a reduced CovType dataset for binary logistic regression over a variety of graphs to illustrate the above theoretical results.
Conformal prediction (CP) offers distribution-free marginal coverage guarantees under an exchangeability assumption, but these guarantees can fail if the data distribution shifts. We analyze the use of pseudo-calibration as a tool to counter this performance loss under a bounded label-conditional covariate shift model. Using tools from domain adaptation, we derive a lower bound on target coverage in terms of the source-domain loss of the classifier and a Wasserstein measure of the shift. Using this result, we provide a method to design pseudo-calibrated sets that inflate the conformal threshold by a slack parameter to keep target coverage above a prescribed level. Finally, we propose a source-tuned pseudo-calibration algorithm that interpolates between hard pseudo-labels and randomized labels as a function of classifier uncertainty. Numerical experiments show that our bounds qualitatively track pseudo-calibration behavior and that the source-tuned scheme mitigates coverage degradation under distribution shift while maintaining nontrivial prediction set sizes.
Advanced Aerial Mobility (AAM) operations require strategic flight planning services that predict both spatial and temporal uncertainties to safely validate flight plans against hazards such as weather cells, restricted airspaces, and CNS disruption areas. Current uncertainty estimation methods for AAM vehicles rely on conservative linear models due to limited real-world performance data. This paper presents a novel Kalman Filter-based uncertainty propagation method that models AAM Flight Management System (FMS) architectures through sigmoid-blended measurement noise covariance. Unlike existing approaches with fixed uncertainty thresholds, our method continuously adapts the filter's measurement trust based on progress toward waypoints, enabling FMS correction behavior to emerge naturally. The approach scales proportionally with control inputs and is tunable to match specific aircraft characteristics or route conditions. We validate the method using real ADS-B data from general aviation aircraft divided into training and verification sets. Uncertainty propagation parameters were tuned on the training set, achieving 76% accuracy in predicting arrival times when compared against the verification dataset, demonstrating the method's effectiveness for strategic flight plan validation in AAM operations.
We study randomized experiments in a service system when stochastic congestion can arise from temporarily limited supply or excess demand. Such congestion gives rise to cross-unit interference between the waiting customers, and analytic strategies that do not account for this interference may be biased. In current practice, one of the most widely used ways to address stochastic congestion is to use switchback experiments that alternatively turn a target intervention on and off for the whole system. We find, however, that under a queueing model for stochastic congestion, the standard way of analyzing switchbacks is inefficient, and that estimators that leverage the queueing model can be materially more accurate. Additionally, we show how the queueing model enables estimation of total policy gradients from unit-level randomized experiments, thus giving practitioners an alternative experimental approach they can use without needing to pre-commit to a fixed switchback length before data collection.
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
Cost-effectiveness analyses, based on decision-analytic models of disease progression and treatment, are routinely used to assess the economic value of a new intervention and consequently inform reimbursement decisions for the intervention. Many decision-analytic models developed to assess the economic value of highly effective directly acting antiviral (DAA) treatments for the hepatitis C virus (HCV) infection do not incorporate the transmission dynamics of HCV, accounting for which is required to estimate the number of downstream infections prevented by curing an infection. In this study, we develop and validate a comprehensive agent-based simulation (ABS) model of HCV transmission dynamics in the Indian context and use it to: (a) quantify the extent to which the cost-effectiveness of a DAA is underestimated - as a function of its uptake rate - if disease transmission dynamics are not considered in a cost-effectiveness analysis model; and (b) quantify the impact of the frequency and timing of treatment with DAAs, also as a function of their uptake rate, within a disease surveillance period on its cost-effectiveness.
Populations do not only interact over time but also age over time. It is therefore common to model them as age-structured PDEs, where age is the space variable. Since the models also involve integrals over age, both in the birth process and in the interaction among species, they are in fact integro-partial differential equations (IPDEs) with positive states. To regulate the population densities to desired profiles, harvesting is used as input. But non-discriminating harvesting, where wanting to repress one species will inevitably repress the other species as well, the positivity restriction on the input (no insertion of population), and the multiplicative nature of harvesting, makes control challenging even for ODE versions of such dynamics, let alone for their IPDE versions on an infinite-dimensional nonnegative state space. We introduce a design for a benchmark version of such a problem: a two-population predator-prey setup. The model is equivalent to two coupled ordinary differential equations (ODEs), actuated by harvesting which must not drop below zero, and strongly disturbed by two autonomous but exponentially stable integral delay equations (IDEs). We develop two control designs. With a modified Volterra-like control Lyapunov function, we design a simple feedback which employs possibly negative harvesting for global stabilization of the ODE model, while guaranteeing regional regulation with positive harvesting. With a more sophisticated, restrained controller we achieve regulation for the ODE model globally, with positive harvesting. For the full IPDE model, with the IDE dynamics acting as large disturbances, for both the simple and saturated feedback laws we provide explicit estimates of the regions of attraction. The paper charts a new pathway for control designs for infinite-dimensional multi-species dynamics and for nonlinear positive systems with positive controls.
Shared autonomous vehicles (SAVs) bring competition to traditional transit services but redesigning multimodal transit network can utilize SAVs as feeders to enhance service efficiency and coverage. This paper presents an optimization framework for the joint multimodal transit frequency and SAV fleet size problem, a variant of the transit network frequency setting problem. The objective is to maximize total transit ridership (including SAV-fed trips and subtracting boarding rejections) across multiple time periods under budget constraints, considering endogenous mode choice (transit, point-to-point SAVs, driving) and route selection, while allowing for strategic route removal by setting frequencies to zero. Due to the problem's non-linear, non-convex nature and the computational challenges of large-scale networks, we develop a hybrid solution approach that combines a metaheuristic approach (particle swarm optimization) with nonlinear programming for local solution refinement. To ensure computational tractability, the framework integrates analytical approximation models for SAV waiting times based on fleet utilization, multimodal network assignment for route choice, and multinomial logit mode choice behavior, bypassing the need for computationally intensive simulations within the main optimization loop. Applied to the Chicago metropolitan area's multimodal network, our method illustrates a 33.3% increase in transit ridership through optimized transit route frequencies and SAV integration, particularly enhancing off-peak service accessibility and strategically reallocating resources.
Motivated by large intelligent surface applications, the electric field properties of near field focusing using phase conjugation method are analyzed for cylindrical dipole arrays. Firstly, for the transmitting antennas featuring vertical polarization, the polarization characteristic is decomposed along the x, y, and z directions. Three typical cases are studied when the focal points are at (xf , 0, 0), (0, yf , 0), and (0, 0, zf ). When the length of the cylindrical dipole array is significantly larger compared to its radius, certain unique insights emerge. When the focal point is positioned along (0, 0, zf ), apart from the region on both sides, the ratio between Ez and Ex/Ey remains {\pi}/2. Additionally, When the focal point is located within the cylinder, the electric field of each polarization is approximately the same everywhere. In other words, beam focusing does not incur losses due to different positions. The focusing resolution of Ez is the same in the transverse and longitudinal directions. Different from the situation where the 3 - dB focal beam depth is much smaller than the focal beam width for the most of arrays, the resolution in the longitudinal can be improved, respectively. Through a comprehensive grasp of these design principles, we can gain a deeper understanding of the specific areas with significant potential for practical applications.
Recently, Artificial Intelligence (AI)-driven Physical-Layer Authentication (PLA), which focuses on achieving endogenous security and intelligent identity authentication, has attracted considerable interest. When compared with Discriminative AI (DAI), Generative AI (GAI) offers several advantages, such as fingerprint augmentation, reconstruction, and denoising. Inspired by these innovations, this paper provides a systematic exploration of GAI's integration with PLA. We commence with a concise review of identity authentication techniques and GAI models. Then, we contrast the limitations of DAI with the potential of GAI in addressing PLA challenges. Specifically, we introduce a structured taxonomy for GAI-enhanced PLA methodologies, encompassing three key stages: fingerprint collection, model training, and performance optimization within the PLA pipeline. Furthermore, we propose a novel PLA framework based on GAI and channel extrapolation for dynamic environments. To demonstrate GAI's efficacy in enhancing PLA robustness, we implement a case study using the Generative Diffusion Model (GDM). Finally, we outline potential future research directions for GAI-based PLA.
Affine frequency division multiplexing (AFDM) has recently emerged as an excellent backward-compatible 6G waveform. In this paper, we study matched filtering (MF) assisted channel estimation (CE) for AFDM systems in complex doubly selective channels. By deriving the complete input-output relationship of the continuous-time signal, the inter-chirp-carrier interference, signal-to-interference-plus-noise ratio (SINR), and the effective SINR loss of AFDM, are investigated in discrete affine Fourier transform (DAFT) domain. Further, we propose two low-complexity methods for constructing the channel matrix by taking advantage of its inherent discrete Fourier transform structure and the staircase structure of the piecewise functions in the channel matrix, respectively. It is shown that complexity reduction by at least two orders of magnitude can be achieved for a large number of chirp subcarriers. For the CE problem in doubly selective channels, we introduce an MF assisted CE scheme. This allows us to sequentially estimate the parameters of each path by exploiting the separability and approximate orthogonality of different paths in the DAFT domain, thus leading to significantly reduced complexity. Furthermore, based on generalized Fibonacci search (GFS), an MF-GFS scheme is proposed to avoid significantly redundant computation, which can be extended to typical wide-band systems. Extensive simulation results indicate that the proposed schemes offer superior advantages in terms of their improved communication performance and lower complexity.
4D live fluorescence microscopy is often compromised by prolonged high intensity illumination which induces photobleaching and phototoxic effects that generate photo-induced artifacts and severely impair image continuity and detail recovery. To address this challenge, we propose the CellINR framework, a case-specific optimization approach based on implicit neural representation. The method employs blind convolution and structure amplification strategies to map 3D spatial coordinates into the high frequency domain, enabling precise modeling and high-accuracy reconstruction of cellular structures while effectively distinguishing true signals from artifacts. Experimental results demonstrate that CellINR significantly outperforms existing techniques in artifact removal and restoration of structural continuity, and for the first time, a paired 4D live cell imaging dataset is provided for evaluating reconstruction performance, thereby offering a solid foundation for subsequent quantitative analyses and biological research. The code and dataset will be public.
Electrified roadways (ER) equipped with dynamic wireless power transfer (DWPT) capabilities can patently extend the driving range and reduce the battery size of electric vehicles (EVs). However, due to the spatial arrangement of the transmitter coils in the ER, the DWPT load exhibits frequency content that could excite power system frequency dynamics. In this context, this work aims to study the spectrum of DWPT loads under different traffic conditions. Under simplifying assumptions, we develop statistical models to identify the location and relative magnitude of DWPT load harmonics. Our analysis reveals that the fundamental frequency depends on ER coil spacing and average EV speed. In the worst-case yet unlikely scenario that EVs move in a synchronized fashion, the amplitude of harmonics scales with the EV count. On the contrary, when EVs move freely, harmonics scale with the square root of the EV count. Platoon formations can accentuate harmonics. The spectral content around harmonics decreases in magnitude and increases in bandwidth with the harmonic index. The load of a single EV moving at a time-varying speed can be modeled as a frequency-modulated (FM) signal. Despite the simplifying assumptions, the derived models offer valuable insights for ER planners and grid operators. Dynamic simulations of a modified WECC model with DWPT loads synthesized from realistic EV trajectories and ER specifications corroborate some of these insights.
This paper presents an optimal power splitting and beamforming design for co-located simultaneous wireless information and power transfer (SWIPT) users in Dynamic Metasurface Antenna (DMA)-aided multiuser multiple-input single-output (MISO) systems. The objective is to minimize transmit power while meeting users signal-to-interference-plus-noise ratio (SINR) and energy harvesting (EH) requirements. The problem is solved via an alternating optimization framework based on semidefinite programming (SDP), where metasurface tunability follows Lorentzian-constrained holography (LCH). In contrast to traditional beamforming architectures, DMA-assisted architectures reduce the need for RF chains and phase shifters but require optimization under the Lorentzian constraint limiting the amplitude and phase optimizations. Hence, the proposed method integrates several LCH schemes, including the recently proposed adaptive-radius LCH (ARLCH), and evaluates nonlinear EH models and circuit noise effects. Simulation results show that the proposed design significantly reduces transmit power compared with baseline methods, highlighting the efficiency of ARLCH and optimal power splitting in DMA-assisted SWIPT systems.
We study the energy efficiency of pinching-antenna systems (PASSs) by developing a consistent formulation for power distribution in these systems. The per-antenna power distribution in PASSs is not controlled explicitly by a power allocation policy, but rather implicitly through tuning of pinching couplings and locations. Both these factors are tunable: (i) pinching locations are tuned using movable elements, and (ii) couplings can be tuned by varying the effective coupling length of the pinching elements. While the former is feasible to be addressed dynamically in settings with low user mobility, the latter cannot be addressed at a high rate. We thus develop a class of hybrid dynamic-static algorithms, which maximize the energy efficiency by updating the system parameters at different rates. Our experimental results depict that dynamic tuning of pinching locations can significantly boost energy efficiency of PASSs.
The increasing integration of renewable energy sources into electrical grids necessitates a paradigm shift toward advanced control schemes that guarantee safe and stable operations with scalable properties. Accordingly, this paper investigates large-signal stability guarantees for cyber-physical DC microgrids employing a nonlinear distributed consensus-based control scheme to enable coordinated integration and management of distributed generation units within an expandable framework. The proposed control framework adopts nested control loops; inner (decentralized) and outer (distributed), specifically designed to simultaneously achieve uniform voltage containment within pre-specified limits, and proportional current sharing in steady state. Our scalable stability result relies on singular perturbation theory and Lyapunov arguments to prove global exponential stability when imposing a sufficient time-scale separation at the border between the nested control loops, while relying on some practical parameter-setting schemes. The effectiveness and versatility of the proposed control strategy are then validated through time-domain simulations performed on a case-specific low-voltage DC microgrid and the modified IEEE 33-bus radial distribution system. Moreover, a small-signal stability analysis is conducted to derive practical guidelines that enhance the applicability of the method.
Magnetic Resonance Imaging (MRI) diagnoses and manages a wide range of diseases, yet long scan times drive high costs and limit accessibility. AI methods have demonstrated substantial potential for reducing scan times, but despite rapid progress, clinical translation of AI often fails. One particular class of failure modes, referred to as implicit data crimes, are a result of hidden biases introduced when MRI datasets incompletely model the MRI physics of the acquisition. Previous work identified data crimes resulting from algorithmic completion of k-space with parallel imaging and drew on simulation to demonstrate the resulting downstream biases. This work proposes a mathematical framework to re-characterize the problem as one of error reduction during interpolation between sets of evaluation coordinates. We establish a generalized matrix-based definition of the reconstruction error upper bound as a function of the input sampling pattern. Experiments on relevant sampling pattern structures demonstrate the relevance of the framework and suggest future directions for analysis of data crimes.
Electroencephalography (EEG) signals are inherently non-linear, non-stationary, and vulnerable to noise sources, making the extraction of discriminative features a long-standing challenge. In this work, we investigate the non-linear Teager-Kaiser Energy Operator (TKEO) for modeling the underlying energy dynamics of EEG in three representative tasks: motor imagery, emotion recognition, and epilepsy detection. To accommodate the narrowband nature of the operator, we employ Gabor filterbanks to isolate canonical frequency bands, followed by the Energy Separation Algorithm to decompose the TKEO output into amplitude envelope and instantaneous frequency components. We then derive a set of energy descriptors based on this demodulation and compare their classification performance against established EEG features. The proposed TKEO-based pipeline offers an intuitive, physiologically grounded framework for capturing EEG signal dynamics, while remaining simple, training-free, and data-efficient. Our findings suggest that combining TKEO features with conventional ones improves Balanced Accuracy by approximately 15 percent in epilepsy detection, yields modest gains in motor imagery, and achieves on par performance in emotion recognition, reflecting the pipeline's ability to capture transient neural dynamics.
End-to-end speech-to-speech translation (S2ST) systems typically struggle with a critical data bottleneck: the scarcity of parallel speech-to-speech corpora. To overcome this, we introduce RosettaSpeech, a novel zero-shot framework trained exclusively on monolingual speech-text data augmented by machine translation supervision. Unlike prior works that rely on complex cascaded pseudo-labeling, our approach strategically utilizes text as a semantic bridge during training to synthesize translation targets, thereby eliminating the need for parallel speech pairs while maintaining a direct, end-to-end inference pipeline. Empirical evaluations on the CVSS-C benchmark demonstrate that RosettaSpeech achieves state-of-the-art zero-shot performance, surpassing leading baselines by significant margins - achieving ASR-BLEU scores of 25.17 for German-to-English (+27% relative gain) and 29.86 for Spanish-to-English (+14%). Crucially, our model effectively preserves the source speaker's voice without ever seeing paired speech data. We further analyze the impact of data scaling and demonstrate the model's capability in many-to-one translation, offering a scalable solution for extending high-quality S2ST to "text-rich, speech-poor" languages.
This paper investigates the problem of safety certification for black-box discrete-time stochastic systems, where both the system dynamics and disturbance distributions are unknown, and only sampled data are available. Under such limited information, ensuring robust or classical quantitative safety over finite or infinite horizons is generally infeasible. To address this challenge, we propose a data-driven framework that provides theoretical one-step safety guarantees in the Probably Approximately Correct (PAC) sense. This one-step guarantee can be applied recursively at each time step, thereby yielding step-by-step safety assurances over extended horizons. Our approach formulates barrier certificate conditions based solely on sampled data and establishes PAC safety guarantees by leveraging the VC dimension, scenario approaches, Markov's inequality, and Hoeffding's inequality. Two sampling procedures are proposed, and three methods are proposed to derive PAC safety guarantees. The properties and comparative advantages of these three methods are thoroughly discussed. Finally, the effectiveness of the proposed methods are demonstrated through several numerical examples.
Millimeter-wave (mmWave) radar has emerged as a compact and powerful sensing modality for advanced perception tasks that leverage machine learning. It is particularly effective in scenarios where vision-based sensors fail to capture reliable information, such as detecting occluded objects or distinguishing between different surface materials in indoor environments. Due to the nonlinear characteristics of mmWave radar signals, deep learning-based methods are well suited for extracting relevant information from in-phase and quadrature (IQ) data. However, the current state of the art in IQ signal-based occluded-object and material classification still offers substantial potential for further improvement. In this paper, we propose a bidirectional cross-attention fusion network that combines IQ signal and FFT-transformed radar features obtained by distinct complex-valued convolutional neural networks (CNNs). In our experiments, we achieve a material classification accuracy of 99.92% on samples collected at the same sensor distances used during training, and an accuracy of 65.56% on samples measured at previously unseen distances, demonstrating improved generalization across varying measurement conditions. Furthermore, our approach improves occluded object classification to 94.20%, outperforming all comparison and ablation models and underscoring the benefit of the proposed fusion strategy.
Next-generation wireless networks are envisioned to achieve reliable, low-latency connectivity within environments characterized by strong multipath and severe channel variability. Programmable wireless environments (PWEs) address this challenge by enabling deterministic control of electromagnetic (EM) propagation through software-defined reconfigurable intelligent surfaces (RISs). However, effectively configuring RISs in real time remains a major bottleneck, particularly under near-field conditions where mutual coupling and specular reflections alter the intended response. To overcome this limitation, this paper introduces MATCH, a physics-based codebook compilation algorithm that explicitly accounts for the EM coupling among RIS unit cells and the reflective interactions with surrounding structures, ensuring that the resulting codebooks remain consistent with the physical characteristics of the environment. Finally, MATCH is evaluated under a full-wave simulation framework incorporating mutual coupling and secondary reflections, demonstrating its ability to concentrate scattered energy within the focal region, confirming that physics-consistent, codebook-based optimization constitutes an effective approach for practical and efficient RIS configuration.
Current RF machine-learning pipelines rely on task-specific deep networks for modulation classification and related tasks, but these models require custom architectures and labeled datasets for each problem, generalize poorly across channel conditions and SNRs, and offer little interpretability. In contrast, modern multimodal large language models (MLLMs) can integrate heterogeneous visual and textual data and exhibit strong cross-domain generalization and explanation capabilities. Our goal in this work is to explore whether vision-language models (VLMs) can be adapted to directly perceive RF signals and reason about modulation patterns without redesigning their architectures or injecting RF-specific inductive biases. To achieve this, we convert complex IQ streams into time-series, spectrogram, and joint RF visualizations, build a 57-class RF visual question answering benchmark, and show that lightweight parameter-efficient fine-tuning can enhance the accuracy of a general-purpose VLM from around 10% to nearly 90%, while ensuring robustness to noise and out-of-vocabulary modulations and the ability to produce human-readable rationales. The obtained results show that combining RF-to-image conversion with promptable VLMs provides a scalable and practical foundation for RF-aware AI systems in future 6G networks.
Traffic Steering (TS) dynamically allocates user traffic across cells to enhance Quality of Experience (QoE), load balance, and spectrum efficiency in 5G networks. However, TS algorithms remain vulnerable to adversarial conditions such as interference spikes, handover storms, and localized outages. To address this, an AI-driven fuzz testing framework based on the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) is proposed to systematically expose hidden vulnerabilities. Using NVIDIA Sionna, five TS algorithms are evaluated across six scenarios. Results show that AI-driven fuzzing detects 34.2% more total vulnerabilities and 5.8% more critical failures than traditional testing, achieving superior diversity and edge-case discovery. The observed variance in critical failure detection underscores the stochastic nature of rare vulnerabilities. These findings demonstrate that AI-driven fuzzing offers an effective and scalable validation approach for improving TS algorithm robustness and ensuring resilient 6G-ready networks.
This paper presents an approach for the modelling of dependent random variables using generalised polynomial chaos. This allows to write chance-constrained optimization problems with respect to a joint distribution modelling dependencies between different stochastic inputs. Arbitrary dependencies are modelled by using Gaussian copulas to construct the joint distribution. The paper exploits the problem structure and develops suitable transformations to ensure tractability. The proposed method is applied to a probabilistic power reserve procurement problem. The effectiveness of the method to capture dependencies is shown by comparing the approach with a standard approach considering independent random variables.
As the standardization of sixth generation (6G) wireless systems accelerates, there is a growing consensus in favor of evolutionary waveforms that offer new features while maximizing compatibility with orthogonal frequency division multiplexing (OFDM), which underpins the 4G and 5G systems. This article presents affine frequency division multiplexing (AFDM) as a premier candidate for 6G, offering intrinsic robustness for both high-mobility communications and integrated sensing and communication (ISAC) in doubly dispersive channels, while maintaining a high degree of synergy with the legacy OFDM. To this end, we provide a comprehensive analysis of AFDM, starting with a generalized fractional-delay-fractional-Doppler (FDFD) channel model that accounts for practical pulse shaping filters and inter-sample coupling. We then detail the AFDM transceiver architecture, demonstrating that it reuses nearly the entire OFDM pipeline and requires only lightweight digital pre- and post-processing. We also analyze the impact of hardware impairments, such as phase noise and carrier frequency offset, and explore advanced functionalities enabled by the chirp-parameter domain, including index modulation and physical-layer security. By evaluating the reusability across the radio-frequency, physical, and higher layers, the article demonstrates that AFDM provides a low-risk, feature-rich, and efficient path toward achieving high-fidelity communications in the later versions of 6G and beyond (6G+).
This paper presents C-POD, a cloud-native framework that automates the deployment and management of edge pods for seamless remote access and sharing of wireless testbeds. C-POD leverages public cloud resources and edge pods to lower the barrier to over-the-air (OTA) experimentation, enabling researchers to share and access distributed testbeds without extensive local infrastructure. A supporting toolkit has been developed for C-POD to enable flexible and scalable experimental workflows, including containerized edge environments, persistent Secure Shell (SSH) tunnels, and stable graphical interfaces. We prototype and deploy C-POD on the Amazon Web Services (AWS) public cloud to demonstrate its key features, including cloud-assisted edge pod deployment automation, elastic computing resource management, and experiment observability, by integrating two wireless testbeds that focus on RF signal generation and 5G(B) communications, respectively.
Continuous time (CT) and discrete time (DT) linear time invariant (LTI) systems are commonly introduced through distinct mathematical formalisms, which can obscure their underlying dynamical equivalence. This tutorial presents a unified treatment of firstorder CT and DT systems, emphasizing their shared modal structure and stability properties. Beginning with transfer functions and pole zero representations in the Laplace domain, canonical first order low pass and high-pass dynamics are examined from an operational perspective. The discussion then transitions to discrete-time sequences and the Z transform, highlighting geometric sequences as eigenfunctions of DT systems and establishing the correspondence between the left half of the s plane and the interior of the unit circle in the z plane. Practical discretization and sampled data implementations are analyzed to illustrate how continuous time dynamics are reinterpreted through recursion and accumulation in discrete time realizations. By maintaining structural symmetry between domains, the manuscript consolidates established concepts into a coherent framework linking mathematical representation, physical realizability, and implementation.
The DC-side ground fault (GF) poses significant risks to three-phase TN-earthed photovoltaic (PV) systems, as the resulting high fault current can directly damage both PV inverters and PV modules. Once a fault occurs, locating the faulty string through manual string-by-string inspection is highly time-consuming and inefficient. This work presents a comprehensive analysis of GF characteristics through fault-current analysis and a simulation-based case study covering multiple fault locations. Building on these insights, we propose an edge-AI-based GF localization approach tailored for three-phase TN-earthed PV systems. A PLECS-based simulation model that incorporates PV hysteresis effects is developed to generate diverse GF scenarios, from which correlation-based features are extracted throughout the inverter's four-stage shutdown sequence. Using the simulated dataset, a lightweight Variational Information Bottleneck (VIB)-based localization model is designed and trained, achieving over 93% localization accuracy at typical sampling rates with low computational cost, demonstrating strong potential for deployment on resource-constrained PV inverters.
This paper considers the infinite horizon optimal control problem for nonlinear systems. Under the condition of nonlinear controllability of the system to any terminal set containing the origin and forward invariance of the terminal set, we establish a regularized solution approach consisting of a ``finite free final time" optimal transfer problem to the terminal set, which renders the set globally asymptotically stable. Further, we show that the approximations converge to the optimal infinite horizon cost as the size of the terminal set decreases to zero. We also perform the analysis for the discounted problem and show that the terminal set is asymptotically stable only for a subset of the state space and not globally. The theory is empirically evaluated on various nonholonomic robotic systems to show that the cost of our approximate problem converges and the transfer time into the terminal set is dependent on the initial state of the system, necessitating the free final time formulation. We also do comparisons of our free-final time approach with nonlinear MPC.
Formant synthesis aims to generate speech with controllable formant structures, enabling precise control of vocal resonance and phonetic features. However, while existing formant synthesis approaches enable precise formant manipulation, they often yield an impoverished speech signal by failing to capture the complex co-occurring acoustic cues essential for naturalness. To address this issue, this letter presents HiFi-Glot, an end-to-end neural formant synthesis system that achieves both precise formant control and high-fidelity speech synthesis. Specifically, the proposed model adopts a source--filter architecture inspired by classical formant synthesis, where a neural vocoder generates the glottal excitation signal, and differentiable resonant filters model the formants to produce the speech waveform. Experiment results demonstrate that our proposed HiFi-Glot model can generate speech with higher perceptual quality and naturalness while exhibiting a more precise control over formant frequencies, outperforming industry-standard formant manipulation tools such as Praat. Code, checkpoints, and representative audio samples are available at this https URL.
This work considers infinite-horizon optimal control of positive linear systems applied to the case of network routing problems. We demonstrate the equivalence between Stochastic Shortest Path (SSP) problems and optimal control of a certain class of linear systems. This is used to construct a heuristic search framework for linear {positive} systems inspired by existing methods for SSP. {We propose a heuristics-based algorithm for {efficiently} finding local solutions to the analyzed class of optimal control problems with {a given initial} state and {positive} linear dynamics.} {By leveraging the bound on optimality in each state provided by the heuristics, we also derive a novel distributed algorithm for calculating local controllers within a specified performance bound, with a distributed condition for termination.} More fundamentally, the results allow for analysis of the conditions for explicit solutions to the Bellman equation utilized by heuristic search methods.
The denoising diffusion probabilistic model (DDPM) has emerged as a mainstream generative model in generative AI. While sharp convergence guarantees have been established for the DDPM, the iteration complexity is, in general, proportional to the ambient data dimension, resulting in overly conservative theory that fails to explain its practical efficiency. This has motivated the recent work Li and Yan (2024a) to investigate how the DDPM can achieve sampling speed-ups through automatic exploitation of intrinsic low dimensionality of data. We strengthen this line of work by demonstrating, in some sense, optimal adaptivity to unknown low dimensionality. For a broad class of data distributions with intrinsic dimension $k$, we prove that the iteration complexity of the DDPM scales nearly linearly with $k$, which is optimal when using KL divergence to measure distributional discrepancy. Notably, our work is closely aligned with the independent concurrent work Potaptchik et al. (2024) -- posted two weeks prior to ours -- in establishing nearly linear-$k$ convergence guarantees for the DDPM.
Explicit solutions to optimal control problems are rarely obtainable. Of particular interest are the explicit solutions derived for minimax problems, providing a framework to address adversarial conditions and uncertainty. This work considers a multi-disturbance minimax Linear Regulator (LR) framework for positive linear time-invariant systems in continuous time, which, analogous to the Linear-Quadratic Regulator (LQR) problem, can be utilized for the stabilization of positive systems. The problem is studied for nonnegative and state-bounded disturbances. Dynamic programming theory is leveraged to derive explicit solutions to the minimax LR problem for both finite and infinite time horizons. In addition, a fixed-point method is proposed that computes the solution for the infinite horizon case, and the minimum L1-induced gain of the system is studied. We motivate the prospective scalability properties of our framework with a large-scale water management network.
Deep-space habitats are complex systems that must operate autonomously over extended durations without ground-based maintenance. These systems are vulnerable to multiple, often unknown, failure modes that affect different subsystems and sensors in mode-specific ways. Developing accurate remaining useful life (RUL) prognostics is challenging, especially when failure labels are unavailable and sensor relevance varies by failure mode. In this paper, we propose an unsupervised prognostics framework that jointly identifies latent failure modes and selects informative sensors using only unlabeled training data. The methodology consists of two phases. In the offline phase, we model system failure times using a mixture of Gaussian regressions and apply an Expectation-Maximization algorithm to cluster degradation trajectories and select mode-specific sensors. In the online phase, we extract low-dimensional features from the selected sensors to diagnose the active failure mode and predict RUL using a weighted regression model. We demonstrate the effectiveness of our approach on a simulated dataset that reflects deep-space telemetry characteristics and on a real-world engine degradation dataset, showing improved accuracy and interpretability over existing methods.
Error accumulation is effective for gradient sparsification in distributed settings: initially-unselected gradient entries are eventually selected as their accumulated error exceeds a certain level. The accumulation essentially behaves as a scaling of the learning rate for the selected entries. Although this property prevents the slow-down of lateral movements in distributed gradient descent, it can deteriorate convergence in some settings. This work proposes a novel sparsification scheme that controls the learning rate scaling of error accumulation. The development of this scheme follows two major steps: first, gradient sparsification is formulated as an inverse probability (inference) problem, and the Bayesian optimal sparsification mask is derived as a maximum-a-posteriori estimator. Using the prior distribution inherited from Top-k, we derive a new sparsification algorithm which can be interpreted as a regularized form of Top-k. We call this algorithm regularized Top-k (RegTop-k). It utilizes past aggregated gradients to evaluate posterior statistics of the next aggregation. It then prioritizes the local accumulated gradient entries based on these posterior statistics. We validate our derivation through various numerical experiments. In distributed linear regression, it is observed that while Top-k remains at a fixed distance from the global optimum, RegTop-k converges to the global optimum at significantly higher compression ratios. We further demonstrate the generalization of this observation by employing RegTop-k in distributed training of ResNet-18 on CIFAR-10, as well as fine-tuning of multiple computer vision models on the ImageNette dataset. Our numerical results confirm that as the compression ratio increases, RegTop-k sparsification noticeably outperforms Top-k.
Audio and music generation based on flexible multimodal control signals is a widely applicable topic, with the following key challenges: 1) a unified multimodal modeling framework, and 2) large-scale, high-quality training data. As such, we propose AudioX, a unified framework for anything-to-audio generation that integrates varied multimodal conditions (i.e., text, video, and audio signals) in this work. The core design in this framework is a Multimodal Adaptive Fusion module, which enables the effective fusion of diverse multimodal inputs, enhancing cross-modal alignment and improving overall generation quality. To train this unified model, we construct a large-scale, high-quality dataset, IF-caps, comprising over 7 million samples curated through a structured data annotation pipeline. This dataset provides comprehensive supervision for multimodal-conditioned audio generation. We benchmark AudioX against state-of-the-art methods across a wide range of tasks, finding that our model achieves superior performance, especially in text-to-audio and text-to-music generation. These results demonstrate our method is capable of audio generation under multimodal control signals, showing powerful instruction-following potential. The code and datasets will be available at this https URL.
Diffusion-based speech generation has achieved remarkable fidelity, increasing the risk of misuse and unauthorized redistribution. However, most existing generative speech watermarking methods are developed for GAN-based pipelines, and watermarking for diffusion-based speech generation remains comparatively underexplored. In addition, prior work often focuses on content-level provenance, while support for model-level and user-level attribution is less mature. We propose \textbf{TriniMark}, a diffusion-based generative speech watermarking framework that targets trinity-level traceability, i.e., the ability to associate a generated speech sample with (i) the embedded watermark message (content-level provenance), (ii) the source generative model (model-level attribution), and (iii) the end user who requested generation (user-level traceability). TriniMark uses a lightweight encoder to embed watermark bits into time-domain speech features and reconstruct the waveform, and a temporal-aware gated convolutional decoder for reliable bit recovery. We further introduce a waveform-guided fine-tuning strategy to transfer watermarking capability into a diffusion model. Finally, we incorporate variable-watermark training so that a single trained model can embed different watermark messages at inference time, enabling scalable user-level traceability. Experiments on speech datasets indicate that TriniMark maintains speech quality while improving robustness to common single and compound signal-processing attacks, and it supports high-capacity watermarking for large-scale traceability.
Unmanned aerial vehicle (UAV) is regarded as a key enabling platform for low-altitude economy, due to its advantages such as 3D maneuverability, flexible deployment, and LoS air-to-air/ground communication links. In particular, the intrinsic high mobility renders UAV especially suitable for operating as a movable antenna (MA) from the sky. In this paper, by exploiting the flexible mobility of UAV swarm and antenna position adjustment of MA, we propose a novel UAV swarm enabled two-level MA system, where UAVs not only individually deploy a local MA array, but also form a larger-scale MA system with their individual MA arrays via swarm coordination. We formulate a general optimization problem to maximize the minimum achievable rate over all ground user equipments (UEs), by jointly optimizing the 3D UAV swarm placement positions, their individual MAs' positions, and receive beamforming for different UEs. To gain useful insights, we first consider the special case where each UAV has only one antenna, under different scenarios of one single UE, two UEs, and arbitrary number of UEs. In particular, for the two-UE case, we derive the optimal UAV swarm placement positions in closed-form that achieves IUI-free communication when the uniform plane wave (UPW) model holds, where the UAV swarm forms a uniform sparse array (USA) satisfying minimum safe distance constraint. While for the general case with arbitrary number of UEs, we propose an efficient alternating optimization algorithm to solve the formulated non-convex optimization problem. Then, we extend the results to the case where each UAV is equipped with multiple antennas. Numerical results verify that the proposed low-altitude UAV swarm enabled MA system significantly outperforms various benchmark schemes, thanks to the exploitation of two-level mobility to create more favorable channel conditions for multi-UE communications.
Algorithms for decentralized optimization and learning rely on local optimization steps coupled with combination steps over a graph. Recent works have demonstrated that using a time-varying sequence of matrices that achieves finite-time consensus can improve the communication and iteration complexity of decentralized optimization algorithms based on gradient tracking. In practice, a sequence of matrices satisfying the exact finite-time consensus property may not be available due to imperfect knowledge of the network topology, a limit on the length of the sequence, or numerical instabilities. In this work, we quantify the impact of approximate finite-time consensus sequences on the convergence of a gradient-tracking based decentralized optimization algorithm. Our results hold for any periodic sequence of combination matrices. We clarify the interplay between approximation error of the finite-time consensus sequence and the length of the sequence as well as typical problem parameters such as smoothness and gradient noise.
Infrared thermography has gained interest as a tool for non-contact measurement of blood circulation and skin blood flow due to cardiac activity. Partiularly, blood vessels on the surface, such as on the back of the hand, are suited for visualization. However, standardized methodologies have not yet been established for areas such as the face and neck, where many blood vessels are lie deeper beneath the surface, and external stimulation for measurement could be harmful. Here we propose Synchro-Thermography for stable monitoring of facial temperature changes associated with heart rate variability. We conducted experiments with eight subjects and measured minute temperature changes with an amplitude of about \SI{10}{mK} on the forehead and chin. The proposed method improves the temperature resolution by a factor of 2 or more, and can stably measure skin temperature changes caused by blood flow. This skin temperature change could be applied to physiological sensing such as blood flow changes due to injury or disease, or as an indicator of stress.
Global Climate Models (GCMs) are critical for simulating large-scale climate dynamics, but their coarse spatial resolution limits their applicability in regional studies. Regional Climate Models (RCMs) address this limitation through dynamical downscaling, albeit at considerable computational cost and with limited flexibility. Deep learning has emerged as an efficient data-driven alternative; however, most existing approaches focus on single-variable models that downscale one variable at a time. This paradigm can lead to redundant computation, limited contextual awareness, and weak cross-variable this http URL address these limitations, we propose a multi-variable Vision Transformer (ViT) architecture with a shared encoder and variable-specific decoders (1EMD). The proposed model jointly predicts six key climate variables: surface temperature, wind speed, 500 hPa geopotential height, total precipitation, surface downwelling shortwave radiation, and surface downwelling longwave radiation, directly from GCM-resolution inputs, emulating RCM-scale downscaling over Europe. Compared to single-variable ViT models, the 1EMD architecture improves performance across all six variables, achieving an average MSE reduction of approximately 5.5% under a fair and controlled comparison. It also consistently outperforms alternative multi-variable baselines, including a single-decoder ViT and a multi-variable U-Net. Moreover, multi-variable models substantially reduce computational cost, yielding a 29-32% lower inference time per variable compared to single-variable approaches. Overall, our results demonstrate that multi-variable modeling provides systematic advantages for high-resolution climate downscaling in terms of both accuracy and efficiency. Among the evaluated architectures, the proposed 1EMD ViT achieves the most favorable trade-off between predictive performance and computational cost.
The expansion of the low-altitude economy has underscored the significance of Low-Altitude Network Coverage (LANC) prediction for designing aerial corridors. While accurate LANC forecasting hinges on the antenna beam patterns of Base Stations (BSs), these patterns are typically proprietary and not readily accessible. Operational parameters of BSs, which inherently contain beam information, offer an opportunity for data-driven low-altitude coverage prediction. However, collecting extensive low-altitude road test data is cost-prohibitive, often yielding only sparse samples per BS. This scarcity results in two primary challenges: imbalanced feature sampling due to limited variability in high-dimensional operational parameters against the backdrop of substantial changes in low-dimensional sampling locations, and diminished generalizability stemming from insufficient data samples. To overcome these obstacles, we introduce a dual strategy comprising expert knowledge-based feature compression and disentangled representation learning. The former reduces feature space complexity by leveraging communications expertise, while the latter enhances model generalizability through the integration of propagation models and distinct subnetworks that capture and aggregate the semantic representations of latent features. Experimental evaluation confirms the efficacy of our framework, yielding a 7% reduction in error compared to the best baseline algorithm. Real-network validations further attest to its reliability, achieving practical prediction accuracy with MAE errors at the 5dB level.
Conformal prediction is a popular uncertainty quantification method that augments a base predictor to return sets of predictions with statistically valid coverage guarantees. However, current methods are often computationally expensive and data-intensive, as they require constructing an uncertainty model before calibration. Moreover, existing approaches typically represent the prediction sets with intervals, which limits their ability to capture dependencies in multi-dimensional outputs. We address these limitations by introducing zono-conformal prediction, a novel approach inspired by interval predictor models and reachset-conformant identification that constructs prediction zonotopes with assured coverage. By placing zonotopic uncertainty sets directly into the model of the base predictor, zono-conformal predictors can be identified via a single, data-efficient linear program. While we can apply zono-conformal prediction to arbitrary nonlinear base predictors, we focus on feed-forward neural networks in this work. Aside from regression tasks, we also construct optimal zono-conformal predictors in classification settings where the output of an uncertain predictor is a set of possible classes. We provide probabilistic coverage guarantees and present methods for detecting outliers in the identification data. In extensive numerical experiments, we show that zono-conformal predictors are less conservative than interval predictor models and standard conformal prediction methods, while achieving a similar coverage over the test data.
We propose an adaptive step-size rule for decentralized optimization. Choosing a step-size that balances convergence and stability is challenging. This is amplified in the decentralized setting as agents observe only local (possibly stochastic) gradients and global information (like smoothness) is unavailable. We derive a step-size rule from first principles. The resulting formulation reduces to the well-known Polyak's rule in the single-agent setting, and is suitable for use with stochastic gradients. The method is parameter free, apart from requiring the optimal objective value, which is readily available in many applications. Numerical simulations demonstrate that the performance is comparable to the optimally fine-tuned step-size.
We present EigenSafe, an operator-theoretic framework for safety assessment of learning-enabled stochastic systems. In many robotic applications, the dynamics are inherently stochastic due to factors such as sensing noise and environmental disturbances, and it is challenging for conventional methods such as Hamilton-Jacobi reachability and control barrier functions to provide a well-calibrated safety critic that is tied to the actual safety probability. We derive a linear operator that governs the dynamic programming principle for safety probability, and find that its dominant eigenpair provides critical safety information for both individual state-action pairs and the overall closed-loop system. The proposed framework learns this dominant eigenpair, which can be used to either inform or constrain policy updates. We demonstrate that the learned eigenpair effectively facilitates safe reinforcement learning. Further, we validate its applicability in enhancing the safety of learned policies from imitation learning through robot manipulation experiments using a UR3 robotic arm in a food preparation task.
Bridge models have been investigated in speech enhancement but are mostly single-task, with constrained general speech restoration (GSR) capability. In this work, we propose VoiceBridge, a one-step latent bridge model (LBM) for GSR, capable of efficiently reconstructing 48 kHz fullband speech from diverse distortions. To inherit the advantages of data-domain bridge models, we design an energy-preserving variational autoencoder, enhancing the waveform-latent space alignment over varying energy levels. By compressing waveform into continuous latent representations, VoiceBridge models~\textit{various} GSR tasks with a~\textit{single} latent-to-latent generative process backed by a scalable transformer. To alleviate the challenge of reconstructing the high-quality target from distinctively different low-quality priors, we propose a joint neural prior for GSR, uniformly reducing the burden of the LBM in diverse tasks. Building upon these designs, we further investigate bridge training objective by jointly tuning LBM, decoder and discriminator together, transforming the model from a denoiser to generator and enabling \textit{one-step GSR without distillation}. Extensive validation across in-domain (\textit{e.g.}, denoising and super-resolution) and out-of-domain tasks (\textit{e.g.}, refining synthesized speech) and datasets demonstrates the superior performance of VoiceBridge. Demos: this https URL.
This paper presents a comparative study of context management strategies for end-to-end Spoken Dialog State Tracking using Speech-LLMs. We systematically evaluate traditional multimodal context (combining text history and spoken current turn), full spoken history, and compressed spoken history approaches. Our experiments on the SpokenWOZ corpus demonstrate that providing the full spoken conversation as input yields the highest performance among models of similar size, significantly surpassing prior methods. Furthermore, we show that attention-pooling-based compression of the spoken history offers a strong trade-off, maintaining competitive accuracy with reduced context size. Detailed analysis confirms that improvements stem from more effective context utilization.
Let $A \in \mathbb{R}^{m \times n}$ be an arbitrary, known matrix and $e$ a $q$-sparse adversarial vector. Given $y = A x^\star + e$ and $q$, we seek the smallest robust solution set containing $x^\star$ that is uniformly recoverable from $y$ without knowing $e$. While exact recovery of $x^\star$ via strong (and often impractical) structural assumptions on $A$ or $x^\star$ (e.g., restricted isometry, sparsity) is well studied, recoverability for arbitrary $A$ and $x^\star$ remains open. Our main result shows that the smallest robust solution set is $x^\star + \ker(U)$, where $U$ is the unique projection matrix onto the intersection of rowspaces of all possible submatrices of $A$ obtained by deleting $2q$ rows. Moreover, we prove that every $x$ that minimizes the $\ell_0$-norm of $y - A x$ lies in $x^\star + \ker(U)$, which then gives a constructive approach to recover this set.
We consider a time-slotted job-assignment system with a central server, N users and a machine which changes its state according to a Markov chain (hence called a Markov machine). The users submit their jobs to the central server according to a stochastic job arrival process. For each user, the server has a dedicated job queue. Upon receiving a job from a user, the server stores that job in the corresponding queue. When the machine is not working on a job assigned by the server, the machine can be either in internally busy or in free state, and the dynamics of these states follow a binary symmetric Markov chain. Upon sampling the state information of the machine, if the server identifies that the machine is in the free state, it schedules a user and submits a job to the machine from the job queue of the scheduled user. To maximize the number of jobs completed per unit time, we introduce a new metric, referred to as the age of job completion. To minimize the age of job completion and the sampling cost, we propose two policies and numerically evaluate their performance. For both of these policies, we find sufficient conditions under which the job queues will remain stable.
Differentiable reinforcement learning (RL) frameworks like DiffRO offer a powerful approach for controllable text-to-speech (TTS), but are vulnerable to reward hacking, particularly for nuanced tasks like emotion control. The policy model can exploit a vanilla Reward Model (RM) by generating acoustic artifacts to achieve spurious rewards, but at the cost of degrading perceptual quality. To address this, we propose Robust Reward Policy Optimization (RRPO), a novel framework that employs a hybrid regularization scheme. This scheme develops a robust RM whose reward signal is more reliably aligned with human perception, compelling the policy to abandon detrimental shortcuts and instead learn the complex features of genuine emotions. Our ablation study confirms the enhanced robustness of our RM, as evidenced by its strong cross-lingual generalization. The subjective evaluation demonstrates that this robust RM effectively mitigates reward hacking, leading to significant improvements in both emotional expressiveness and naturalness over all baselines. Demo page: this https URL.
Implicit neural representations for videos (NeRV) have shown strong potential for video compression. However, applying NeRV to high-resolution 360-degree videos causes high memory usage and slow decoding, making real-time applications impractical. We propose NeRV360, an end-to-end framework that decodes only the user-selected viewport instead of reconstructing the entire panoramic frame. Unlike conventional pipelines, NeRV360 integrates viewport extraction into decoding and introduces a spatial-temporal affine transform module for conditional decoding based on viewpoint and time. Experiments on 6K-resolution videos show that NeRV360 achieves a 7-fold reduction in memory consumption and a 2.5-fold increase in decoding speed compared to HNeRV, a representative prior work, while delivering better image quality in terms of objective metrics.
Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying properties of these embeddings remain underexplored. In this work, we evaluate such disentangled representations in a set of music audio models for controllable generation using a probing-based framework that goes beyond standard downstream tasks. The selected models reflect diverse unsupervised disentanglement strategies, including inductive biases, data augmentations, adversarial objectives, and staged training procedures. We further isolate specific strategies to analyze their effect. Our analysis spans four key axes: informativeness, equivariance, invariance, and disentanglement, which are assessed across datasets, tasks, and controlled transformations. Our findings reveal inconsistencies between intended and actual semantics of the embeddings, suggesting that current strategies fall short of producing truly disentangled representations, and prompting a re-examination of how controllability is approached in music generation.