Existing spectral benchmarks are limited in scale, modality alignment, and evaluation scope, and typically focus on either specialized models or multimodal language models (MLLMs). We introduce SpecX, a large-scale benchmark for multi-modal spectroscopy with cross-paradigm evaluation. SpecX contains 1.7M molecules with diverse spectral modalities, including NMR (1H, 13C, HSQC), IR, MS,UV,Raman and FL, and is organized into three tiers: a large-scale dataset for pretraining, an aligned multi-spectral subset for benchmarking, and a high-quality experimental subset for evaluation. SpecX supports a range of tasks such as molecular elucidation, spectrum simulation, and spectral understanding, and enables unified evaluation across both specialized spectral models and MLLMs. Experiments show that specialized models excel at signal-level modeling, while MLLMs exhibit strengths in high-level reasoning but lack precise spectral grounding. SpecX establishes a unified benchmark for spectral intelligence and highlights the need for spectrum-native foundation models.
Cardiovascular stability estimation from wearable photoplethysmography (PPG) requires a principled nonlinear framework, yet major gaps persist in heuristic parameter selection and evaluation protocols that inflate reported performance. We introduce a Stability-Constrained Cardiovascular Stability Index (SCSI) grounded in Cardiac Stability Theory and validate it across 176,742 segments from four heterogeneous PPG datasets at three temporal scales. Cross-dataset analysis demonstrates a large Kruskal-Wallis effect size (eta2 = 0.351, p < 0.001), strong cross-scale consistency (kappa > 0.97), and significant correlation with respiratory rate across 53 ICU records (Spearman r = 0.346, p = 0.011). We identify three evaluation artifacts that inflate heuristic AUC from a true baseline of 0.573 to 0.752: segment-level cross-validation leakage, test-set normalization leakage, and pooled-AUC overweighting that conceals per-patient failure. Correcting these artifacts and applying Bayesian optimization over 15 joint parameters yields SCSI with cross-validation AUC of 0.720. On 18 held-out records, SCSI achieves pooled AUC of 0.757 (95% CI: 0.686-0.828) and negative predictive value of 0.966 for tachypnea screening, while per-record AUC of 0.497 +/- 0.207 is disclosed for transparency. External validation on 42 elective-surgery records yields AUC of 0.621, confirming cross-population generalization. Ablation analysis identifies the nonlinear complexity module as the dominant component. A sparse three-component architecture is proposed as the minimal deployable configuration. The corrected protocol provides a reproducible benchmark for future wearable cardiovascular stability indices.
Hospital readmission within 30 days of discharge is a leading driver of morbidity, mortality, and avoidable healthcare expenditure in congestive heart failure (CHF). Current clinical risk stratification tools rely primarily on non-imaging data and exhibit limited predictive performance. Point-of-care lung ultrasound (LUS) offers a sensitive, noninvasive window into the pulmonary congestion that characterizes CHF decompensation, yet its prognostic utility for readmission prediction remains largely unexplored. We present a pilot feasibility study, the first systematic machine learning study using B-mode LUS acquired during hospitalization to predict 30-day CHF readmission. Quantitative spatiotemporal embeddings are extracted from a pretrained Temporal Shift Module (TSM) ResNet-18 encoder, and interpretable biomarker features are separately evaluated. Through structured ablations over lung view, temporal representation, multi-view fusion, and cross-lung augmentation, we identify the key imaging factors driving readmission risk. Our findings reveal that (1) dependent lower-lung regions (Left-3, Right-3) carry the strongest prognostic signal, consistent with their greater susceptibility to hydrostatic congestion; (2) temporal difference features between sequential examinations substantially outperform single-timepoint representations, highlighting the importance of capturing disease trajectory; and (3) multi-view feature concatenation yields the best overall performance, with our top MLP model achieving an F1 score of 0.80 (95% CI: 0.62-0.96). Biomarker analysis further reveals that pleural-line abnormalities, including breaks and indentations, are as informative as the canonical A-line and B-line markers. These results support POCUS-derived biomarkers as practical, interpretable tools for noninvasive CHF risk stratification.
Intracranial EEG (iEEG) provides high-fidelity neural recordings essential for clinical and brain-computer interface applications, but acquiring these signals requires invasive surgery. While recent studies have attempted to estimate iEEG from non-invasive scalp EEG, most rely on patient-specific models, creating a circular dependency: if surgery is required to collect training data, the non-invasive model offers limited practical benefit. In this study, we address the challenge of cross-subject iEEG reconstruction by predicting intracranial signals for unseen patients using models trained on other individuals. We propose CAST (Cross-Attention Spatial-Temporal Transformer), a machine learning framework that translates scalp EEG into multi-channel iEEG waveforms through a two-stage transfer learning strategy. First, a temporal encoder extracts multi-scale neural representations at three different resolutions. Then, because electrode placements vary substantially across patients, a channel-aware decoder is calibrated using only a few minutes of data from the target subject. We evaluated the proposed method using leave-one-subject-out cross-validation on two public datasets comprising 1,282 iEEG channels. Experimental results demonstrate that CAST reconstructs cortical signals located near the scalp surface substantially better than deep subcortical activity. In highly observable sensorimotor regions, the model achieved peak correlations of up to r=0.864 in the precentral gyrus. Furthermore, with a channel selection strategy, CAST obtained a mean correlation of r=0.545 on viable subjects, outperforming previous within-subject baselines. These findings indicate that cortical iEEG signals can be reconstructed for unseen subjects from scalp EEG without extensive patient-specific training, and that only a brief calibration phase is sufficient to adapt the model to new hardware configurations.
We demonstrate a data-driven framework for emulating high-speed VCSEL-based 4-level Pulse Amplitude Modulation (PAM-4) optical interconnects using bidirectional Long Short-Term Memory (Bi-LSTM) networks. Unlike conventional rate-equation models, which are computationally intensive and often require difficult parameter tuning, our approach utilizes experimental waveforms to learn the end-to-end system dynamics. By employing transfer learning and weight interpolation, we extend the model to new operating regimes with a 20-fold reduction in computation time compared to independent training, while maintaining normalized mean squared error below 0.04. This emulator provides a rapid, accurate tool for the design and optimization of short-reach optical links.
Accurate selection of bovine embryos is a challenging task, as current practice relies on a single expert assessment on the seventh day after insemination, resulting in high rates of pregnancy loss. Time-lapse videomicroscopy provides detailed information on early development, but is difficult to exploit because of complex motion patterns and time-consuming analysis. We propose TransFACT, a transformer-based framework for modeling early developmental stages and embryo transferability using 2D time-lapse videos from the first four days of development. TransFACT combines frame-level temporal features with stage-level representations, using developmental stages as auxiliary supervision to predict transferability on day four. Our experiments demonstrate that TransFACT, by leveraging an existing method designed for action recognition, achieves superior performance than its competitor in predicting embryo transferability.
Safe motion planning in uncertain, time-varying environments is challenging because the safe region can change unpredictably across planning steps, often causing a loss of recursive feasibility. In this work, we present a Probabilistic Recursively Feasible Model Predictive Control (PRF-MPC) framework that guarantees recursive feasibility with a specified probability. We introduce properties that an ideal predictor should satisfy to ensure distributional consistency, and use these properties to derive closed-form expressions for the means and covariances of trajectories predicted at future time steps. Building on this analysis, we construct safety constraints that ensure, with high probability, that the current safe set is contained within the safe sets at future time steps, thereby probabilistically guaranteeing recursive feasibility. Simulation results on a lane-change scenario demonstrate that the proposed method significantly improves recursive feasibility.
Robust and accurate calibration of macroscopic traffic flow models such as METANET is critical for reliable prediction and effective control. While gradient-based methods are desirable for high-dimensional parameter spaces, their application to real-world traffic scenarios is hindered by highly nonconvex optimization landscapes. Consequently, standard static calibration frequently yields parameter sets that produce unstable, unrealistic traffic dynamics, undermining confidence in the estimated parameters and compromising the simulation's utility for counterfactual scenario testing. To address this, we propose a dynamic, rolling-horizon calibration framework. By reformulating static one-time estimation as a closed-loop control problem, parameters better maintain stability and accuracy in the presence of measurement noise. Using real-world data from the I-24 MOTION testbed, this work empirically characterizes the instability of standard methods. It then shows that the proposed approach simultaneously enhances robustness to perturbations and achieves a 48% improvement in predictive accuracy over conventional static calibration.
Radars are susceptible to interference from transmissions by other radars, leading to potential issues such as false target generation and masking of true targets. Currently, automotive radars are installed on a small percentage of vehicles, with interference managed under the assumption of infrequent occurrences. However, as radar adoption grows, this assumption will no longer hold, leading to increased severity and likelihood of interference. This paper analyzes the impact of interference in various future scenarios characterized by higher radar density on vehicles and a greater number of radars per vehicle. Conventional interference mitigation techniques are evaluated using a realistic radar processing flow simulation at the IF (Intermediate Frequency) frequency level, incorporating analytical interference modeling. To validate the simulation and assess radar performance under heavy interference conditions, experimental tests were conducted with host radars in environments with to up to 30 interference radars.
Mean field game equilibria are predicated on the assumption of immediate pairwise interactions within a population of homogeneous agents with asymptotically vanishing influence as population size increases. However, in many real-world cases, agents receive population-level information with a delay. In this paper, we characterize agent best responses under an information exchange structure whereby agents observe the empirical mean state only at discrete time instants with some delay. Sufficient conditions are presented for the existence of a Nash equilibrium within a finite population of agents, and the cost increase due to delayed discrete empirical mean observations relative to zero-latency discrete observations and continuous global-state observations is also evaluated.
Four-dimensional (4D; 3D + time) microscopic imaging has emerged as a powerful technique for investigating dynamic phenomena in complex systems, enabling direct visualization of structural evolution in space and time. However, when pushing the limits of spatiotemporal resolution, most time-resolved imaging techniques yield inherently sparse 4D datasets. While deep learning-based reconstruction methods have shown promise in reconstructing 4D from sparse spatiotemporal measurements, a practical approach for evaluating their performance in the absence of a 4D reference has, to the best of our knowledge, been lacking. Here, we present a bootstrapped cross-validation framework that estimates reconstruction performance by quantifying correlations between reconstructions generated from independently sampled subsets of the acquired data, as inspired by the 3D validation strategy in cryo-electron microscopy, where reconstructions from split datasets are compared to assess resolutions. This enables both qualitative and quantitative assessment in the absence of ground truth. We investigate two representative scenarios with sparse and ultra-sparse X-ray datasets and validate this approach using 4D-ONIX, a 4D deep-learning reconstruction method, on simulated water droplet collision experiments. The proposed approach provides a reference-free framework for performance estimation and support for better-informed experimental strategies across a wide range of ultrafast imaging applications.
Route planning for military vehicles is a complex decision-making problem due to the simultaneous influence of environmental trafficability and tactical risks. This paper presents an optimization model that integrates soil trafficability and risk of enemy engagement into a decision-support model for planning activities in open terrain. Although a military application is the focus of this paper, other use cases include wildfire response, agricultural operations, and off-road vehicle recreation. The routing problem is formulated as a minimum cost mixed-integer linear program over a discretized representation of the operational environment. Each node represents a location and is connected by arcs to adjacent nodes whose traversal incurs a cost derived from a composite risk function that accounts for soil strength and the proximity to known enemy activity and prior convoy routes. Environmental inputs required for evaluating soil strength are obtained by integrating external models, which estimate spatial variations in the rating cone index (RCI) across the terrain. The model is evaluated through a case study conducted at a location in northern Colorado using fine-resolution environmental data and simulated tactical conditions. Scenario analyses demonstrate how variations in risk weighting, vehicle mobility characteristics, and operational conditions influence route geometry and mission risk. The objective function values achieved varied by five orders of magnitude based on the coefficients assigned to the terms in the cost function and the vehicle properties of the scenario. The results illustrate the capability of the proposed framework to quantify trade-offs between environmental mobility constraints and tactical considerations.
We propose ERFSL, an efficient reward function searcher using large language models (LLMs) for custom-environment, multi-objective learning-based methods (LB). ERFSL generates reward components based on explicit user requirements, rectifies them using a reward critic, and iteratively optimizes the weights of these components based on textual context generated by the training log analyzer. Applied to a simulation-based benchmark task, the reward critic corrects reward codes with only one feedback iteration per requirement, and the reward weight initializer acquires diverse reward functions within the Pareto set. Even when a weight is off by a factor of 500, an average of only 5.2 iterations is needed to meet user requirements. The approach works adequately with GPT-4o mini and does not require advanced understanding capabilities.
MRI reconstruction is an inherently ill-posed inverse problem, since incomplete measurements admit many plausible solutions. This ambiguity becomes more severe under high acceleration, where pixel-domain continuous predictors tend to average over feasible reconstructions and suppress high-frequency anatomy. We address this limitation by moving reconstruction to discrete multi-scale latent space and posing it as autoregressive next-acceleration-scale prediction. Leveraging discrete priors proven effective in visual autoregressive modeling, our method restricts the solution to compact sequences of codebook tokens, enabling sharp reconstructions even from extremely sparse measurements. This discrete autoregressive formulation also aligns naturally with modern large language model post-training techniques. Building on this observation, we introduce on-policy privileged information distillation for visual autoregressive modeling, where a teacher is provided training only privileged context that is unavailable at inference, in our case fully sampled acquisitions, and supervises a student trained on its own rollouts, leading to consistent reconstruction gains. Through extensive experiments on the fastMRI benchmark, we show that our approach delivers improved reconstruction performance across diverse sampling patterns under extreme undersampling. Project website is \hyperlink{this https URL}{here}.
This letter presents a comprehensive analysis of the stability phenomenon related to the ability of generators to remain in synchronism when subjected to small or large disturbances, in power systems with both synchronous machines and grid-forming voltage source converters (GFM-VSC). This phenomenon is associated with two stability classes in the IEEE/PES classification, namely, rotor-angle stability (when involving synchronous machines and slow-interaction converter-driven stability (when involving power converters). However, this work shows that this phenomenon is fully characterised with the slow dynamics of the angle difference between the voltage sources connected to the power system, regardless of whether they are synchronous machines (with rotors) or GFM-VSCs. Therefore, we suggest using the term angle stability to refer to this phenomenon, while slow-interaction converter-driven stability should only include slow interactions of different nature involving power converters.
Accurate channel modeling is fundamental to design and evaluation of Terahertz (THz) ultra-massive multiple-input multiple-output (UM-MIMO) systems. However, existing model-based approaches typically rely on simplified assumptions, such as sparsity or predefined parametric structures, which are insufficient to capture the complex spatial variations and cross far-/near-field propagation characteristics of practical THz channels. In this paper, a conditional diffusion transformer (CDiT) framework is proposed for high-fidelity THz channel generation. By leveraging the state-of-the-art hybrid planar-spherical wave model (HPSM), THz channel modeling is formulated as a geometry-aware conditional generative learning problem in the sparse beamspace domain. Position information is incorporated as a conditioning signal within a diffusion-transformer architecture, enabling effective learning of the spatially dependent channel distribution. By combining the strong distribution modeling capability of diffusion models with the global dependency modeling strength of transformers, the proposed framework achieves controllable and high-fidelity THz channel synthesis. Extensive experiments on realistic THz channel datasets demonstrate that the proposed framework converges stably and significantly outperforms representative benchmark methods. The proposed framework provides a promising data-driven paradigm for THz channel modeling in next-generation wireless systems.
Distributed microphone arrays composed of multiple subarrays enable blind source separation over a wide spatial area. Directly applying fast multichannel nonnegative matrix factorization (FastMNMF) to all subarrays can exploit observations from all subarrays, but it requires repeated inversions of large matrices spanning all microphones, causing the computational cost to increase rapidly as the number of microphones grows. In contrast, applying FastMNMF to one subarray reduces the matrix size but cannot exploit observations from other subarrays. We propose distributed FastMNMF, which imposes a block-diagonal structure on the source spatial covariance matrices, so that matrix inversions are performed within subarrays. The NMF-based source spectrogram model is shared across subarrays, allowing the method to aggregate source activity information while discarding inter-subarray covariance. In synchronized, noiseless simulations with fixed room and array/source geometry, the method required less computation time than conventional FastMNMF using all subarrays, achieved a higher average source-to-distortion ratio than conventional FastMNMF using one subarray, and was applicable in the tested five-source condition, where each four-microphone subarray was locally underdetermined.
In this paper, we propose a quantum-native formulation of maximum likelihood detection (MLD) for overloaded multiple-input multiple-output (MIMO) systems in a random access channel, where numerous user terminals share the same channel resource and asynchronously transmit signals. Classical linear detectors suffer from significant performance degradation in this scenario, whereas the exhaustive-search MLD achieves the optimal performance but incurs an exponential computational complexity. To overcome this trade-off, we formulate the MLD as a binary optimization problem and solve it via Grover adaptive search (GAS) -- a quantum exhaustive search algorithm offering quadratic speedup in fault-tolerant quantum computing. We then introduce a search space reduction technique to substantially decrease the required computational resources. In addition, we investigate efficient parameter settings for GAS through probability analysis to improve convergence performance. We demonstrate that the proposed detector achieves the optimal detection performance while reducing the required Grover rotation count to reach the solution by up to approximately 65% compared with the conventional GAS, showing its potential as a viable solution for future quantum-accelerated wireless systems.
Ultra-high-resolution streaming and emerging immersive services are driving rapidly increasing wireless video traffic. However, perceptually pleasing video transmission over bandwidth-limited and latency-constrained wireless links remains challenging for conventional separated source-channel systems, which primarily target bit-level reliability and often suffer performance degradation under short-blocklength transmission. In addition, pixel-level distortion optimization does not necessarily align with human perception, while existing learned video codecs may incur high complexity and raise deployment issues. This paper proposes PVSC, a perception-aware video semantic communication framework for real-time wireless video transmission. PVSC eliminates explicit motion-vector transmission and exploits spatio-temporal feature coding to generate compact and channel-robust symbol streams. It also specifies side-information formatting, reference-buffer management, and lightweight rate control, enabling stable receiver-side reconstruction and bandwidth-adaptive inference with a single model. Extensive experiments demonstrate that PVSC achieves superior performance across diverse datasets, resolutions, GOP configurations, and channel conditions. Compared with the engineered ``VTM + 5G LDPC'' baseline, PVSC saves up to about 75% and 87% bandwidth at comparable LPIPS and DISTS, respectively, while enabling real-time inference on a single NVIDIA RTX 4090 GPU.
Time-based indoor positioning techniques rely on multiple access points (APs) and measurements between the user equipment (UE) and the APs. In dense indoor environments, occlusion-induced non-line-of-sight (NLoS) propagation introduces significant delays in these measurements, thereby degrading position estimation accuracy. To address this challenge, this paper proposes measurement selection strategies to improve position estimation accuracy. A ray-tracing (RT) simulator is employed to characterize the propagation environment and derive AP neighborhood information, which is subsequently used to design and evaluate different measurement selection strategies. The approaches explored include AP neighborhood-based cardinality selection, intersection and union of measurements from AP neighborhoods, and fixed measurement selection. Experiments demonstrate the efficacy of the proposed measurement selection strategies in environments under significant NLoS conditions.
Fluid antenna system (FAS), which continuously repositions a single physical element across a deployment region $[0, D]$, breaks this limit by freeing antenna positions from the discrete grid entirely. This paper establishes the theoretical foundations of sparse FAS design for direction-of-arrival (DOA) estimation and shows that continuous position freedom unlocks three compounding advantages over the classical designs. \emph{First}, we derive a universal dual DOF bound and prove that FAS-optimized positions can approach it, growing the DOF linearly with $D/\lambda$ , where $\lambda$ is the signal wavelength, rather than saturating at $O(N^2)$. \emph{Second}, the CRB scales as $O(1/D^{2L})$ for $L$ sources, a $(D/(N^2 d_0))^{2L}$ improvement over the best grid design, with $d_0 = \lambda/2$ and D-optimal positions admitting closed-form solution for single sources and efficient Frank-Wolfe algorithm for multiple sources. \emph{Third}, we propose a two-stage FAS-MUSIC approach that combines coarray MUSIC disambiguation with full-aperture local maximum likelihood (ML) refinement to track the CRB, overcoming the grating-lobe ambiguity inherent in large-aperture non-uniform arrays. Robustness to minimum spacing constraints, mutual coupling, and finite position accuracy is also analyzed. Extensive simulations show that FAS-MUSIC achieves $17.5\times$ lower root mean squared error (RMSE) than uniform linear array (ULA) MUSIC and that FAS with $4$ antennas outperforms MRA with $8$ antennas, gains that are unattainable by any grid-constrained design.
Driven by the ultra-high throughput requirements of 6G, wireless communications are migrating to centimeter wave (cmWave) bands to overcome the limitations of current spectral resources. Massive multiple-input multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) systems aim to achieve high spectral efficiency in cmWave regimes but are often constrained by the heavy overhead of downlink channel state information (CSI) feedback. This paper proposes a deep learning scheme based on the multi-axis multi-layer perceptron for image processing (MAXIM) architecture for joint semantic CSI feedback and hybrid beamforming in multi-user cmWave MIMO-OFDM systems, which maximizes the downlink sum rate by end-to-end optimization. Specifically, distributed encoders at multiple user equipments (UEs) perform limited CSI feedback, while the decoder at the base station (BS) jointly designs the hybrid beamforming matrices without explicit CSI reconstruction. The uplink transmission is implemented via deep joint source-channel coding (DJSCC) to enhance CSI compression efficiency and noise robustness. Furthermore, considering the high correlation between vertical and horizontal polarization channels in dual-polarized massive MIMO systems, a cross-polarization interaction module is introduced at the UEs to exploit polarization correlations for joint CSI compression. Simulation results demonstrate that the proposed method improves the downlink sum rate under various signal-to-noise ratio (SNR) conditions with a limited number of feedback symbols, validating its robustness and superiority in multi-user dual-polarized cmWave MIMO-OFDM systems.
Numerous multicarrier modulation schemes have been proposed recently to enhance the performance in narrowband doubly dispersive channels for emerging high-mobility applications. However, the ultra-reliable modulation framework in wideband linear time-varying (LTV) channels remains an open problem, where the time dilations and contractions brought by the high mobility cannot be ignored for the baseband signal to obtain the constant Doppler shift across the whole transmission band. To solve this problem, we propose the hyperbolic frequency multicarrier (HFMC) waveform in this paper based on the inspiration from affine frequency division multiplexing (AFDM) modulation, where the delay and Doppler shift are absorbed into a 1D shift in the affine domain to provide a compact characterization of doubly dispersive discrete-time channels. By adopting the passband representation of wideband LTV channels and hyperbolic frequency modulated (HFM) signals, we reveal that the Doppler scaling factor brought by the relative mobility can be absorbed into an equivalent delay. The basic principle of HFMC modulation is established by investigating the approximate orthogonality among HFMC subcarriers, which are generated from a basic HFM signal by utilizing uniformly spaced equivalent delay. The spectrum of HFMC subcarriers is also analyzed to evaluate the system capacity, where the overlapping nature in the frequency domain can be observed. The input-output characterization in wideband LTV channels is then executed to confirm the 1D integration of time delay and Doppler scaling factor for each path, which demonstrates the ability to exploit potential multipath diversity. The parameter optimization based on the input-output relation and spectrum analysis is finally developed to balance the efficiency and reliability.
Multicellular coordination relies on broadcast-addressable receptors, yet engineered magnetic systems face an addressability bottleneck because global fields intrinsically conflate power and control. Here, we introduce MagCeptors to resolve this by encoding selectivity directly into magnetic topology. Establishing an energetic isomorphism with biological receptors, these arrays utilize local couplings to shape potential landscapes where global field vectors act as spatial keys, triggering deterministic snap-through instabilities. This architecture decouples force from source distance, achieving a density of 385 mN/mm3 (>50-fold increase over prior art). We validate this primitive through signal demultiplexing, embodied sequential logic, and untethered distributed networking. This framework enables distributed systems to orchestrate complex tasks without tethers or electronics, relying solely on the intrinsic logic of matter.
In a fluid antenna system (FAS), a single reconfigurable antenna is able to activate one of $N$ correlated ports to exploit spatial diversity. However, outage analysis is challenging because exact evaluation requires an $N$-dimensional multivariate integral, while existing closed-form approximations based on block-correlation models tend to underestimate the true outage probability. This paper shows that the spatial correlation matrix of a FAS with a normalized linear aperture length $W$ has at most $K^{*}=2\lceil W\rceil+1$ significant eigenmodes, regardless of the number of deployed ports. This is a spatial counterpart of the Slepian-Landau-Pollak spectral concentration theorem and reveals that the spatial degrees of freedom are determined by aperture size rather than port count. Motivated by this result, we derive an \emph{equivalent degree of freedom} (EDoF) approximation, under which the outage probability can be expressed in closed form as that of selection combining over $K^{*}$ independent branches. We propose a refined \emph{weighted independent modes} (WIM) approximation, to incorporate eigenvalue-dependent branch weights $\{\beta_k\}$ and yield a product-form closed-form expression with improved accuracy at moderate signal-to-noise ratio (SNR). Both approximations achieve the exact diversity order, become asymptotically exact at high SNR, and provably never underestimate the true outage probability by Anderson's inequality. The proposed framework is further extended to obtain closed-form expressions for ergodic capacity, characterize multi-user fluid antenna multiple access (FAMA) with explicit interference-limited outage floors. Besides, we analyze two-dimensional planar FAS, for which the diversity order scales multiplicatively with the aperture dimensions.
Deep generative models have emerged as state-of-the-art for solving inverse problems, but applying them to inverse problems for PDEs, like electrical impedance tomography (EIT) remains challenging. Because physical domains are naturally discretized as unstructured meshes rather than regular grids, standard convolutional architectures are often inadequate. In this paper, we propose a novel framework that extends diffusion posterior sampling (DPS) to graph-structured data. We develop an unconditional score-based diffusion model directly on a 2D triangular mesh to learn an accurate prior over the physical solution space. Furthermore, we introduce a regularized variant, RDPS, which incorporates explicit regularization terms, such as total variation and generalized Tikhonov, to complement the implicit diffusion prior and mitigate severe ill-posedness. Extensive experiments on synthetic and real 2D EIT datasets demonstrate that RDPS produces stable, physically plausible reconstructions. Our approach generalizes well to out-of-distribution inclusion geometries, is highly robust to measurement noise, and outperforms current state-of-the-art solvers (e.g., GPnP-BM3D, DP-SGS) in reconstruction accuracy and artifact reduction.
Automated driving systems require monitoring mechanisms to ensure operation as intended, especially when system elements degrade and/or fail. Hence, capability monitoring is crucial in order to evaluate the system's remaining performance and implement capability-based behavior. In this paper, we investigate the dynamics of a highly over-actuated automated vehicle under actuator degradations and failures, affecting the vehicle's motion control capabilities. We propose a lightweight prediction model based on conformalized quantile regression that predicts whether an automated vehicle can be controlled with sufficiently low lateral deviation from a planned trajectory under nominal, degraded, and failed actuator conditions. We recognize that statistical guarantees should hold not only across all data (marginal coverage) but also for different regimes within the data (conditional coverage). We therefore employ equalized coverage methods to address this challenge. During runtime behavior generation our predictor can provide a heuristic for determining the admissible action space. Its application and limitations are discussed in this paper.
In conversational speech separation and recognition tasks, close-talk microphones are typically attached to each speaker during training data collection to capture near-field, close-talk mixture signals, in addition to using far-field microphones to record far-field mixture signals. Each such close-talk mixture exhibits a reasonably high energy level for the wearer and could intuitively serve as weak supervision for training far-field speech separation models directly on real-recorded far-field signals. However, they are not sufficiently clean for this purpose, as they often contain strong cross-talk speech from other speakers in addition to background noise. To address this, we propose cross-talk reduction (CTR), a task aiming to isolate the wearer's speech from each close-talk mixture, and a novel method called CTRnet, which can be trained directly on real-recorded pairs of close-talk and far-field mixtures to accomplish CTR. Building on CTRnet, we further propose pseudo-label based far-field speech separation (PuLSS), which uses CTRnet's estimated clean speech as pseudo-labels to train models for separating far-field mixtures. A key advantage of the proposed framework is that both CTRnet and PuLSS can be trained on real-recorded data from the target domain, addressing the generalization gap commonly observed when models are trained exclusively on simulated data. On the CHiME-6 dataset, our framework achieves state-of-the-art ASR performance under both oracle and estimated speaker diarization, surpassing all CHiME-{7,8} challenge submissions. To our knowledge, it is the first neural speech separation method that substantially outperforms guided source separation on real conversational "speech-in-the-wild" data.
Discrete affine Fourier transform spread affine frequency division multiplexing (DAFT-s-AFDM) is a promising waveform for integrated sensing and communication (ISAC) due to its low peak-to-average power ratio, robustness to Doppler shifts, and reduced multiuser interference in the uplink transmission. This paper presents a comprehensive ambiguity function (AF) analysis of DAFT-s-AFDM and derives the closed-form expression for the AF magnitude expectation. Several key insights into the impact of DAFT-s-AFDM parameters on ISAC performance are revealed, thus providing concrete guidance for the subsequent waveform design. Building on these insights, a novel probabilistic constellation shaping (PCS) framework is proposed for ISAC waveform enhancement, where the communication throughput and the sensing AF characteristics are jointly optimized by addressing a multi-objective problem. An efficient algorithm based on a closed-form bit error rate expression is developed to obtain the Pareto-optimal solutions. Extensive simulations validate the theoretical results and that the proposed PCS-enhanced DAFT-s-AFDM can significantly outperform the classical counterparts, achieving a superior and highly controllable tradeoff between the dual-functional performances.
In this paper, we theoretically analyze and experimentally demonstrate the performance gains achievable by integrating an in-house built reconfigurable intelligent surface (RIS) with a 5G new radio (NR) system implemented using the OpenAirInterface (OAI) software stack. Unlike conventional RIS-assisted systems that rely on explicit channel state information (CSI) estimation followed by RIS phase configuration optimization, we adopt a low-complexity approach in which the RIS phase states are randomly switched among predefined configurations. The resulting channel fluctuations are opportunistically exploited by the inherent proportional fair (PF) scheduling mechanism of 5G NR. We develop a theoretical framework that characterizes the interaction between RIS switching dynamics and PF scheduling. Based on this framework and the associated analysis, we provide design guidelines for selecting the RIS switching time $T_s$ and the PF throughput averaging window $T_c$ that maximize the system throughput. Experimental evaluations on the 5G NR testbed demonstrate improvements in key performance metrics, including reference signal received power (RSRP), block error rate (BLER), modulation and coding scheme (MCS) index, and throughput. Our key takeaway is that randomly configured RIS operation with appropriately chosen system parameters can achieve performance comparable to optimized RIS designs, with no additional overhead compared to a conventional 5G NR system. More importantly, it requires no coordination between the RIS and the 5G NR system.
Beyond diagonal reconfigurable intelligent surface (BD-RIS) represents a promising architecture for advancing millimeter-wave (mmWave) communications. However, its intricate inter-element connections invalidate the conventional decoupled mathematical structure, thereby severely complicating cascaded channel estimation. In this paper, we formulate a novel block-Kronecker-structured cascaded channel model for a \textit{group-connected} BD-RIS-aided multi-user (MU) mmWave system equipped with uniform planar arrays (UPAs). By exploiting the cascaded channel sparsity, an efficient three-stage estimation protocol is proposed. Specifically, Stage I acquires the common angles of arrival (AoAs) at the base station (BS) via a discrete Fourier transform (DFT)-based approach. Stage II leverages the block-Kronecker structure alongside orthogonal matching pursuit (OMP) and correlation-based least squares (LS) to extract the complete cascaded channel for a designated typical user. Finally, Stage III utilizes a Hierarchical Block OMP (HBOMP) algorithm to estimate the other users' channels. This structurally reconstructs the common and user-specific components, which fundamentally reduces the computational complexity and substantially reduces the pilot overhead. Numerical simulations verify that the proposed protocol yields improved channel estimation accuracy while maintaining a relatively low pilot overhead.
This paper studies the use of Set Shaping Theory (SST) as a reversible payload-shaping layer for least significant bit (LSB) image steganography. The proposal is not intended to replace existing steganographic methods or to compete with them as a new embedding scheme. Instead, SST is positioned as a complementary preprocessing stage that makes an existing embedding method easier to apply with lower statistical disturbance. The SST transformation increases the message length by K symbols and is implemented with the approximate and fast transformation algorithm developed by Glen Tankersley. Although the embedded payload is lengthened from N to N+K bits, the selected representation can reduce D_KL(P||Q) and therefore make the subsequent steganographic insertion less detectable under histogram-based criteria. Across 1,800 controlled simulations on four synthetic cover-image models, SST reduced D_KL(P||Q) by an average of 25.16 percent relative to a fair N+K LSB baseline, with a 95 percent confidence interval of +/- 1.22 percent. For K=8, the average reduction reached 42.81 percent. Additional robustness simulations with keyed random embedding paths confirmed the effect across several distances: at K=8, SST reduced KL divergence by 42.44 percent, Jensen-Shannon divergence by 29.62 percent, total variation by 12.41 percent, and symmetric chi-square distance by 28.30 percent. An additional image-based matrix-embedding/STC-like simulation showed that SST also reduces the minimum weighted insertion cost: relative to the unshaped K=0 reference, K=8 reduced the cost by 6.93 percent.
This paper addresses the problem of tracking in-plane waves from image sequences using periodic surface patterns. Wave-induced deformation is modeled as a spatial phase modulation of a periodic carrier. We propose ADOPT (Analytical Demodulation of Periodic Texture), a method based on an oriented two-dimensional analytic signal to estimate displacement phase and orientation. The approach relies on a physical model describing longitudinal and transverse in-plane waves. Orientation-selective filtering isolates relevant spectral components, and phase extraction provides a stable reconstruction of the displacement field. A theoretical analysis using the Cramer--Rao bound evaluates performance limits of ADOPT. Simulations show that the proposed method outperforms state-of-the-art Digital Image Correlation (DIC) at high signal-to-noise ratios, especially for small displacements where DIC becomes limited. Moreover, ADOPT is more computationally efficient. Experiments on silicone membranes with periodic patterns confirm accurate estimation of wave fields and dispersion curves under impulsive excitation. Overall, the proposed framework provides a robust and efficient solution for wave-induced displacement estimation.
This paper implements deep reinforcement learning (DRL) with a safety filter for spacecraft reorientation control with a single pointing keep-out zone. A new state space representation is designed which includes a compact representation of the attitude constraint zone. A reward function is formulated to achieve the control objective while enforcing the attitude constraint. The soft actor-critic (SAC) algorithm is adopted to handle continuous state and action space. A curriculum learning approach is implemented for agent training. To guarantee the compliance of the attitude constraint, a control barrier function (CBF)-based safety filter is implemented for agent deployment. Simulation results demonstrate the effectiveness of the proposed state space presentation and the designed reward function. Monte Carlo simulations underscore that reward shaping alone cannot guarantee the safety during reorientation maneuver. In contrast, with the CBF-based safety filter, the constraint can be guaranteed during maneuvers.
This paper investigates robust synchronization for multi-agent systems (MASs) governed by parabolic partial differential equations in the presence of both observable and unobservable disturbances. Using only boundary output measurements, a disturbance observer is designed to estimate observable Dirichlet boundary disturbances while ensuring robustness of the observer error system with unobservable disturbances occurring in the domain. Using only the reference signal and local output information, distributed synchronization controllers are then constructed to enable all agents to track the reference trajectory. In particular, exponential tracking is achieved in the absence of unobservable disturbances, while robustness is preserved when additional unobservable disturbances occur during controller implementation. We further analyze the impact of unobservable Dirichlet-Robin boundary disturbances on synchronization performance by proving the boundedness of solutions to the synchronization error system. Moreover, to characterize the influence of all disturbances, input-to-state stability (ISS) is established for the closed-loop system. For the involved systems, the generalized Lyapunov method and the recursion technique are extensively employed in the stability analysis, and the lifting technique and semigroup theory are used to prove the well-posedness. Simulation results validate the proposed control scheme, demonstrating effective disturbance estimation and rejection, robust synchronization, and the ISS properties under various scenarios.
This paper proposes CAT-MoEformer, a context-aware transformer with scene-conditioned mixture-of-experts (MoE) feed-forward networks, for proactive mmWave beam prediction from compressed uplink pilot observations. The spatial encoder comprises a three-layer asymmetric convolutional network followed by a squeeze-and-excitation recalibration block, which extracts frequency-beam correlation features from pilot tensors without explicit channel reconstruction. A truncated pretrained GPT-2 backbone models the temporal evolution of beam sequences, with the feed-forward networks in the upper three transformer layers replaced by scene-conditioned MoE-FFN modules. A lightweight gating network maps the scenario label and normalized user equipment speed to expert mixing weights, conditioning the routing decision on physical propagation descriptors rather than on latent hidden states. This design yields interpretable expert assignments and eliminates the load imbalance associated with token-level routing. To prevent expert collapse under soft routing, a three-stage training strategy is introduced: hard expert assignment in the first stage establishes scene-specific specialization, isolated gating network training in the second stage aligns the soft routing distribution with the hard partition, and top-1 hard inference in the third stage fine-tunes the model under deterministic single-expert activation to maximize scene-specific precision. Simulation results on 3GPP TR 38.901 Urban Macro channel simulations with $64{,}000$ user samples demonstrate that CAT-MoEformer achieves a Top-1 beam prediction accuracy of $94.88\%$ and a beam switching instant accuracy of $80.62\%$, representing gains of $2.33\%$ and $9.55\%$ respectively over a CNN+GPT-2 baseline, with an inference latency of $0.52$~ms.
Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposing artifact, structure, and original visual feature branches and adaptively fusing them over time through a learned gating module, the proposed method achieves accurate and efficient quality prediction. Experimental results show that our method achieves strong performance on short-form video datasets in terms of average rank and linear correlation (SRCC: 0.736, PLCC: 0.787), while maintaining efficient inference runtime. The code and additional results are available at: this https URL.
This paper presents a new stochastic relay-based extremum-seeking controller (ESC) for multi-input-single-output (MISO) systems. The goal of this work was to create an algorithm that is much simpler to configure than alternative approaches making deployment to real-world problems easier. A solution is developed first for a static map and then adapted for a general class of dynamic systems. The number of configurable parameters is one per input channel for the static case and only one additional parameter is needed for the dynamic version. The problem of gradient identification is solved via the use of stochastic relay gains and a simple stability proof for the static case is presented. Simulation tests demonstrate the performance of the strategy for optimizing both static and dynamic systems.
While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase -- up to k-1 times, each within a threshold $\epsilon$ -- while maintaining overall safety, and improving flexibility. This paper leverages neural networks and constructs k-inductive neural barrier certificates (k-NBCs) for (partially) unknown nonlinear systems. While neural networks offer scalability in the design process, they lack formal guarantees, requiring additional approaches such as counterexample-guided inductive synthesis (CEGIS) with satisfiability modulo theories (SMT) for verification. However, the CEGIS-SMT framework requires knowledge of system dynamics, which is unavailable in practical settings. To address this, we leverage the generalization of the Willems et al.'s fundamental lemma, using a single state trajectory, to construct a data-driven representation of (partially) unknown models for SMT verification without sacrificing accuracy. Additionally, CEGIS-SMT further removes the constraint of restricting barrier certificates to specific function classes, such as sum-of-squares, enabling greater flexibility in their design. We validate our approach on three nonlinear case studies with (partially) unknown dynamics.
Advanced Traffic Signal Control (TSC) algorithms require real-time phase control, yet existing Hardware-in-the-Loop Simulation (HILS) testbeds only support pre-programmed timing plans. In this paper, we present the first HILS testbed for real-time phase control. We develop a novel middleware architecture that translates dynamic phase actions (selection, switch, and duration) into commands for NTCIP-compliant commercial hardware controllers. This middleware manages phase transitions, synchronizes signal states, and handles errors without interrupting the hardware's internal operations. Experimental validation demonstrates that the system executes real-time phase commands, handles system conflicts, and achieves a low system internal latency at sub-millisecond on average.
This letter introduces attack-resilient Control Lyapunov Functions (AR-CLFs) and attack-resilient Control Barrier Functions (AR-CBFs) for nonlinear control-affine systems subject to control-input false data injection attacks (FDIA) satisfying an at-most-exponentially growing envelope. The proposed framework embeds a unified adaptive compensation term into both the CLF decrease and CBF safety constraints. In contrast to input-to-state stability/safety (ISS/ISSf)-based methods that certify disturbance-dependent enlarged safe sets, the proposed approach enables finite-time recovery to the nominal safe set without requiring a prior magnitude bound on the FDIA, relying instead on a growth-rate characterization used for analysis and an online gain tuning law that regulates the compensation term. A unified quadratic program (QP) is developed to enforce the AR-CLF and AR-CBF conditions simultaneously, guaranteeing uniformly ultimately bounded (UUB) stability and uniform ultimate safety (UUS) under unbounded FDIA. Numerical results demonstrate improved resilience compared to existing ISS-CLF, ISSf-CBF, and robust CLF-CBF-QP approaches.
This paper presents an innovative solution designed to facilitate safe and flexible operation of nuclear power plants. The purpose of this new device, named OAPS system, is to provide optimal strategies (e.g., axial offset control, xenon oscillations mitigation, effluent minimization) and real-time recommendations (e.g., dilution and boration flowrates, turbine power setpoints and variation rates) to help NPP operators perform power variations confidently and efficiently. In fact, just as a GPS navigator optimizes and modifies its planned route according to the current position of the user, the OAPS system regularly updates its recommendations based on the latest plant measurements. To achieve this, the OAPS system relies on a well-established -yet cutting-edge in the nuclear industry -advanced control technique known as model predictive control. The conventional axial offset control strategy of the OAPS system was previously validated on both Framatome's full-scope PWR simulator and EDF's full-scope N4 simulator. In this paper, three new advanced strategies are showcased on an intermediate-complexity PWR simulator developed by Framatome: 1) determination of the fastest feasible power variation rates, 2) accelerated cancellation of axial power oscillations and 3) minimization of water and boron effluents.
Urban flood disaster is one of the most serious natural disasters. Numerous flood simulation models have been proposed and relatively matured. However, two major challenges persist: excessive simplification of the city system and high computational complexity. To break these limitations, this paper develops an Urban Flood Dynamical System Model (UFDSM) based on the concept of the Cellular Automata Urban Flood Model. This model allows flexible customization of cell types and selection of water motion or distribution rules based on actual urban environments to incorporate as much the urban system data as possible. The water motion and distribution rules can be simple, which could reduce the computational complexity, but not arbitrary. So, a sufficient condition is provided so that solutions of dynamical system align with macroscopic physical conditions governing water movement. Then, to preserve the evolutionary properties of the UFDSM, we propose a first-order conservation nonstandard finite difference algorithm. This numerical method ensures positive solutions and conservation of water while maintaining the same fixed-point characteristics as the dynamical system. And, this numerical method is validated by comparing it with an analytical this http URL, to verify the applicability of our model, we performed an urban flood simulation experiment and compared it to HEC-RAS. There is approximately a 2mm discrepancy in distance dp' and 0.02mm discrepancy in distance d2' , with the relative distance Rp about 7.5% and the relative distance R2 approximately 0.06%. Additionally, the proposed model is easily coupled with other hydrological processes and facilitates data assimilation, thereby offering promising practical applications.
Wearable devices enable continuous health monitoring from multimodal signals, but real-world deployment is hindered by limited labeled data and pervasive sensor incompleteness. While large-scale self-supervised pretraining reduces label dependence, most existing methods assume full modality availability. Current approaches for handling modality missingness often reconstruct entire absent signals, which can encourage hallucinating modality-specific details that are not inferable from the observed sensor signals and degrade robustness. We propose VCR, a self-supervised framework that learns to extract valid representations robust to modality missingness. VCR employs an orthogonal tokenizer to enforce strict orthogonal disentanglement by rectifying latent manifolds and applying a geometric projection, separating each modality into shared semantics and modality-specific residuals. This design preserves complete information integrity while serving as a structural foundation for robust learning under modality missingness. The resulting tokens are processed by a missing-aware mixture-of-experts backbone that adapts to varying patterns of modality availability. By constraining the objective to reconstruct only the shared components of missing modalities, VCR effectively mitigates hallucinations of non-inferable modality-specific details. Across multiple health monitoring tasks, VCR consistently improves performance and robustness under full, single-missing, and multiple-missing modality settings compared with strong supervised and self-supervised baselines.
Any system that models the world under finite representational capacity must compress; any compression entails a prior; and the prior is the system's bias. What has not been established is whether uncertainty participates in the dynamics governing future behavior, or merely describes the output distribution without consequence. We introduce a structural distinction between descriptive uncertainty, which does not recursively modulate the system's policy, and regulatory uncertainty, which directly enters the optimization landscape and drives persistent adaptive restructuring. We prove formally that current transformer architectures are confined to descriptive uncertainty at inference. We ground this in thermodynamics via Landauer's principle: for uncertainty to be regulatory, epistemic error must cost real energy; in a decoupled system, hallucinations and correct derivations dissipate identical energy. We test this empirically across three locally-deployed language models (3B, 8B, 70B parameters). Token-level Shannon entropy is statistically invariant across tasks spanning pattern retrieval, causal operator application, and out-of-distribution causal generalization in all three models (all pairwise p >= 0.568; within-model ranges 0.011-0.028 nats), while task accuracy varies substantially across the same conditions (0%-100%). Entropy and accuracy are orthogonal. The decoupling is scale-invariant: larger models achieve higher accuracy but identical entropy flatness. This structural incapacity is not resolvable by additional parameters or training data. Genuine epistemic grounding requires physical coupling between thermodynamic substrate state and information processing cost.
We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models struggle with this, often remaining anchored to the visually implied sound source when video and text contents disagree. We present ConterFlow, an inference-time dual-phase sampling scheme for pretrained flow-matching VT2A models. Phase 1 builds a video-derived temporal structure while suppressing the visually implied source; Phase 2 drops video conditioning to focus entirely on shaping audio timbre toward the target prompt. ConterFlow substantially improves counterfactual Video Foley generation compared to naive negative prompting and state-of-the-art baselines. To evaluate replacement quality, we propose a metric leveraging a text-audio co-embedding space to measure both target-prompt evidence and residual visually implied source leakage. Video demonstrations and code are available at this https URL
Covert quantum communication (CQC) seeks to hide not only message content but also the existence of communication. Existing CQC models usually assume deterministic or worst-case channel conditions, which are difficult to justify in realistic free-space optical and quantum links affected by turbulence, fluctuating background radiance, and stochastic detector noise. We propose a stochastic risk-aware optimization framework for CQC under uncertain physical-layer conditions. By modeling transmissivity and background noise as random variables, we express covertness and reliability guarantees through chance constraints with explicit outage budgets $\epsilon_{\text{cov}}$ and $\epsilon_{\text{rel}}$. This recasts CQC design as a risk-calibrated resource-allocation problem balancing throughput, covertness, reliability, and communication privacy. We derive quantile-based reformulations of the outage constraints, characterize feasible operating regions under stochastic uncertainty, and introduce a complementary risk-adjusted utility formulation to expose throughput-risk trade-offs. The analysis reveals that modest relaxations in acceptable covertness-outage risk can yield large throughput gains, while aggressive optimization may break covertness outside sparse-transmission regimes. Monte Carlo results under log-normal fading and stochastic thermal noise show that the framework expands feasible operating regions, improves covert throughput by more than an order of magnitude, and identifies degradation boundaries beyond which covert operation becomes unreliable. These results move CQC closer to realistic secure quantum networking for free-space, satellite, and low-probability-of-detection applications.
This article presents a scalable implementation of nonlinear Gramian-based control synthesis for control-affine systems, including a minimum energy control construction. These synthesis advances are achieved by addressing key computational bottlenecks inherent to iterative synthesis map formulations, yielding a computational scheme that exhibits rapid convergence and high-precision. The efficacy of this synthesis framework is demonstrated across five canonical nonlinear control systems and 100-dimensional recurrent neural network models, including underactuated systems. Empirical scaling results further indicate that convergence is primarily governed by intrinsic system properties, such as nonlinearity and controllability, rather than by state-space dimensionality. This work provides a practical, scalable computational pathway for translating rigorous nonlinear synthesis theory into high-dimensional control applications.
Humanoid robots are difficult to deploy safely because they have high-dimensional bodies, many collision constraints, and must operate near people and obstacles. Safety filters help by modifying a nominal control action when it may violate collision-avoidance constraints. Still, nominal benchmark scores do not fully show how these filters behave in harder environments. In this work, we study the robustness of SPARK humanoid safety filters through replication and stress testing. We replicate the SPARK benchmark case G1SportMode_D1_WG_SO_v1 in MuJoCo and evaluate RSSA, RSSS, SSA, CBF, PFM, and SMA under controlled random seeds. We also built a post-processing pipeline that converts raw SPARK logs into goal-tracking, minimum-distance, and collision-step metrics. Our results show that some methods track the goal more closely, while others reduce collision steps more effectively. The stress tests further indicate that safety behavior can change under obstacle crowding, noisy distance estimates, and delayed obstacle information. These findings suggest that humanoid autonomy should be evaluated beyond nominal performance, using metrics that expose failure modes before deployment.
Kolmogorov-Arnold Networks (KANs) have demonstrated an exceptional ability to learn complex functions on clean, low-dimensional data but struggle to maintain performance on noisy and imperfect real-world datasets. In contrast, conventional multi-layer perceptrons (MLPs) are far more tolerant to noise and computationally efficient. Replacing all MLP components with KANs in HAR models often degrades accuracy and computation efficiency, highlighting an open challenge: how to combine KANs' precision with MLPs' noise robustness and efficiency. To address this, we systematically explore various placements of KAN modules within deep HAR networks and propose a hybrid architecture that strategically synergizes the strengths of both paradigms, which uses a KAN-based input embedding layer, retains MLP layers for intermediate feature mixing, and introduces a specialized LarctanKAN module for final activity classification. Across eight public HAR datasets, the hybrid KAN-MLP model achieves an average macro F1 score relative improvement of 5.33\% compared pure-MLP model, significantly outperforming standalone KAN and MLP baselines. Furthermore, integrating this hybrid strategy into other state-of-the-art HAR architectures consistently boosts their performance. Our findings demonstrate that a carefully orchestrated combination of KAN, MLP, or other conventional neural components yields more robust and accurate HAR models for real-world wearable sensing environments.
This paper proposes a channel estimation method for Multiple-Input Multiple-Output (MIMO) systems based on Canonical Polyadic (CP) decomposition applied to a mode-factorized tensor representation of the channel. The proposed approach reshapes the original low-order channel tensor into a higher-order tensor by factorizing its modes into multiple virtual modes, thereby introducing additional dimensions. By exploiting the sparse structure of MIMO channels and the plane-wave propagation model in the far-field regime, the proposed mode tensorization enhances the separability of individual propagation paths. It is shown that increasing the number of tensor modes improves component separation and provides inherent denoising effects. Building on these properties, a mode-tensorized CP decomposition (MTCPD) algorithm is developed. In addition, a metric for analyzing the virtual factors obtained from MTCPD is proposed, enabling estimation of the canonical rank and selection of the most informative components contributing to overall system performance. Numerical results demonstrate that the proposed method improves channel estimation accuracy compared to conventional tensor-based approaches, particularly under low signal-to-noise ratio conditions.
High-resolution 3D medical image generation remains challenging because fully volumetric models are computationally expensive, while efficient 2D slice generators often fail to preserve anatomical consistency across the third dimension. We propose LiFT, a framework for Lifted inter-slice Feature Trajectories that factorizes 3D volume synthesis into per-slice image generation and inter-slice trajectory learning. Rather than modeling the volumetric distribution end-to-end, LiFT treats a volume as an ordered trajectory in feature space, capturing how anatomical structures appear, transform, and disappear across depth. A tri-planar drifting loss aligns the trajectory of generated slices with the trajectories of real volumes, enabling distributional learning over inter-slice progressions in unconditional generation; in paired translation, a bidirectional $z$-context mixer trained against the registered target supplies through-plane coherence while preserving per-slice fidelity. We evaluate LiFT on BraTS 2023 (unconditional and missing-modality MR) and SynthRAD2023 (MR-to-CT). Across these settings, LiFT preserves per-slice quality, approaches the reported cWDM missing-MR reconstruction quality at $\sim$$135\times$ lower inference cost (without formal equivalence testing), and improves through-plane coherence on MR-to-CT relative to a no-mapper ablation, demonstrating that lightweight inter-slice trajectory learning is a viable route to high-resolution 3D medical synthesis.
Optimal path parameterization (OPP) is a fundamental problem for planning trajectories along a prescribed geometric path under kinodynamic constraints and task-dependent objectives. While TOPP minimizes traversal time, its saturating states and controls may induce vibration and tracking errors, which can be mitigated by introducing smoothness objectives. However, a key capability gap remains in OPP: feasibility guarantees, general-objective optimality certificates, and computational efficiency are difficult to achieve simultaneously in a unified framework, especially for third-order OPP (OPP3) with non-convex constraints. This paper proposes reachability-augmented dual dynamic programming (RDDP), a state-grid-free and objective-aware DP framework for OPP. The key idea is to replace the relatively complete recourse assumption used in classical dual DP (DDP) with OPP-specific backward reachable sets, and then generate both value-function cuts and trial trajectories only inside these reachable sets. For convex and non-convex OPP, we prove global optimality and Karush-Kuhn-Tucker convergence of RDDP under OPP-specific conditions, respectively. Efficient instantiations are developed for OPP2 and OPP3. Experiments show that RDDP achieves objective values comparable to convex-optimization baselines while reducing computation time by 28.6 times for OPP2 and 5.8 times for OPP3. RDDP also achieves faster convergence than grid-based DP. Compared with reachability-analysis methods, RDDP retains the reachability mechanism while replacing local maximum-control propagation with value-function-guided control selection, thereby enabling objectives beyond traversal time. In summary, RDDP addresses a key capability gap in OPP by unifying certifiable general-objective optimization, reachability-based feasibility preservation, and online-compatible low-dimensional DP computation in a single OPP framework.
Green hydrogen plays an essential role in decarbonization, with capacity projected to scale to 560 GW by 2030 (vs. 1.39 GW in 2023) in net-zero settings. Proton exchange membrane (PEM) electrolysis is one of the most promising technology routes to green hydrogen production, and real-time system health monitoring of PEM electrolyzers is essential for their scalable deployment. In lab settings, performance degradation can be characterized through electrochemical testing protocols by periodic pauses of normal operation. Such interruption is not practical for full-scale stack deployments, limiting system operators' ability to make real-time assessments of state-of-health (SoH). We present a machine learning (ML) framework that performs virtual electrochemical characterization during normal operation. The method uses an encoder-decoder transformer, conditioned on operational data, to reconstruct characterization outputs, focusing here on polarization curves. Inspired by patch-based sequence tokenization, we segment the inputs into patches and encode them to form meaningful tokens, which substantially improves learning efficiency. Across four longitudinal runs, lasting up to 478 hours on different test cells and loading cycles, the model accurately reconstructed polarization curves and achieved 10x reduction in mean squared error (MSE) compared to a vanilla transformer. This proof-of-concept demonstrates that ML models can enable continuous performance monitoring for PEM electrolyzers and that the encoder captures meaningful latent representations of SoH, opening up opportunities to derive interpretable indicators in future work.
Over-the-air federated learning (OTA-FL) improves communication efficiency by exploiting the superposition property of wireless channels, but this same property also creates a critical security vulnerability: the parameter server (PS) cannot access individual local updates, making it difficult to identify and exclude poisoned gradients. The challenge is further exacerbated under non-independent and identically distributed (Non-IID) training data, where benign gradient drift can closely resemble malicious updates. In this paper, we propose a two-stage robust aggregation framework for defending against backdoor attacks in OTA-FL. Under our scheme, each client is first assigned a modality-aware multi-indicator trust score, where the specific indicators are selected according to the data modality (e.g., waveform, text, image) and model architecture to capture the most discriminative footprint of backdoor updates. Based on this score, the PS then performs trust-based multiple access (TBMA) to separate clients into trusted, suspicious, and malicious categories. Suspicious clients are further examined through PS-side layer-wise inspection and a longitudinal reputation mechanism. Experimental results on several datasets demonstrate that the proposed methodology effectively suppresses stealthy backdoor attacks, including bounded-scaling attacks, Euclidean-constrained attacks, Cosine-constrained attacks, and Neurotoxin, while maintaining competitive main-task accuracy.
The low-altitude economy (LAE) is reshaping the industrial landscape by deploying unmanned aerial vehicles (UAVs) to facilitate a wide range of applications demanding flexible aerial mobility. Integrating edge artificial intelligence (AI) into LAE platforms creates a compelling paradigm where UAVs provide real-time AI-driven analysis while simultaneously executing their primary aerial mission duties. However, realizing this paradigm remains challenging due to the strict mission constraints imposed by these primary duties and the throughput bottlenecks of wireless links. To bridge this gap, we propose a UAV-assisted cooperative edge inference framework where UAVs execute mission-critical LAE duties, quantified by trajectory deviations from reference paths, while concurrently supporting ground devices via intermediate feature offloading. Within this framework, UAV trajectories, inference task offloading decisions, and feature compression ratios are jointly optimized to maximize the system performance. We cast this joint optimization task into a constrained partially observable Markov decision process (POMDP) framework. To efficiently solve it, we propose HDRL-MoE, a novel hierarchical deep reinforcement learning framework that decouples the optimization of slow-varying inference decisions from rapidly changing UAV trajectory control. Furthermore, HDRL-MoE integrates a mixture-of-experts (MoE) architecture, where a router network orchestrates discrete offloading decisions while expert networks independently optimize the feature compression ratios. Extensive simulations show that HDRL-MoE achieves significant inference accuracy gains over baselines and exhibits high scalability and efficiency through its MoE design.
The rapid adoption of deep learning has increasingly led to data-driven models replacing classical model-based algorithms, even in domains governed by well-understood physical laws. While data-driven models, such as long short-term memory (LSTM) networks, have become a popular choice for time-series analysis, their performance relative to model-based approaches in structured environments is rarely evaluated objectively. This paper presents a performance evaluation framework comparing an LSTM classifier against a model-based expectation maximization (EM) classifier for binary time-series classification. The evaluation is conducted on two scalar linear Gaussian state space models differing only in their noise statistics, where the Kalman filter likelihood ratio test with true parameters serves as a reference for the best achievable classification this http URL Monte Carlo simulations, the classifiers are evaluated across three axes: task difficulty, controlled by the separation in process or measurement noise between the two models; sequence length; and training dataset size. The results show that the EM classifier, which exploits the known model structure, performs strongly when the data conform to the assumed model class. The LSTM classifier requires a larger separation in noise statistics to achieve reliable classification, and its performance saturates below the reference classifier when the models differ only in measurement noise, regardless of sequence length or training dataset size.
Batteryless IoT systems have largely followed two paths: ambient-energy sensing, where energy arrival is decoupled from the event being monitored, and kinetic event telegrams, where a user actuation powers a short report of the actuation itself. Mechanically gated states expose a third case: the access motion is not only an event to report, but the moment at which a latent physical state may have changed and must be measured. We show that routine hinge motion can supply enough energy for one bounded wake-sense-transmit transaction, including ultrasonic sensing and a long-range LoRa uplink. We call this principle motion-coupled sensing and instantiate it with an open-source compact electromagnetic harvester that retrofits to bins, doors, and cabinets with no structural modification. We size the platform for the most demanding workload, waste-bin monitoring, where each actuation must power both an ultrasonic measurement and a long-range LoRa uplink. Across five campus locations and 5,945 lid actuations, the bin deployment achieves 99.3% per-event transmission reliability. Field deployments on room doors with 1,870 actuations and office cabinets with 1,636 actuations achieve 92% and 94% transmission success respectively, demonstrating that the same energy envelope transfers across hinge geometries without hardware redesign. These results show that mechanical access can be treated as a self-powered sensing transaction, removing periodic polling and scheduled battery maintenance for IoT deployments.
Despite rapid advances in automatic speech recognition (ASR) and large audio-language models, robust recognition in real-world environments remains limited by an "acoustic robustness bottleneck": models often lose acoustic grounding and produce omissions or hallucinations under severe, compositional distortions. We propose Mega-ASR, a unified ASR-in-the-wild framework that combines scalable compound-data construction with progressive acoustic-to-semantic optimization. We introduce Voices-in-the-Wild-2M, covering 7 classic acoustic phenomena and 54 physically plausible compound scenarios, and train Mega-ASR with Acoustic-to-Semantic Progressive Supervised Fine-Tuning and Dual-Granularity WER-Gated Policy Optimization. Extensive experiments demonstrate that Mega-ASR achieves significant advantages over prior state-of-the-art systems on adverse-condition ASR benchmarks (45.69% vs. 54.01% on VOiCES R4-B-F, and 21.49% vs. 29.34% on NOIZEUS Sta-0). On complex compositional acoustic scenarios, Mega-ASR further delivers over 30% relative WER reduction against strong open- and closed-source baselines, establishing a scalable paradigm for robust ASR in-the-wild.
To support operations and passenger-facing services, transit agencies need reliable passenger load trajectories. Currently, load estimates are typically inferred from imperfect sensing systems rather than fully observed, and the accuracy of modern automatic passenger counting (APC) systems still varies with station layout, flow intensity, and operating conditions. To address the challenges of robust passenger load estimation from heterogeneous data streams, including incremental count errors, evidence conflicts, and context-dependent sensor reliability, we propose a closed-loop, state-centric, multi-agent framework. This method enforces physical feasibility at every step, allocates trust dynamically among evidence sources, and feeds physics-derived violation residuals back into training for robustness improvement. The architecture consists of a unified stop-event backbone, a coupled Perception--Physical--Fusion loop for stop-by-stop inference, and optional trip-level macro-correction and closed-loop calibration modules.
Current Physical AI (PAI) relies heavily on closed-loop visual-servoing pipelines, whose perception and planning stages may become computationally intensive onboard due to complex models embedded on robots. In practice, offloading the perception task to on-site edges statically is inappropriate for latency-sensitive, precise industrial settings over a standardized industrial network. This emphasizes the importance of Control-Communication-Computing (3C) co-design in industrial automation: monolithic local execution saturates AI-accelerated machine and robot hardware, while static edge offloading exposes the control loop to network jitter. Existing adaptive task placement (ATP) controllers can partially address the gap by relocating a single pipeline stage on binary threshold rules, without a multi-stage model and an explicit cost on placement switching. In this Work-in-Progress (WiP) paper, we propose a directed acyclic graph (DAG) based quality-of-service (QoS)-aware dynamic task placement (DTP) framework for sensing-perception-planning-control pipelines in networked robotics. This pipeline is formalized as a DAG with task-level and node-level attributes for compute cost, communication delay, and feasible placement sets; over a small interpretable candidate set (fully local, static offload, hybrid), a window-based cost function combines tail end-to-end latency, deadline violation rate, hardware utilization, and a Hamming-distance switching penalty, and a DTP algorithm with hysteresis and a minimum dwell-time bounds placement chatter. Our WiP paper presents the theoretical framework, a structured qualitative analysis, and a two-phase simulation plus hardware-in-the-loop validation roadmap.
This paper presents a method to approximate regions of attraction of unknown nonlinear dynamical systems from data. Assuming point-wise evaluations of the vector field and known Lipschitz bounds, a polyhedral uncertainty set of admissible dynamics is constructed. This uncertainty description enables the synthesis of a continuous \ac{PWA} Lyapunov candidate via a linear program, enforcing a robust decrease condition for all admissible vector fields. The approach allows certification of a region of attraction consistent with the available data. Numerical examples illustrate the effectiveness of the proposed method in extracting certified regions of attraction from sparse data.
Blind source separation (BSS) is a natural framework for studying how latent causes may be recovered from sensory mixtures, but deriving online and biologically plausible algorithms for structured (i.e., constrained to known domains) and potentially correlated sources remains challenging. Recent work has derived neural networks for BSS from maximization of an entropy measure, yet its online implementations involve complex and nonlocal recurrent dynamics. Motivated by this perspective, we propose Predictive Entropy Maximization, which achieves competitive performance in BSS, using only local weight updates. The method employs a close approximation of an entropy measure, yielding an objective function with easily interpretable components. Minimizing this objective leads to a predictive neural architecture in which feedforward synapses follow an error-driven rule (that can be realized through dendritic mechanisms), lateral inhibitory connections are learned with local Hebbian plasticity, and source-domain constraints are enforced through simple output nonlinearities. We derive explicit spectral bounds on the surrogate error, characterizing when the approximation is accurate. Empirically, Predictive Entropy Maximization remains robust under increasing source correlation and observation noise, outperforms biologically plausible algorithms that rely on stronger independence or decorrelation assumptions, and remains competitive with exact determinant- and correlative-information-based baselines. These results show how local plasticity and adaptive lateral inhibition can emerge from maximizing a regularized second-order entropy over structured source domains. Our implementation code is available at this https URL.
Diffusion and flow-based generative models dominate visual synthesis, with guidance aligning samples to user input and improving perceptual quality. However, Classifier-Free Guidance (CFG) and extrapolation-based methods are heuristic linear combinations of velocities/scores that ignore the generative manifold geometry, breaking probability conservation and driving samples off the learned manifold under strong guidance. We analyse guidance through the continuity equation and show its effect decomposes into a divergence term and a score-parallel term defined invariantly across parameterisations. We prove the divergence term blows up structurally as sampling approaches the data manifold, motivating a time-dependent schedule alongside score-parallel attenuation. The resulting plug-and-play rule, Adaptive Manifold Guidance (AdaMaG), bounds both terms at no additional inference cost. Finally, we show that most empirical heuristics for reducing saturation or improving generation quality correspond directly to the two terms in our decomposition. Across image generation benchmarks, AdaMaG improves realism, reduces hallucinations, and induces controlled desaturation in high-guidance regimes.
Distributed acoustic sensing (DAS) systems generate continuous, ultra-high-channel-count data streams at rates that exceed the capabilities of conventional batch-oriented analysis frameworks. As a result, essential tasks such as interactive exploration of long-duration recordings, scalable event annotation, and real-time algorithm-in-the-loop monitoring remain inadequately supported by workflows built around manually selected data segments and offline processing. This paper presents FiLark (Fiber Lark), a Python framework that applies a \emph{streaming-first} principle uniformly across data access, signal processing, visualization and monitoring for DAS. Instead of operating on manually selected data segments, FiLark presents any DAS sources-including continuous multi-file recordings-as a unified stream and builds all system components around that abstraction. An OpenGL-based ring-buffer renderer enables interactive browsing and visualization of arbitrarily long recordings with constant memory usage. An integrated annotation interface supports event labeling directly within continuous data streams, facilitating the creation of reproducible machine-learning-ready labeled datasets without offline preprocessing. The signal processing library includes temporal, spatial, spectral, and decomposition-based operators, with both CPU implementations and GPU-accelerated variants via PyTorch, alongside stateful chunked execution that preserves processing continuity and application semantics across segment boundaries. A standardized monitor interface further integrates streaming detectors and learning-based models into the visualization workflow. By sharing a common streaming abstraction across all layers, FiLark allows processing configurations and workflows developed interactively to transfer directly to scalable production pipelines without modification.
This article presents a Hamilton--Jacobi (HJ) reachability framework for a two--satellite collision avoidance problem operating in the same circular orbit, where relative motion is modeled in the radial--tangential--normal (RTN) frame using planar Hill--Clohessy--Wiltshire (HCW) dynamics. We define the target state space as unsafe relative configurations in the orbit plane corresponding to minimum separation requirements consistent with Federal Communications Commission (FCC) orbital standards. The interaction between spacecraft is formulated as a zero--sum differential game, where Player 1 is the controlled satellite and Player 2 is modeled as a bounded adversarial disturbance with unknown intent. We present the HJ formulation and compute backward reachable sets that characterize relative states from which collision cannot be avoided under worst-case disturbances, while states outside this set admit provably collision-free trajectories. These reachable sets are integrated with supervisory hybrid control logic to determine when evasive maneuvers must be initiated, enabling mathematically grounded safety guarantees for scalability.
Efficient use of spatio-temporal resources, including sensor arrays and transmit waveforms, is a key challenge in modern MIMO active sensing systems. This paper studies the impact of array redundancy and waveform rank (WR) on active sensing performance. Specifically, we show that parameter identifiability at reduced WR critically depends on subspace properties of the so-called array redundancy pattern. We show that array geometries with identical sum co-arrays can exhibit markedly different identifiability properties at low WR. We derive a novel necessary condition for maximizing identifiability at reduced WR, which reveals that the unfavorable redundancy patterns of certain redundant arrays fundamentally limits their performance. The results yield new insights into resource-efficient sensing systems, motivating redundancy-aware array and waveform design.
Conventional optimization frameworks for power-system operation and planning primarily focus on steady-state conditions, which become increasingly inadequate as rising penetrations of inverter-based resources (IBRs) strengthen the coupling between stability and steady-state operating conditions. Meanwhile, the software-defined nature of IBRs provides additional flexibility to co-optimize operating points and dynamic behavior. This paper proposes a unified stability-constrained optimization framework that incorporates synchronization, voltage, and frequency stability within a single scheduling model. Established stability criteria are selected and translated into explicit operational limits, after which a general formulation is developed to embed all three criteria in a common structure. The resulting second-order cone (SOC) constraints are convex and can be integrated seamlessly into existing optimization models. The proposed framework enables the simultaneous pursuit of economic efficiency and multi-dimensional stability enhancement, providing a tractable pathway for secure operation in future IBR-dominated power systems.
Robust signal detection in colored noise with unknown covariance is essential in radar, cognitive radio, integrated sensing and communication (ISAC), and quantum sensing applications. This paper develops a unified analytical framework for the Standard Condition Number (SCN) detector, which employs the ratio of the largest to smallest eigenvalues of the whitened sample covariance matrix. The framework jointly covers both ideal conditions in which the training and sensing noise statistics are identical and disturbed conditions in which interference or jamming alters the sensing covariance. Despite the SCN's practical relevance, its finite-sample false-alarm and detection behavior has not been analytically characterized. Using random matrix theory (RMT), we derive general expressions for these probabilities, provide closed-form results for special cases, and show that the SCN preserves the Constant False Alarm Rate (CFAR) property under covariance mismatch. Analytical and simulation results confirm that the proposed unified framework delivers consistent detection performance and greater robustness than conventional eigenvalue- and LRT-based detectors.
Advanced navigation techniques in image-guided interventions and surgical robotics require the rapid and precise alignment of 3D preoperative volumes (e.g., CT, MRI) to 2D intraoperative images (e.g., X-ray fluoroscopy). However, existing 2D/3D registration methods fail to generalize across the broad spectrum of fluoroscopy-guided procedures: traditional intensity-based optimizers require careful hyperparameter tuning for each subject, while deep learning approaches demand extensive manually labeled datasets and remain constrained to the specific anatomy on which they were trained. To address these limitations, we present xvr, a self-supervised framework that combines patient-specific neural networks with gradient-based optimization for automatic 2D/3D registration. xvr leverages physics-based simulation to generate training data from a patient's own preoperative scan, eliminating the need for manual annotation. We present a foundation model pretrained on thousands of whole-body scans, achieving patient-specific adaptation for any anatomical region in only 5 minutes of finetuning. In the largest evaluation of 2D/3D registration on real fluoroscopy to date, xvr achieves high accuracy in seconds across diverse anatomical structures, imaging modalities, and hospitals, improving upon the accuracy of existing methods by an order of magnitude. xvr makes pan-anatomical 2D/3D rigid registration accessible to broad clinical and research communities through open-source software at this https URL.
Deep neural networks (DNNs) have emerged as a powerful tool with a growing body of literature exploring Lyapunov-based approaches for real-time system identification and control. These methods depend on establishing bounds for the second partial derivatives of DNNs with respect to their parameters, a requirement often assumed but rarely addressed explicitly. This paper provides rigorous mathematical formulations of polynomial bounds on both the first and second partial derivatives of DNNs with respect to their parameters. We present lemmas that characterize these bounds for fully-connected DNNs, while accommodating various classes of activation function including sigmoidal and ReLU-like functions. Our analysis yields closed-form expressions that enable precise stability guarantees for Lyapunov-based deep neural networks (Lb-DNNs). Furthermore, we extend our results to bound the higher-order terms in first-order Taylor approximations of DNNs, providing important tools for convergence analysis in gradient-based learning algorithms. The developed theoretical framework develops explicit, computable expressions, for previously assumed bounds, thereby strengthening the mathematical foundation of neural network applications in safety-critical control systems.
Epileptic seizures are transient neurological events characterized by abnormal and excessive neuron activity in the brain, which are often associated with measurable disturbances in the cardiovascular system. Traditionally, electroencephalogram (EEG) signals have served as the primary modality for seizure prediction due to their direct measurement of brain activity and high diagnostic precision. However, their cost, sensitivity to noise, and practical deployment constraints limit their applicability outside controlled clinical environments. To overcome these challenges, recent studies have increasingly investigated electrocardiogram (ECG) signals as a practical and non-invasive alternative for seizure prediction in real-world settings. Evidence suggests that ECG-derived cardiac signatures may precede clinical seizure onset, offering a viable window for early detection. In this paper, we propose a reconstruction-based anomaly detection framework that integrates time-frequency representations with advanced deep learning models to capture deviations in heart rate dynamics associated with seizure onset. Afterward, reconstruction error is smoothed, and an adaptive thresholding strategy is applied to reduce false alarms. The method was evaluated on the Siena database, achieving a specificity of 99.16%, accuracy of 76.05%, and a false positive rate (FPR) of 0.01/h, with an average prediction horizon of 45 minutes prior to seizure onset. These results demonstrate that ECG-based prediction can provide clinically actionable early warnings while improving patient accessibility and comfort. Nevertheless, this performance reflects a trade-off favoring high specificity over sensitivity, resulting in reduced FPR and aligning with clinical requirements for reliable deployment.
Reliable automatic seizure detection from long-term electroencephalography (EEG) remains an unsolved challenge, as current models often fail to generalize across patients or clinical settings. Manual EEG review still is the standard of care, highlighting the need for robust models and standardized evaluation. The current literature often reports high efficacy, yet these models frequently fail when deployed to unseen patient populations. To rigorously assess this generalization gap, we conducted a large-scale empirical study evaluating 28 state-of-the-art algorithmic architectures, ranging from classical feature engineering to modern Deep Learning. These algorithms were collected by organizing a competition. A strictly held-out private dataset of continuous EEG recordings from 65 subjects, totaling 4,360 hours of data, was utilized to evaluate algorithm performance. Expert neurophysiologists annotated these recordings, establishing the ground truth for seizure events. Algorithms were evaluated using event-based metrics from the SzCORE framework, including sensitivity, precision, F1-score, and false positive rate per day. Results revealed significant performance variability among state-of-the-art approaches, with the top F1 score of 32% (sensitivity 37%, precision 29%), highlighting the persistent difficulty of this task. Analysis uncovered a discordance between peak performance and population-level stability. The algorithms achieving the highest aggregate F1-scores did not achieve the most consistent ranking across subjects. This independent evaluation exposed a notable gap between self-reported efficacies and hold-out performance, underscoring the critical need for standardized, rigorous benchmarking. The evaluation infrastructure transitions into a continuously open benchmarking platform, fostering reproducible research and accelerating robust seizure detection algorithm development.
Directed acyclic graphs (DAGs) are central to science and engineering applications including causal inference, scheduling, and neural architecture search. In this work, we introduce the DAG Convolutional Network (DCN), a novel graph neural network (GNN) architecture designed specifically for convolutional learning from signals supported on DAGs. The DCN leverages causal graph filters to learn nodal representations that account for the partial ordering inherent to DAGs, a strong inductive bias does not present in conventional GNNs. Unlike prior art in machine learning over DAGs, DCN builds on formal convolutional operations that admit spectral-domain representations. We further propose the Parallel DCN (PDCN), a model that feeds input DAG signals to a parallel bank of causal graph-shift operators and processes these DAG-aware features using a shared multilayer perceptron. This way, PDCN decouples model complexity from graph size while maintaining satisfactory predictive performance. The architectures' permutation equivariance and expressive power properties are also established. Comprehensive numerical tests across several tasks, datasets, and experimental conditions demonstrate that (P)DCN compares favorably with state-of-the-art baselines in terms of accuracy, robustness, and computational efficiency. These results position (P)DCN as a viable framework for deep learning from DAG-structured data that is designed from first (graph) signal processing principles.
Automatic Speech Recognition (ASR) is an integral component of modern technology, powering applications such as voice-activated assistants, transcription services, and accessibility tools. Yet ASR systems continue to struggle with the inherent variability of human speech, such as accents, dialects, and speaking styles, as well as environmental interference, including background noise. Moreover, domain-specific conversations often employ specialized terminology, which can exacerbate transcription errors. These shortcomings not only degrade raw ASR accuracy but also propagate mistakes through subsequent natural language processing pipelines. Because redesigning an ASR model is costly and time-consuming, non-intrusive refinement techniques that leave the model's architecture intact have become increasingly popular. In this survey, we review current non-intrusive refinement approaches and group them into five classes: fusion, re-scoring, correction, distillation, and training adjustment. For each class, we outline the main methods, advantages, drawbacks, and ideal application scenarios. Beyond method classification, this work surveys adaptation techniques aimed at refining ASR in domain-specific contexts, reviews commonly used evaluation datasets along with their construction processes, and proposes a standardized set of metrics to facilitate fair comparisons. Finally, we identify open research gaps and suggest promising directions for future work. By providing this structured overview, we aim to equip researchers and practitioners with a clear foundation for developing more robust, accurate ASR refinement pipelines.
A pinching antenna system (PASS) assisted cell-free communication system is proposed. A sum rate maximization problem under the BS power budget constraint and PA deployment constraint is formulated. To tackle the proposed non-convex optimization problem, an alternating optimization (AO) algorithm is developed. In particular, the digital beamforming sub-problem is solved using the weighted minimum mean square error (WMMSE) method, whereas the pinching beamforming sub-problem is handled via a penalty based approach combined with element-wise optimization. Simulation results demonstrate that: 1) the PASS assisted cell-free systems achieve superior performance over benchmark schemes; 2) increasing the number of PAs per waveguides can improve the advantage of PASS assisted cell-free systems; and 3) the cell-free architecture mitigates the average user rate degradation as the number of users increases.
Battery-less Internet of Things (IoT) devices rely on ambient energy harvesting and therefore require scheduling policies that jointly account for energy intermittency and hard timing constraints. This challenge is especially acute in periodic monitoring applications, where a sensing--computing--transmitting task chain must be completed within each reporting cycle. In this paper, we formulate this problem within a setting characterized by independently and identically distributed (i.i.d.) energy arrivals as a long-term average-reward Markov decision process (MDP) that explicitly captures capacitor-voltage evolution, task ordering, permissible start windows, and safe-execution requirements. We further propose rewards that promote reliable task completion while penalizing risky low-energy execution. We prove that the considered MDP is unichain and that the optimal stationary policy has a threshold structure, which leads to an optimal stationary threshold-based (OSTB) scheduler. To account for more realistic energy sources, we additionally study a correlated harvesting model based on a finite-state Markov process and show that the proposed framework can be applied to this richer setting under conservative sufficient conditions. Finally, numerical results show that OSTB outperforms representative baselines in terms of long-term full-chain completion rate, power failures, and latency, particularly when harvested energy is scarce.
Model Predictive Control (MPC) is a powerful framework for optimal control but can be too slow for low-latency applications. We present a data-driven framework to accelerate MPC by replacing online optimization with a nonparametric policy constructed from offline MPC solutions. Our policy is greedy with respect to a constructed upper bound on the optimal cost-to-go, and can be implemented as a nonparametric lookup rule that is orders of magnitude faster than solving MPC online. Our analysis shows that under sufficient coverage conditions of the offline data, the policy is recursively feasible and admits provable, bounded optimality gap. These conditions establish an explicit trade-off between the amount of data collected and the tightness of the bounds. New solutions can be incorporated straightforwardly without the need for retraining, enabling continual improvement. Our experiments show that this policy is between 100 and 1000 times faster than standard MPC with only a modest hit to optimality, showing potential for real-time control tasks.
Effective medical simulators necessitate realistic haptic rendering of biological tissues that exhibit viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as they naturally capture memory effects. However, due to the unintuitive, frequency-dependent coupling among the order of the fractional element and other parameters, determining appropriate parameter values for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, using qualitative feedback-based human-in-the-loop (HiL) optimization, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results into an aggregate perceptual map trained on the entire dataset, and demonstrate how to select population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters for three viscoelastic materials by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.
This paper proposes an encrypted state observer that is capable of detecting sensor attacks without decryption. We first design a state observer that operates over a finite field of integers with the modular arithmetic. The observer generates a residue signal that indicates the presence of attacks under sparse attack and sensing redundancy conditions. Then, we develop a homomorphic encryption scheme that enables the observer to operate over encrypted data while automatically disclosing the residue signal. Unlike our previous work restricted to single-input single-output systems, the proposed scheme is applicable to general multi-input multi-output systems. Given that the disclosed residue signal remains below a prescribed threshold, the full state can be recovered as an encrypted message.
This paper investigates structural herdability in a special class of temporally switching networks with fixed topology. We show that when the underlying digraph remains unchanged across all snapshots, the network attains complete SS herdability even in the presence of signed or layer dilations, a condition not applicable to static networks. This reveals a fundamental structural advantage of temporal dynamics and highlights a novel mechanism through which switching can overcome classical obstructions to herdability. To validate these conclusions, we utilize a more relaxed form of sign matching within each snapshot of the temporal network. Furthermore, we show that when all snapshots share the same underlying topology, the temporally switching network achieves $\mathcal{SS}$ herdability within just two snapshots, which is fewer than the number required for structural controllability. Several examples are included to demonstrate these results.
Due to the directive property of each antenna element, the received signal power can be severely attenuated when the emitter deviates from the array boresight, which will lead to a severe degradation in sensing performance along the corresponding direction. Although existing rotatable array sensing methods such as recursive rotation (RR-Root-MUSIC) can mitigate this issue by iteratively rotating and sensing, several mechanical rotations and repeated eigendecomposition operations are required to yield a high computational complexity and low time-efficiency. To address this problem, a pre-rotation initialization with recieve power as a rule is proposed to signifcantly reduce the computational complexity and improve the time-efficiency. Using this idea, a low-complexity enhanced direction-sensing framework with pre-rotation initialization and iterative greedy spatial-spectrum search (PRI-IGSS) is develped with three stages: (1) the normal vector of array is rotated to a set of candidates to find the opimal direction with the maximum sensing energy with the corresponding DOA value computed by the Root-MUSIC algorithm; (2) the array is mechanically rotated to the initial estimated direction and kept fixed; (3) an iterative greedy spatial-spectrum search or recieving beamforming method, moviated by reinforcement learning, is designed with a reduced search range and making a summation of all previous sampling variance matrices and the current one is adopted to provide an increasiong performance gain as the iteration process continues. To assess the performance of the proposed method, the corresponding CRLB is derived with a simplified rotation model. Simulation results demonstrate that the proposed PRI-IGSS method performs much better than RR-Root-MUSIC and achieves the CRLB in term of mean squared error due to the fact there is no sample accumulation for the latter.
In this paper, we derive a novel procedure for set-membership estimation of dynamical systems affected by stochastic noise with unbounded support. Employing a bound on the sample covariance matrix, we are able to provide a finite- sample uncertainty set containing the true system parameters with high probability. Our approach can be natively applied to a wide class of nonlinear systems affected by sub-Gaussian noise. Our analysis provides conditions under which the proposed uncertainty set converges to the true system parameters and establishes an upper bound on the convergence rate. The proposed uncertainty set can be used directly for robust controller synthesis with probabilistic stability and performance guarantees. Concluding numerical examples demonstrate the advantages of the proposed formulation over established approaches.
Speech foundation models have shown strong transferability across a wide range of speech applications. However, their robustness to age-related domain shift in speaker diarization remains underexplored. In this work, we present a cross-lifespan evaluation within a unified end-to-end neural diarization framework (EEND-VC), covering speech samples from conversations involving children, adults, and older adults. We compare models under zero-shot cross-age inference, joint multi-age training, and domain-specific adaptation. Results show substantial performance degradation when models trained on adult-specific speech are applied to child and older-adult conversational data. Moreover, joint multi-age training across different age groups improves robustness without reducing diarization performance in canonical adult conversations, while targeted age group adaptation yields further gains in diarization performance, particularly when using the Whisper encoder.
This study presents two analytical closed-form PI controller tuning solutions for second-order plants with real poles, each achieving monotonic step response and minimum settling time. The first solution employs pole-zero cancellation, placing the controller zero at the slower plant pole and reducing the closed-loop dynamics to a critically damped second-order system. The second solution, applicable when the plant pole ratio is less than two, places all three closed-loop poles at a common location without cancelling any plant pole, yielding a closed-loop transfer function with a triple real pole and a zero. Despite retaining a closed-loop zero, this solution achieves strictly faster settling time than the pole-zero cancellation method in its region of applicability. The two solutions coincide at the boundary pole ratio of two and together form a continuous piecewise-analytical tuning covering the full range of plant pole ratios. This study further establishes that closed-loop transfer functions of the form a^n/(s + a)^n possess a maximum sensitivity Ms together with phase margin and gain margin that are independent of the pole location a and depend solely on the order n, yielding universal robustness constants for each n. A closed-form expression GM(n) = 1 + sec^n({\pi}/n) is established for the gain margin of the family. Numerical verification confirms the analytical results across multiple plant configurations.
Accurate and continuous estimation of cognitive workload is fundamental to creating adaptive human-machine systems. However, designing architectures that balance representational capacity with computational efficiency has been challenging for practical deployment. This paper introduces 1BT, a One-Block Transformer for compact and efficient EEG-based cognitive workload assessment. The model aggregates multi-channel temporal sequences via a minimal latent bottleneck, using a single cross-attention module followed by lightweight self-attention. A controlled study involving 11 participants performing three cognitively diverse tasks (abstract reasoning, numerical problem-solving, and an interactive video game) was conducted with continuous EEG recordings across two workload levels. Systematic architectural analysis identifies the most compact configuration that preserves high performance, while substantially lowering computational cost. The final model achieves high workload classification performance with under 0.5 million parameters and 0.02 GFLOPs, paving the way for a design direction for real-time cognitive workload monitoring in resource-constrained settings.
Fingerprinting-based localization often suffers from poor cross-environment generalization, especially when only a few labeled samples are available in the target environment. Existing methods mitigate distribution shifts through domain adaptation or improved signal representations, but they usually ignore environmental geometry or use it in a deterministic manner, limiting their ability to capture diverse multipath variations in complex propagation conditions. To address this issue, we propose EnvCoLoc, an environment-conditioned diffusion meta-learning framework for few-shot fingerprinting localization. EnvCoLoc extracts structured descriptors from 3D point clouds and uses them to condition a latent diffusion generator, which produces environment-specific parameter offsets to modulate a shared meta-learned initialization. This design injects geometry-aware priors into the adaptation process and provides more informative initializations for new environments. To learn the stochastic mapping from coarse environmental descriptors to high-dimensional parameter corrections under limited data, the diffusion generator and localization network are jointly optimized within a two-loop meta-learning framework. The generated offsets capture systematic environment-dependent variations, while gradient-based inner-loop adaptation further refines the model to reduce residual task-specific mismatch. We also provide an excess-loss analysis for finite-step adaptation, theoretically supporting the benefit of geometry-aware initialization. Real-world experiments show that EnvCoLoc consistently improves localization accuracy over baseline methods, achieving up to a 20.0% reduction in mean localization error in NLOS scenarios with only 10 support samples.
In this paper, we present a learning-based control for a class of nonlinear systems that guarantees exponential stability as well as bounded output errors. The control is based on the Gaussian Process Submodel Online Learning (GPSOL) algorithm and the Disturbance Error Rate Limiting (DERL) algorithm, both of which were developed in previous work. The GPSOL algorithm provides a method to learn Gaussian Process (GP) models for subsystems online, whereas the DERL algorithm allows to limit the rate of the prediction error of these GP models. The focus of this paper is the utilization of the GP model within an adaptive controller and the derivation of corresponding stability conditions and system peak-to-peak gains by means of linear matrix inequalities (LMIs). These peak-to-peak gains are then used to prescribe a desired prediction error rate for the DERL algorithm to achieve user-defined output error bounds. The gains and the related bounds were successfully verified using a simulation model. Furthermore, results form a successful experimental validation of the bounds and the overall control structure on a pneumatic test rig are presented. While the control scheme and error bounds proposed in this paper are limited to first-order single-input-single-output systems, an extension to certain classes of higher-order and multiple-input-multiple-output systems is expected to be forthcoming.
Log-homotopy particle flow filters realize nonlinear Bayesian estimation by continuously migrating samples from the prior to the posterior distribution. This transport is governed by a pseudo-time ordinary differential equation (ODE). A major practical challenge of these filters is the need for numerical integration, which suffers from high computational cost and susceptibility to stiffness. This paper develops an exact, integration-free closed-form solution for the exact Daum--Huang deterministic particle flow under vector linear Gaussian measurements. By transforming the ODE into a specific eigenspace, we derive closed-form algebraic expressions for both the homogeneous state transition matrix and the inhomogeneous forcing term. We prove that this analytic solution is equivalent to the exact Kalman measurement update. We embed this closed-form evaluation within an $N$-step piecewise method for nonlinear measurement models. We further propose a constant contraction rate substep schedule that equalizes the per-step contraction along the eigendirection of $D$ associated with the largest eigenvalue $\alpha_{\max}$. The result is a stiffness-mitigating, integration-free particle update for highly nonlinear measurement models. On a bearings-only tracking benchmark, it achieves the lowest error among the compared filters, at a per-update cost comparable to deterministic particle flow baselines and substantially lower than stochastic flows.
Shear-horizontal surface acoustic wave (SH-SAW) filters have shown strong potential for low-loss, compact, GHz-frequency RF front ends. In this work, we demonstrate a high-performance SH-SAW filter design at 4.35 GHz utilizing 42°Y-cut thin-film lithium tantalate (LiTaO3) on a SiO2/Si platform. Despite the limitations of thin aluminum metallization and its associated ohmic losses, we show that implementing a Bartlett window apodization technique, primarily intended for in-band spurious-mode suppression, yields a significantly improved quality factor (Q) of 1,522 from 688 in conventional interdigitated SH-SAW resonators. This enhancement enables a third-order ladder filter at 4.3 GHz with an insertion loss of 1.59 dB, compared with 1.65 dB for a conventional SH-SAW filter. In addition, our filter with apodized resonator designs achieves a 3 dB fractional bandwidth (FBW) of 3.24% and out-of-band rejection exceeding 14 dB, all within a compact footprint of 0.4 mm2. These results suggest that apodized thin-film LiTaO3 designs are highly promising for low-loss, miniaturized, cost-effective radio frequency acoustic solutions in next-generation communication and sensing applications.
Contextual biasing is essential to improving the recognition of rare and domain-specific words in an automatic speech recognition (ASR) system. While numerous methods have been proposed in recent years, most of them focus on offline settings and do not explicitly address the challenges of streaming ASR. For example, CTC-based word spotting (CTC-WS) have demonstrated strong performance by directly detecting keywords from CTC log-probabilities, but they are limited to offline processing and require access to the full utterance. In This work, we present a streaming extension of CTC-WS for real-time contextual biasing. Our method maintains active keyword paths across audio chunks using a stateful token passing algorithm, enabling the detection of keywords that span multiple chunks. To ensure low latency and stable output, we introduce an incremental commitment mechanism that only emits segments guaranteed not to be affected by future audio, while deferring uncertain regions. This method naturally integrates with streaming ASR pipelines and does not require modifications to the underlying acoustic model or additional training, making it practical for real-world deployment. Experimental results show that our method reduces overall WER and effectively improves keyword F-score, demonstrating its effectiveness for real-time ASR applications.
The task of efficient automatic music classification is of vital importance and forms the basis for various advanced applications of AI in the musical domain. Musical instrument recognition is the task of instrument identification by virtue of its audio. This audio, also termed as the sound vibrations are leveraged by the model to match with the instrument classes. In this paper, we use an artificial neural network (ANN) model that was trained to perform classification on twenty different classes of musical instruments. Here we use use only the mel-frequency cepstral coefficients (MFCCs) of the audio data. Our proposed model trains on the full London philharmonic orchestra dataset which contains twenty classes of instruments belonging to the four families viz. woodwinds, brass, percussion, and strings. Based on experimental results our model achieves state-of-the-art accuracy on the same.
Planning and control for high-dimensional robot manipulators in cluttered dynamic environments require computational efficiency and robust safety guarantees. Inspired by recent advances in learning configuration-space distance functions (CDFs) as representations of robot bodies, we propose a unified approach for motion planning and control that formulates safety constraints as CDF barriers. A CDF barrier approximates the local free configuration space, substantially reducing the number of collision-checking operations during motion planning. However, learning a CDF barrier with a neural network and relying on online sensor observations introduces uncertainties that must be considered during control synthesis. To address this, we develop a distributionally robust CDF barrier formulation for control that accounts for modeling errors and sensor noise without assuming a known underlying distribution. Simulations and hardware experiments on a UFactory xArm6 manipulator show that our neural CDF barrier formulation enables efficient planning and robust safe control in cluttered and dynamic environments, relying only on onboard point-cloud observations.
This paper introduces the Compliant Explicit Reference Governor (CERG), a modular reference management system that enables robots to interact physically with their environment under provable guarantees. The CERG is an intermediate layer that can be placed between a high-level planner and a low-level controller: it enforces operational constraints and enables smooth transitions between free-motion and contact operations. The CERG ensures safety by limiting the total energy available to the robotic arm at the time of contact. In the absence of contact, however, the CERG does not penalize the system performance. Simulation and hardware experiments validate the CERG on increasingly complex systems.
This paper focuses on the general linearly constrained optimization problem: $\min_{x \in \mathbb{R}^d} f(x) \ \text{s.t.} \ Ax = b$, where $f: \mathbb{R}^d \rightarrow \mathbb{R} \cup \{+\infty\}$ is a closed proper convex function, $A \in \mathbb{R}^{p \times d}$, and $b \in \mathbb{R}^p$. We define the standard dual function $\phi(\lambda) = \inf_x \{f(x) + \langle \lambda, A x - b \rangle\}$, the augmented Lagrangian $\mathcal{L}_{\rho}(x, \lambda) = f(x) + \langle \lambda, Ax - b \rangle + \frac{\rho}{2}\|Ax - b\|^2$ ($\rho > 0$), and the augmented Lagrangian dual function $\phi_{\rho}(\lambda) = \inf_x \mathcal{L}_{\rho}(x, \lambda)$. Under the fundamental condition that $\text{dom} \ \phi \neq \emptyset$, we establish that: (1) $\phi_{\rho}$ is $\frac{1}{\rho}$-smooth everywhere; and (2) the solution to $\min_{x \in \mathbb{R}^d} \mathcal{L}_{\rho}(x, \lambda)$ exists for any $\lambda \in \mathbb{R}^p$. These theoretical findings substantially weaken the stringent assumptions typically imposed in the literature to ensure such properties.
In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
This paper presents a novel non-invasive object classification approach using acoustic scattering, demonstrated through a case study on hair assessment. When an incident wave interacts with an object, it generates a scattered acoustic field encoding structural and material properties. By emitting acoustic stimuli and capturing the scattered signals from head-with-hair-sample objects, we classify hair type and moisture using AI-driven, deep-learning-based sound classification. We benchmark comprehensive methods, including (i) fully supervised deep learning, (ii) embedding-based classification, (iii) supervised foundation model fine-tuning, and (iv) self-supervised model fine-tuning. Our best strategy achieves nearly 90% classification accuracy by fine-tuning all parameters of a self-supervised model. These results highlight acoustic scattering as a privacy-preserving, non-contact alternative to visual classification, opening huge potential for applications in various industries.
Landslides pose severe threats to infrastructure, economies, and human lives, necessitating accurate detection and predictive mapping across diverse geographic regions. With advancements in deep learning and remote sensing, automated landslide detection has become increasingly effective. This study presents a comprehensive approach integrating multi-source satellite imagery and deep learning models to enhance landslide identification and prediction. We leverage Sentinel-2 multispectral data and ALOS PALSAR-derived slope and Digital Elevation Model (DEM) layers to capture critical environmental features influencing landslide occurrences. Various geospatial analysis techniques are employed to assess the impact of terra in characteristics, vegetation cover, and rainfall on detection accuracy. Additionally, we evaluate the performance of multiple stateof-the-art deep learning segmentation models, including U-Net, DeepLabV3+, and Res-Net, to determine their effectiveness in landslide detection. The proposed framework contributes to the development of reliable early warning systems, improved disaster risk management, and sustainable land-use planning. Our findings provide valuable insights into the potential of deep learning and multi-source remote sensing in creating robust, scalable, and transferable landslide prediction models.
Physical processes evolving in both time and space are often modeled using Partial Differential Equations (PDEs). Recently, it has been shown how stability analysis and control of coupled PDEs in a single spatial variable can be more conveniently performed using an equivalent Partial Integral Equation (PIE) representation. The construction of this PIE representation is based on an analytic expression for the inverse of the spatial differential operator, $\partial_s^{d}$, on the domain defined by boundary conditions. In this paper, we show how this univariate representation may be extended inductively to multiple spatial variables by representing the domain as the intersection of lifted univariate domains. Specifically, we show that if each univariate domain is well-posed, then there exists a readily verified consistency condition which is necessary and sufficient for existence of an inverse to the multivariate spatial differential operator, $D^\alpha=\partial_{s_1}^{\alpha_1}\cdots\partial_{s_N}^{\alpha_N}$, on the PDE domain. Furthermore, we show that this inverse is an element of a $*$-algebra of Partial Integral (PI) operators defined by polynomial semi-separable kernels. Based on this operator algebra, we show that the evolution of any suitably well-posed linear multivariate PDE may be described by a PIE, parameterized by elements of the PI algebra. A convex computational test for PDE stability is then proposed using a positive matrix parameterization of positive PI operators, and software (PIETOOLS) is provided which automates the process of representation and stability analysis of such PDEs. This software is used to analyze stability of 2D heat, wave, and plate equations, obtaining accurate bounds on the rate of decay.
The rapid growth of space-based services has established LEO satellite networks as a promising option for global broadband connectivity. Next-generation LEO networks leverage inter-satellite links (ISLs) to provide faster and more reliable communications compared to traditional bent-pipe architectures, even in remote regions. However, the high mobility of satellites, dynamic traffic patterns, and potential link failures pose significant challenges for efficient and resilient routing. To address these challenges, we model the LEO satellite network as a time-varying graph comprising a constellation of satellites and ground stations. Our objective is to minimize a weighted sum of average delay and packet drop rate. Each satellite independently decides how to distribute its incoming traffic to neighboring nodes in real time. Given the infeasibility of finding optimal solutions at scale, due to the exponential growth of routing options and uncertainties in link capacities, we propose SKYLINK, a novel fully distributed learning strategy for link management in LEO satellite networks. SKYLINK enables each satellite to adapt to the time-varying network conditions, ensuring real-time responsiveness, scalability to millions of users, and resilience to network failures, while maintaining low communication overhead and computational complexity. To support the evaluation of SKYLINK at global scale, we develop a new simulator for large-scale LEO satellite networks. For 25.4 million users, SKYLINK reduces the weighted sum of average delay and drop rate by 29% compared to the bent-pipe approach, and by 92% compared to Dijkstra. It lowers drop rates by 95% relative to k-shortest paths, 99% relative to Dijkstra, and 74% compared to the bent-pipe baseline, while achieving up to 46% higher throughput. At the same time, SKYLINK maintains constant computational complexity with respect to constellation size.
We generalize the robotics equation describing the dynamics of open kinematic chains by including the effect of time-dependent change of inertial parameters as well as the effects of causative mass-density redistribution, triggered by internal movement of mass-carrying particles relative to their body-fixed frames. Time dependency of inertial parameters that results from the sole addition of mass to the robot prominently occurs during the loading of end-effectors--a scenario covered by our model without restriction from the restraint that kinematic parameters of the robot must remain constant. Further, our model also includes internal mass-density redistributions that adhere to this kinematic restraint such as trolleys attached to the robot or the movement of passengers. To accompany the generalized robotics equation with some theoretical infrastructure, we then introduce the concepts of uniform physical consistency and upper boundedness of inertial parameters under which desirable, structural properties regarding the existence of finite, positive uniform bounds of the mass matrix can be shown to carry over to the more involved case of time-dependent inertial parameters. These findings have implications for adaptive control, as they facilitate more realistic testing for robustness against unforeseen time dependencies. Moreover, the results in this paper also provide a pathway to ensuring the desirable existence of finite, positive uniform bounds of the estimated mass matrix under upper bounded, uniformly physically consistent estimation regimes.
High-mobility wireless communication systems suffer from severe Doppler spread and multi-path delay, which degrade the reliability and spectral efficiency of conventional modulation schemes. Orthogonal time frequency space (OTFS) modulation offers strong robustness in such environments by representing symbols in the delay-Doppler (DD) domain, while faster-than-Nyquist (FTN) signaling can further enhance spectral efficiency through intentional symbol packing. Meanwhile, reconfigurable intelligent surfaces (RIS) provide a promising means to improve link quality via passive beamforming. Motivated by these advantages, we propose a novel RIS-empowered OTFS modulation with FTN signaling (RIS-OTFS-FTN) scheme. First, we establish a unified DD-domain input-output relationship that jointly accounts for RIS passive beamforming, FTN-induced inter-symbol interference, and DD-domain channel characteristics. Based on this model, we provide comprehensive analytical performance for the frame error rate, spectral efficiency, and peak-to-average power ratio (PAPR), etc. Furthermore, a practical RIS phase adjustment strategy with quantized phase selection is designed to maximize the effective channel gain. Extensive Monte Carlo simulations under a standardized extended vehicular A (EVA) channel model validate the theoretical results and provide key insights into the trade-offs among spectral efficiency, PAPR, input back-off (IBO), and error performance, with some interesting this http URL proposed RIS-OTFS-FTN scheme demonstrates notable performance gains in both reliability and spectral efficiency, offering a viable solution for future high-mobility and spectrum-constrained wireless systems.
The rapid progress in 6G communication and high-bandwidth radar has driven an unprecedented surge in the spatial density of signal sources, resulting in an increasingly congested electromagnetic (EM) environment. When resolving closely spaced signals and interference, existing architectures are strictly bounded by the inherent diffraction limits of two-dimensional (2D) physical apertures, hindering super-resolution sensing and multi-interference mitigation in complex scenarios. Here, we present a 3D aperture-engineered diffractive neural network (AE-DNN) that achieves super-resolution sensing and computing by extending the traditional 2D aperture into 3D. The 3D aperture engineering framework is realized by constructing deep cascaded metasurface layers so that the diffractive propagation from oblique incident fields can be layer-wise modulated and piecewise encoded for perceiving EM fields far exceeding physical aperture limits. The N-layer AE-DNN has the capability to achieve ~N times higher angular resolution than the 2D aperture diffraction limit. The multi-dimensional synthetic aperture (MSA) training is developed to achieve speed-of-light coherent synthesis of the 3D aperture and integrate neural network-based modeling of multi-dimensional metasurface modulation. By orthogonalizing array response vectors in the analog domain, AE-DNN performs parallel super-resolution angle estimation, source number estimation, and source separation for up to 10 independent coherent or incoherent sources. Experimental results across the 36-41 GHz band demonstrate that AE-DNN resolves and suppresses closely spaced multi-interference by ~20 dB, enhances communication capacity by 13.5X, and reduces latency by three orders of magnitude. AE-DNN heralds a paradigm shift in signal processing for advanced radar and 6G communications.
Deep learning has achieved transformative performance across diverse domains, largely driven by large-scale and high-quality training data. In contrast, the development of brain-computer interfaces (BCIs) is fundamentally constrained by limited, heterogeneous, and privacy-sensitive neural recordings. Generating synthetic yet physiologically plausible brain signals has therefore emerged as a promising strategy to mitigate data scarcity, improve model generalization, and support data-efficient BCIs. This survey provides a comprehensive review of synthetic brain data generation for BCIs, covering methodological taxonomies, benchmark experiments, evaluation metrics, key applications, and future directions. We systematically categorize existing generation approaches into four types: signal-transformation-based, feature-based, model-based, and translation-based generation, and discuss their characteristics, advantages, and limitations. Furthermore, we benchmark representative brain signal generation approaches across four BCI paradigms, including motor imagery, epileptic seizure detection, steady-state visually evoked potentials, and auditory attention detection, to provide an objective comparison of their downstream utility. We also summarize evaluation principles for generated brain signals from multiple perspectives, including signal realism, physiological plausibility, downstream utility, and privacy preservation. Finally, we discuss the potential and challenges of current generation approaches and outline future research directions toward accurate, data-efficient, generalizable, and privacy-aware BCI systems. The benchmark codebase is available at this https URL.
In this letter, we study a model-based inverse problem for infinite-horizon linear-quadratic differential games with descriptor dynamics. Given an observed feedback strategy profile, we seek to identify all cost functions that rationalize it as a feedback Nash equilibrium; this collection is referred to as the solution set. We characterize the solution set, show that it is rectangular and convex, and provide an algorithm for computing an admissible realization whenever it is nonempty. We also show that, compared with the corresponding inverse problem for standard state-space dynamics, descriptor dynamics modify the geometry of the solution set and may reduce identifiability. Finally, we illustrate the results with numerical examples.
In this work, we study how to ensure probabilistic safety for nonlinear systems under distributional ambiguity. Our approach builds on a backup-based safety filtering framework that switches between a high-performance nominal policy and a certified backup policy to ensure safety. To handle arbitrary uncertainties from ambiguous distributions, i.e., where the distribution is not of specific structure and the true distribution is unknown, we adopt a distributionally robust (DR) formulation using Wasserstein ambiguity sets. Rather than solving a high-dimensional DR trajectory optimization problem online, we exploit the structure of backup-based safety filtering to reduce safety certification to a one-dimensional search over the switching time between nominal and backup policies. We then develop a sampling-based certification procedure with finite-sample guarantees, where empirical failure probabilities are compared against a Wasserstein-inflated threshold. We validate our method through simulations across three systems, from a Dubins vehicle to a high-speed racing car and a fighter jet, demonstrating the broad applicability and computational efficiency.
Image super-resolution (SR) aims to reconstruct high-quality, high-resolution (HR) images from low-resolution (LR) inputs and plays a critical role in various downstream applications. Despite recent advancements, balancing reconstruction fidelity and computational efficiency remains a fundamental challenge, particularly in resource-constrained scenarios. While existing lightweight methods attempt to expand receptive fields, many of them either incur substantial computational overhead, naively scale up kernel sizes, or lack mechanisms for coherent multi-scale integration, limiting their overall effectiveness and scalability. To address these limitations, we propose EchoSR, an efficient context-harnessing framework for lightweight image super-resolution, which unifies multi-scale receptive field modeling and hierarchical context fusion. EchoSR decouples feature learning into disentangled local, multi-scale, and global modeling stages through an efficient context-harnessing strategy, and further promotes seamless cross-scale integration via a cross-scale overlapping fusion mechanism. Extensive experiments have shown that EchoSR consistently outperforms state-of-the-art lightweight super-resolution methods across multiple benchmarks, while also achieving a faster speed $(\sim 2\times)$. The source code is available at this https URL.