Underwater acoustic target recognition is critical for maritime applications, yet it faces challenges arising from the complex and diverse nature of ship-radiated noise. To address these issues, we propose a robust deep learning-based framework. First, we introduce a feature extraction and fusion method based on variational mode decomposition (VMD) and the 3/2-D spectrum to generate high-fidelity 2-D DEMON spectral features, which effectively capture modulation envelope information. To further enhance feature representation, we design a one-dimensional convolutional neural network (1-D CNN) integrated with a novel Multi-Stage Multi-Type Attention Mechanism (MMATT) that adaptively refines features at different network depths. Within this mechanism, we propose a Residual Channel-Independent Spectral Attention Mechanism (R-CISAM) and a Multi-Scale Separate-and-Fuse Spectral Attention Mechanism (MS-SFSAM). Moreover, to mitigate performance degradation caused by severe class imbalance inherent in real-world ship-radiated noise data, we devise an Adjustable Class-Balanced Focal Loss (ACBFL), which provides flexibility across tasks with varying degrees of imbalance. Experimental results on a real-world ship-radiated noise dataset demonstrate that the proposed solutions effectively enhance underwater acoustic target recognition performance.
Risk-aware navigation in unknown environments is a fundamental challenge for autonomous vehicles operating in complex urban systems. To address this issue, this paper presents a differentiable optimization layered safety-critical control method based on conformal prediction. First, to handle uncertainties arising from sensor noise, the conformal prediction method is employed to generate risk-aware obstacle ellipsoids around an elliptical-shaped robot. Second, two nested differentiable optimization layers are introduced to build the control barrier functions for obstacle avoidance and feasibility guarantee, respectively. Then, a quadratic program based safety-critical control law is proposed to integrate the above control barrier function constraints as well as input constraints. In the end, the effectiveness of the proposed framework is demonstrated through numerical simulations.
WiFi fingerprint-based indoor localization has been widely studied, but most existing approaches focus on absolute positioning and rely on dense coordinate annotations, which are costly to obtain at scale. In this paper, we study a fundamentally different problem: relative localization, where the goal is to directly estimate the displacement between two WiFi fingerprint traces without predicting their absolute positions. To reduce annotation overhead, we adopt weak supervision in the form of stepwise motion vectors obtained from inertial sensing. We propose Intersection Pathway (IP), a cross-modal learning framework that aligns fingerprint traces (f-traces) and displacement traces (d-traces) in a shared latent space. The key idea is to enforce an additive structure in the latent space, such that latent addition and subtraction correspond to physical motion composition, enabling direct relative-displacement inference. Experiments on a synthesized dataset derived from real measurements demonstrate that the proposed method learns displacement-aware WiFi representations and achieves accurate relative localization across varying displacement ranges. Furthermore, the learned model can be extended to few-shot absolute localization with sparse anchors.
Kelvin is a lightweight learned pre-encoder that sits in front of an unmodified libx264 encoder. It applies content-adaptive pixel adjustments, bounded at +/-1/255 per channel, so that the encoder allocates bits where they matter most perceptually, while emitting a standard H.264 bitstream compatible with every existing decoder, player, and CDN. On the seven-sequence 1080p UVG benchmark, Kelvin v1.0 achieves a mean BD-VMAF of -27.62% (7 of 7 wins) and BD-VMAF-NEG of -5.18% (6 of 7 wins) relative to baseline libx264 at preset medium. On the 30-sequence MCL-JCV public set (28 unseen by training), the same checkpoint wins on 28 of 30 clips by BD-VMAF; with the two diagnosable failures removed the mean is -27.70% BD-VMAF and -5.37% BD-VMAF-NEG, consistent with UVG to within one percentage point. A central engineering challenge is the non-differentiability of H.264: we describe a hybrid codec proxy that combines a calibrated differentiable rate estimator (Spearman rho = 0.986 vs. real libx264 bits-per-pixel) with a U-Net distortion proxy trained on real encoder outputs. We publish full per-sequence rate-distortion data, a named failure-mode taxonomy on MCL-JCV (rate-floor violation, distribution shift, metric saturation), a five-baseline sanity panel (hqdn3d, unsharp, -tune psnr, -tune ssim, x265 medium), and honest positioning: x265 medium beats Kelvin on every metric on the same corpus. Kelvin is therefore designed for workloads where remaining on H.264 is a constraint rather than a choice.
Inertial measurement units (IMUs) are fundamental sensing components in multi-source integrated navigation systems, and their performance directly determines the accuracy and reliability of solutions. However, the precision of low-cost IMUs is inherently constrained by hardware limitations. Recently, generative artificial intelligence has demonstrated remarkable capability in modeling complex data distributions and reconstructing high-fidelity signals. Motivated by this, we propose a diffusion-based generative learning framework for synthesizing high-fidelity virtual IMU data from low-cost IMU measurements. Specifically, a conditional diffusion model based on a U-Net architecture is constructed, where high-grade IMU measurements are utilized as ground-truth priors and low-cost IMU measurements are employed as conditional inputs. The virtual IMU data generated by the model is used for subsequent navigation and localization tasks. Experimental results demonstrate that the generated virtual IMU data significantly outperform the original low-cost IMU measurements in both positioning and attitude estimation. Furthermore, we transfer the model to airborne mapping experiments, where the proposed method produces thinner and more consistent point clouds. Overall, the proposed framework breaks the performance limits of low-cost IMU and demonstrates the potential of diffusion-based generative learning for virtual high-grade IMU data.
Unsourced random access (URA) has emerged as a promising paradigm for enabling massive connectivity in Internet-of-Things (IoT) networks. However, since URA transmissions do not contain device identifiers, the receiver may not associate decoded messages with their originating devices, introducing a security vulnerability: forged messages may be decoded as legitimate. To address this problem, this paper proposes a one-hot coding (OHC)-based URA framework that enables message authentication while preserving the unsourced transmission principle. Specifically, distinct messages are mapped onto orthogonal channel uses via an OHC-based common codebook and transmitted using on-off keying modulation. The resulting orthogonal channel structure enables radio-frequency fingerprint identification to authenticate received signals by exploiting device-specific hardware impairments, thereby authenticating decoded messages without introducing an additional authentication payload. Analytical expressions for the per-user probability of error and the probability of successful spoofing are derived. Numerical results demonstrate that the proposed scheme enables secure URA transmission while maintaining reliable communication performance in ultra-short-payload IoT scenarios.
Rare diseases dominate the diagnostic challenge in medical imaging yet are severely underrepresented in clinical datasets, causing classifiers to fail on exactly the conditions where reliable detection matters most. Generative augmentation can supply the missing tail-class coverage, but coarse disease labels aggregate diverse subtypes and acquisition settings into multi-modal conditionals that bias generators toward dominant submodes, while a shared Gaussian source forces rare subpopulations through disproportionately long transport paths. We propose an offline strategy that introduces informative priors at two levels: first, we partition each coarse label into coherent submodes via Gaussian mixture modeling in the generative model's latent space; second, we learn subclass-conditioned source distributions that re-center and re-scale the starting distribution per submode, shortening trajectories and reducing within-subclass dispersion. To prevent degenerate solutions we impose explicit geometric control, moderately concentrating normalized displacement directions around learnable prototypes while capping path-length outliers. On long-tailed chest X-ray (MIMIC-LT, NIH-LT) and CT slice (CT-RATE) benchmarks the proposed method consistently improves tail-class generation fidelity and diversity (FID, IRS) and is a promising augmentation strategy that reliably improves downstream balanced accuracy and macro-F1 over a non-augmented baseline across modalities.
Through-plane resolution in clinical MRI is typically much coarser than in-plane resolution, limiting diagnostic utility. This work investigates deep learning approaches to interpolate intermediate MRI slices in prostate imaging, effectively doubling through-plane resolution. I evaluated five architectures (CNN, U-Net, two GAN variants, and DDPM) and discovered that problem formulation has dramatically more impact than architectural complexity. By reformulating the interpolation task to use adjacent slices (i-1, i+1) rather than distant slices (i-2, i+2), I achieved a 58% improvement in SSIM performance across all deterministic architectures. The U-Net model achieved the best results with PSNR of 30.08 dB and SSIM of 0.898, representing a 10.1% improvement over linear interpolation baseline. A DDPM was also evaluated but showed poor reconstruction quality due to fundamental mismatch between stochastic generation and deterministic reconstruction requirements. These findings demonstrate that problem formulation can have 290x more impact than architectural sophistication in medical imaging tasks.
In adversarial settings, a mobile agent may strategically plan its motion to influence an opponent's inference about its intended goal. We study deceptive path planning in a scenario where a mobile agent aims to reach a privately selected goal while an adversarial observer allocates limited defensive resources based on the observed trajectory. Unlike classical path-planning and goal-recognition approaches that model observers as passive inference process, our game-theoretic formulation models them as strategic decision-makers. For the resulting dynamic asymmetric-information game, we develop an efficient solution method that combines a linear programming formulation with the Double Oracle algorithm. To evaluate performance, we introduce metrics that quantify both the risk and the effectiveness of deception and provide illustrative numerical examples.
We present MedASR, an open-source 105M-parameter model engineered for high-accuracy medical dictation. Prioritizing a "small, fast, and accurate" design, MedASR addresses 3 core pillars (1) Data: overcoming clinical corpora scarcity and class imbalance; (2) Modeling: efficient long-form training; and (3) Inference: accurate transcription via a pseudo-streaming sliding-window approach. Our evaluation shows that MedASR achieves a 58% relative WER reduction on Eye Gaze compared to Whisper Large-v3. By open-sourcing MedASR, we provide a transparent, high-performance backbone for specialized health-care applications, breaking down the barriers to clinical documentation often obscured by proprietary systems.
This paper focuses on learning efficient sensor allocations that ensure observability of unknown high-dimensional linear systems using only a small number of sensors. Existing methods either require an impractically large number of sensors or assume access to an observable allocation in advance. We propose a two-stage framework that overcomes these limitations: first, a novel system identification algorithm integrates information from multiple trajectories, each observing different subsets of state coordinates; then, a classic sensor allocation method is adapted to operate on the learned system parameters. Our non-asymptotic guarantees show that the proposed approach learns a sensor allocation with a near-optimal number of sensors when sensors can be allocated on any state coordinate. We further extend the results to settings with inaccessible state coordinates that are unavailable for sensor allocation.
Shear-horizontal surface acoustic wave (SH-SAW) filters have shown strong potential for low-loss, compact, GHz-frequency RF front ends. In this work, we demonstrate a high-performance SH-SAW filter design at 4.35 GHz utilizing 42°Y-cut thin-film lithium tantalate (LiTaO3) on a SiO2/Si platform. Despite the limitations of thin aluminum metallization and its associated ohmic losses, we show that implementing a Bartlett window apodization technique, primarily intended for in-band spurious-mode suppression, yields a significantly improved quality factor (Q) of 1,522 from 688 in conventional interdigitated SH-SAW resonators. This enhancement enables a third-order ladder filter at 4.3 GHz with an insertion loss of 1.59 dB, compared with 1.65 dB for a conventional SH-SAW filter. In addition, our filter with apodized resonator designs achieves a 3 dB fractional bandwidth (FBW) of 3.24% and out-of-band rejection exceeding 14 dB, all within a compact footprint of 0.4 mm2. These results suggest that apodized thin-film LiTaO3 designs are highly promising for low-loss, miniaturized, cost-effective radio frequency acoustic solutions in next-generation communication and sensing applications.
Fluid reconfigurable intelligent surfaces (FRISs) have recently emerged as a promising paradigm for wireless communications, wherein the reflecting elements can dynamically select their effective radiating positions from a dense preset grid, thereby introducing an additional degree of freedom. In contrast to conventional RIS architectures, FRISs can achieve spatial diversity with fewer physical elements. However, beyond the cascaded channel structure, FRIS-assisted systems are also affected by uncertainties arising from element-position mismatches caused by calibration inaccuracies or motion errors, which may degrade channel state information. To the best of our knowledge, channel estimation (CE) for FRIS-assisted systems under position uncertainty remains unexplored. To fill this gap, we propose a CE framework for a multi-user FRIS-assisted uplink system based on a two-time-scale FRIS configuration protocol that captures both reflection phase-shift and element-motion dynamics. By capitalizing on orthogonal pilot sequences and tensor modeling, we derive a closed-form solution that jointly estimates the individual channels and the motion-induced phase coefficients. Numerical results demonstrate notable performance in the presence of unknown position deviations.
Electrified powertrains rely heavily on magnetics for power conversion, where cost, volume, and weight concerns make integrated multi-use designs an attractive solution. With EV powertrain architectures requiring a boost stage being a major market segment, the proposed Coupled Inductor-Based Multi-Port DC-DC Converter (CI-MPC) leverages the existing magnetic framework of a conventional topology to realize independent, isolated, and simultaneously regulated converters without additional magnetic cores or cascaded stages. Unlike existing architectures that use secondary windings solely for voltage gain or passive rectification, the proposed topology integrates an actively controlled full bridge on the secondary side to create a distinct, independently regulated auxiliary converter. Primary output regulation is achieved via duty-cycle control, while the auxiliary converter employs phase-shift modulation synchronized with the primary switching to enable active rectification and flexible voltage or current regulation. A unified control framework ensures decoupled operation with minimal interaction between the primary and auxiliary loops, while also avoiding high step-down conversion ratios from high voltages to lower auxiliary levels. The operating principles and coordinated control strategies are validated through simulation and experimental results on a hardware prototype, demonstrating enhanced controllability, decoupled regulation, and a scalable pathway toward generalized multi-port power conversion within a unified magnetic framework.
A central obstacle in nonlinear Bayesian filtering is representing the belief distribution. Moment-based filters address this by propagating polynomial moments and reconstructing a density from them. Recent work completes the predict-update loop via the maximum-entropy (MaxEnt) principle, but each step requires the partition function and its gradient, both $n$-dimensional integrals whose cost scales exponentially, restricting the demonstrated MaxEnt moment filtering to $n \le 4$. We avoid the partition function entirely by combining score matching with Stein's identity. In our setting, score matching reduces the density fit to a single linear solve whose coefficients are assembled directly from the propagated moments. The same parameters then drive Stein's identity to close the moment hierarchy during prediction and to recover posterior moments after each Bayesian update, keeping the full predict-update loop free of partition function evaluation. The resulting Score Kalman Filter (SKF) reduces to the classical information-form Kalman filter as a special case and performs every step through linear algebra. On nonlinear coupled-oscillator networks, the SKF runs through $n=20$ and reports lower RMSE than the EKF, UKF, EnKF, and particle-filter baselines on the tested synthetic benchmarks.
This paper presents a novel data-driven framework for the robust safety verification and safe control synthesis of unknown monotone discrete-time systems. While existing data-driven safety analysis approaches are often either heuristic in nature or require large amounts of data to provide rigorous guarantees, we leverage the structural property of monotonicity to significantly reduce data requirements while still ensuring formal safety guarantees. Our approach is built upon a new class of certificates called dominance functions, constructed directly from collected system trajectories, which themselves need not be safe. By exploiting the monotone structure of the dynamics, we show that dominance functions are (i) dissipative, meaning that they decrease monotonically along system trajectories, and (ii) sufficiently \expressive to characterize safety certificates for monotone systems. Together, these properties establish dominance functions as principled building blocks for the systematic construction of formal safety certificates directly from trajectory data. For both robust safety verification and safe control synthesis, we develop an efficient sampling-based optimization framework that searches for safety certificates represented as linear combinations of dominance functions constructed from collected trajectories. We validate our data-driven framework on two monotone systems by successfully deriving safety certificates from a small number of trajectories.
Audio super-resolution (SR), also referred to as bandwidth extension (BWE), aims to reconstruct high-fidelity signals from low-resolution (LR) or band-limited (BL) observations, an inherently ill-posed task due to the ambiguity of missing high-frequency (HF) content. This survey provides a comprehensive overview of the field, with a particular focus on the paradigm shift from discriminative mapping to modern generative modeling. We first review early discriminative deep neural network (DNN) models, which formulate BWE/SR as a deterministic mapping problem and are prone to regression-to-the-mean effects and spectral over-smoothing. We then systematically review generative approaches, including autoregressive (AR) models, variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion and score-based models, flow-based methods, and Schrödinger bridges. Across these approaches, we examine key design aspects, including representation domain, architecture, conditioning mechanisms, and trade-offs among reconstruction fidelity, perceptual quality, robustness, and computational efficiency. Furthermore, we discuss emerging directions involving large language models (LLMs) and multimodal foundation models, and highlight open challenges in perceptual evaluation, phase modeling, and real-world generalization. By providing a structured taxonomy and unified perspective, this survey establishes a comprehensive foundation and offers a practical roadmap for advancing BWE/SR from deterministic point estimation toward distribution-aware generative modeling.
Fast charging of lithium-ion batteries is limited by lithium plating, which occurs when the anode potential drops below 0 V vs Li/Li+. Model-based control aims to maximize charging current while maintaining anode potentials above this threshold. In this work, a plating-free fast charging strategy is demonstrated using a Homogenized Model (HM) coupled with a classical PID controller. The HM, derived from homogenization theory applied to the Poisson-Nernst-Planck equations, retains the physics of the Doyle-Fuller-Newman model while capturing electrode microstructural heterogeneity in a one-dimensional double-continua formulation. By reconstructing three-dimensional distributions of electrochemical variables from precomputed closure variables, the HM enables non-invasive estimation of heterogeneous anode potentials, acting as a virtual sensor. Through MATLAB-COMSOL co-simulation, a PID controller regulates current to maintain the full 3D anode potential distribution above the plating limit, achieving model-based fast charging at a fraction of the computational cost of high-fidelity models. The results demonstrate the potential of HM-based control for safe, degradation-aware, and efficient fast charging of lithium-ion batteries.
AI-native 6G visions increasingly invoke wireless foundation models, large multimodal models, and wireless world models as the natural endpoint of AI-native networking, drawing an analogy to recent developments in large language models (LLMs). We argue that this analogy is structurally incomplete. The success of LLMs is based on a broad, reusable, and largely self-contained tokenized data substrate, whereas the wireless domain lacks an equivalent data foundation. Unlike text, code, or images, wireless data such as CSI tensors, IQ samples, or scheduler logs are not self-contained: their meaning is configuration-dependent, simulator-conditioned, task-disaggregated, and weakly grounded in operational feedback, all structural bottlenecks that undermine current pre- and post-training recipes. We therefore argue that monolithic models, including mixture-of-experts (MoE) and wireless world models, are not the most realistic near-term path toward deployable AI-native networks. Instead, emerging evidence points toward composable and agentic network architectures, where general reasoning models orchestrate specialized signal processing models, classical algorithms, digital twins, standards-aware retrieval, and safety checks through explicit programmable interfaces.
The rapid advancement of Vehicle-to-Everything (V2X) communications and Tele-Operated Driving (ToD) demands ultra-low-latency, 8K60 video telemetry. However, deploying modern hardware at the vehicular edge is frequently hindered by supply chain constraints, high power budgets, and growing e-waste concerns. This paper investigates a highly sustainable alternative: repurposing legacy NVIDIA Pascal GPUs for real-time 8K HEVC edge encoding. We demonstrate that triggering 2-Way Split Frame Encoding (SFE) on dual-NVENC GP104 and GP102 silicon successfully unlocks real-time 8K60 throughput with a negligible Rate-Distortion penalty of under 1%. Crucially, our micro-architectural analysis reveals that smaller GPU dies significantly outperform larger flagship models in both raw throughput and energy efficiency. Because fixed-function encoding forces general-purpose Streaming Multiprocessor (SM) cores to sustain maximum frequencies while remaining idle, GPUs with fewer CUDA cores waste drastically less power. While benchmarking against the state-of-the-art RTX PRO 6000 Blackwell highlights a generational compression efficiency gap, Pascal's functional HEVC architecture and native lack of B-frames align perfectly with ultra-low-latency V2X pipelines. Ultimately, repurposed mid-range Pascal GPUs present a highly capable, cost-effective, and e-waste mitigating solution for modern Intelligent Transportation Systems.
Learning-based dynamical models face a persistent tension between expressiveness and formal guarantees: richer model classes improve predictive accuracy, but their stability properties are typically verified only empirically, if at all. This paper proposes \emph{Stable Fiber-Koopman Residual Dynamics} (SFKD), a unified framework that simultaneously addresses environment-aware geometric consistency, latent-space stability certification, and bounded residual perturbation propagation. Concretely, SFKD constructs a fiber bundle latent manifold whose fibers encode environment-specific dynamics; an environment-conditioned Koopman operator governs the dominant linear evolution on each fiber; and a contraction-constrained residual neural network captures unmodeled nonlinear effects while admitting an explicit input-to-state stability (ISS) certificate. The resulting model is embedded in a sampling-based MPPI controller for autonomous vehicle path tracking under variable surface conditions and wind disturbances. Theoretical analysis establishes ISS of the latent dynamics and a finite ultimate bound on tracking error. Numerical experiments against five baselines -- Koopman MPC, Neural ODE, ICODE, ControlSynth, and ICODE-MPPI -- demonstrate a 31\% reduction in tracking RMSE, a 44\% improvement in control smoothness, and near-zero latent stability violation rate across environment-switching scenarios.
Vehicle-to-grid (V2G) technology empowers electric vehicles (EVs) to act as mobile energy resources, providing critical support to power systems, especially under stressed conditions. To understand the economic mechanism driving V2G participation and its benefits to power grid, this paper proposes a multi-player coupled equilibrium framework that models the bidirectional interactions between power grid operations and EV routing, incorporating charging and discharging choice in a preprocessed feasible path generation procedure. Energy prices are endogenously determined by market clearance conditions. We formulate the overall problem as a Variational Inequality that unite the decision-making of Distribution System Operator, Charging Network Operator, Load Serving Entities, and EV drivers. Numerical studies validate the framework under two stress scenarios: increased household load and power line outages. Results show that when EVs are incentivized by reduced generalized path costs, V2G is particularly effective in eliminating load shedding and reducing distribution locational marginal electricity prices. On the transportation side, V2G can lead to divergence in EV behavior between normal and scarcity conditions, and alter route choices yet improve overall trip economic.
Ocean exploration places high demands on autonomous underwater vehicles, especially when there's observation delay. We propose age of information optimized Markov decision process (AoI-MDP) to enhance underwater tasks by modeling observation delay as signal delay and including it in the state space. AoI-MDP also introduces wait time in the action space and integrates AoI with reward functions, optimizing information freshness and decision-making using reinforcement learning. Simulations show AoI-MDP outperforms the standard MDP, demonstrating superior performance, feasibility, and generalization in underwater tasks. To accelerate relevant research, we have made the codes available as open-source at this https URL.
Coincident Peak (CP) pricing is widely used in U.S. electricity markets to allocate capacity and transmission costs. This paper develops a behavioral game-theoretic framework for CP-driven load shifting that couples a nonlinear cost-allocation model with day-ahead (one-shot) and real-time (sequential-learning) decision processes. We examine two update rules, namely best-response dynamics (BRD) and fictitious-play dynamics (FPD), across continuous and finite action spaces to quantify how flexibility, action resolution, and participation influence peak outcomes. Using ERCOT peak-day data, we find that FPD reliably reduces system peaks, whereas BRD is more variable and can increase peaks under tight-capacity conditions. Finer action resolution improves peak shaving, while the number of participants is largely neutral when aggregate flexibility is fixed. Meanwhile, information-provider signals can induce herding, whereas response-aware or diverse signals improve peak shaving. These results highlight both the potential and limits of CP pricing: smoothing information and enabling granular control are as important as the amount of available flexibility. The framework offers practical guidance for system operators and consumers: For ISOs, broadcasting smoothed CP signals and setting minimum controllable-capacity thresholds enhance coordination. For consumers, greater flexibility and finer control resolution improve both cost savings and peak-shaving performance.
As connected and autonomous driving technologies advance, vehicles increasingly rely on data from external sensors. Although this information can enhance state estimation, processing all available streams imposes significant communication and computational costs. To address this challenge, we introduce a Sensor Management Center (SMC) that selects a low-cost subset of external sensors in real time while satisfying chance-constrained error bounds derived from an Extended Kalman Filter (EKF) covariance. We formulate the selection problem as a multidimensional minimum knapsack problem and adopt a deficiency-weighted greedy algorithm as an approximate yet efficient solution. The proposed approach is validated through MATLAB simulations and experiments on a 1:15-scale cooperative driving testbed.
Evaluating resilience in electric distribution systems under severe weather requires models that can connect network topology, hazard simulation, fragility modeling, restoration assumptions, repair strategy, and downstream consequences. This paper extends our prior graph-based resilience evaluation framework for power distribution systems in three ways: it adds analysis conditioned on historical events with real outage and weather data, introduces sensitivity studies for key modeling assumptions, and includes a coupled power-flooding extension for sewage-backup assessment. Historical wind events drive Monte Carlo simulations conditioned on real weather, and the observed outage trajectories are treated as realized historical samples for comparison. Wind-event resilience metrics stabilize at approximately 256 episodes, and outage peak, duration, and outage intensity change systematically with fragility parameters, network topology, restoration assumptions, and repair strategies. In a separate 1000-episode joint power-flooding simulation, episodes with at least one flooded customer occur in 1.9% of episodes overall, and both flood occurrence and flood intensity increase with outage intensity, showing a selective power-to-flood consequence pathway. Overall, the framework provides a practical basis for resilience assessment, comparative scenario analysis, and coupled power-flooding studies in a limited public-data setting, while also suggesting that more detailed utility data could further improve simulation realism.
Learned image compression has achieved competitive rate-distortion performance, but very-low-bitrate reconstruction remains difficult because the transmitted representation often cannot preserve fine textures and local structures. Perceptual and generative codecs address this problem by using learned reconstruction priors, and controllable codecs allow one model to cover different bitrate and reconstruction preferences. However, controllability alone does not resolve the decoder-side reconstruction-prior problem: under severe bit constraints, the decoder must infer missing details from limited transmitted information, while existing codebook-based controllable designs generally rely on single-codebook token-based priors. This paper proposes Adaptive Fused Prior Transfer for Controllable Generative Image Compression (AFP-GIC), a controllable codec that transfers an adaptive fused prior from a frozen pretrained AdaCode model. Encoder-side fused-prior features guide latent formation, while the decoder predicts a compatible fused prior from the compressed representation and selected control variables, enabling prior-guided reconstruction without transmitting the fused prior itself. A motivating analysis relates decoder-side fused-prior alignment to a reconstruction-error upper bound and shows that the fused-prior family contains single-codebook choices as special cases. Under the unified benchmark, AFP-GIC reduces decoder latency by 18.1% and the overall parameter count by 31.10 million (20.5%) relative to DC-VIC. Experiments on Kodak, CLIC2020, and DIV2K show competitive PSNR, with the clearest perceptual gains in NIQE scores and very-low-bitrate visual comparisons.
Orthogonal frequency division multiplexing (OFDM) is a key waveform for integrated sensing and communication (ISAC) due to its spectral efficiency and compatibility with modern wireless standards. In multi-target and clutter-rich environments, however, payload-based OFDM-ISAC can suffer from data-dependent sidelobes induced by non-constant-modulus modulation symbols. To overcome these limitations, this paper proposes a region-of-interest mismatched filter (ROI-MMF) that suppresses sidelobes within a prescribed delay region while preserving the mainlobe response. By leveraging the Woodbury identity, the proposed design admits an efficient closed-form implementation whose complexity scales with the ROI size rather than the number of subcarriers. We theoretically provide the ranging mean-square error (MSE) of the designed ROI-MMF, which shows the superior performance compared to conventional matched filtering (MF) and reciprocal filtering (RF) sensing receivers. Simulations across various constellations show that the proposed sensing receiver achieves a ranging MSE approaching the Cramér-Rao bound (CRB), which notably confirms that our design preserves the target ranging performance even under the non-constant-modulus constellation. Finally, the framework is experimentally validated with our over-the-air OFDM-ISAC testbed.
This paper provides a thorough mathematical analysis of continuous movable antenna (MA) arrays. Focusing on the multiple antenna case, we consider a linear antenna array with multiple fixed antenna elements that moves along a line. We assume a full, spatially coherent correlation model and continuous positioning of the array. We provide asymptotically exact approximations to the upper tail of the cumulative distribution function (cdf) of the signal-to-noise ratio (SNR), considering both correlated and uncorrelated antenna elements in the array. We also obtain a novel closed-form expression for the level crossing rate (LCR) of the SNR under correlated array elements, where a non-separable two-dimensional correlation is present. The analysis is validated through simulations, confirming both the accuracy of the LCR expressions and the tightness of the cdf bounds in the upper tail. Numerical results show that the proposed MA array outperforms single fluid antenna and fixed array systems, with reduced inter-element spacing providing further performance gains.
We analyze multi-user fluid antenna systems with continuous positioning over a track of length L under a spatial correlation model, where exact performance distributions become analytically intractable. We develop a level-crossing-rate (LCR) framework that yields asymptotically exact approximations and tight bounds for the cumulative distribution function (cdf) of the optimized metric S* = sup_{0 <= l <= L}, where S(l) denotes the performance metric at antenna position l. For a single fluid antenna, we characterize the cdfs of signal-to-noise ratio (SNR), signal-to interference ratio (SIR) and signal-to-interference-plus-noise ratio (SINR) under Rayleigh fading and extend the approach to Ricean desired channels. We further treat two multi-antenna receiver layouts with maximum-ratio combining: (i) a fluid antenna with a fixed antenna and (ii) a two-element moving array, deriving new LCR results for the practically important case where array-element correlation and positional correlation are inherently coupled. The analysis provides actionable insights: high-threshold tail probabilities scale linearly with L, we derive the required L to neutralize a co-channel interferer, and we show that about one wavelength of movement can reduce outage by three orders of magnitude. Monte Carlo results validate the accuracy across the considered scenarios and regimes.
Integrated Sensing And Communication (ISAC) is recognized as a key enabler for future 6th Generation (6G) networks, combining communication capabilities with pervasive sensing. In such systems, the estimation of the Doppler shift plays a crucial role for target characterization. However, typical real-world ISAC scenarios largely involve bistatic or multistatic configurations and mobile ISAC nodes. Under these conditions, Doppler estimation becomes particularly challenging, as clock asynchrony between the Transmitter (TX) and the Receivers (RXs), combined with their mobility, introduces additional Doppler components and phase offsets that distort or disrupt the target-induced frequency shift. Existing works have considered these challenges separately or relied on external reference reflectors. In this paper, we present the first method to estimate the Doppler frequency of a target with mobile and asynchronous ISAC nodes in a multistatic configuration, considering the case of a mobile TX and multiple static RXs, and without leveraging any external reflector. By leveraging the invariance of the phase offsets across multipath components and exploiting geometrical relationships, we show that the problem is solvable if at least 4 RXs are present. We evaluate the proposed solution through numerical simulations in various scenarios, showing that it is a valid approach for estimating target Doppler shifts in unsynchronized multistatic ISAC deployments with mobile nodes.
A hard real-time system cannot miss any deadline. A weakly-hard real-time system, on the contrary, is designed to tolerate a specific number of deadline misses. For instance, the AnyMiss(2, 300) weakly-hard constraint stipulates that in every window of 300 consecutive jobs, at most 2 deadlines are missed. The weakly-hard model is the state-of-the-art for industrial dependability-by-design of control systems that tolerate deterministic failures. Weakly-hard constraints correspond to regular languages. The size of the minimal finite state machine that recognizes whether a string satisfies the constraint (about 45k states for AnyMiss(2, 300)) is a notorious impediment for the verification of control system properties. This paper discusses an over-approximation of the language that allows us to provide sound safety guarantees for control systems under deadline misses that would be out of reach using the minimal finite state machine. We present a compressed language acceptor and prove that it simulates the original finite state machine. We study language cardinality properties, and report on empirical results that show how the new acceptor can be embedded in the control design workflow, leading to verifying safety for systems for which the state-of-the-art tools do not provide answers.
Continuous autoregressive speech synthesis has recently emerged as a promising direction for zero-shot text-to-speech (TTS). However, existing methods still suffer from a fundamental mismatch between semantic-prosodic modeling and reconstruction-driven continuous speech representations. This mismatch causes TTS models to focus excessively on low-level acoustic textures at the expense of high-level semantic coherence, further exacerbating error accumulation in autoregressive generation. To address this challenge, we propose SemaVoice, a semantic-aware continuous autoregressive framework for high-fidelity zero-shot TTS. SemaVoice introduces a Speech Foundation Model (SFM) guided alignment mechanism that refines continuous speech representations to better capture both local semantic consistency and global structural relationships. These representations condition a patch-wise diffusion head within the autoregressive framework for high-quality speech synthesis. Experimental results on the Seed-TTS benchmark show that SemaVoice achieves an English WER of 1.71\% and remains highly competitive with state-of-the-art open-source systems in both objective and subjective evaluations. The effectiveness of SFM guided alignment is further confirmed by significant improvements under varying representation granularities with a fixed information-rate constraint.
The reconfigurable intelligent surfaces detection and identification (RISs-ID) is a critical process that enables a base station (BS) to adaptively assign the appropriate RIS to a given user equipment (UE). This work proposes a novel modulation scheme to enhance the reliability of RIS-ID by reducing the miss detection and false-alarm probabilities. Specifically, we leverage the RIS's passive beamforming gain to enable over-the-air modulation of the RIS ID, combined with passive beam sweeping to extend detection coverage in angular space. The proposed modulation scheme is validated through computer simulations and prototype experiments, demonstrating its effectiveness in reducing miss-detection and false-alarm probabilities.
Recent advances in Time Series Foundation Models (TSFMs) promise zero-shot forecasting capabilities with minimal task-specific training. While these models have shown strong performance across generic benchmarks, their applicability in volatile, complex electricity markets remains underexplored. Addressing this gap, this study provides a systematic empirical evaluation of several TSFMs, specifically Chronos-2 and Chronos-Bolt (developed by Amazon), and TimesFM 2.5 (provided by Google), for forecasting Belgian day-ahead and imbalance electricity prices. For both considered markets, Chronos-2 in ARX mode produces the most accurate forecasts. Compared with the best ensemble prediction from other machine learning methods, Chronos-2's Mean Absolute Error (MAE) is 5% lower for the day-ahead market. In contrast, the model yields 10% higher MAE predicting imbalance prices across all forecast horizons, except for the two-hour-ahead horizon. Moreover, we find that TSFMs exhibit genuine zero-shot forecasting skills but still struggle under extreme market conditions.
Off-grid microgrids powered entirely by renewable energy sources face substantial challenges in achieving utility-grade reliability standards. Existing microgrid planning frameworks often prioritize cost minimization while treating reliability as a secondary metric, thereby leading to suboptimal designs. This paper presents a comprehensive scenario-based optimization framework that simultaneously addresses long-term capacity planning and short-term operational dispatch in two stages for 100%-renewable microgrids. The developed two-stage stochastic programming model co-optimizes the investment and operation of photovoltaic generation and battery energy storage, while ensuring compliance with stringent reliability constraints following utility grid standards. Network modeling with operational constraints, such as line capacities and voltage limits, is incorporated to allow distributed resource placement leveraging power sharing between microgrid nodes. A novel scenario generation approach captures critical uncertainties, including seasonal demand fluctuations, solar output variations, and probabilistic equipment failures, through the statistical clustering of historical data. The optimization framework integrates utility-grade reliability constraints limiting the expected energy not served to below 0.002% of the annual demand while minimizing the total system costs. Numerical simulations demonstrate the effectiveness of the proposed framework, achieving 99.998% supply reliability using only photovoltaic power and battery energy storage. The optimized network-aware distributed resource allocation provides inherent resilience through power rerouting during component outages, maintaining load continuity even under simultaneous equipment failures. This study confirms the feasibility of 100%-renewable microgrids to support remote communities while meeting utility-grade reliability benchmarks.
Renewable electricity generation has grown significantly across many European power systems, leading to a greener energy mix, but also additional complexity in balancing electricity supply and demand. Unexpected differences between forecasts and actual output can lead to fluctuations in the system imbalance, which causes volatile imbalance prices. Accurate imbalance price forecasts are crucial for market players to choose a strategic balancing position. In early works, most forecasting methods combined fundamental and statistical approaches, but currently there is a clear trend towards data-driven machine learning models. This review compares forecasting algorithms in European markets with a focus on methodology. We emphasize the importance of high-quality input data, including intraday information and per-minute system data. Next, we identify the need for a common benchmark to compare novel forecasting methods developed for different markets and time periods. Finally, we argue that forecasts should be evaluated in terms of both downstream value and accuracy.
Bluetooth Core Specification v6.0 introduces Channel Sounding (CS) as a standardized high-accuracy ranging primitive for Bluetooth Low Energy (BLE). However, standard CS usage remains tied to per-pair LE asynchronous connection logical transport (LE ACL) connections, which adds initiation overhead, limits concurrent partners, and transfers results over the connection itself. We present a connectionless CS architecture that combines the LE CS Test command with Periodic Advertising with Responses (PAwR). A Central Orchestrator, a Gateway, and synchronized Tag/Anchor devices coordinate measurement configurations and aggregate results at the application layer. Each device derives its role, channel sequence, and response slot assignment from its device index and a Peer-to-Peer Assignment Matrix distributed via PAwR. The deterministic channel sequence prevents same-step collisions across parallel CS procedures, while matrix updates reconfigure arbitrary device-to-device pairings within a PAwR subevent group. A compact data plane omits fields recoverable from the shared measurement configuration and reduces the serialized ranging-data payload by approximately 69%, enabling result reporting through PAwR response slots. A proof-of-concept evaluation on the Nordic nRF54L15 platform shows that deterministic channel management eliminates the collision-induced outliers observed under simulated dense-deployment channel overlaps. At a 1 s update cycle, the architecture reduces steady-state active charge by 40-48% relative to a fair connected baseline and cuts per-switch initiation overhead by approximately 98%. Under per-cycle partner switching, these effects combine to up to 88% lower total charge over a 24 h horizon. An empirical timing model projects a capacity upper bound of 16,384 active devices per PAwR train at four CS procedures per device per cycle, 37 channels, and a single antenna path.
This paper develops a fault detection and identification (FDI) method for nonlinear control-affine systems under simultaneous actuator and sensor faults. We adopt a geometric approach to study the isolability of faults in the sense of the principal angles between subspaces corresponding to each actuator and sensor fault. As for the fault identification, a hybrid estimator that consists of a Luenberger-like observer with contraction guarantees is developed. Moreover, neural networks are embedded in the mentioned observer to estimate actuator and sensor faults. Considering that the training dataset for neural networks cannot be representative of every fault scenario, the last layer of each network is adapted using mirror descent-based laws. The mirror descent-based adaptive laws impose isolability conditions for fault channels and do not assume a quadratic parameter estimation space to consider the geometry of the fault subspaces. A Lyapunov-based analysis establishes that the state and parameter estimation errors are uniformly ultimately bounded. The effectiveness of our proposed FDI method is illustrated on the 3-axis attitude control system of a spacecraft.
Second-generation Starlink Direct-to-Cell (DTC) satellites carry an additional payload for direct cellular phone connectivity whose unintended electromagnetic radiation (UEMR) at sub-300 MHz frequencies has not been individually characterised. We reanalyse 112,534 detections from 1,806 Starlink satellites observed with the Engineering Development Array version 2 (EDA2) at 21 frequencies between 72.685 and 234.375 MHz (Grigg et al. 2025), separating 175 DTC and 1,623 Ku-only v2-Mini comparison satellites via the McDowell General Catalogue (McDowell 2020). DTC satellites emit a range-corrected flux density 1.45x that of the Ku-only comparison (Cliff's delta = +0.30, p = 2.6e-11). At 230.469 MHz the XX detection fraction reaches 0.811 against a 0.481 baseline (p ~ 1e-274), and 11 of 21 frequency channels show Benjamini-Hochberg-significant polarisation anomalies. The DTC population is brighter in eclipse than in sunlight (illuminated/eclipsed flux density ratio 0.47) while the Ku-only comparison shows the opposite sense (1.18); the reversal persists across altitude, sub-satellite latitude, frequency, and launch-epoch matching. The reversal strongly disfavours UEMR mechanisms that scale monotonically with instantaneous solar photocurrent and favours an active on-board source whose effective duty cycle is larger at lower equilibrium temperature. Within the 230.469 MHz coarse channel, fine-channel inspection isolates the excess to a single ~24 kHz bin near 230.627 MHz, tail-driven and absent at five control channels. Three falsifiable mechanism-discrimination tests show this feature is not coincident with the LOFAR-resolved Bassa et al. (2024) clock fundamentals, is unresolved at the EDA2 24 kHz resolution, and is heterogeneously expressed across the v2-Mini fleet rather than driven by a few permanently bright units or by uniform thermal scaling.
Low-dose CT (LDCT) denoising remains an important yet challenging problem in medical imaging. Although recent learning-based methods have shown promising performance, those optimized using classical pixel-level objectives often produce over-smoothed reconstructions. Existing mainstream generative models, such as diffusion models, have improved fidelity at the cost of expensive multi-step iterative inference, which limits their practicality for real-time use. To address this gap, we propose a Residual-Driven Drifting Model (RDDM) for effective, efficient, and high-fidelity LDCT denoising. Inspired by the recently proposed Drifting Models, RDDM incorporates the multi-step distribution evolution into the training dynamics through a residual drifting field, thereby enabling one-step denoising. Specifically, the residual drifting field is formed by an attractive force induced by the residuals between LDCT and normal-dose CT (NDCT) and a repulsive force induced by the generated residuals. In addition, by adjusting the parameter settings and incorporating pixel-level supervision, we develop three RDDM variants, covering application needs from detail preservation to stronger noise suppression. Extensive experiments demonstrate that RDDM achieves state-of-the-art denoising performance among supervised baselines. In particular, RDDM-Fine produces reconstructions that are highly consistent with NDCT, achieving superior PSNR and SSIM together with the best FID of 5.87 while preserving realistic anatomical textures. Moreover, RDDM enables on-the-fly inference, requiring only about 15 ms to denoise a single 512 x 512 LDCT slice. These results establish RDDM as a promising solution for high-fidelity and real-time LDCT denoising in clinical applications.
In 2024, Texas operators observed 23-Hz oscillations in real power measurements close to a large electronic load (LEL). Oscillations emerged when the load's power consumption reached approximately 320 MW level and subsided as the active power demand decreased. The paper aims to analyze the event and reproduce the oscillations using electromagnetic transient (EMT) simulations. In the first stage, a representative feedback system is developed, and frequency-domain analysis is conducted to examine the phenomenon and identify its key influencing factors. Next, detailed EMT simulations are performed to further validate the proposed analytical approach. The results show that the feedback system effectively captures and characterizes the critical features of the 23-Hz oscillation incident. In addition, the EMT simulations successfully reproduce the real-world event, with the simulated results closely matching the fault recorder data.
Grid-forming (GFM) converters are generally expected to exhibit low impedance near the fundamental frequency due to their voltage-source behavior. However, an impedance peak and a negative-resistance region are consistently observed in this range, which contradicts this expectation and lacks a clear physical explanation. This paper reveals that these phenomena originate from the inherent dynamics of the active power control loop, where the mapping from power disturbance to the synchronous angle inherently involves an integrative action, intrinsically preventing a positive-resistance characteristic near the fundamental frequency. This finding explains why existing grid codes in China, the United States, and Europe exclude a narrow band around the fundamental frequency in impedance-based evaluations. It is further shown that the width of the excluded frequency band (e.g., +/- 3~5 Hz) is governed by the power-to-frequency dynamics. Based on this insight, a quantitative index is proposed to determine the exclusion bandwidth from the corner frequencies of the impedance magnitude curve. The proposed index provides a concise and theoretically grounded criterion for voltage-source assessment and impedance standardization of GFM converters.
Robust selective auditory attention under multilingual interference is critical for reliable deployment of Large Audio Language Models (LALMs). We introduce MUSA, a cocktail party-inspired multilingual benchmark for source-grounded spoken-language understanding and reasoning. Each item pairs an English target dialogue with a semantically plausible distractor in English, Spanish, Korean, or Chinese, and evaluates models across (1) single, (2) source separation-based two-stage, (3) and end-to-end cocktail party settings under controlled SNRs. Evaluating two closed-source and four open-weight LALMs, we find that strong single performance does not ensure robust selective auditory attention: cocktail party accuracy degrades under severe SNRs, and errors are dominated by distractor-grounded source confusion. In addition, separation reduces acoustic overlap but leaves source attribution unresolved, often yielding confident wrong-stream answers. Data and code will be released upon publication.
Control science is a core representative of the third industrial revolution and is so important to modern civilization. Control systems are the main subject of control science and may involve many aspects of consideration, such as hardware consideration, software consideration, operation consideration, maintenance consideration, economy consideration, society consideration. However, besides all such aspects of consideration, one aspect that is most essential to the control system is methodology consideration in mathematical sense, knowledge on which is what we refer to as control theory. Besides its importance from the mathematical perspective, control theory is even more charming as it is deeply rooted in practical applications. Charms of control theory consist in both know-why and know-how and it is the fusion of control theory and practical applications that highlights such charms. Control theory for practical applications, especially when somewhat with so-called ``advanced'' flavour, involves several fundamental aspects. This article introduces the Handling Control System Uncertainty aspect of Advanced Control Theory for Practical Applications.
This work introduces a latency-aware benchmarking framework for evaluating deep learning models in power system anomaly detection using high-fidelity, time-domain signals generated from an industry-grade electromagnetic transient simulator. Eight neural network architectures, ranging from MLPs to Transformers, were systematically evaluated on streaming datasets representing both physical faults and cyber-attacks in inverter-dominated networks. All models successfully classified two representative multi-event sequences in real time with sub-cycle response times below 15 ms. However, although classification decisions occurred within one cycle, the end-to-end inference latency consistently exceeded three cycles, ranging from 50 to 90 ms. These results highlight a critical gap between algorithmic capability and protection-grade deployment, pointing to the need for further optimization and hardware acceleration. The findings establish a reproducible benchmark for sub-cycle anomaly detection and provide guidance for transitioning machine learning methods from research prototypes to real-world protection applications.
Clustered cell-free networking paves a new way for enabling scalable joint transmission among access points (APs) by partitioning the whole network into non-overlapping subnetworks. Previous works adopted clustering algorithms, graph partitioning methods or conventional continuous optimization theories to partition a network based on the channels between all users and all APs, resulting in huge channel measurement and computational costs. This makes these methods difficult to be implemented in practical systems since the optimal network partition could vary frequently due to user mobility. In addition, existing methods were usually designed for specific clustered cell-free networking problems with different optimization algorithms employed. In this paper, we leverage deep reinforcement learning (DRL) for clustered cell-free networking so as to rapidly adapt to user movements in dynamic environments, and propose a deep deterministic policy gradient based clustered cell-free networking (DDPG-C$^{2}$F) framework that can be adapted in various application scenarios. Moreover, in our framework, only one single channel needs to be estimated at each AP as the input of the neural network, which greatly reduces the channel measurement costs for clustered cell-free networking, and the training and inference costs of our framework. The proposed DDPG-C$^{2}$F framework is then applied to various clustered cell-free networking problems with different objectives and constraints to demonstrate its performance. Simulation results show that our framework outperforms existing baselines in all scenarios. Moreover, we show that the proposed framework can reduce the handover cost over user mobility, and is robust to dynamic scenarios with random user joining or leaving.
We introduce UPSim (UxNB Propagation Simulator), a ray tracing-calibrated, semi-deterministic solution for spatially consistent FR3 air-to-ground propagation modeling in uncrewed aerial vehicle (UAV) networks. Instead of launching rays for every receiver position, UPSim derives deterministic visibility regions from 3D building geometry via shadow projection. It then augments these regions with line-of-sight (LOS) state-specific and altitude-aware path loss, correlated large-scale fading, and small-scale fading. Calibration and validation against FR3 ray tracing data using the global 3D-GloBFP building dataset demonstrate that UPSim accurately reproduces empirical channel distributions. Furthermore, the resulting maps support route-based analysis of channel evolution over complex urban layouts, exposing critical trajectory-level statistics such as outage distances. Consequently, UPSim offers a highly scalable, practical middle ground between computationally expensive full ray tracing and purely stochastic channel generation for mobility-aware planning and radio-map construction in aerial access scenarios.
Spatially selective active noise control (SSANC) hearables aim to attenuate noise from certain directions at the eardrum while preserving desired speech arriving from selected directions. Existing SSANC systems typically assume an accurate estimate of the secondary path from the loudspeaker to the inner error microphone. In practice, however, this path varies across users and device fits, which can degrade performance and compromise system stability. This paper proposes a robust soft-constrained optimization framework that computes a single control filter by minimizing the average cost over a set of secondary path estimates derived from human measurements. Simulations and experiments on a real-time control platform show that the proposed approach slightly reduces mean performance relative to the matched case but substantially narrows the performance spread under secondary path mismatch. The proposed framework therefore provides a practical design strategy when accurate secondary path estimates are unavailable.
High-fidelity text-to-music generation typically relies on massive proprietary datasets and immense computational resources. Existing models often struggle to generate coherent pure musical accompaniments and lack precise, localized semantic control due to their reliance on coarse, track-level annotations. To address these limitations under constrained data and computing resources, we propose S2Accompanist, a Semantic-Aware and Structure-Guided Diffusion Model developed for the ICME2026 ATTM Grand Challenge. Specifically, we design an automated data pipeline comprising structural segmentation, Large Audio-Language Model driven segment-level captioning, and dual-metric quality grading to overcome the absence of localized metadata in raw datasets. Furthermore, we propose a semantic-aware Variational Autoencoder fine-tuning strategy that explicitly distills foundational LeadSheet structures into the acoustic latent space, effectively improving the overall audio fidelity. Extensive experiments demonstrate that S2Accompanist achieves state-of-the-art objective performance on the ATTM Grand Challenge benchmark across both the Efficiency and Performance Tracks. With only 402M parameters, our model remains competitive compared to larger-scale unconstrained models and secured first place in the Efficiency Track.
Finding sound effects or environmental sounds that match a creator's intended impression remains a largely manual process in multimedia production. This is especially relevant for comics and other visual media, where visually stylized onomatopoeic expressions convey auditory impressions through letter shapes, strokes, layouts, and decorative patterns. However, cross-modal retrieval between onomatopoeic images and general sounds has been largely unexplored. This paper thus introduces a bidirectional retrieval framework between onomatopoeic images and the corresponding sound clips. Instead of directly comparing embeddings extracted from pretrained image and audio encoder, we train modality-specific projection heads that re-align the embeddings for visual onomatopoeia and corresponding sounds. We then construct the Multimodal Image-Audio Onomatopoeia dataset (MIAO), which contains paired onomatopoeic images and sound clips across 50 sound event classes. Experimental results show that the proposed method substantially outperforms a zero-shot baseline using pretrained CLIP and CLAP embeddings. These results demonstrate that adapting pretrained representations enables effective retrieval in both directions: from onomatopoeic images to sounds and from sounds to onomatopoeic images.
Weakly labeled datasets such as AudioSet have driven recent progress in audio tagging. However, annotation quality varies across sound classes. Labels may be incomplete, ambiguous, or unreliable, which introduces class-dependent supervision bias during optimisation. The issue becomes harder as real and generated audio are increasingly mixed in training, and generated samples do not always match their intended semantic labels. Prior work mainly addressed unreliable supervision from missing-positive labels, while this paper targets three other sources of unreliable supervision: spurious additions, misassignments between similar classes, and weakened label evidence. These effects introduce class-dependent optimisation bias that is not explicitly modeled by most existing methods. To bridge this gap, the paper proposes a Class-wise Supervision Unreliability (CSU) framework that controls supervision strength at the class level during training. CSU learns a separate unreliability parameter for each class and down-weights less reliable supervision without changing the model architecture or inference process. To support evaluations, this paper also introduces ESC-FreeGen50, a manually verified benchmark of 50 sound classes that combines real and generated audio. Experiments on controlled benchmarks and AudioSet show that CSU improves robustness across different architectures and different sources of supervision unreliability. The results indicate that explicit class-wise modeling of supervision unreliability is an effective and practical strategy for robust audio tagging under large-scale weakly labeled training. Code and data are available at: this https URL
We experimentally evaluated the sensing-communication trade-off from the fixed-point precision MIMO equalizer using FPGA. At 7-bit, noise floor drops 100x and angular error 63%, but the communication performance saturates while the hardware complexity rises.
This letter proposes a distributed 3D leader-follower formation (3D-LFF) control framework for multi-UAV systems that achieves formation tracking while enforcing perception safety constraints. Maintaining safe, vision-based 3D-LFF is challenging because onboard cameras impose strict Field-of-View (FOV) limitations, and demanding formation commands can drive the leader outside the follower's camera frustum, resulting in loss of visibility. To address this issue, we develop a perception-aware safe control architecture that guarantees visibility by construction. First, we derive a relative kinematic model in a line-of-sight coordinate representation and design a distributed 3D-LFF tracking controller using only locally available relative states. Next, we embed the nominal formation controller within a Control Barrier Function-based Quadratic Program (CBF-QP) safety filter that minimally modifies the commanded velocities to maintain the leader inside the follower's camera frustum while preserving formation tracking whenever feasible. Gazebo simulations and Crazyflie hardware experiments validate the proposed approach, demonstrating accurate formation tracking and effective FOV enforcement, including scenarios in which the nominal desired formation conflicts with visibility constraints.
This letter investigates the problem of output synchronisation in heterogeneous dynamical networks with nonlinear diffusive couplings in the presence of disturbances on the coupling links. By exploiting relative dissipativity properties between adjacent agents, distributed conditions are established to guarantee output synchronisation. Specifically, these conditions can be verified using only local information associated with neighbouring agents and coupling links. As an illustration, a heterogeneous network of Goodwin oscillators is considered, where the relative dissipativity properties between neighbouring oscillators are characterised and used to analyse synchronisation.
Distributed controller synthesis offers scalable and privacy-preserving control design, but typical state-of-the-art approaches either assume white-box models or resort to centralized synthesis. In this paper, we combine partially known model knowledge and an input-state dataset within a distributed gray-box scheme to design \(\mathcal{H}_2\) controllers. Our method can handle unknown dynamics and offers scalable synthesis. Each agent communicates with a set of neighbors determined by the physical coupling topology of the system such that we can apply the Alternating Direction Method of Multipliers (ADMM) to solve the problem iteratively in a fully distributed fashion (i.e., without a central server). The effectiveness and flexibility of the proposed approach is demonstrated in simulations of the IEEE 39-bus power system test case.
The sixth generation (6G) of mobile communications and beyond is expected to enable advanced functionalities, such as integrated sensing and communication (ISAC), while involving diverse terminal/user equipment types from terrestrial to non-terrestrial networks. As waveforms are acknowledged as a fundamental technology driving 6G and beyond, this article presents a contribution in this technical domain. First, it provides an overview of several standardized communication waveforms, as well as chirp-based waveforms for radar sensing and Internet of Things (IoT) applications. This article then presents single-carrier chirping waveform: discrete Fourier transform spread orthogonal frequency division multiplexing (DFT-s-OFDM) with chirping. Its fundamental principles, key properties, performances, and advantages are examined from both communication and sensing perspectives. Finally, several future research directions are outlined to further explore its potential and opportunities for ISAC.
This work establishes a framework of near-field communication under different array geometries of extremely large-scale multiple-input multiple-output (XL-MIMO). We first formulate the near-field spatial non-stationary channel model which is characterized by the distance between the user and each antenna on uniform and modular curved arrays. By fixing the total number of antennas while varying the degree of curvature, we investigate a fair case where the horizontal arc length of the curved array is the same as the planar array. We explicitly unveil the non-trivial impact of array curvature on extending the near-field region for cell edges. Then, for arbitrary array geometries and arbitrary-field channels, we estimate the spatial-domain channel by tackling a compressed sensing problem with a learned regularizer. Without relying on specific codebooks, we propose a denoising autoencoder (AE)-aided approximated message passing (AMP) algorithm and provide the corresponding theoretical replica bound. Finally, based on the estimated channel, we propose an optimization algorithm to maximize the sum user rate for sub-connected XL-MIMO systems by jointly designing the array geometry and hybrid precoding in the downlink. Numerical results demonstrate that the proposed AE-AMP algorithm can effectively estimate the spatial non-stationary near-field channels with robustness and generalities compared to several conventional and deep-learning-based benchmarks. The improvement of data rate by using modular curved arrays with the estimated channel is also validated.
Edge inference systems are typically evaluated with software-reported latency collected under controlled conditions. We argue, and demonstrate empirically, that deployment interference can corrupt not only the inference timing being measured but the timing observability infrastructure that measures it, and that the two failures can occur independently. We pair software-reported timing with externally observable GPIO intervals captured by a Saleae Logic Pro 8 logic analyzer on an NVIDIA Jetson Orin Nano, running MobileNetV2 under two inference architectures (TensorRT FP16 GPU and ONNX Runtime CPU) across baseline, light memory pressure, and storage writeback stress. Across 35 paired capture runs (3500 samples) plus 3 storage-stress runs where external pairing failed (300 software-only samples), we observe three findings the software-only view does not surface. (1) The two architectures differ not only in mean latency but in distributional structure: TensorRT baseline clusters tightly near 1.23 ms (run-mean SD 15 us) while ORT CPU baseline is multimodal with run-mean SD 31.8 ms. (2) Light memory pressure inflates TensorRT P99 from 1.28 ms to 1.61 ms, while one of five ORT memory-stress runs collapses into a deterministic 198 ms regime rather than uniformly inflating variance. (3) All three TensorRT storage-stress runs produce complete software timing logs (100/100 iterations) alongside externally observable timing failures of three different kinds (full post-marker collapse, ~40% transition loss, and complete acquisition failure) -- while the runtime reports normal completion in every case. We claim, narrowly, that timing observability is itself an interference-sensitive resource, and that summary statistics from a single timing source can hide failure modes an independent external observer makes visible.
We address the conditions and design of controllers and observers for homogeneous networks of linear MIMO agents. We develop networked controllers and observers that ensure the stability of both the system state and the estimation error, leveraging the concept of generalized frequency variables. A separation principle for networks is then established, showing that the observer and controller can be designed independently and combined to achieve a stable output feedback. Our results are illustrated via a highly unstable, oscillatory network of locally actuated pendulums on carts. Finally, necessary conditions for controllability and observability -- derived from agent properties and network structure -- are established and discussed.
Residential batteries increasingly serve two roles: they can earn money by arbitraging wholesale prices and providing grid services, and they provide backup power during outages. This dual use creates a basic tradeoff between earning market value and preserving outage readiness. Coordination across many batteries can help, but a provider cannot treat the fleet as a single virtual battery when each household is promised its own backup protection. We compare standalone control, in which each home is dispatched independently, with pooling, in which homes are coordinated while each battery retains its own state of charge and household-specific backup requirement. Both regimes are implemented as model predictive control problems with 15-minute decision intervals and evaluated using household telemetry together with ERCOT market inputs. The empirical design focuses on the 543 homes in our sample that can support at least one backup product in standalone operation and studies backup caps ranging from 2 to 24 hours. Lower caps relax backup obligations, while the 24-hour cap coincides with assigning each home its own longest feasible backup tier. Pooling remains beneficial in this service-constrained setting, but its value declines smoothly as backup obligations tighten. Standalone firm margin ranges from \$11.06 per home per week at the 2-hour cap to \$10.79 at the 24-hour cap, while pooling benefit falls from \$1.49 to \$1.27 per home per week. Relative to standalone firm margin, pooling is worth about 13.5% at the 2-hour cap and about 11.8% at the 24-hour cap. Coordination therefore still helps after preserving household-level backup guarantees, but its value declines as backup obligations tighten.
Reference media are widely used in distorted-Born-approximation-based GPR imaging to represent partially known propagation effects. When the true host background differs from the chosen reference medium, the difference enters the observations and propagates into anomaly estimates. For single-snapshot FDA-MIMO-GPR, this paper establishes a reference-state observation model under the distorted Born approximation and defines that difference as the reference--background medium residual, namely, the effective residual between the reference medium and the physical background medium. Hereafter, this quantity is abbreviated as the reference--background residual. Its response is derived from the Cole--Cole dispersive mapping, the reference propagation kernels, and the FDA frequency--transmit organization. The paper then constructs its observation-domain covariance, analyzes the off-diagonal channel-block structure, and uses a standard Tikhonov estimator to show how the response transfers to reconstruction error and covariance over an anomaly candidate region. Numerical results show pronounced cross-frequency and cross-channel covariance under mismatched reference states. After Tikhonov reconstruction, these structures appear as low-dimensional, concentrated pseudo-anomaly errors. Right-hand-side coherence and inter-channel correlation arise mainly because multiple transmit--receive channels jointly observe the same residual field, while FDA space-frequency coding determines their organization in the observation and reconstruction domains. The reference--background residual should therefore be modeled explicitly in reference-state selection, background suppression, and channel-covariance analysis for single-snapshot FDA-MIMO-GPR.
Wireless resource allocation in digital-twin-enabled unmanned aerial vehicle (UAV) swarms must be both network-feasible and certifiably safe for closed-loop control. Existing packet-level or scalar-priority schedulers cannot meaningfully compare heterogeneous multi-hop actions that differ simultaneously in route, retransmission depth, blocklength, bidirectional delay, delivery probability, and TDMA slot cost. This paper introduces a certificate-guided resource allocation framework for low-altitude multi-hop UAV swarms. A digital twin maps predicted topology, channel, route, and controller-side state into a shared five-dimensional quality-of-service (QoS) certificate comprising uplink/downlink delay bounds, directional delivery guarantees, and a certified upper bound on the interval between successful bidirectional interactions. A state-conditioned stochastic drift test then admits only certificates whose augmented Lyapunov drift is nonpositive under the current controller state. Admitted actions are reduced to certified supply frontiers by removing dominated route-slot configurations, and the online scheduler maximizes Lyapunov-drift reduction under a shared TDMA slot budget via exact dynamic programming. Closed-loop ns-3 simulations demonstrate that the proposed framework outperforms fixed-service, certificate-filtered fixed-priority, dynamic-transmission-count, and value-of-information baselines in both tracking accuracy and high-risk state suppression under identical communication budgets.
Despite 230 million speakers, Urdu remains critically under-resourced in speech technology. We introduce UrduSpeech: a large high-fidelity Urdu corpus comprising 156 hours of audio with 12-dimension paralinguistic metadata, encompassing US-Std, US-CS, US-EngPk. To address Right-to-Left script constraints and frequent code-switching, we developed UrduSpeech, a LLM-driven pipeline to curate data across 12 diverse categories, including news, drama, and rare literary forms like Bait-Bazi. We also release a 9-hour US-Benchmark set, manually corrected by native annotators to serve as a standard. Human quality assessment of the primary 156-hour corpus yielded a Mean Opinion Score (MOS) of 4.6 (std = 0.7) with inter-rater reliability confirmed by a 0.68 Cohen's Kappa, validating our curation pipeline's 97.6% confidence score. The corpus maintains a 60-40 gender balance across 71,792 utterances. Our work represents a significant leap toward linguistic inclusivity in global AI. The corpus and code are open-sourced, and a demo page is available.
This tutorial presents cooperative and noncooperative game-theoretic frameworks for modeling, learning, and control in socio-technical systems, where human behavior, incentives, institutions, and social interactions are coupled with cyber-physical and networked infrastructures. The paper reviews strategic, dynamic, cooperative, matching, learning, and feedback-control approaches for analyzing how local decision-making, adaptation, and strategic interactions shape collective system outcomes. The tutorial further develops feedback-learning and incentive-design perspectives that connect equilibrium analysis with adaptation, distributed control, and mechanism design under information and coordination constraints. We also examine resilience and security challenges arising from adversarial behavior, misinformation, disruptions, and cascading failures in interconnected socio-technical networks. Finally, we discuss emerging research directions at the intersection of game theory, control, learning, and network science for resilient and adaptive socio-technical systems.
Low-light image enhancement remains a challenging problem due to severe noise, color distortion, contrast degradation, and loss of structural details under insufficient illumination. Existing methods typically apply uniform enhancement without considering the depth-dependent nature of light attenuation and sensor noise in real-world scenes. To address this limitation, we propose LUMEN, a multi-stage enhancement framework that integrates virtual flash simulation with transformer-based feature fusion. The proposed framework first estimates scene depth from low-light inputs using a dedicated encoder-decoder network, after which a soft clustering module partitions pixels into depth-aware regions, enabling depth-dependent flash simulation. The simulated flash features, together with depth representations, are fused with image features through efficient attention-based fusion blocks to enhance global context while preserving fine details. A composite loss function combining reconstruction, perceptual, structural, color, edge, and depth consistency objectives ensures both visual fidelity and perceptual quality. Extensive experiments on LOL-v1 and LOL-v2 benchmarks demonstrate that LUMEN achieves state-of-the-art performance and produces visually natural results compared with several state-of-the-art methods.
Seamlessly unifying communication and sensing, sixth-generation (6G) networks are poised to transform into intelligent platforms with high spectral-energy efficiency and real-time environmental awareness. In the low-altitude economy, unmanned aerial vehicles (UAVs) enable air-ground integrated sensing and communication (ISAC) for applications such as logistics and inspection, yet most studies focus on single-UAV or homogeneous-agent designs. In contrast, this paper proposes a multi-UAV cooperative ISAC system that enables heterogeneous-agent collaboration between multiple UAVs and a ground base station (BS) for joint target sensing, tracking, and communication. The system is formulated as a posterior Cramer-Rao bound (PCRB) minimization problem under communication performance constraints, utilizing joint trajectory-beamforming optimization. To tackle the NP-hard nature of this problem, we design a curriculum-based heterogeneous-agent proximal policy optimization (C-HAPPO) algorithm, where curriculum learning guides progressive policy refinement and Kronecker/QR decomposition mitigates action dimensionality. Simulation results show that the proposed approach achieves more than a 30% improvement in sensing performance, faster convergence, and higher tracking accuracy than existing baselines, demonstrating its scalability and effectiveness for complex multi-UAV ISAC scenarios.
The conventional normalized subband p-norm (NSPN) algorithm achieves robustness in $\alpha$-stable noise ($1<\alpha \leq 2$) by utilizing low-order error moments. However, its performance degrades significantly under three scenarios: (1) non-Gaussian inputs, (2) $\alpha$-stable noise with $0<\alpha \leq 1$, and (3) sparse system identification. To address these limitations, this paper proposes a fractional-order NSPN algorithm based on the nearest Kronecker product (NKP) decomposition and fractional-order stochastic gradient descent, termed NKP-FoNSPN. Theoretical bounds for the fractional-order parameter $\beta$ are also derived. Notably, when $\beta=1$, the NKP-FoNSPN reduces to a new NKP-NSPN algorithm, while its non-NKP decomposition variant becomes the fractional-order NSPN (FoNSPN) algorithm. Furthermore, a novel transformation-based NKP (TNKP) decomposition technique is designed, which exhibits lower computational complexity than conventional NKP for specific filter structures. The resulting TNKP-based FoNSPN (TNKP-FoNSPN) achieves lower steady-state misadjustment and multiplication cost compared with the NKP-FoNSPN algorithm. Additionally, complete computational complexity analyses are provided. For active noise control (ANC) scenarios, we develop filtered-x variants: NKP-FxFoNSPN and TNKP-FxFoNSPN. From the former, two additional variants are derived: NKP-FxNSPN and FxFoNSPN. Simulations using diverse noise sources (pink, helicopter, gunshot, pile driver, and traction substation noise) demonstrate the superiority of the proposed algorithms. Finally, we validate their noise reduction performance in a real constructed single-channel duct ANC and a simulated multi-channel ANC systems.
Quasi-bimodal objects, such as text, road signs, and barcodes, play a basic yet vital role in daily visual communication. By boiling these down to clear silhouettes, binarization uses a minimal language to convey essential vision cues for maximum downstream efficiency. The catch is that frame-based imaging often struggles on mobile platforms like drones, self-driving cars, and underwater vehicles. In these dynamic scenes, rapid motion and harsh lighting can make it blind, causing severe motion blur and erasing crucial details. To overcome the limits, neuromorphic vision via event cameras, featuring microsecond-level temporal resolution and high dynamic range, steps in as a natural solution. Building upon this event-driven sensing paradigm, we introduce a simple yet effective dual-modal approach that harnesses the synergy between frames and events to achieve real-time, high-frame-rate binarization on CPU-only devices. Extensive evaluations present that it earns competitive performance against leading techniques in reducing motion blur, while delivering impressive improvements under challenging illumination. Besides, our asynchronous workflow bypasses event scarcity that breaks traditional time-binning reconstruction, maintaining clear target shapes even at extreme kilohertz frame rates. Its binary results further serve as reliable representations that facilitate a range of downstream tasks. This work paves the way towards lightweight perception and interaction in embodied intelligence on resource-constrained edge platforms.
Ray-tracing (RT) has become central to site-specific electromagnetic propagation modeling in dynamic complex environments. Yet its computational burden grows sharply as high-fidelity digital twins of these environments scale to millions of facets whose material parameters must be continuously updated as the environment changes. The challenge is amplified at mmWave and sub-THz frequencies, where surface roughness becomes comparable to the wavelength and so diffuse scattering can account for up to 40% of the received power, making accurate yet tractable models essential. The popular Effective Roughness (ER) approach offers physical consistency but become increasingly costly when highly directive lobes are required or when parameters must be iteratively tuned. This communication introduces a directive, reciprocal diffuse scattering model that preserves the structure of the ER while enabling an order-of-magnitude reduction in computational cost. Validation across eight materials shows no loss in accuracy - and a slight improvement - demonstrating a scalable and physically meaningful solution for RT in scenarios where diffuse scattering is non-negligible.
Because LiDAR sensors acquire point clouds with a fixed angular resolution, the resulting data can be systematically parameterized and efficiently compressed in the spherical coordinate system. Traditional spherical coordinate-based point cloud compression methods have demonstrated strong rate-distortion (RD) performance, with the predictive geometry coding (PredGeom) method in the geometry-based point cloud compression (G-PCC) standard being a prominent example. Although PredGeom includes an inter-frame prediction mode, it relies on a simple linear model, which limits its ability to capture complex motion patterns and structural dependencies. Meanwhile, existing learning-based compression methods in the spherical domain do not exploit inter-frame correlations to reduce geometry redundancy. To address these limitations, we propose a learning-based inter-frame predictive coding method, termed Inter-LPCM. For azimuth prediction, we employ a delta coding strategy based on the predefined angular resolution. To improve radius compression, we introduce an inter-frame radius predictive (Inter-RP) model that estimates the current point's radius using neighboring points from both the current frame and the registered reference frame. In addition, we design a lightweight attention-based prediction (LAEP) model to predict elevation angles by capturing long-range geometric correlations across different coordinates. For quantization, we propose an RD-optimized method to select quantization steps in the spherical coordinate system. For entropy coding, we design distinct models for each spherical coordinate component. These models are adapted to the statistical priors of each coordinate, enabling more accurate probability estimation. Our source code is publicly available at this https URL
Volumetric media promises next-generation content delivery applications, but its bandwidth demand remains a key bottleneck. Implicit and hybrid volumetric representations reduce model sizes, yet still require careful coding to reach 2D video-like bitrates. We present CATRF, a standard-codec-in-the-loop compression framework for plane-factorized radiance fields. During training, we quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip (JPEG/VP9/HEVC/AV1), then unpack and dequantize the decoded features before volume rendering. We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions without introducing any learned codec parameters. On both static and dynamic benchmarks, CATRF consistently achieves a better rate-distortion trade-off over codec-agnostic and learned-codec-in-the-loop baselines, and also outperforms recent compressed 3DGS methods in both compression efficiency and decoding speed. These results highlight a practical path toward low-bitrate, compression-resilient volumetric representations for free-viewpoint video streaming.
EEG foundation models aim to learn reusable representations across heterogeneous paradigms, yet existing approaches often use uniform adaptation mechanisms and are typically reported under separate downstream fine-tuning protocols. In this work, we first analyze dense EEG Transformers from two complementary perspectives. Gradient similarity across six downstream datasets reveals substantial optimization conflicts among EEG paradigms, while CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition: shallow layers preserve stronger cross-paradigm similarity, whereas deeper layers become increasingly specialized. Motivated by these findings, we propose \textbf{PRiSE-EEG}, a prior-guided EEG foundation model with CKA-calibrated Depth-Stratified Experts. PRiSE-EEG forms continuous multi-channel EEG patches using weak static cortical and network priors and dynamic short-time channel interactions, then allocates shared and specialized experts across MoE Transformer blocks according to a sigmoid mapping from layer-wise CKA sharedness. This design preserves common EEG regularities in early blocks while assigning more specialized capacity to later task-specific transformations. Experiments on 12 public EEG benchmarks show strong cross-paradigm performance under matched protocols. Compact ablations further show that CKA-derived expert allocation improves over dense Transformers, uniform MoE, and manually fixed shared-specific expert ratios.
Future 6G systems are expected to exploit upper midband spectrum in frequency range 3 (FR3) not only for high throughput communications, but also for sensing services such as localization, detection, and situational awareness. The following paper develops a concrete path from today's coverage-oriented deployments to FR3 networks that treat sensing as a native function. We first show how existing FR2 radars can be time-multiplexed and coordinated under a $6$G medium access control as radar-as-a-service, forming a bridge between legacy sensing and network-managed integrated sensing and communications (ISAC). We then propose a hierarchical FR3 beam-alignment strategy in which coarse access occurs at lower frequencies and refinement occurs at upper FR3, and quantify the resulting sensing and communication capabilities via range-angle Cram{é}r-Rao bounds in the near field. We identify intra- and inter-beam squint phenomena specific to wideband FR3 arrays, and discuss design approaches to mitigate them. On the signal-processing side, we argue that FR3 sensing cannot rely solely on pilot resources and discuss how much sensing information can be extracted from payload resource elements. We further highlight the role of calibrated FR3 channel simulators and real-time models as the core of wireless digital twins for training and evaluating ISAC algorithms, and discuss how massive MIMO and dense or distributed deployments at FR3 naturally act as large reconfigurable sensor arrays.
Acquiring channel knowledge is required by many applications. For instance, handover in cellular networks is mainly decided based on the knowledge of pathloss. In contrast to traditional statistical distance-determined models that might provide misleading pathloss estimates, researchers started to explore deep learning methods recently to accurately estimate the radio map that characterizes the spatial distribution of pathloss according to the specific physical wireless propagation environment. However, existing works mainly focused on 2D radio map estimation by assuming that all receivers are at the same height. In fact, radio maps could be significantly different at different receiver heights, highlighting the importance of 3D radio map estimation. In this paper, we first propose a method to embed height information into 2D images, and then propose a general 2D radio residual network (R$^{2}$Net) for 3D radio map estimation. Since pathloss exhibits different characteristics in indoor and outdoor scenarios, we specifically propose R$^{2}$Net-In for indoor scenarios and R$^{2}$Net-Out for outdoor scenarios to better capture penetration loss and diffraction loss, respectively. Extensive experimental results show that our R$^{2}$Net significantly outperforms the state-of-the-art benchmarks in terms of estimation accuracy, computational and storage costs, and inference speed. In addition, due to the lack of publicly available 3D radio map datasets, a 3D indoor radio map dataset (3DiRM3200) is created, which took more than $1,000$ labour hours. The dataset and codes will be available at this https URL.
Signal integrity (SI) analysis in printed circuit board (PCB) interconnects faces increasing complexity due to diverse integrated circuit (IC) buffer technologies, varying operating conditions, and manufacturing tolerances. Existing machine learning (ML) surrogate models for predicting SI metrics such as the inner eye contour, eye-height (EH), eye-width (EW), and transient waveform features typically rely on fixed buffer parameters, requiring costly new data generation and retraining cycles for every technology shift. This paper introduces a buffer-parameterized ML surrogate modeling methodology capable of handling cross-technology variations without retraining by treating IC buffer characteristics, e.g., clock frequency, supply voltage, rise/fall times, jitter, and internal resistors and capacitors, as dynamic model inputs alongside PCB parameters. To identify the optimal surrogate architecture for this high-dimensional space, a comprehensive benchmarking study compares tree-based methods (RFR/GBM), kernel methods (SVR/KRR), Gaussian process regression (GPR), and neural networks. The framework is subsequently validated on a complex interconnect with 44 design parameters. Results show that while anisotropic GPR excels in low-data regimes, neural networks heavily outperform other models on large datasets. Finally, the practical value of the ML surrogate models is demonstrated through a cross-technology design space exploration and optimization scenario, showcasing massive computational speedups for eye mask compliance checking compared to simulation.
This paper proposes a framework for fast signal acquisition based on deterministic non-uniform sampling, with emphasis on multi-coset architectures and receivers driven by known synchronization sequences, pilots, or preambles. Unlike conventional sampling theory, which is formulated from a waveform-reconstruction perspective, the proposed approach is derived from the observation that acquisition is fundamentally a parametric inference problem in delay-Doppler space. Accordingly, the objective is not to reconstruct the full Nyquist-rate signal, but to preserve the statistics required for detection and estimation. The paper formulates compressed-domain acquisition through a generalized likelihood ratio test and shows how multi-coset sampling leads to reduced correlator structures operating directly on the retained samples. An offline exhaustive design procedure is introduced to select the coset pattern for a given sampling ratio by minimizing a cost that jointly enforces peak isolation in the acquisition surface and uniform retained-energy coverage over the delay search interval. The framework is evaluated on 5G NR synchronization using the PSS/SSS signals under a worst-case Doppler scenario. Results show that substantial reductions in mean acquisition time can be achieved relative to uniform sampling, with measured gains ranging from 2.8x to 34.2x, depending on the selected compression ratio. The corresponding delay and Doppler root-mean-square errors quantify the estimation penalty introduced by aggressive sample reduction. These results demonstrate a clear complexity-performance trade-off and confirm the potential of multi-coset sampling for fast synchronization-oriented receivers.
Contextual biasing is essential to improving the recognition of rare and domain-specific words in an automatic speech recognition (ASR) system. While numerous methods have been proposed in recent years, most of them focus on offline settings and do not explicitly address the challenges of streaming ASR. For example, CTC-based word spotting (CTC-WS) have demonstrated strong performance by directly detecting keywords from CTC log-probabilities, but they are limited to offline processing and require access to the full utterance. In This work, we present a streaming extension of CTC-WS for real-time contextual biasing. Our method maintains active keyword paths across audio chunks using a stateful token passing algorithm, enabling the detection of keywords that span multiple chunks. To ensure low latency and stable output, we introduce an incremental commitment mechanism that only emits segments guaranteed not to be affected by future audio, while deferring uncertain regions. This method naturally integrates with streaming ASR pipelines and does not require modifications to the underlying acoustic model or additional training, making it practical for real-world deployment. Experimental results show that our method reduces overall WER and effectively improves keyword F-score, demonstrating its effectiveness for real-time ASR applications.
Self-initiated attention shifts play a critical role in voluntary behavior but are difficult to study due to the absence of explicit temporal markers. While previous studies have examined their neural correlates, it remains unclear how multi-dimensional electroencephalography (EEG) features contribute to their characterization within an interpretable computational framework. In this study, we build on an experimental paradigm developed in our previous work, which enables controlled comparison between task-constrained self-initiated shifts and externally instructed shifts under identical visual stimulation. Within this setting, we investigate whether preparatory EEG activity can distinguish these two types of attention shifts. We adopt a machine learning-based approach and conduct two complementary analyses: (1) a performance-oriented assessment of frequency-specific topographic patterns, and (2) a model-based feature attribution analysis using SHapley Additive exPlanations (SHAP). These analyses provide a structured view of how spectral features across regions of interest contribute to model behavior. Our results demonstrate reliable within-subject classification performance, indicating that preparatory EEG activity contains subject-specific discriminative information within this paradigm. The analysis shows that higher-frequency bands and frontal regions contribute strongly to model decisions, although such contributions should be interpreted cautiously due to the potential influence of non-neural artifacts in high-frequency EEG signals. Overall, this work highlights the value of interpretable machine learning for analyzing subject-specific EEG signal patterns in a controlled experimental setting, with potential applications in personalized and asynchronous brain-machine interface systems.
This paper presents a method that learns a regionally stable recurrent neural network model from a set of input-output data generated by an unknown dynamical system. Relying on generalized sector conditions on the deadzone activation function, we first derive sufficient conditions that guarantee forward invariance on a compact set of the state space for any inputs from a given set. Such regional properties lead to less conservative conditions compared to variants that offer a global form of stability, and are in line with the system data that is only observed regionally. Our learning method derives conditions for regional stability using a barrier function approach, leading to models equipped with a certificate of regional stability in a subset of the state space and for a given input set. We illustrate our theoretical result with a numerical example and compare it to methods that impose a global form of stability, which fail to identify the system, and with a method that imposes no stability constraints at all, which does not guarantee a stable behavior within any state or input set.
Channel estimation is essential to massive multiple-input multiple-output (MIMO) systems. While recent generative model-based approaches using lightweight diffusion models (DMs) have achieved superior performance, they typically rely on a single data-driven prior, which limits their adaptability to varying channel distributions in real-world scenarios. To address this deficiency, we propose a mixture-of-experts (MoE) diffusion model (DM) framework combined with variational Bayesian inference. Specifically, our approach employs multiple pre-trained DMs, with each trained on a specific type of propagation channels. We then propose a probabilistic graphical model in which the channel is modeled as a latent variable drawn from one of these candidate generative priors with a certain probability. By integrating variational Bayesian inference with DM-based data priors, the underlying channel along with the expert indicator variable are jointly inferred, thus enabling automatic model adaptation for channel estimation. The effectiveness of our approach is evaluated on 3GPP CDL channels. Simulation results demonstrate that our proposed approach achieves a clear performance improvement over the standard DM-based method that employs a single prior trained on aggregated data from all channel types, particularly when the channel samples from different propagation environments are imbalanced.
Sparse recovery algorithms are of utmost importance for estimation processes in wireless communications. However, communication systems such as massive multiple input multiple output (MIMO) systems are rapidly growing in dimension, which consequently increases the computational complexity of these algorithms. This work proposes a low-complexity strategy for the efficient implementation of the ''atom selection step'' in these greedy sparse recovery algorithms, based on the structural features of these systems. A theoretical justification is presented along with tests using realistic channel data, to demonstrate the computational gain induced by the proposed approach and compare it to the classical sparse recovery approach.
For downlink transmission in massive multi-user multiple-input multiple-output (MU-MIMO) systems, conventional precoding research heavily focuses on reducing the computational complexity of precoding matrix design, while largely overlooking another critical bottleneck: the substantial signal weighting cost incurred by repeatedly applying the precoder to high-speed data streams. To address both challenges simultaneously, this paper proposes a novel sparse precoding framework tailored for fully-digital architectures. Within this framework, from the sum-rate maximization perspective, we design two sparse precoding architectures: a common-support row-sparse architecture and a user-specific row-sparse architecture, so as to reduce the number of multiplication operations required in baseband signal weighting without sacrificing system capacity. For the formulated mixed-integer non-linear programming (MINLP) problem, we rigorously prove, for the first time, that the optimal precoder under both sparse architectures strictly resides in a specific low-dimensional subspace determined by the channel matrices, thereby reducing the dimensionality of the optimization variables. Based on this insight, an alternating optimization algorithm is developed within the weighted minimum mean square error (WMMSE) framework to jointly optimize sparse beam selection and low-dimensional precoding coefficients. The combinatorial beam selection problem is handled using an efficient penalty-based majorize-minimization (MM) method, yielding a low-complexity closed-form solution. Simulation results demonstrate that the proposed scheme achieves near-optimal sum-rate performance while substantially reducing both the precoding computation complexity and the overall signal weighting cost.
While video compression algorithms effectively reduce bitrate, aggressive quantization often compromises temporal coherence, introducing artifacts such as flicker, motion inconsistency, and unstable textures. Although spatial quality degradation is well-documented, the relationship between compression intensity and temporal stability remains insufficiently characterized. This paper systematically examines the progression of frame-to-frame coherence errors across different bitrate regimes, utilizing multiple codecs (AV1, HEVC, VP9, H.264) and content types. Our findings reveal that temporal consistency degrades non-linearly with increasing compression. Most critically, we identify a "Predictability anomaly" where sequences with unpredictable or irregular dynamics experience disproportionately higher instability than sequences with higher, but more predictable, motion magnitude. This challenges the conventional assumption that motion volume alone dictates encoding difficulty and highlights the necessity of temporal-aware metrics in compression pipelines.
Adaptive filter in complex scenarios demands algorithms that integrate fast convergence, low complexity, and robust performance under diverse noise conditions. To address this challenge, we propose a online censoring robust total generalized adaptive filter using improved data-reused method (RTGA-IDROC) algorithm. The proposed RTGA variant possesses the advantages of both the total least squares (TLS) strategy and the robust generalized adaptive (RGA) function. This algorithm not only effectively handles input noise under the errors-in-variables (EIV) model but also achieves excellent performance across diverse noise environments. Furthermore, to meet the high demand for convergence speed in practical applications, an improved data reuse (IDR) method is introduced, enabling faster convergence in the early stages of iteration without compromising steady-state performance. The increased computational complexity brought by the IDR method is mitigated using the online censoring (OC) strategy. We also modify the OC threshold for real-valued algorithms, as the original threshold was defined for the complex domain. Beyond these algorithmic enhancements, a local stability analysis for the proposed algorithm is provided, and the theoretical steady-state mean-square deviation (MSD) is derived. Finally, simulation experiments in system identification and acoustic echo cancellation (AEC) scenarios validate the superior performance of the proposed algorithm.
The augmented affine projection algorithm (AAPA) has considerably excellent performance for highly colored input signals. However, the direct matrix inversion operation leads to a high computational complexity, especially with high projection order. Inspired by the excellent characteristics of set-membership filtering (SMF), this paper proposes the augmented set-membership affine projection algorithm (ASM-APA), which not only has low computational complexity but also offers improved performance compared with AAPA. Then, the computational complexity and stability of ASM-APA are analyzed, and the condition for maintaining the stability of the algorithm is provided. Finally, in the computer simulation phase, the results of the simulation experiments demonstrated that ASM-APA has superior performance compared to AAPA.
Recently, a spatially selective non-linear filter (SSF) has been proposed for target speaker extraction, using the target direction-of-arrival (DOA) as a spatial cue. Since learned intermediate features are tied to the microphone geometry, the performance of the SSF degrades significantly when evaluated on mismatched array geometries. In this paper, we propose a geometry-conditioned SSF (GC-SSF), which incorporates a geometry-conditioning branch based on FiLM layers. Furthermore, we propose a feature that jointly encodes the DOA and the microphone positions (DOA-MPE). The conditioning branch modulates the intermediate feature maps of the SSF using the DOA-MPE feature to capture the spatial relationship between the microphone positions and the target speaker. Experimental results across circular, uniform linear, and random microphone arrays show that the proposed GC-SSF generalizes better to mismatched geometries while maintaining high spatial selectivity, demonstrating its ability to effectively adapt the filtering process to different array geometries
Electric Vehicle (EV) fast charging stations require forecasting techniques both at the single charger level and aggregated level. While for the latter several models exist, forecasting individual EV charging profiles is still underexplored in literature. However, such methods may be potentially used by battery-aware scheduling, leading to a more granular update of the charging station aggregated forecast and provide a more accurate estimation of EVs departure times. Nonetheless, the variable extent of available information in time and in different settings could jeopardize these benefits. For this reason, we propose a hybrid and lightweight method to estimate the EV charging profile before and during the charging process. Besides evaluating this method on multiple EVs from a public dataset, we also assess the impact of different level of information in the time transposition of the charging profile.
Edge perception has emerged as a foundational capability for future wireless networks, enabling the network edge to proactively sense, interpret, and interact with the physical environment in a task-oriented and resource-aware manner. This survey provides a comprehensive and structured overview of edge perception. We first review representative sensing modalities and edge artificial intelligence (AI) techniques as the fundamental building blocks. We then examine their synergistic interactions. We systematically analyze how edge AI enhances sensing capabilities, encompassing both in-band and out-of-band modalities, as well as multi-modal sensor data fusion. Moreover, we discuss the role of task-driven sensing in facilitating edge AI, including integrated sensing-communication-computation designs, and active perception frameworks that dynamically adapt sensing strategies for downstream applications. Finally, we identify key challenges and open issues. By consolidating fragmented research across sensing, communication, and edge AI, this survey provides forward-looking insights for the design and implementation of edge perception systems for sixth-generation (6G) networks.
Advanced regulatory control (ARC), also known as advanced PID architectures, is a simple and robust way of controlling processes with changing and possibly conflicting constraints, where it previously was believed - at least in academia - that model-based solutions, such as MPC, were the only effective solution. To illustrate this, ARC is applied in two case studies. The first is a gas-liquid separation process, in which selectors and split-parallel control are combined to achieve bidirectional inventory control in which the throughput manipulator moves automatically to the most optimal position. The second case study is on keeping acceptable air quality (CO2-level) and temperature in a room (in this case, a barn for cows). The CO2 and temperature constraints can be conflicting, leading to a hierarchical switching network of PID controllers. Note: this is an extended version (with simulations) of paper at IFAC World Congress, August 2026, Korea.
The computation of chance constraints in stochastic model predictive control is often numerically challenging due to the non-Gaussian nature of the disturbances. To overcome this problem, we propose an optimization computational framework applicable to non-Gaussian disturbances. This framework employs a numerical inversion method, utilizing the characteristic function of the disturbance distribution to compute the probability in the chance constraint as well as its gradient. To improve efficiency, it vectorizes integral points and reuses intermediate computations in Gauss-Kronrod quadrature. The framework is implemented within the YALMIP toolbox to perform chance constraint calculations for arbitrary non-Gaussian disturbances, applicable to both single-component distributions and mixture models. It allows the user to simply specify a distribution type and its parameters for the disturbance and directly compute the probability and its gradient to solve the optimization problem. The method is validated through a numerical example of a stochastic model predictive control application.
This paper investigates a multiple unmanned aerial vehicle (UAV)-assisted integrated sensing and communication (ISAC) system equipped with movable antenna (MA) arrays. To align with practical scenarios, we simulate the dynamic roaming of ground users and the three-dimensional deployment of UAVs in the airspace. We aim to maximize the total data rate by jointly optimizing key operational variables, including UAV trajectories, user association, antenna positions, and beamforming. This formulated problem is subject to constraints on transmission power and the sensing signal-to-noise ratio. To address the challenge of dynamically unknown state transitions due to user mobility, the original problem is decomposed into two steps and solved using different algorithms. First, we utilize the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm to address the ground-to-air association problem, periodically updating clusters and re-associating during training. The clustering hotspots are used to suggest flight directions for the UAVs. Second, we develop the soft actor-critic algorithm to solve the joint optimization problem of UAV trajectories, antenna positions, and beamforming. Experimental results demonstrate that UAVs equipped with MA arrays outperform those with traditional fixed antenna arrays in ISAC systems, and the proposed optimization strategy effectively enhances communication rates while ensuring sensing performance.
This paper presents a novel approach to synthesize stabilizing termi- nal ingredients for linear model predictive control (MPC) schemes, with the aim of increasing the region of attraction while reducing suboptimal- ity with respect to the solution of the infinite-horizon optimal control problem. It is based on the construction of a novel terminal region using methods from the field of configuration-constrained polytopic computing, along with a terminal cost that is exactly equal to the infinite-horizon linear-quadratic regulator cost in a nontrivial neighborhood of the steady- state. The practical performance of the controller is illustrated through various case studies, and comparisons with state-of-the-art approaches are presented.
Composed of multiple interconnected pixels controlled by on/off RF switches, the pixel antenna can generate reconfigurable radiation patterns that can be further exploited to construct diverse pilot sequences for effective channel estimation. However, such pilot sequences inherently have rank deficiency, making it difficult to effectively and efficiently acquire the full channel state information (CSI) across all available radiation patterns. To tackle this difficulty, we consider a sparse environment with a limited number of propagation paths for a pixel antenna system, where a user equipped with a pixel antenna transmits only a limited number of pilots to recover the CSI under all radiation patterns. The proposed algorithm exploits the limited number of propagation paths that are invariant with the pixel antenna patterns, and then formulates the full channel estimation as a sparse recovery problem in the angular domain solved by Generalized Approximate Message Passing (GAMP). Moreover, to mitigate the rank deficiency of pilot sequences, we additionally incorporate a Multipath Matching Pursuit (MMP) algorithm for robust initialization. The overall proposed scheme, termed MMP-GAMP, achieves higher estimation accuracy than other algorithm baselines, while requiring lower pilot overhead.
Data center electricity consumption reached 4.4% of U.S. total in 2023 and is projected to grow to 6.7--12% by 2028, imposing increasing stress on transmission networks while representing a largely untapped source of controllable demand-side flexibility. This paper proposes a modular security-constrained unit commitment (SCUC) framework that coordinates flexible data center workloads with system-level scheduling to reduce renewable curtailment, alleviate congestion, and lower operating costs. Three mixed-integer linear programming (MILP) models are formulated: the Data Center Spatial model (DC-S), enabling instantaneous workload redistribution across geographically distributed sites; the Data Center Temporal model (DC-T), permitting each site to shift its deferrable load across time while preserving the daily energy balance; and the Data Center Spatio-Temporal model (DC-ST), jointly activating both mechanisms and spanning the largest feasible operating region. Case studies on a modified IEEE 24-bus reliability test system show that DC-ST eliminates all base-case and post-contingency transmission violations at a flexibility ratio of 40%, and reduces renewable curtailment by up to 84.4% at 30% relative to the inflexible baseline. Sensitivity analysis further reveals that moderate flexibility levels of 20%--30% already capture most of the achievable benefits, supporting practical deployment with limited operational burden on data center operators.
Vision based and event based tactile sensors are important in robotic manipulation research. However, they suffer from a fundamental tradeoff: vision based sensors have low sampling rates, while event based sensors are prone to drift during long term static force estimation. To solve this challenge and achieve human level tactile perception, the novel hybrid event frame tactile sensor (Mixtac) is proposed in this paper by emulating the synergistic function of biological mechanoreceptors, which achieves normal force estimation. The prototype leverages events for high frequency force tracking and frames for long term accuracy. The Frame Guided Event Recurrent Network (FGER-Net) was proposed to fuse the two data streams. Frames were used by the net to correct event drift during training and guide high frequency predictions during inference. Experiments demonstrated an MAE of 0.04 N. This paper could bridge the sampling rate gap from 0 to 500 Hz in current vision based tactile sensors and pave the way for human level robotic manipulation.
Backward reachable tubes (BRTs), computed via viscous Hamilton-Jacobi (HJ) partial differential equations, provide principled safety certificates for learned controllers and planning algorithms in trustworthy machine learning. However, classical grid-based HJ solvers require $O(M^n)$ memory footprint for $M$ grid points per $n$ state dimension. This renders them impractical for high-dimensional systems. We address this bottleneck with a local PDE linearization that enables a frozen-coefficient sampling scheme for the viscous HJ PDE: a generalized Cole-Hopf-type transformation reduces the nonlinear HJ equation to a sequence of linear heat equations whose solutions admit Gaussian heat-kernel representations. The value function and its spatial gradient are then recovered via roll-outs of Monte Carlo expectations on Gaussian densities, yielding a storage and grid-free algorithm that scales as $N\cdot n$ for $N$ samples. This decoupling of memory from dimensionality enables reachability analysis on problems where grid-based methods are simply impossible. We prove a finite-sample concentration bound $O(N^{-1/2})$ error and conditional linear convergence for the introduced Monte-Carlo Picard iterative scheme. Numerical validation on pursuit-evasion games demonstrates relative $L^2_{\text{rel}}$ errors of $0.03 - 0.20$, with $14-26$ second wall-clock times per 2D slice on a CPU. Crucially, the method scales with validation on up to (but not limited to) $n=45$-dimensional multi-agent games.
Long-duration energy storage (LDES) faces significant revenue volatility that impedes investment. This paper evaluates four contract-based support mechanisms using an equilibrium model with risk-averse investors and incomplete risk markets. Applied to a stylized 2035 Great Britain case, we find that all mechanisms can achieve the targeted LDES capacity but differ substantially in cost-effectiveness and risk-aversion sensitivity. Contracts that eliminate revenue volatility achieve the lowest costs but may weaken operational incentives, while contracts that preserve market exposure maintain incentives at higher costs.
We present a controlled benchmark evaluating three LLMs -- Claude Sonnet 4.5, Gemini 2.5 Pro, and GPT-3.5 Turbo -- across four prompt formats (from concise narrative to structured JSON with explicit iteration trace) on Gauss--Seidel AC power flow computation for a three-bus system. Against 50 test cases with reference solutions computed numerically, Gemini 2.5 Pro with the simplest narrative prompt achieves the lowest mean absolute error (MAE = 0.257 MW/MVar, 54\% of cases within 5\% relative error), while the same model with a JSON-structured prompt raises MAE to 0.789 -- a 3.1$\times$ increase. Adding a worked example degrades accuracy for Gemini but provides a marginal gain for Claude. GPT-3.5 Turbo fails on at least 90\% of cases under all prompt formats. An independent 100-case replication with related prompt-format families confirms the qualitative ordering (Gemini $>$ Claude $>$ GPT-3.5): the best 100-case configuration (Gemini with explicit iteration trace) achieves MAE = 0.402 and 53\% within 5\%, while Claude Sonnet 4.5's near-flat accuracy profile ($\approx$38\% within 5\% across formats) and GPT-3.5's near total ineffectiveness (92--97\% above 20\% error) both replicate. In neither evaluation does any configuration achieve sufficient reliability for use as a direct numerical solver. These findings offer a diagnostic baseline for practitioners and researchers evaluating LLMs for smart-grid decision-support assistance.
Unmanned Aerial Vehicles in dynamic environments face telemetry outages, structural vibrations, and regime-dependent noise that invalidate the stationary covariance assumptions of classical Kalman filters. The Sage-Husa Kalman Filter (SHKF) estimates noise statistics online, but its reliance on a static, scalar forgetting factor forces a strict compromise between steady-state stability and transient responsiveness. We introduce the N-Deep Recurrent Sage-Husa Filter (NDR-SHKF), which replaces this scalar parameter with a vector-valued memory attenuation policy learned by a hierarchical recurrent network operating on whitened innovation sequences. A bifurcated architecture routes shallow recurrent states to capture instantaneous sensor anomalies and deep states to encode sustained dynamic trends, while an auxiliary reconstruction objective prevents feature collapse. The complete filter, including recursive covariance updates, is trained end-to-end via backpropagation through time to directly minimize state estimation error. Evaluations on topologically distinct chaotic attractors demonstrate cross-domain generalization, outperforming purely data-driven baselines that diverge under out-of-distribution dynamics. Furthermore, evaluations on recorded real-world UAV flight datasets validate the framework's practical viability, demonstrating its capacity to bridge transitions into proprioceptive dead reckoning and outperform classical adaptive estimators during sensor outages.
Dynamic MRI reconstruction from undersampled measurements is a challenging inverse problem that requires preserving both spatial reconstruction quality and temporal consistency across the frames of the cine series. While recent learning-based approaches achieve strong performance, they heavily rely on large training, mostly fully sampled, datasets, and may otherwise generalize poorly. In contrast, training-data-free methods such as deep image prior (DIP) adapt directly to individual scans but often fail to fully exploit temporal structure and are prone to overfitting. They are particularly attractive for dynamic MRI due to the limited large, public, high-quality datasets. In this work, we propose a structured DIP framework for dynamic MRI reconstruction that explicitly models spatiotemporal correlations through a low-rank plus sparse (L+S) decomposition. Instead of directly reconstructing the cine image series, we parameterize the low-rank background and sparse dynamic components using two DIP untrained convolutional neural networks, jointly optimized using accelerated extrapolated ADMM (eADMM). This formulation combines the implicit regularization of DIP with the interpretability of classical L+S regularization. We provide a convergence analysis for the proposed eADMM algorithm in the presence of DIP-based nonconvex parameterizations. In particular, we establish a sufficient descent property and show that every cluster point of the generated sequence is a critical point of the associated Lyapunov function. Across various acceleration factors, our numerical results demonstrate that the proposed method consistently outperforms classical reconstruction and existing supervised and unsupervised MRI reconstruction techniques.
Recent proposals for datacenters in sun-synchronous Low Earth Orbit rely on a large number of compute satellites formation-flying in dense clusters. Designing such satellite clusters requires optimizing the satellites' orbital geometry under several safety and operational constraints applied throughout the cluster's entire orbit. These constraints include guaranteeing a minimum inter-satellite spacing, obstruction-less solar power for every satellite, and that each satellite have a stable set of nearest neighbors with which it can maintain inter-satellite links (ISLs). In this work, we propose two main cluster orbital designs, parametrized by the minimum inter-satellite spacing $R_{min}$ and the cluster radius $R_{max}$: a planar cluster, and a 3D cluster. We show by construction and numerical analysis that both cluster orbital designs are consistent with the inter-satellite spacing, unobstructed sun-vector, and inter-satellite line of sight constraints. The proposed planar architecture is the most efficient packing of satellites in a plane for given $R_{min}$ and $R_{max}$ values, and our 3D architecture allows for the number of datacenter satellites to scale proportional to $(R_{max}/R_{min})^3$, an improvement over all previous LEO datacenter cluster designs. Finally, for a given satellite cluster, we formulate and solve an integer optimization problem that maps a VL2-like Clos network datacenter switching fabric onto the satellites and their corresponding set of feasible ISLs. We confirm that for both the planar and 3D architectures, there are sufficiently many permanently unobstructed ISLs within the cluster to replicate the switching fabric of terrestrial datacenters. We also examine the tradeoff between the number of ISLs each satellite can simultaneously sustain, and the corresponding number of cluster satellites that must be dedicated as aggregation and intermediate switches.
Chronic neck pain is a leading cause of disability worldwide, and current treatment selection remains largely trial and error. We present a machine learning framework that uses electroencephalography to predict treatment efficacy in patients with chronic neck pain, with the goal of supporting individualized therapy and reducing the burden on healthcare systems. The framework centers on a rigorous data preprocessing stage tailored to the characteristics of each EEG recording type. For resting-state EEG, the preprocessing pipeline comprises baseline signal removal, bad channel identification and exclusion, re-referencing, bandpass and notch filtering, Independent Component Analysis, and power spectral density analysis. For motor execution and motor imagery recordings, the same initial steps are applied, after which signals are aligned to trigger events so that event-related desynchronization (ERD) and event-related synchronization (ERS) can be quantified. Synchronously recorded electromyography data are bandpass filtered and smoothed with a moving average, then correlated with the corresponding EEG channels to characterize the EEG EMG relationship during attempted movement. In parallel, we performed an extensive literature review of machine learning models applied to clinical EEG (763 records initially screened, 16 patient and 47 healthy-control studies retained), to inform the post-processing strategy. Through this combined preprocessing and review effort, we aim to develop a robust predictive model that can support personalized healthcare strategies in chronic pain management.
Haptic rendering of viscoelastic materials that exhibit creep and stress relaxation is crucial for many applications, such as medical training with realistic biological tissue models. Fractional-order viscoelastic models provide an effective means of describing intrinsically time-dependent dynamics with few parameters, as these models can naturally capture memory effects. In this study, we present analyses of passivity and rendering performance for fractional-order viscoelastic models under finite-memory discretization. We derive closed-form expressions to ensure the passivity of haptic rendering with a fractional-order (FO) standard linear solid (SLS) model based on Grunwald-Letnikov derivative under short-memory discretization. We also provide symbolic expressions for the effective stiffness and damping of such FO-SLS models. The resulting passivity conditions constitute a unified framework that generalizes previously reported results for integer-order Kelvin-Voigt, Maxwell, and SLS models, since these results are special cases of the newly derived condition. Furthermore, we provide experimental validations of the theoretical passivity bounds and human-subject evaluations of perceived realism of FO-SLS models. Overall, this study establishes a unified theoretical framework and experimental evaluations for FO viscoelastic rendering under short-memory discretization.
Seismic prediction remains challenging due to the highly nonlinear and chaotic dynamics of earthquake signals. While classical deep learning models such as LSTMs and CNNs capture local temporal features, and quantum models offer richer state representations, their integration with chaos-driven mechanisms is underexplored. We introduce QuChaTeR, a hybrid architecture that combines wavelet-based preprocessing, chaotic maps, and variational quantum circuits with recurrent structures to enhance temporal feature extraction. Implemented in PyTorch and PennyLane, QuChaTeR is benchmarked against classical (LSTM, GRU, RNN, 1D-CNN, Reservoir Computing) and quantum-inspired (Quantum LSTM) baselines. On real-world seismic datasets, QuChaTeR consistently converges faster and achieves superior performance across multiple evaluation criteria. Despite promising results, scalability and quantum hardware limitations remain challenges. Overall, this work demonstrates how quantum-chaotic hybridization provides a practical pathway toward more accurate and robust earthquake prediction.
Accurate polyp segmentation in colonoscopy is essential for early colorectal cancer detection, yet real-world clinical environments pose persistent challenges such as motion blur, specular reflections, and illumination instability. Most existing methods are optimized on clean benchmark images and suffer noticeable performance degradation when deployed in authentic surgical scenarios. We propose DepthPolyp, a lightweight and robust segmentation framework based on pseudo-depth-guided multi-task learning and efficient feature modulation. The architecture combines hierarchical Ghost factorization for compact feature generation, Interleaved Shuffle Fusion for low-cost cross-scale interaction, and Dynamic Group Gating for adaptive group-wise feature weighting. Extensive experiments demonstrate that DepthPolyp achieves strong cross-dataset generalization when trained on degraded data and evaluated on both clean and noisy target domains, consistently outperforming lightweight baselines and remaining competitive with substantially larger models. In real surgical video evaluation on PolypGen, DepthPolyp achieves better segmentation performance than models up to $20\times$ larger while preserving real-time inference speed. With only 3.57M parameters and 0.86 GMACs, the proposed method runs at over 180 FPS on mobile devices, making it well suited for real-time deployment in resource-constrained clinical environments. Code and pretrained weights are available at: this https URL
Conventional techniques for compression and encryption are frequently laborious and resource-intensive, rendering them inappropriate for real-time applications. A plethora of research has been presented in the current literature to address these difficulties together; yet, it fails to propose any suitable strategy. Therefore, this study introduces an innovative simultaneous data compression and encryption (SDCE) system specifically designed for large video files. The methodology amalgamates chaotic map-based encryption with Huffman encoding for lossless compression into a cohesive framework, markedly diminishing computational overhead and processing duration while augmenting data security. The logistic map is utilized to produce a pseudo-random chaotic sequence for XOR-based encryption, guaranteeing robust security against unwanted access. The research findings demonstrate its efficacy in enhancing data privacy compared to other existing and related strategies, particularly in terms of generating greater entropy and avalanche effects. It produces superior throughput, compression ratio, peak signal-to-noise ratio (PSNR), and reduced bits per rate (BPC), along with a smaller percentage of data loss, which further supports its ability to provide enhanced data integrity compared to other existing methods.
Safety-critical autonomy in unstructured environments poses significant challenges for online safety certification under evolving constraints. We propose Policy Library Control Barrier Function~(PL-CBF), a runtime safety filter that evaluates a library of fallback policies via parallel finite-horizon rollouts, selects the least invasive safe mode, and enforces safety by solving a quadratic program that minimally modifies a nominal policy. We provide a theoretical analysis based on a finite-horizon language metric over closed-loop behaviors, characterizing policy-library coverage requirements for certifying finite-horizon safety. Simulations on a planar double-integrator (4 states), highway driving with abrupt friction changes using a realistic nonlinear vehicle model (8 states), and 3D quadrotor navigation in crowded dynamic environments (12 states) demonstrate improved safety coverage over single-policy safety filters while retaining millisecond-level runtime.
Quantum key distribution (QKD) provides information-theoretic security and satellite-based quantum key distribution (SatQKD) has demonstrated the potential to extend this communication security to intercontinental scales. However, atmospheric turbulence induces significant distortion in the spatial distribution of received optical beams, while background noise remains approximately uniform across the detector plane. As a result, single-element qubit (quantum bit) detection can be frequently dominated by noise due to the random spatial pattern of the imaged wavefront, thereby degrading the system performance. To address this limitation, we propose to exploit the spatial degrees of freedom of single-photon detector arrays to reject the excessive noise while adapting to channel variations induced by turbulence. We develop a threshold-based selection method that only activates detector elements that have higher probability of registering qubits. We evaluate the performance of the proposed noise-rejection QKD schemes using Monte Carlo simulations considering the impact of diffraction and atmospheric turbulence on the transmitted optical beam in the presence of background and dark noise. The results show that, compared to conventional schemes, the proposed noise-rejection strategy effectively reduces the quantum bit error rate (QBER) and improves the secret key rate (SKR) performance, while the performance gains depend on the turbulence condition. These findings demonstrate the potential of adaptive array receiver design to enhance the robustness of the SatQKD system under realistic atmospheric conditions.
Coded computing is a distributed paradigm that uses coding theory to introduce \textit{redundancy} and overcome bottlenecks in large-scale systems. In the same vein, randomized numerical linear algebra employs probabilistic methods to \textit{compress} and accelerate linear algebraic operations, addressing challenges in high-dimensional data analysis. This article reviews the foundations of both fields and presents distributed schemes that combine techniques from both to speed up optimization and machine learning algorithms, in the presence of slow or non-responsive servers. Along the way, we touch on various related topics and mathematical concepts.
Convolutional neural networks (CNNs) have been extensively and successfully applied to the task of synthetic aperture radar (SAR) image change detection. However, conventional convolutional layers are inherently limited by their local receptive fields, which mainly capture spatially localized patterns while neglecting the global context that is often crucial for accurately distinguishing subtle or large-scale changes in SAR imagery. To address these limitations, we propose a novel Global Dynamic Context-Aware Network (GDNet) specifically tailored for SAR image change detection. At the core of our approach lies a novel global dynamic convolution module, which adaptively modulates convolution kernel weights according to the global semantic information extracted from the input features. By dynamically incorporating long-range dependencies, this mechanism enables the network to integrate both local detail and global context, thus improving its ability to detect diverse change patterns. In addition, we introduce a carefully designed two-stage Mixup strategy for model training. Unlike conventional single-stage Mixup, our two-stage design generates more diverse and informative training samples, effectively regularizing the model and yielding more stable and reliable classification results even under limited data scenarios. Extensive experiments on three SAR datasets demonstrate the superiority of the proposed GDNet compared to other state-of-the-art methods. These findings highlight the potential of global dynamic modeling and advanced data augmentation strategies for advancing SAR image interpretation. Source codes are available at \url{this https URL}.
Semantic segmentation of multi-source remote sensing images is a fundamental task for Earth observation applications. Existing methods often struggle with insufficient multi-scale context modeling and suboptimal cross-modal feature fusion, limiting their performance in complex high-resolution scenes. To this end, we propose Axial-Relation Guided Fusion Mamba (ARG-Mamba), a state space model-based framework for optical-elevation remote sensing image segmentation. Specifically, we introduce a Multi-Scale State Space Module to capture both fine-grained local details and global contextual dependencies with linear computational complexity. Moreover, an Axial-Relation Guided Fusion Module is designed to explicitly model global cross-modal correlations along horizontal and vertical axes, enabling efficient feature fusion between optical and elevation modalities. Extensive experiments conducted on the ISPRS Vaihingen and Potsdam datasets demonstrate that our ARG-Mamba consistently outperforms state-of-the-art methods while maintaining favorable computational efficiency. The code will be made publicly available at \url{this https URL}.
Reinforcement Learning (RL) uses rewards to guide learning, yet reward design is typically hand-crafted using heuristics that can be difficult to tune. We propose a Control Barrier Function (CBF)-informed reward design for Multi-Agent RL (MARL) that converts CBF constraint values under joint MARL actions into a reward signal that explicitly guides safe learning. We compare against two heuristic reward baselines in a four-way multi-lane intersection with connected and automated vehicles. Results show that our method achieves the highest task performance and is less sensitive to reward hyperparameters, yielding consistently strong performance across the tested hyperparameter range. Code for reproducing the experimental results and a video demonstration are available at this https URL.
Immersive video delivery is bottlenecked by pixel-rate constraints, making the transmission of high-resolution depth maps or explicit 3D volumetric data expensive. Decoder-Side Depth Estimation (DSDE) shifts depth computation to the client, but struggles with complex geometries, inter-view flickering, and non-Lambertian reflections. Conversely, 3D Gaussian Splatting (3DGS) offers state-of-the-art view synthesis, but transmitting splats (or their projected 2D maps) incurs prohibitive bandwidth costs and is poorly aligned with standard video codecs. We propose Decoder-Side Gaussian Splatting (DSGS), a framework that natively replaces the depth-estimation stage of DSDE with feed-forward 3DGS inference, optimizing volumetric scenes entirely on the decoder side from compressed textures and metadata. A central, counterintuitive finding is that lossy compression acts as an implicit low-pass filter stabilizing feed-forward splat prediction: compressed bitstreams exceed lossless quality while shrinking tenfold. Under extreme view sparsity (one 2D atlas comprising 4 input views), DSGS achieves a +5.79 dB BD-PSNR and +0.054 BD-SSIM gain over the DSDE anchor while reducing maximum inter-view Delta IV-PSNR from 17.2 dB to 6.4 dB, minimizing the domain shift between transmitted and virtual viewports.
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce the agent bullwhip effect, the amplification of decision unreliability across echelons, manifesting along two dimensions: decision variance increases both across facilities at the same point in time and within the same facility across time. We develop a mathematical framework showing that this phenomenon is inherent to multi-agent systems that involve coordination and information delays, and we demonstrate that repeated sampling fails to meaningfully reduce it. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. GRPO post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.
Latent diffusion models have emerged as the dominant paradigm for many generation tasks including audio generation such as text-to-audio, text-to-music and text-to-speech. A key component of latent diffusion is an autoencoder (VAE) that compresses high-dimensional signals into a low frame rate continuous representation that is conducive for downstream prediction. Regularizing these VAEs is challenging, as there is a trade-off between over-regularized (poor output quality) and under-regularized (difficult to predict) latent representations. We propose a framework for studying this trade-off through compression and train Audio VAEs at specific bitrates via target-KL regularization. This allows direct comparison to well-studied discrete neural audio codec models, and the construction of rate-distortion curves for audio VAEs. We evaluate the impact of target-KL regularization on text-to-sound generation and find that sweeping compression rates is helpful in identifying the optimal generation setting.
We develop a class of data-adaptive shrinkage estimators for high-dimensional covariance estimation in which the shrinkage target is a Reynolds projection of the sample covariance under a finite symmetry group selected from a candidate library by held-out predictive performance. The class generalizes the convex shrinkage estimator of Ledoit and Wolf by replacing the scalar-identity target with a structured target derived from a symmetry group when one is available, and generalizes the group-symmetric maximum-likelihood estimator of Shah and Chandrasekaran by combining structural targeting with adaptive convex shrinkage and by selecting the group from data rather than treating it as prespecified. A two-tier procedure performs the group selection: a universal per-candidate evaluation based on held-out negative log-likelihood, optionally preceded by a domain-specific step that constructs the candidate library from structural priors. We establish a finite-sample regret bound for the held-out calibration of the convex combination weight, an oracle inequality for the data-driven group selection, and a quantitative sufficient-match condition under which the proposed estimator dominates Ledoit-Wolf shrinkage in Frobenius mean-squared error. The procedure is illustrated on six real-data problems spanning finance (S&P~500 daily returns), climate (NOAA OISST sea-surface temperature anomalies), genomics (TCGA-BRCA gene expression), radio signal processing (RadioML 2018.A), astronomical imaging (Galaxy10 DECaLS), and natural image patches (CIFAR-10 with a CIFAR-10.1 distribution-shift companion). An empirical comparison is also made against the Bayesian permutation-symmetry estimator of Chojecki and colleagues. Outside the few-shot regime, where structural priors carry the most information per observation, Ledoit-Wolf shrinkage remains the appropriate baseline.
Digital twins (DTs) rely on continuous synchronization between physical systems and their virtual counterparts through online parameter estimation under uncertainty. In many practical settings, however, this task is challenged by low observability, weak excitation, nonlinear dynamics, and noisy or biased measurements. In this work, we develop a new mathematical framework that integrates Weighted Flow Matching (WFM) generative modeling with physics-informed nonlinear filtering to enhance parameter estimation in DTs. WFM relies on dynamic reweighting of training samples, which guides the generative model toward parameter regimes most informative of the evolving system state. This generative component is tightly coupled with a physics-informed filtering architecture based on the Unscented Kalman Filter (UKF), yielding a unified DT framework that combines data-driven probability transport with physically consistent state and parameter estimation. The effectiveness of the new integrated framework is demonstrated within a spacecraft DT architecture, where stable moment of inertia estimation is achieved under uncertain and noisy sensing, with significant performance improvements over established approaches such as Extended Kalman Filtering (EKF) and Ensemble Kalman Filtering (EnKF). These results highlight the potential of weighted generative modeling as a core mechanism for real-time DT synchronization in operational and mission-critical systems.
Artificial Intelligence (AI) is widely adopted today for its ability to detect patterns, automate tasks, and reduce time and cost across various applications. Its integration into Cybersecurity has garnered significant attention, particularly in areas such as intrusion detection, malware analysis, and phishing or spam detection. As AI and cybersecurity evolve, new methods and approaches emerge regularly. Current trends include the use of Generative AI, Natural Language Processing, Federated Learning for privacy-preserving collaborative training, and eXplainable AI to ensure interpretability and trust, which are vital in cybersecurity. This paper presents an interesting review of current AI-based cybersecurity trends, focusing on intrusion detection approaches and aiming to uncover meaningful insights through comparative analysis based on the employed AI techniques and reported performance.
Automated driving system deployment requires rigorous validation across safety-critical vehicle-pedestrian interactions, yet real-world datasets rarely capture high-risk scenarios while simulation platforms lack realistic behavior. In response, this study proposes a three-stage framework that combines real-world grounding with adaptive simulation to generate behaviorally realistic safety-critical scenarios at scale. Stage 1 pre-trains multi-agent state-space Transformer-enhanced DDPG (MA-SST-DDPG) agents on real-world safety-critical data to learn human-like interactive evasive behaviors through data-driven learning. Stage 2 deploys pre-trained multi-agents in CARLA for online reinforcement learning to generalize across diverse scenarios, integrating real-world knowledge with simulation experience to produce a refined MA-SST-DDPG model. Stage 3 uses CARLA with the refined model to generate over 198,000 high-resolution interaction episodes from eight intersection scenarios, culminating in the Vehicle-Pedestrian Safety-Critical Interaction (VPSCI) dataset. The Refined MA-SST-DDPG model outperformed baseline methods in reproducing realistic evasive behaviors, achieving the lowest trajectory errors (ADE = 0.072 m, FDE = 0.142 m). Statistical comparison confirmed distributional equivalence between the generated and real-world data in both conflict severity and behavioral response. A Turing test confirmed that the three-stage framework generated evasive behaviors were indistinguishable from real-world interactions. These results demonstrate the framework's effectiveness in producing high-fidelity safety-critical data, offering valuable sources for the development of ADS and simulation-based safety evaluations.
Tactile sensing is a fundamental modality for embodied intelligence, offering unique and direct feedback on contact geometry, material properties, and interaction dynamics that remote sensors cannot replace. However, unimodal tactile perception is inherently limited by its sparse spatial coverage and lack of global semantic context. With the recent explosion in deep learning and large language models, integrating tactile with vision and language has become essential to bridge physical interaction with semantic reasoning, leading to the emergence of Multimodal Tactile Fusion. Despite rapid progress, the existing researches remain fragmented across disparate datasets, sensing modalities, and tasks, lacking a unified theoretical framework. To address this gap, this paper provides a comprehensive survey of multimodal tactile fusion research up to the first quarter of 2026. We propose a hierarchical taxonomy that organizes the field into two primary dimensions: multimodal datasets and multimodal methods. On the data side, we categorize resources ranging from Tactile-Vision datasets, Tactile-Language datasets, Tactile-Vision-Language datasets, and Tactile-Vision-Other datasets. On the method side, we structure prior work into three core pillars: (1) Multimodal Perception and Recognition, which focuses on object understanding and grasp prediction; (2) Cross-Modal Generation, focusing on bidirectional translation between tactile, vision, and text; and (3) Multimodal Interaction, emphasizing feedback control and language-guided manipulation. Furthermore, we summarize representative tactile sensing hardware, review commonly used evaluation metrics and benchmark settings, and discuss current challenges and promising future directions.
Monitoring deployments often require reliable long-range wireless links to intermittently upload sensor logs and short video snapshots. Wi-Fi HaLow (IEEE~802.11ah) is a promising candidate due to sub-1 GHz propagation and bandwidth-flexible PHY modes. This summary paper reports a field characterization organized around three deployment-driven regimes: (i) point-to-point Non-Line-of-Sight (NLoS) links; (ii) point-to-point Line-of-Sight (LoS) links over several-hundred-meter distances; and (iii) LoS mesh networking with fixed relay nodes for range extension. Using commodity HaLow dongle-class nodes in all regimes, we report application-layer goodput and monitoring-centric update latency based on transferring a representative ``heavy'' object (a $\sim$30 s video file). The measurements reveal (a) a clear bandwidth--range tradeoff and an NLoS coverage boundary around $\sim$120 m, (b) gradual throughput decay under LoS up to 814 m in single-hop with 0.15 Mbps at the farthest point, and (c) kilometer-class extension under LoS when fixed relays are introduced, reaching 901 m (two fixed relays) and 1110 m (three fixed relays
High-density LED arrays enable high-speed transmission in image-sensor-based visible-light communication (VLC) systems. However, when optical spots become blurred and spatially overlapped due to focal shift, resolution limitations, or interference, severe inter-symbol interference (ISI) occurs, significantly degrading decoding performance. Furthermore, radial distortion introduces geometric deformation of the LED grid, while vignetting leads to incomplete and asymmetric spot shapes at the periphery, both of which further hinder reliable signal detection. Existing methods mitigate ISI by reducing LED transmission signaling density. This paper proposes a robust decoding framework that maintains full LED signaling density. We introduce a pilot-aided geometric recognition method that uses a PSF-constrained Hough transform and circle-center alignment refinement. \textbf{In addition, radial distortion correction and vignetting-aware compensation are incorporated to restore geometric consistency and suppress edge-related detection errors.} By leveraging prior structural knowledge from pilot frames, the system effectively separates overlapping LED signals under severe optical distortion. Experimental results on a real-world VLC testbed confirm that the proposed method achieves superior decoding accuracy and throughput compared to conventional Hough-based and low-density baseline methods. The results highlight its potential for high-efficiency VLC applications in interference-prone environments.
We analyze how automatic speech recognition (ASR) errors propagate through ASR-LLM cascades in Korean spoken question answering (SQA), focusing on downstream semantic failures that conventional ASR metrics cannot fully capture. Our analysis shows that the relative downstream degradation caused by ASR errors is consistent across LLMs with different absolute performance, suggesting that cascade degradation largely tracks ASR-stage information loss. We further identify single-character Korean ASR errors as a distinct semantic-failure channel, where the gold answer becomes entirely absent from the downstream prediction despite only a minimal transcription difference. Finally, an auxiliary comparison shows that a large audio language model outperforms an ASR-LLM pipeline with a matched language backbone in noisy Korean SQA, indicating the potential of direct audio input to mitigate transcript-induced information loss.
Image super-resolution (SR) aims to reconstruct high-quality, high-resolution (HR) images from low-resolution (LR) inputs and plays a critical role in various downstream applications. Despite recent advancements, balancing reconstruction fidelity and computational efficiency remains a fundamental challenge, particularly in resource-constrained scenarios. While existing lightweight methods attempt to expand receptive fields, many of them either incur substantial computational overhead, naively scale up kernel sizes, or lack mechanisms for coherent multi-scale integration, limiting their overall effectiveness and scalability. To address these limitations, we propose EchoSR, an efficient context-harnessing framework for lightweight image super-resolution, which unifies multi-scale receptive field modeling and hierarchical context fusion. EchoSR decouples feature learning into disentangled local, multi-scale, and global modeling stages through an efficient context-harnessing strategy, and further promotes seamless cross-scale integration via a cross-scale overlapping fusion mechanism. Extensive experiments have shown that EchoSR consistently outperforms state-of-the-art lightweight super-resolution methods across multiple benchmarks, while also achieving a faster speed $(\sim 2\times)$. The source code is available at \url{this https URL}.
Electrocardiogram (ECG)-based biometric recognition has emerged as a promising solution for secure authentication and liveness detection. However, most existing methods rely on unimodal deep learning architectures that independently process either one-dimensional (1D) temporal signals or two-dimensional (2D) time-frequency representations, limiting robustness and generalization. To address this issue, this paper proposes a hybrid framework integrating 1D and 2D convolutional neural networks (CNNs) within a unified end-to-end architecture. The 1D branch extracts temporal and morphological features from raw ECG signals, while the 2D branch captures discriminative spectral information from time-frequency representations. An attention-guided fusion mechanism dynamically weights both modalities according to input characteristics, overcoming the limitations of conventional static fusion strategies. The framework was evaluated on three benchmark datasets (ECG-ID, MIT-BIH, and PTB), including healthy subjects and patients with cardiac pathologies, achieving identification accuracies of 99.56%, 100.00%, and 99.89%, respectively. To assess long-term biometric permanence, experiments were also conducted on the multi-session Heartprint dataset spanning ten years. The proposed approach achieved same-session accuracies of 98.54% (S1), 99.09% (S2), 94.93% (S3R), and 96.08% (S3L), while cross-session evaluations reached 56.33% (S1-S2) and 53.27% (S2-S3R), demonstrating the ability to capture stable biometric signatures over time. The optimal configuration combines InceptionTime for 1D processing, ResNet-34 for 2D analysis, and attention-based fusion. Ablation studies confirm that the proposed attention mechanism consistently outperforms conventional fusion approaches. Overall, the proposed framework provides a robust, scalable, and high-performance solution for ECG biometric recognition.
Although modern multilingual Automatic Speech Recognition (ASR) systems support several Nigerian languages, their performance consistently lags behind high-resource languages like English and French. Nigerian languages present unique modelling hurdles, including acute data scarcity, inconsistent orthography, tonal diacritics, diverse accents, frequent code-switching, and localized named entities. To address these challenges, we developed a multilingual ASR framework utilizing a two-stage distillation process. First, we employ student-teacher knowledge distillation from existing monolingual models, conditioned on robust language-specific N-gram language models. Second, we perform iterative self improvement using pseudo-labelled data to further refine accuracy. Our method significantly bridges the performance gap, achieving on average a relative Word Error Rate (WER) reduction of 29 % over monolingual baselines. Our models also outperform state-of-the-art multilingual models across major benchmarks, including Common Voice and Fleurs. We introduce Sometin Beta Pass Notin (SBPN), a foundational multilingual ASR model covering Yorùbá, Hausa, Igbo, Nigerian Pidgin, and Nigerian English. SBPN is released in two sizes: SBPN-Base (120 M parameters) and SBPN-Large (600 M parameters). By releasing these as open foundation models, we aim to provide ASR resources for further research into the rich phonetic and cultural landscape of the region.
Generative models have demonstrated remarkable potential in time series analysis tasks, like synthesis, forecasting, imputation, etc. However, offering limited coverage for generative models, existing time series libraries are mainly engineered for discriminative models, with standardized workflows for specific tasks, such as optimizing Mean Squared Errors for time series forecasting. This rigid structure is fundamentally incompatible with the distinct and often complex paradigms of generative models (e.g., adversarial training, diffusion processes), which learn the underlying data distribution rather than a direct input-output mapping. To this end, we proposed GenTS, a comprehensive and extensible benchmark library designed for systematic assessment on generative time series models. GenTS features a unified data preprocessing pipeline, a collection of versatile models, and panoramic evaluation metrics. Its modular design also enables the researchers to flexibly customize beyond our built-in datasets and models. Based on GenTS, we conducted benchmarking experiments under diverse tasks, accordingly offering suggestions for model selection and identifying potential directions for future research. Our codes are open-source at this https URL. The official tutorials and document are available at this https URL.
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.
Formation control of wheeled mobile robots (WMRs) has been extensively studied due to its broad applications in fields such as logistics transportation, environmental monitoring, and search and rescue. However, most existing works mainly focus on tracking predefined formations, which limits their adaptability to complex real-world environments. To address this, we propose REACT (Real-time Environment-Adaptive architecture for Continuous formation navigaTion), a hierarchical architecture integrating centralized formation generation and distributed formation maintenance. Specifically, our upper layer generates new environment-adaptive formations when necessary and uses our proposed TCF-R2T (Trajectory-Conflict-Free Robot-to-Target assignment) algorithm to compute conflict-free WMR-to-target assignments in polynomial time, enabling timely formation transitions without trajectory conflicts. At the lower layer, each WMR executes our developed JSTP (Joint Spatio-Temporal trajectory Planning) method to maintain the generated formation by simultaneously optimizing spatial positions and temporal durations, thereby enhancing coordination among WMRs and enabling continuous navigation in obstacle-rich environments and dynamic-obstacle scenarios. Both simulation and real-world experiments validate the effectiveness and practical applicability of REACT. Experimental videos are available on our project website: this https URL.
A lightweight and reproducible denoising pipeline for high-throughput Raman spectroscopy is presented. The approach relies on a one-dimensional convolutional autoencoder trained using a Noise2Noise strategy, requiring neither external spectral libraries nor high signal-to-noise reference spectra for training. From a reduced training subset composed of repeated short-exposure acquisitions, the model learns to reconstruct Raman spectra while efficiently suppressing stochastic noise. The method is evaluated on a heterogeneous mineral sample, using both quantitative spectral fidelity metrics (RMSE, SNR, SSIM) and task-oriented criteria based on unsupervised K-means classification. Results demonstrate that integration times as short as 5 ms per spectrum, which are typically insufficient for reliable interpretation, yield denoised spectra with high fidelity to the reference data while preserving chemically coherent maps. This work provides a practical trade-off between spectral quality and acquisition speed, enabling fast, adaptable Raman workflows compatible with routine laboratory use. It also offers a transferable framework for other one-dimensional spectroscopic modalities.
Fringe projection profilometry (FPP) is a widely used technique for measuring object surface form and three-dimensional (3D) geometry, capable of delivering high-precision, high-resolution measurements when paired with suitable cameras and projectors. However, in practical deployments, identifying parameter configurations that maximise precision while satisfying real-world constraints remains challenging. To address this, we present an automated digital twin framework implemented in Blender, an open-source 3D software package that provides a ray-traced rendering environment that enables accurate simulation of physical systems. We replicated the physical setup in our digital twin by matching characterisation quality, gamma response, and characterisation images. Accurate system characterisation using Zhang's method [1], to obtain intrinsic and extrinsic parameters, is shown to be critical for achieving high precision. Using this digital twin, we then demonstrate systematic exploration and optimisation of key parameters, including phase-shift count, camera-projector spacing, and fringe density. These parameters span both system geometry (e.g. camera-projector positioning) and algorithmic choices, such as 2D phase-shifting and unwrapping methods [2]. Three measurement artefacts, representative of real world metrology scenarios, were used to benchmark the system. The symmetrical mean Chamfer distance (SMCD), computed between ground-truth and reconstructed meshes, was used to evaluate reconstruction quality. After optimisation within the digital twin, transferring the optimal parameters to the physical system reduced the number of required images per measurement by 48% (from 36 to 21). A reduction of 74.0% mean SMCD was also achieved for fringe pattern stripe count alteration. A 36.9% mean SMCD was obtained for adjusting the camera and projector spacing purely in the digital-twin.
In this paper, we propose the frameworks of generalized performance evaluation and generalized controller synthesis. To this end, we give a true concurrent process calculus as the model of systems, and present a lattice-valued performance evaluation language as the performance specification of systems. We give a framework of generalized performance evaluation based on the process calculus and the performance evaluation language. We show that the several problems in computer science are special cases of generalized performance evaluation. A generalized performance evaluation algorithm is presented. Furthermore, we present a framework of generalized controller synthesis, which is the inverse problem of generalized performance evaluation. We show several special cases of generalized controller synthesis in computer science, and give an outline of generalized controller synthesis algorithm.
Designing systems is typically uncertain and ambiguous at early stages. Set-based design supports alternative exploration and gradual uncertainty reduction during the early lifecycle, making it practical for complex systems design. In parallel, the functional requirements decomposition helps to advance the design incrementally. However, current literature on set-based design lacks formal guidance in how to decompose functional requirements. To bridge this gap, we introduce a four-step method to decompose functional requirements for set-based design hierarchically. We systematically define, reason, and narrow the sets, breaking down the functional requirements into formal sub-requirements. This method allows parallel abstraction, ensuring the resulting system satisfies the top-level functional requirements.
We describe a family of iterative algorithms that involve the repeated execution of discrete and inverse discrete Fourier transforms. One interesting member of this family is motivated by the discrete Fourier transform uncertainty principle and involves the application of a sparsification operation to both the real domain and frequency domain data with convergence obtained when real domain sparsity hits a stable pattern. This sparsification variant has practical utility for signal denoising, in particular the recovery of a periodic spike signal in the presence of Gaussian noise. General convergence properties and denoising performance relative to existing methods are demonstrated using simulation studies. An R package implementing this technique and related resources can be found at this https URL.
A core aim of neurocritical care is to prevent secondary brain injury. Spreading depolarizations (SDs) have been identified as an important independent cause of secondary brain injury. SDs are usually detected using invasive electrocorticography recorded at high sampling frequency. Recent pilot studies suggest a possible utility of scalp electrodes generated electroencephalogram (EEG) for non-invasive SD detection. However, noise and attenuation of EEG signals makes this detection task extremely challenging. Previous methods focus on detecting temporal power change of EEG over a fixed high-density map of scalp electrodes, which is not always clinically feasible. Having a specialized spectrogram as an input to the automatic SD detection model, this study is the first to transform SD identification problem from a detection task on a 1-D time-series wave to a task on a sequential 2-D rendered imaging. This study presented a novel ultra-light-weight multi-modal deep-learning network to fuse EEG spectrogram imaging and temporal power vectors to enhance SD identification accuracy over each single electrode, allowing flexible EEG map and paving the way for SD detection on ultra-low-density EEG with variable electrode positioning. Our proposed model has an ultra-fast processing speed (<0.3 sec). Compared to the conventional methods (2 hours), this is a huge advancement towards early SD detection and to facilitate instant brain injury prognosis. Seeing SDs with a new dimension - frequency on spectrograms, we demonstrated that such additional dimension could improve SD detection accuracy, providing preliminary evidence to support the hypothesis that SDs may show implicit features over the frequency profile.
In this paper, coordination control of discrete event systems under joint sensor and actuator attacks is investigated. Sensor attacks are described by a set of attack languages using a proposed ALTER model. Several local supervisors are used to control the system. The goal is to design local supervisors to ensure safety of the system even under cyber attacks (CA). The necessary and sufficient conditions for the existence of such supervisors are derived in terms of conditional decomposability, CA-controllability and CA-observability. A method is developed to calculate local state estimates under sensor attacks. Two methods are also developed to design local supervisors, one for discrete event systems satisfying conditional decomposability, CA-controllability and CA-observability, and one for discrete event systems satisfying conditional decomposability only. The approach works for both stealthy and non-stealthy attacks. A practical example is given to illustrate the results.
Diffusion Tensor Cardiac Magnetic Resonance (DT-CMR) is the only in vivo method to non-invasively examine the microstructure of the human heart. Current research in DT-CMR aims to improve the understanding of how the cardiac microstructure relates to the macroscopic function of the healthy heart as well as how microstructural dysfunction contributes to disease. To get the final DT-CMR metrics, we need to acquire diffusion weighted images of at least 6 directions. However, due to DWI's low signal-to-noise ratio, the standard voxel size is quite big on the scale for microstructures. In this study, we explored the potential of deep-learning-based methods in improving the image quality volumetrically (x4 in all dimensions). This study proposed a novel framework to enable volumetric super-resolution, with an additional model input of high-resolution b0 DWI. We demonstrated that the additional input could offer higher super-resolved image quality. Going beyond, the model is also able to super-resolve DWIs of unseen b-values, proving the model framework's generalizability for cardiac DWI superresolution. In conclusion, we would then recommend giving the model a high-resolution reference image as an additional input to the low-resolution image for training and inference to guide all super-resolution frameworks for parametric imaging where a reference image is available.
We present a novel filtering algorithm that employs Bayesian transfer learning to address the challenges posed by mismatched intensity of the noise in a pair of sensors, each of which tracks an object using a nonlinear dynamic system model. In this setting, the primary sensor experiences a higher noise intensity in tracking the object than the source sensor. To improve the estimation accuracy of the primary sensor, we propose a framework that integrates Bayesian transfer learning into an Unscented Kalman Filter (UKF) and a Cubature Kalman Filter (CKF). In this approach, the parameters of the predicted observations in the source sensor are transferred to the primary sensor and used as an additional prior in the filtering process. Our simulation results show that the transfer learning approach significantly outperforms the conventional isolated UKF and CKF. Comparisons to a form of measurement vector fusion are also presented.
Ocean exploration utilizing autonomous underwater vehicles (AUVs) via reinforcement learning (RL) has emerged as a significant research focus. However, underwater tasks have mostly failed due to the observation delay caused by information limitation in the information updating networks. In this study, we present an AoI optimized Markov decision process (AoI-MDP) to improve the performance of underwater tasks. Specifically, AoI-MDP models observation delay as timing delay through statistical delay formulation, and includes this delay as a new component in the state space. Additionally, we introduce wait time in the action space, and integrate AoI with reward functions to achieve joint optimization of information freshness and decision-making for AUVs leveraging RL for training. Finally, we apply this approach to the multi-AUV data collection task scenario as an example. Simulation results highlight the feasibility of AoI-MDP, which effectively minimizes AoI while showcasing superior performance in the task. To accelerate relevant research in this field, we have made the simulation codes available as open-source.
The Information Updating Networks (IUNs) offers significant potential for ocean exploration but encounters challenges due to dynamic underwater environments and severe system attenuation. Current methods relying on Autonomous Underwater Vehicles (AUVs) based on online reinforcement learning (RL) lead to high computational costs and low data utilization. To address these issues and the constraints of turbulent ocean environments, we propose a multi-AUV assisted data collection framework for IUNs based on multi-agent offline RL. This framework maximizes data rate and the value of information (VoI), minimizes energy consumption, and ensures collision avoidance by utilizing environmental and equipment status data. We introduce a semi-communication decentralized training with decentralized execution (SC-DTDE) paradigm and a multi-agent independent conservative Q-learning algorithm (MAICQL) to effectively tackle the problem. Extensive simulations demonstrate the high applicability, robustness, and data collection efficiency of the proposed framework.
The following paper introduces Dual beam-similarity awaRe Integrated sensing and communications (ISAC) with controlled Peak-to-average power ratio (DRIP) waveforms. DRIP is a novel family of space-time ISAC waveforms designed for dynamic peak-to-average power ratio (PAPR) adjustment. The proposed DRIP waveforms are designed to conform to specified PAPR levels while exhibiting beampattern properties, effectively targeting multiple desired directions and suppressing interference for multi-target sensing applications, while closely resembling radar chirps. For communication purposes, the proposed DRIP waveforms aim to minimize multi-user interference across various constellations. Addressing the non-convexity of the optimization framework required for generating DRIP waveforms, we introduce a block cyclic coordinate descent algorithm. This iterative approach ensures convergence to an optimal ISAC waveform solution. Simulation results validate the DRIP waveforms' superior performance, versatility, and favorable ISAC trade-offs, highlighting their potential in advanced multi-target sensing and communication systems.
Signal Temporal Logic (STL) robustness is a common objective for optimal robot control, but its dependence on history limits the robot's decision-making capabilities when used in Model Predictive Control (MPC) approaches. In this work, we introduce Signal Temporal Logic robustness-to-go (Ro-To-Go), a new quantitative semantics for the logic that isolates the contributions of suffix trajectories. We prove its relationship to formula progression for Metric Temporal Logic, and show that the robustness-to-go depends only on the suffix trajectory and progressed formula. We implement robustness-to-go as the objective in an MPC algorithm and use formula progression to efficiently evaluate it online. We test the algorithm in simulation and compare it to MPC using traditional STL robustness. Our experiments show that using robustness-to-go results in a higher success rate.
Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.
With the fast development of zero-shot text-to-speech technologies, it is possible to generate high-quality speech signals that are indistinguishable from the real ones. Speech editing, including speech insertion and replacement, appeals to researchers due to its potential applications. However, existing studies only considered clean speech scenarios. In real-world applications, the existence of environmental noise could significantly degrade the quality of generation. In this study, we propose a noise-resilient speech editing framework, SeamlessEdit, for noisy speech editing. SeamlessEdit adopts a frequency-band-aware noise suppression module and an in-content refinement strategy. It can well address the scenario where the frequency bands of voice and background noise are not separated. The proposed SeamlessEdit framework outperforms state-of-the-art approaches in multiple quantitative and qualitative evaluations.
Ultrasound Coherent Plane-Wave Compounding (CPWC) enhances image contrast by combining echoes from multiple steered transmissions. While increasing the number of steering angles generally improves image quality, it significantly reduces frame rate and may introduce blurring artifacts in fast-moving targets. In addition, compounded images remain susceptible to noise, particularly when acquired using a limited number of transmissions. In this work, we propose a lightweight physics-aware zero-shot denoising framework for low-angle CPWC ultrasound imaging that improves image quality without requiring external training datasets or clean reference images. The proposed approach partitions the available steering angles into two disjoint subsets, each used to reconstruct compounded images with different angle-dependent artifacts and noise characteristics. These reconstructed images are then used as pseudo-pairs within a self-supervised residual learning framework to train a lightweight convolutional neural network directly on the test sample. Because the underlying tissue structures remain consistent across the subsets while the incoherent artifacts vary with steering angle selection, the proposed physics-aware pairing strategy enables the network to distinguish anatomical information from inconsistent noise and artifacts. Unlike supervised approaches, the proposed method does not require domain-specific fine-tuning or paired datasets, making it adaptable across different anatomical regions and acquisition settings. Furthermore, the proposed framework employs an efficient architecture composed of only two convolutional layers, enabling fast and computationally inexpensive training.
Diffusion models have shown strong performance in speech enhancement, but their real-time applicability has been limited by multi-step iterative sampling. Consistency distillation has recently emerged as a promising alternative by distilling a one-step consistency model from a multi-step diffusion-based teacher model. However, distilled consistency models are inherently biased towards the sampling trajectory of the teacher model, making them less robust to noise and prone to inheriting inaccuracies from the teacher model. To address this limitation, we propose ROSE-CD: Robust One-step Speech Enhancement via Consistency Distillation, a novel approach for distilling a one-step consistency model. Specifically, we introduce a randomized learning trajectory to improve the model's robustness to noise. Furthermore, we jointly optimize the one-step model with two time-domain auxiliary losses, enabling it to recover from teacher-induced errors and surpass the teacher model in overall performance. This is the first pure one-step consistency distillation model for diffusion-based speech enhancement, achieving 54 times faster inference speed and superior performance compared to its 30-step teacher model. Experiments on the VoiceBank-DEMAND dataset demonstrate that the proposed model achieves state-of-the-art performance in terms of speech quality. Moreover, its generalization ability is validated on both an out-of-domain dataset and real-world noisy recordings.
Objective: Latent diffusion models (LDMs) could mitigate data scarcity challenges affecting machine learning development for medical image interpretation. The recent CCELLA LDM improved prostate cancer detection performance using synthetic MRI for classifier training but was limited to the axial T2-weighted (AxT2) sequence, did not investigate inter-institutional domain shift, and prioritized PI-RADS over histopathology outcomes. Methods: We propose CCELLA++, a novel LDM pipeline for simultaneous 3D biparametric prostate MRI (bpMRI) generation, including the AxT2, high b-value diffusion series (HighB) and apparent diffusion coefficient map (ADC), to overcome these limitations. We investigated source-free domain adaptation with classifiers pretrained on single institution real or LDM-generated synthetic data prior to fine-tuning on fractions of an out-of-distribution, external dataset. Results: CCELLA++ achieved comparable AxT2 Kernel Inception Distance to CCELLA (0.0128, 0.0131 respectively). CCELLA++ synthetic bpMRI pretraining outperformed real bpMRI in AP and AUC up to 12.5% (n<=166) external dataset volume (p<0.01 all), no pretraining in AUC up to 25% external volume (n=332, p<0.05 all), and CCELLA AxT2-only pretraining in both data-scarce (n=83, p<0.001 AP and AUC) and full data (n=1329, p<0.05 AP and AUC) scenarios. Conclusion: CCELLA++ synthetic bpMRI can improve downstream classifier generalization and performance beyond real bpMRI or CCELLA-generated AxT2-only images. Future work should quantify medical image quality, balance bpMRI LDM training, and condition the LDM with additional information. Significance: CCELLA++ can generate synthetic bpMRI that outperforms real data for domain adaptation with data-scarce external institutions, advancing machine learning development for medical imaging. Our code is available at this https URL
Infinite networks are complex interconnected systems comprising a countably infinite number of subsystems, for which no fixed upper bound on the number of participating subsystems is specified a priori since it may vary over time as agents join or leave (e.g., vehicles in traffic). In such scenarios, the presence of infinitely many subsystems within the network renders the existing analysis frameworks tailored for finite networks inapplicable to infinite ones. This paper is concerned with offering a data-driven approach, within a compositional framework, for the safety certification of infinite networks with both unknown mathematical models and unknown interconnection topologies. Given the immense computational complexity stemming from the extensive dimension of infinite networks, our approach capitalizes on the joint dissipativity-type properties of subsystems, characterized by storage certificates. We introduce innovative compositional data-driven conditions to construct a barrier certificate for the infinite network leveraging storage certificates of its unknown subsystems derived from data, while offering correctness guarantees for network safety. We demonstrate that our compositional data-driven reasoning eliminates the requirement for checking the traditional dissipativity condition, which typically mandates precise knowledge of the interconnection topology. We illustrate our data-driven results on two physical infinite networks with unknown models and interconnection topologies.
Standard random projection techniques typically operate as a black box, mapping high-dimensional structures directly to a lower-dimensional space where the target dimension must be specified a \textit{priori}. To address scenarios where the optimal ultimate dimension is unknown, this paper investigates the retention of information through a sequential, step-by-step dimension reduction process. We examine a fixed, bounded convex body as it undergoes successive random orthogonal projections, systematically reducing the ambient dimension by one at each step. By demonstrating that this sequence of observed bodies forms a Markov chain, we quantify the information preserved through these reductions using the conditional mutual information between successive projections given the original convex body. We derive a theoretical upper bound on this conditional mutual information, parameterized by the Haar measure of the projection spaces that yield the same observed body. Leveraging the established Markov property, we extend these results to an arbitrary number of iterations, proving that the initial two-step bound characterizes information retention across the entire sequence of projections. Furthermore, by analyzing the projection space under the symmetry group of the initial body, we demonstrate that geometric asymmetry serves as a beneficial asset, resulting in higher overall information retention.
This paper introduces a dimension-decomposed geometric learning framework called Sliced Learning for disturbance identification in quadrotor geometric attitude control. Instead of conventional learning-from-states, this framework adopts a learning-from-error strategy by using the Lie-algebraic error representation as the input feature, enabling axis-wise space decomposition (``slicing") while preserving the SO(3) structure. This is highly consistent with the geometric mechanism of cognitive control observed in neuroscience, where neural systems organize adaptive representations within structured subspaces to enable cognitive flexibility and efficiency. Based on this framework, we develop a lightweight and structurally interpretable Sliced Adaptive-Neuro Mapping (SANM) module. The high-dimensional mapping for online identification is axially ``sliced" into multiple low-dimensional submappings (``slices"), implemented by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation within their respective shared subspaces. To enhance interpretability, we prove exponential convergence despite time-varying disturbances and inertia uncertainties. To our knowledge, Sliced Learning is among the first frameworks to demonstrate lightweight online neural adaptation at 400 Hz on resource-constrained microcontroller units (MCUs), such as STM32, with real-world experimental validation.
Driven by intelligent reflecting surface (IRS) and movable antenna (MA) technologies, movable IRS (MIRS) has been proposed to improve the adaptability and performance of conventional IRS, enabling flexible adjustment of the IRS reflecting element positions. This paper investigates MIRS-aided integrated sensing and communication (ISAC) systems. The objective is to minimize the power required for satisfying the quality-of-service (QoS) of sensing and communication by jointly optimizing the MIRS element positions, IRS reflection coefficients, transmit beamforming, and receive filters. To balance the performance-cost trade-off, we proposed two MIRS schemes: element-wise control and array-wise control, where the positions of individual reflecting elements and arrays consisting of multiple elements are controllable, respectively. To address the joint beamforming and position optimization, a product Riemannian manifold optimization (PRMO) method is proposed, where the variables are updated over a constructed product Riemannian manifold space (PRMS) in parallel via penalty-based transformation and Riemannian Broyden-Fletcher-Goldfarb-Shanno (RBFGS) algorithm. Simulation results demonstrate that the proposed MIRS outperforms conventional IRS in power minimization with both element-wise control and array-wise control. Specifically, with different system parameters, the minimum power is achieved by the MIRS with the element-wise control scheme, while suboptimal solution and higher computational efficiency are achieved by the MIRS with array-wise control scheme.
Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.
This work addresses the challenge of making generative models suitable for resource-constrained environments like mobile wireless communication systems. We propose a generative model that integrates Autoregressive (AR) parameterization into a Gaussian Mixture Model (GMM) for modeling Wide-Sense Stationary (WSS) processes. By exploiting model-based insights allowing for structural constraints, the approach significantly reduces parameters while maintaining high modeling accuracy. Channel estimation experiments show that the model can outperform standard GMMs and variants using Toeplitz or circulant covariances, particularly with small sample sizes. For larger datasets, it matches the performance of conventional methods while improving computational efficiency and reducing the memory requirements.
Tube-based Model Predictive Control (MPC) is a widely adopted robust control framework for constrained linear systems under additive disturbance. The paper is focused on reducing the numerical complexity associated with the tube parameterization, described as a sequence of elastically-scaled zonotopic sets. A new class of scaled-zonotope inclusion conditions is proposed, alleviating the need for a priori specification of certain set-containment constraints and achieving significant reductions in complexity. A comprehensive complexity analysis is provided for both the polyhedral and the zonotopic setting, illustrating the trade-off between an enlarged domain of attraction and the required computational effort. The proposed approach is validated through extensive numerical experiments.
The rapid expansion of oceanic applications such as underwater surveillance and mineral exploration is driving the need for real-time wireless backhaul of massive observational data. Such demands are challenging to meet using the narrowband acoustic approach. Alternatively, with the participation of low-altitude platforms (LAPs), water-air optical wireless communication (OWC) has emerged as a promising solution owing to its high potential for broadband transmission. However, implementing water-air OWC remains challenging, particularly when signals penetrate the fluctuating interface, where dynamic refraction induces severe beam misalignment with airborne stations. This necessitates real-time transceiver alignment capable of adapting to complex oceanic dynamics, which remains largely unaddressed. Against this background, this paper establishes a mathematical channel model for water-air optical transmission across a time-varying sea surface. Based on the model, a vision-based beam tracking algorithm combining convolutional neural network and bi-directional long short-term memory with an attention mechanism is developed to extract key spatio-temporal features. Simulations verify that the proposed algorithm outperforms classical methods in maintaining received signal strength and suppressing vision noise, demonstrating its robustness for water-air OWC systems.
Speech Continuation (SC) is the task of generating a coherent extension of a spoken prompt while preserving both semantic context and speaker identity. Because SC is constrained to a single audio stream, it offers a more direct setting for probing biases in speech foundation models than dialogue does. In this work we present the first systematic evaluation of bias in SC, investigating how gender and phonation type (breathy, creaky, end-creak) affect continuation behaviour. We evaluate three recent models: SpiritLM (base and expressive), VAE-GSLM, and SpeechGPT across speaker similarity, voice quality preservation, and text-based bias metrics. Results show that while both speaker similarity and coherence remain a challenge, textual evaluations reveal significant model and gender interactions: once coherence is sufficiently high (for VAE-GSLM), gender effects emerge on text-metrics such as agency and sentence polarity. In addition, continuations revert toward modal phonation more strongly for female prompts than for male ones, revealing a systematic voice-quality bias. These findings highlight SC as a controlled probe of socially relevant representational biases in speech foundation models, and suggest that it will become an increasingly informative diagnostic as continuation quality improves.
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems with input constraints. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
Even though more than 30 years have passed since the seminal Rudin--Osher--Fatemi (ROF) paper on total variation (TV) denoising, it remains relevant, in particular in scientific applications such as astronomical imaging. However, it is known to suffer from artifacts such as the staircasing effect. Many variants of the model have been proposed with the aim of countering this. Recently, against the backdrop of immense research output on double-phase problems in the mathematical analysis community, a double-phase type integral functional, comprising of TV and a weighted term of quadratic growth, was suggested as a regularizer for image restoration. Here, we propose an adaptive variant of the ROF denoising model based on that regularizer. It is designed to reduce staircasing with respect to the classical ROF model, while preserving the edges of the image in a similar fashion. We implement the model and test its performance on synthetic and natural images over a range of noise levels. Compared to {established} models {with similar interpretability to ROF}, we observe an improved or similar performance in terms of similarity metrics SSIM, PSNR, {and LPIPS}, while the staircasing effect is visibly reduced.
Transmission Expansion Planning (TEP) optimizes power grid upgrades and investments to ensure reliable, efficient, and cost-effective electricity delivery while addressing grid constraints. To support growing demand and renewable energy integration, energy storage is emerging as a pivotal asset that provides temporal flexibility and alleviates congestion. This paper develops a multiperiod, two-stage PTDF formulation that co-optimizes transmission upgrades and storage siting/sizing. To ensure scalability, a trust-region, multicut Benders scheme warm-started from per-representative-day optima is proposed. Applied to a 2,000-bus synthetic Texas system under high-renewable projections, the method attains final optimality gaps below 2% and yields a plan with storage at 167 nodes (32% of peak renewable capacity). These results demonstrate that the proposed PTDF-based methodology efficiently handles large distributed storage fleets, demonstrating scalability at high spatial resolution.
Summary: SAMRI is an MRI-specialized adaptation of the Segment Anything Model achieving superior whole-body MRI segmentation, particularly for small and clinically critical structures, through box and point prompts for rapid annotation. Purpose: Existing SAM adaptations treat MRI as a generic modality, overlooking variable tissue contrast, intensity inhomogeneity, and clinically important small structures. We propose an MRI-specialized foundation model with strong whole-body segmentation and zero-shot generalization for direct use on any MRI annotation task. Methods: SAMRI fine-tunes only the mask decoder of SAM (ViT-B/16), keeping encoders frozen to preserve pretrained representations and eliminate redundant passes-reducing training time by 94%, trainable parameters by 96%, and FLOPs by ~99% versus full-model retraining. Training used 1.1 million 2D slice-mask pairs from 30 datasets spanning 47 targets, T1/T2/FLAIR/DWI contrasts, and whole-body anatomy, with focal-Dice loss and bounding-box (with optional point) prompts. Sizes were stratified by mask area (small: <0.5%; medium: 0.5-3.5%; large: >3.5%), and significance assessed by the Wilcoxon signed-rank test. Results: SAMRI with box+point prompts achieved mean DSC 0.87 +/- 0.11 across 47 targets, outperforming MedSAM (0.74 +/- 0.24) by 17.6% (p < 0.05), with largest gains for small (+42.4%) and medium (+26.9%) structures. On six zero-shot datasets, SAMRI achieved mean DSC 0.85, outperforming baselines. Inference requires only ~4.5 GB VRAM through an interactive interface on standard hardware. Conclusion: Decoder-only fine-tuning on a large, MRI-specific corpus delivers superior whole-body segmentation with strong zero-shot generalization, particularly for small and clinically salient structures. Public code, pretrained models, and an interactive interface make SAMRI deployable for MRI segmentation research and clinical workflows.
Velocity estimation is a cornerstone of the recently introduced near-field predictive beamforming. This paper derives the Cramer-Rao bounds (CRBs) for joint radial and transverse velocity estimation within a predictive beamforming framework employing a modular linear array (MLA). We obtain closed-form expressions that characterize the interplay between array geometry and estimation accuracy, showing that increasing the inter-module separation enlarges the effective aperture and reduces the transverse-velocity CRB, while the radial-velocity CRB remains largely insensitive to this separation. Furthermore, we show that an MLA can achieve the same accuracy as a collocated ULA with fewer antennas and quantify the relation between inter-module spacing and antenna savings. The derived expressions are validated through simulations by comparing them with the mean-squared error (MSE) of the maximum likelihood estimator (MLE) reported in the literature.
Computed tomography perfusion (CTP) and magnetic resonance perfusion (MRP) are widely used in acute ischemic stroke assessment and other cerebrovascular conditions to generate quantitative maps of cerebral hemodynamics. While commercial perfusion analysis software exists, it is often costly, closed source, and lacks customizability. This work introduces PyPeT, an openly available Python Perfusion Tool for head CTP and MRP processing. PyPeT is capable of producing cerebral blood flow (CBF), cerebral blood volume (CBV), mean transit time (MTT), time-to-peak (TTP), and time-to-maximum (Tmax) maps from raw four-dimensional perfusion data. PyPeT aims to make perfusion research as accessible and customizable as possible. This is achieved through a unified framework in which both CTP and MRP data can be processed, with a strong focus on modularity, low computational burden, and significant inline documentation. PyPeT's outputs can be validated through an extensive debug mode in which every step of the process is visualized. Additional validation was performed via visual and quantitative comparison with reference perfusion maps generated by three FDA-approved commercial perfusion tools and a research tool. These comparisons show a mean SSIM around 0.8 for all comparisons, indicating a good and stable correlation with FDA-approved tools. The code for PyPeT is openly available at our GitHub this https URL
In the process industries, MPC (Model Predictive Control) is typically implemented as a two-stage controller with a Linear Program (LP) steady-state optimizer that generates economically optimal targets for the MPC algorithm. Abnormal behaviors in industrial LP optimizers are often difficult to rationalize, especially when a large number of manipulated variables (MVs) and controlled variables (CVs) are involved. We introduce a novel, post-hoc LP explainability method by recasting the role of shadow prices in the LP solution as an attribution mechanism for MV-CV relationships. The core idea is that the shadow price of a constrained CV is not just an intrinsic property of the LP solution, but can be split into contributions from individual unconstrained MVs and resolved into one-to-one MV-CV pairings using a linear sum assignment algorithm. The proposed MV-CV pairing framework serves as a practical explainability tool for online LP-MPC systems, enabling practitioners to diagnose suboptimal constraints and verify alignment of the controller's behavior with its original design.
Federated learning (FL) enables a privacy-preserving training paradigm for audio classification but is highly sensitive to client heterogeneity and poisoning attacks, where adversarially compromised clients can bias the global model and hinder the performance of audio classifiers. To mitigate the effects of model poisoning for audio signal classification, we present REVERB-FL, a lightweight, server-side defense that couples a small reserve set (approximately 5%) with pre- and post-aggregation retraining and adversarial training. After each local training round, the server refines the global model on the reserve set with either clean or additional adversarially perturbed data, thereby counteracting non-IID drift and mitigating potential model poisoning without adding substantial client-side cost or altering the aggregation process. We theoretically demonstrate the feasibility of our framework, showing faster convergence and a reduced steady-state error relative to baseline federated averaging. We validate our framework on two open-source audio classification datasets with varying IID and Dirichlet non-IID partitions and demonstrate that REVERB-FL mitigates global model poisoning under multiple designs of local data poisoning.
We derive a state-space characterization of all dynamic state-feedback controllers that make an equilibrium of a nonlinear input-affine continuous-time system locally exponentially stable. Specifically, any controller obtained as the sum of a linear state-feedback $u=Kx$, with $K$ stabilizing the linearized system, and the output of internal locally exponentially stable controller dynamics is itself locally exponentially stabilizing. Conversely, every dynamic state-feedback controller that locally exponentially stabilizes the equilibrium admits such a decomposition. The result can be viewed as a state-space nonlinear Youla-type parametrization specialized to local, rather than global, and exponential, rather than asymptotic, closed-loop stability. The residual locally exponentially stable controller dynamics can be implemented with stable recurrent neural networks and trained as neural ODEs to achieve high closed-loop performance in nonlinear control tasks.
Future wireless communication systems will increasingly rely on the integration of millimeter wave (mmWave) and sub-6 GHz bands to meet heterogeneous demands on high-speed data transmission and extensive coverage. To fully exploit the benefits of mmWave bands in massive multiple-input multiple-output (MIMO) systems, highly accurate channel state information (CSI) is required. However, directly estimating the mmWave channel demands substantial pilot overhead due to the large CSI dimension and low signal-to-noise ratio (SNR) led by severe path loss and blockage attenuation. In this paper, we propose an efficient \textbf{M}ulti-\textbf{D}omain \textbf{F}usion \textbf{C}hannel \textbf{E}xtrapolator (MDFCE) to extrapolate sub-6 GHz band CSI to mmWave band CSI, so as to reduce the pilot overhead for mmWave CSI acquisition in dual band massive MIMO systems. Unlike traditional channel extrapolation methods based on mathematical modeling, the proposed MDFCE combines the mixture-of-experts framework and the multi-head self-attention mechanism to fuse multi-domain features of sub-6 GHz CSI, aiming to characterize the mapping from sub-6 GHz CSI to mmWave CSI effectively and efficiently. The simulation results demonstrate that MDFCE can achieve superior performance with less training pilots compared with existing methods across various antenna array scales and signal-to-noise ratio levels while showing a much higher computational efficiency.
In this paper, we propose a data-enabled moving horizon estimation (MHE) approach for a class of nonlinear systems without explicit modeling, by leveraging Koopman operator theory and Willems fundamental lemma. Specifically, the nonlinear system is lifted to a linear parameter-varying Koopman surrogate, in which the lifting functions and scheduling mappings are learned directly from data using neural networks. Willems fundamental lemma is then employed to construct a trajectory-based representation of the Koopman surrogate, which bypasses the explicit identification of the matrices of the Koopman surrogate. Based on this representation, we formulate a convex data-enabled MHE design, which provides real-time estimates of the Koopman surrogate states, from which the states of the original nonlinear system are reconstructed. Sufficient conditions are derived to ensure the stability of the estimation error. The effectiveness of the proposed method is illustrated using a simulated membrane-based biological wastewater treatment process.
Signal processing has played, and continues to play, a fundamental role in the evolution of modern localization technologies. Localization using spatial variations in the Earth's magnetic field is no exception. It relies on signal-processing methods for statistical state inference, magnetic-field modeling, and sensor calibration. Contemporary localization techniques based on spatial variations in the magnetic field can provide decimeter-level indoor localization accuracy and outdoor localization accuracy on par with strategic-grade inertial navigation systems. This article provides a broad, high-level overview of current signal-processing principles and open research challenges in localization using spatial variations in the Earth's magnetic field. The aim is to provide the reader with an understanding of the similarities and differences among existing key technologies from a statistical signal-processing perspective. To that end, existing key technologies will be presented within a common parametric signal-model framework compatible with well-established statistical inference methods.
Uniform quantization is a topic that has been extensively studied. However and although an analytical description of quantization noise has been proposed, most descriptions of the spectral properties of quantization error resort to statistical descriptions. In this paper, we show how the spectrum of a quantized signal can be expressed using pulse frequency modulation. We first establish the equivalence of a uniform quantizer with a system based on the bipolar pulse frequency modulation and we define afterwards the Fourier transform of the quantized signal using pulse frequency modulation properties. This model brings a more intuitive understanding of the spectral structure of quantization noise and complements prior research in the topic. The results of the paper can be directly applied to level crossing ADCs with zero-order-hold interpolators, giving an accurate estimation of their performance.
The graph fractional Fourier transform (GFRFT) for unitary graph Fourier transform (GFT) matrices can be interpreted through the scalar function $e^{j\alpha\theta}$ on the unit circle. Under the principal branch, its Fourier-series representation encounters an intrinsic obstruction at the spectral point $\lambda=-1$ for non-integer orders. To address this issue, we propose a fast graph fractional Fourier transform (FGFRFT) based on exact spectral splitting: the $\lambda=-1$ component is treated exactly, and the complementary component is approximated by a truncated Fourier series in integer powers of the GFT matrix. This construction yields an offline--online implementation that reduces the online complexity of repeated operator updates from $O(N^3)$ to $O(2LN^2)$ for truncation order $L$, while preserving differentiability with respect to the transform order. We further derive truncation-error bounds, approximate unitarity and additivity, and reconstruction-error bounds. Experiments on approximation accuracy, transform-order learning, image denoising, and point-cloud denoising show that FGFRFT provides substantial online acceleration while remaining close to the exact GFRFT under the tested settings.
State-of-the-art learned reconstruction methods often rely on black-box modules that, despite their strong performance, raise questions about their interpretability and robustness. Here, we build on a recently proposed image reconstruction method, which is based on embedding data-driven information into a model-based convolutional dictionary regularization via neural network-inferred spatially adaptive sparsity level maps. By means of improved network design and dedicated training strategies, we extend the method to achieve filter-permutation invariance as well as the possibility to change the convolutional dictionary at inference time. We apply our method to low-field MRI and compare it to several other recent deep learning-based methods, also on in vivo data, where the benefit of using a different dictionary is demonstrated. We further assess the method's robustness when tested on in- and out-of-distribution data. When tested on the latter, the proposed method suffers less from the data distribution shift compared to the other learned methods, which we attribute to its reduced reliance on training data due to its underlying model-based reconstruction component.
Sensing is an integral part of 6G and beyond systems, providing exceptional environmental perception along with communication. Radio frequency (RF)-based sensing often relies on simplified geometric assumptions (e.g., point scatterers or planar surfaces) to model specular multipath and keep inference tractable. However, such representations are limited in their ability to capture extended objects with complex geometries and properties. This paper presents a probabilistic occupancy grid framework for radio-based simultaneous localization and mapping (SLAM), jointly reconstructing geometric structures and their RF-related properties. The proposed occupancy grid map representation is integrated into a multipath-based SLAM formulation to enable simultaneous mobile-agent localization and environment mapping using multipath measurements. To connect RF measurements with the grid map, a surface model is employed to describe candidate reflection paths, while occupancy grid cell states capture measurement uncertainties and fine-grained geometric details. RF-related object properties are represented through reflection coefficients. The proposed framework offers a principled, proof-of-concept approach to physically interpretable radio-based mapping, and simulation results demonstrate accurate reconstruction of geometry and material properties, as well as high-accuracy localization. In addition, the results highlight the potential to use prior occupancy maps obtained from other radio devices or complementary sensors for subsequent map extension and refinement.
The Hawkes process models self-exciting event streams, requiring a strictly non-negative and stable stochastic intensity. Standard identification methods enforce these properties using non-negative causal bases, yielding conservative parameter constraints and severely ill-conditioned least-squares Gram matrices at higher model orders. To overcome this, we introduce a system-theoretic identification framework utilizing the sign-indefinite orthonormal Laguerre basis, which guarantees a well-conditioned asymptotic Gram matrix independent of model order. We formulate a constrained least-squares problem enforcing the necessary and sufficient conditions for positivity and stability. By constructing the empirical Gram matrix via a Lyapunov equation and representing the constraints through a sum-of-squares trace equivalence, the proposed estimator is efficiently computed via semidefinite programming.
We consider a pursuit-evasion scenario involving a group of pursuers and a single evader in a two-dimensional unbounded environment. The pursuers aim to capture the evader in finite time while ensuring the evader remains enclosed within the convex hull of their positions until capture, without knowledge of the evader's heading angle. Prior works have addressed the problem of encirclement and capture separately in different contexts. In this paper, we present a class of strategies for the pursuers that guarantee capture in finite time while maintaining encirclement, irrespective of the evader's strategy. Furthermore, we derive an upper bound on the time to capture. Numerical results highlight the effectiveness of the proposed framework against a range of evader strategies.
Affine frequency division multiplexing (AFDM) has recently emerged as a promising waveform for doubly-selective channles [1],[2], owing to its ability to fully exploit time-frequency diversity through appropriate tuning of the chirp-rate parameter [3],[4]. In [5], a direct-windowing-based pulse shaping transceiver was proposed for AFDM systems to suppress the Doppler sidelobes, thus improving the accuracy of channel estimation. Inspired by the theory of ``matrix equilibration" (Thm.~4.3 in [6]), we observe that the legacy AFDM pulse shaping method in [5] significantly increases the condition number of the effective channel matrix when path delay and Doppler parameters are randomly distributed, such ill-conditioning leads to the degradation in the solution stability of channel equalization under noisy conditions, thus resulting in the degradation of bit error rate (BER). To address this issue, this letter proposes a novel overlap-summation-based pulse shaping transceiver for AFDM systems (OS-PS-AFDM) to suppress the pulse sidelobes in the discrete affine Fourier transform (DAFT) domain, while maintaining the condition number of channel matrix, at the cost of time-domain prefix overhead. Consequently, the proposed OS-PS-AFDM transceiver simultaneously achieves accurate channel estimation and robust equalization performance. The source code of simulation is provided at this https URL.
This study demonstrates, for the first time, how a network of cellular base stations (BSs) - the infrastructure of mobile radio networks - can be used as a distributed opportunistic radar for rainfall remote sensing. By adapting signal-processing techniques traditionally employed in Doppler weather radar systems, we demonstrate that BS signals can be used to retrieve typical weather radar products, including reflectivity factor, mean Doppler velocity, and spectral width. Due to the high spatial density of BS infrastructure in urban environments, combined with intrinsic technical features such as electronically steerable antenna arrays and wide receiver bandwidths, the proposed approach achieves unprecedented spatial and temporal resolutions, on the order of a few meters and several tens of seconds, respectively. Despite limitations related to low transmitted power, limited antenna gain, and other system constraints, a major challenge arises from ground clutter contamination, which is exacerbated by the nearly horizontal orientation of BS antenna beams. This work provides a thorough assessment of clutter impact and demonstrates that, through appropriate processing, the resulting clutter-filtered radar moments reach a satisfactory level of quality when compared with raw observations and with measurements from independent BSs with overlapped field-of-views. The findings highlight a transformative opportunity for urban hydrometeorology: leveraging existing telecommunications infrastructure to obtain rainfall information with a level of spatial granularity and temporal immediacy like never before.
In this work, we propose a zero constellation for binary modulation on conjugate-reciprocal zeros (BMOCZ), called jutted BMOCZ (J-BMOCZ), and study its application to non-coherent orthogonal frequency division multiplexing (OFDM). With J-BMOCZ, we introduce asymmetry to the zero constellation for Huffman BMOCZ, which removes ambiguity at the receiver under a uniform rotation of the zeros. The asymmetry is controlled by the magnitude of "jutted" zeros and enables the receiver to estimate zero rotation using a simple cross-correlation. The proposed method, however, leads to a natural trade-off between asymmetry and zero stability. Accordingly, we introduce a reliability metric to measure the stability of a polynomial's zeros under an additive perturbation of the coefficients, and we apply the metric to optimize the J-BMOCZ zero constellation parameters. We then combine the advantages of J-BMOCZ and Huffman BMOCZ to design a hybrid waveform for OFDM with BMOCZ (OFDM-BMOCZ). The pilot-free waveform enables blind synchronization/detection and has a fixed peak-to-average power ratio that is independent of the message. Finally, we assess the proposed scheme through simulation and demonstrate non-coherent OFDM-BMOCZ using low-cost software-defined radios.
We present principles of algebraic diversity (AD), a group-theoretic approach to signal processing exploiting signal symmetry to extract more information per observation, complementing classical methods that use temporal and spatial diversity. The transformations under which a signal's statistics are invariant form a matched group; this group determines the natural transform for analysis, and averaging an estimator over the group action reduces variance without requiring additional snapshots. The viewpoint is broadened in five directions beyond the single-observation measurement of a companion paper. Rank promotion admits AD on scalar data streams and identifies the law of large numbers as the trivial-group case of a $(G, L)$ continuum combining sample-count with group-orbit averaging. An eigentensor hierarchy handles signals with nested symmetry. A blind group-matching methodology identifies the matched group from data via a polynomial-time generalized eigenvalue problem on the unitary Lie algebra, placing the DFT, DCT, and Karhunen--Loève transforms as distinguished points on a transform manifold. A cost-symmetry matching principle then extends AD from measurement to blind and adaptive signal processing generally; blind equalization is given as a detailed example, with the Constant Modulus Algorithm's residual phase ambiguity predicted analytically and matched within two degrees on 3GPP TDL multipath channels, and other blind problems in signal processing are mapped into the framework. Four theorems formalize a structural capacity $\kappa$, the Rényi-2 analog of Shannon and von Neumann's Rényi-1 entropies, quantifying how a signal's information is organized rather than how much information it contains. AD complements prior algebraic approaches including invariant estimation, minimax robust estimation, algebraic signal processing, and compressed sensing.
Learning methods are increasingly used to synthesize controllers from data, yet existing sample-complexity characterizations for continuous control are sharp only in the fully observed setting. This paper studies the partially observed case by deriving information-theoretic lower bounds for learning Linear Quadratic Gaussian (LQG) controllers from offline trajectories generated by a (linear) exploration policy. We prove an $\varepsilon$-local minimax excess-cost lower bound that applies to any algorithm mapping the offline dataset to a stabilizing linear controller. The bound is expressed in terms of the Hessian of the LQG cost with respect to model parameters and the inverse Fisher Information induced by the exploration policy. We further provide system-theoretic characterizations of these objects, enabling transparent construction of hard instances. Instantiating the bound on classical fragile robust-control examples, including variants of the Doyle LQG fragility counterexample and non-minimum-phase systems, demonstrates when fragile robust control problems translate into high sample complexity for learning-enabled control. These results suggest the asymptotic optimality of certainty-equivalent synthesis and motivate the importance of both task-directed experiment design and system co-design for sample-efficient learning in partially observed control.
Rapid growth of large loads led by data centers is straining grid capacity. These loads increasingly accept curtailment risk through non-firm interconnection agreements to gain faster grid access, expanding the pool of consumers subject to mandatory disconnection during supply shortfalls. Yet, blunt rules assign curtailment without reference to the wide variation in the value consumers place on avoiding curtailment, often captured by the value of lost load (VOLL). This paper introduces the network-constrained Curtailment Credit Market (CCM), a mechanism in which agents submit bids that determine bilateral credit flows, subject to transmission network constraints. We prove that the bilateral credit flow representation can reach every curtailment allocation available to an omniscient central planner (feasible-set equivalence), so the bilateral flow structure introduces no loss of allocative capability. Under truthful bidding, the CCM achieves the planner's total value of served load, matching the planner's allocative benchmark when bids reflect true interruption costs. The CCM is formulated as a bilevel clearing problem that admits an exact single-level mixed-integer linear program (MILP), solved in 0.009 to 0.034 seconds. Numerical experiments on three test systems validate the mechanism at increasing scale and complexity: a 3-bus toy network that isolates the core trading logic, the IEEE 24-bus reliability test system as a standard benchmark, and a reduced New York (NY) grid that captures coordination across NY load zones. Our simulations show that the CCM increases the total value of served load by 1.24 to 1.83 times relative to pro-rata curtailment. On the three test systems examined here, no participant is worse off under incentive-compatible benchmark payments than under the administrative baseline.
With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, which may interfere with visual interpretation by physicians and affect diagnostic results. To address this problem, inspired by Cycle-GAN for unsupervised learning, this paper proposes an end-to-end unsupervised low-dose computed tomography denoising framework. The proposed framework combines a U-Net structure for multi-scale feature extraction, an attention mechanism for feature fusion, and a residual network for feature transformation. It also introduces perceptual loss to improve the network for the characteristics of medical images. In addition, we construct a real low-dose computed tomography dataset and design a large number of comparative experiments to validate the proposed method, using both image-based evaluation metrics and medical evaluation criteria. Compared with classical methods, the main advantage of this paper is that it addresses the limitation that real clinical data cannot be directly used for supervised learning, while still achieving excellent performance. The experimental results are also professionally evaluated by imaging physicians and meet clinical needs.
Accurate and continuous estimation of cognitive workload is fundamental to creating adaptive human-machine systems. However, designing architectures that balance representational capacity with computational efficiency has been challenging for practical deployment. This paper introduces 1BT, a One-Block Transformer for compact and efficient EEG-based cognitive workload assessment. The model aggregates multi-channel temporal sequences via a minimal latent bottleneck, using a single cross-attention module followed by lightweight self-attention. A controlled study involving 11 participants performing three cognitively diverse tasks (abstract reasoning, numerical problem-solving, and an interactive video game) was conducted with continuous EEG recordings across two workload levels. Systematic architectural analysis identifies the most compact configuration that preserves high performance, while substantially lowering computational cost. The final model achieves high workload classification performance with under 0.5 million parameters and 0.02 GFLOPs, paving the way for a design direction for real-time cognitive workload monitoring in resource-constrained settings.
To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., data analysis, running simulated experiments), Eureka moments could occur at any time while operating lab instruments (e.g., when a scientist notices unexpected clues, intuition may prompt a real-time course change). Although autonomous labs are on the rise, which expose programmable APIs to control scientific instruments via software, bridging the gap between increasingly powerful AI agents and automated lab equipment requires innovation that draws insights from computer systems. We propose a new paradigm called ``Experiment-as-Code (EaC) Labs,'' where a core concept is to encode experiments as declarative configurations that can be compiled down to device-level APIs. AI agents come up with hypotheses and experiments, written as an ensemble of declarative configurations. The systems layer performs program analysis, safety checks, resource assignment, and job orchestration. Finally, programmatic experimentation occurs via actuating the device APIs. This is a general stack that is science-, lab-, and instrument-independent, representing a novel synthesis across the physical, systems, and intelligence layers to unleash the next breakthrough in AI for Science.
Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultaneously transmitted waveforms, which typically requires synchronization, instantaneous channel-state information (CSI), phase compensation, and power control. Noncoherent energy detection removes the need for phase-coherent combining, but a single energy measurement is nonnegative and, therefore, cannot represent signed model updates. This paper introduces resource-element energy difference (REED), a noncoherent physical-layer primitive for continuous signed aggregation. REED maps the positive and negative parts of each real-valued update to transmit energies on paired orthogonal resource elements and estimates the signed sum by subtracting the corresponding received energies. The construction uses slow-timescale calibration of average channel powers, but does not require instantaneous transmitter- or receiver-side CSI or channel inversion. For independent Rayleigh fading, we derive exact first- and second-moment expressions for single-shot REED and for a chip-diverse extension that spreads each coordinate over multiple independently faded paired chips. The resulting variance laws separate fading-induced self-noise, signal-noise interaction, and receiver-noise fluctuation, giving an explicit diversity-resource tradeoff. More->The rest of abstract is in the paper.
This work introduces a fully tunable, ultra-low power unipolar memory cell inspired by the Schmitt-trigger comparator and designed in CMOS using only nine transistors. The proposed circuit operates entirely in the current domain and exploits a novel feedback configuration between two interdependent Heaviside-like thresholding elements to produce tunable bistable switching behavior. Its three key parameters-threshold current, hysteresis width, and output gain-are independently tunable via programmable bias currents, enabling flexibility across diverse analog computing applications. Unlike prior Schmitt-trigger designs, it simultaneously achieves current-mode operation, nanowatt-range power consumption, temperature stability, and full tunability, solely using standard MOSFET elements. Schematic-level simulations in a 180 nm CMOS process confirm robust hysteresis and resilience to device mismatch. Building on this circuit, we develop a complete family of spike-based logic gates using three-level current encoding, where the bistable memory retains the polarity of the last spike on each input indefinitely, enabling asynchronous logic operations without temporal windowing or refresh mechanisms. The same circuit also serves as the primitive for Bistable Memory Recurrent Units in analog neural networks, where the quantized hidden states provide inherent noise immunity. Together, these capabilities position the design as a versatile building block for next-generation neuromorphic processors integrating memory, logic, and recurrent computation.
Purpose: Access to electroencephalography (EEG) remains limited across low- and middle-income countries (LMICs) due to cost, infrastructure requirements, and a shortage of trained staff. This study evaluated the feasibility and clinical utility of a smartphone-based EEG system in a real-world setting. Methods: We conducted a multicenter observational study (November 2023 to April 2026) across 29 clinical sites in Kenya. A smartphone-based 27-lead EEG system enabled trained healthcare workers to acquire standardized recordings with remote expert interpretation. Results: 3,036 EEG sessions were performed. Male patients constituted 57.8% of the cohort, with representation across pediatric and adult populations. The most common referral indication was seizures or convulsions (68.5%). Overall, 2,915 (96%) recordings were interpretable, while 121 (4%) were uninterpretable, primarily due to high electrode impedance and insufficient recording duration. Uninterpretable recordings were significantly shorter than interpretable recordings (mean 18.5 vs. 33.8 minutes; median 15.1 vs. 31.6 minutes; p < 0.0001). Mean turnaround time for interpretation was 107 minutes. Among interpretable recordings, 917 (30.2%) were abnormal, including 701 (76.4%) with epileptiform abnormalities, 215 (23.4%) with non-epileptiform findings, and 1 (0.1%) indeterminate finding. Epileptiform abnormalities were highest in children aged 4-9 years (33.1%) and less frequent in adults (14-21%). Non-epileptiform abnormalities were more common in patients aged 60+ years (19.2%) compared to younger age groups (3-9%). Conclusion: Large-scale, point-of-care EEG acquisition by non-specialist operators in a resource-limited setting is feasible. Expansion of smartphone-based EEG systems may improve equitable access to neurological diagnosis and care in LMICs.
RADAR Challenge 2026 is an APSIPA Grand Challenge on Robust Audio Deepfake Recognition under Media Transformations, designed to simulate realistic media conditions in real-world audio distribution pipelines, including compression, resampling, noise, and reverberation. It consists of two phases: an English development phase with labeled data for analysis and paper writing, and a multilingual evaluation phase containing more than 100,000 utterances in English, Singapore English, Mandarin Chinese, Taiwanese Mandarin, Japanese, and Vietnamese. Systems are evaluated using equal error rate (EER) for binary real/fake classification. This paper describes the challenge task, the construction of the data set, the evaluation protocol, and the overall results. During the challenge, 33 teams submitted to the development phase and 22 teams submitted to the final evaluation phase. The reported results highlight the remaining challenges of robust audio deepfake detection under multilingual and media-transformed conditions.
In this paper, we present a learning-based control for a class of nonlinear systems that guarantees exponential stability as well as bounded output errors. The control is based on the Gaussian Process Submodel Online Learning (GPSOL) algorithm and the Disturbance Error Rate Limiting (DERL) algorithm, both of which were developed in previous work. The GPSOL algorithm provides a method to learn Gaussian Process (GP) models for subsystems online, whereas the DERL algorithm allows to limit the rate of the prediction error of these GP models. The focus of this paper is the utilization of the GP model within an adaptive controller and the derivation of corresponding stability conditions and system peak-to-peak gains by means of linear matrix inequalities (LMIs). These peak-to-peak gains are then used to prescribe a desired prediction error rate for the DERL algorithm to achieve user-defined output error bounds. The gains and the related bounds were successfully verified using a simulation model. Furthermore, results form a successful experimental validation of the bounds and the overall control structure on a pneumatic test rig are presented. While the control scheme and error bounds proposed in this paper are limited to first-order single-input-single-output systems, an extension to certain classes of higher-order and multiple-input-multiple-output systems is expected to be forthcoming.
We unify the discrete Fourier transform (DFT), discrete cosine transform (DCT), Walsh-Hadamard, Haar wavelet, Karhunen-Loève transform, and several others along with their continuous counterparts (Fourier transform, Fourier series, spherical harmonics, fractional Fourier transform) under one representation-theoretic principle: each is the eigenbasis of every covariance invariant under a specific finite or compact group, with columns constructed from the irreducible matrix elements of the group via the Peter-Weyl theorem. The unification rests on the Algebraic Diversity (AD) framework, which identifies the matched group of a covariance as the foundational object of second-order signal processing. The data-dependent KLT emerges as the trivial-matched-group limit; classical transforms emerge as the cyclic, dihedral, elementary abelian, iterated wreath, and hybrid wreath cases. Composition rules cover direct, wreath, and semidirect products. The Reed-Muller and arithmetic transforms appear as related change-of-basis transforms on the matched group of Walsh-Hadamard. A polynomial-time algorithm for matched-group discovery, the DAD-CAD relaxation cast as a generalized eigenvalue problem in double-commutator form, closes the operational loop: the matched group of any empirical covariance is discovered without expert judgment, with noise-aware variants via the commutativity residual $\delta$ and algebraic coloring index $\alpha$ for finite-SNR settings. The fractional Fourier transform is treated as the metaplectic $SO(2)$ case with Hermite-Gauss matched basis, and a structural principle relates matched group size inversely to transform resolution. Modern applications (massive-MIMO, graph neural networks, transformer attention, point cloud and 3D vision, brain connectivity, single-cell genomics, quantum informatics) are sketched with their matched groups.
As sixth-generation (6G) wireless networks evolve toward increasingly heterogeneous scenarios, tasks, and service requirements, conventional artificial intelligence (AI) models remain limited in task-aware decision-making and autonomous adaptation. To address this issue, this paper first proposes a ChannelAgent-empowered electromagnetic space world model, in which wireless intelligence is organized into a closed-loop process consisting of multi-modal sensing, ChannelAgent as the intelligent core, and execution with feedback update. As a case study, agent-driven channel generation is instantiated through path loss prediction. Specifically, a task-oriented intelligent feature selection mechanism is designed by integrating reinforcement-learning-inspired policy adaptation with evolutionary search, enabling the agent to iteratively derive compact and task-suitable feature subsets according to the current scenario and performance feedback. Simulation results demonstrate superior performance in both single-scenario and multi-scenario tasks, highlighting the potential of the proposed model for autonomous, adaptive, task-oriented, and closed-loop wireless intelligence.
This paper investigates causal influences between agents linked by a social graph and interacting over time. In particular, the work examines the dynamics of social learning models and distributed decision-making protocols, and derives expressions that reveal the causal relations between pairs of agents and explain the flow of influence over the network. The results turn out to be dependent on the graph topology and the level of information that each agent has about the inference problem they are trying to solve. Using these conclusions, the paper proposes an algorithm to rank the overall influence between agents to discover highly influential agents. It also provides a method to learn the necessary model parameters from raw observational data. The results and the proposed algorithm are illustrated by considering both synthetic data and real social media data.
Automatic Speaker Verification (ASV) systems, which identify speakers based on their voice characteristics, have numerous applications, such as user authentication in financial transactions, exclusive access control in smart devices, and forensic fraud detection. However, the advancement of deep learning algorithms has enabled the generation of synthetic audio through Text-to-Speech (TTS) and Voice Conversion (VC) systems, exposing ASV systems to potential vulnerabilities. To counteract this, we propose a novel architecture named AASIST3. By enhancing the existing AASIST framework with Kolmogorov-Arnold networks, additional layers, encoders, and pre-emphasis techniques, AASIST3 achieves a more than twofold improvement in performance. It demonstrates minDCF results of 0.5357 in the closed condition and 0.1414 in the open condition, significantly enhancing the detection of synthetic voices and improving ASV security. \textbf{The new version of the model is publicly available at \href{this https URL}{\underline{HuggingFace (2026)}}}
Achieving the effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we propose ERFSL, an efficient reward function searcher using LLMs, which enables LLMs to be effective white-box searchers and highlights their advanced semantic understanding capabilities. Specifically, we generate reward components for each numerically explicit user requirement and employ a reward critic to identify the correct code form. Then, LLMs assign weights to the reward components to balance their values and iteratively adjust the weights without ambiguity and redundant adjustments by flexibly adopting directional mutation and crossover strategies, similar to genetic algorithms, based on the context provided by the training log analyzer. We applied the framework to a customized data collection RL task without direct human feedback or reward examples (zero-shot learning). The reward critic successfully corrects the reward code with only one feedback instance for each requirement, effectively preventing unrectifiable errors. The initialization of weights enables the acquisition of different reward functions within the Pareto solution set without the need for weight search. Even in cases where a weight is 500 times off, on average, only 5.2 iterations are needed to meet user requirements. The ERFSL also works well with most prompts utilizing GPT-4o mini, as we decompose the weight searching process to reduce the requirement for numerical and long-context understanding capabilities.
Learned image compression methods have shown impressive performance but are often highly specialized for either human perception or specific machine vision tasks. This specialization limits their versatility and requires costly retraining for new applications. To address this, we introduce UniCodec, a universal codec built on a novel paradigm of semantic disentanglement at the encoder and compositional generation at the decoder. This framework is designed to simultaneously serve both human and machine needs, eliminating the need for task-specific retraining. At the encoder, UniCodec leverages pre-generated, task-specific label codebooks created by a Large Language Model (LLM). For any given task, a grounding model uses the corresponding codebook to perform task-aware disentanglement, compressing only the most relevant image regions. This mechanism not only saves significant bits but is also the key to our system's rapid, zero-retraining adaptation: switching to a new task is as simple as selecting a new codebook. The decoder then performs compositional generation: it combines the compact, disentangled components with powerful priors from a generative diffusion model. This process reconstructs a high-quality, complete image optimized with rich detail for human perception and precise features for machine vision tasks. Extensive experiments demonstrate that UniCodec consistently outperforms existing methods, effectively bridging the gap between human-centric and machine-centric compression.
Efficient channel state information (CSI) compression is essential in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems due to the substantial feedback overhead. Recently, deep learning-based compression techniques have demonstrated superior performance for CSI feedback. However, their performance often degrades under distribution shifts across wireless environments, largely due to limited generalization capability. To address this challenge, we consider a full-model fine-tuning scheme, in which both the encoder and decoder are jointly updated using a small number of recent CSI samples from the target environment. A key challenge in this setting is the transmission of updated decoder parameters to the receiver, which introduces additional communication overhead. To mitigate this bottleneck, we explicitly incorporate the bit rate of model updates into the fine-tuning objective and entropy-code the model updates jointly with the compressed CSI. Furthermore, we employ a structured prior that promotes sparse and selective parameter updates, thereby significantly reducing the model-update communication cost. Simulation results across multiple CSI datasets demonstrate that full-model fine-tuning substantially improves the rate-distortion performance of neural CSI compression, despite the additional cost of model updates. We further analyze the impact of the evaluation horizon, the quantization resolution of model updates, and the size of the target-domain dataset on the overall feedback efficiency.
We propose a real-time implementable motion planning framework for cooperative object transportation by nonholonomic mobile manipulator robots (MMRs) in dynamic environments. Our global planner finds a path from start to goal through the static, obstacle-free regions in the environment and generates a set of convex, static, obstacle-free regions around the path using a novel, fast, and computationally lightweight ellipse-based technique. We introduce a nonlinear Model Predictive Control (NMPC) based real-time implementable planning technique that jointly plans feasible motion for the mobile base and the manipulator's arm and generates a kinodynamic feasible, collision-free trajectory for cooperative object transportation. Simulation and hardware experiments validate the efficiency of our proposed planning framework.
Large-scale, diverse robot datasets have emerged as a promising path toward enabling dexterous manipulation policies to generalize to novel environments, but acquiring such datasets presents many challenges. While teleoperation provides high-fidelity datasets, its high cost limits its scalability. Instead, what if people could use their own hands, just as they do in everyday life, to collect data? In DexWild, a diverse team of data collectors uses their hands to collect hours of interactions across a multitude of environments and objects. To record this data, we create DexWild-System, a low-cost, mobile, and easy-to-use device. The DexWild learning framework co-trains on both human and robot demonstrations, leading to improved performance compared to training on each dataset individually. This combination results in robust robot policies capable of generalizing to novel environments, tasks, and embodiments with minimal additional robot-specific data. Experimental results demonstrate that DexWild significantly improves performance, achieving a 68.5% success rate in unseen environments-nearly four times higher than policies trained with robot data only-and offering 5.8x better cross-embodiment generalization. Video results, codebases, and instructions at this https URL
This paper studies the multi-reference alignment (MRA) problem of estimating a signal function from shifted, noisy observations. Our functional formulation reveals a new connection between MRA and deconvolution: the signal can be estimated from second-order statistics via Kotlarski's formula, an important identification result in deconvolution with replicated measurements. To design our MRA algorithms, we extend Kotlarski's formula to general dimension and study the estimation of signals with vanishing Fourier transform, thus also contributing to the deconvolution literature. We validate our deconvolution approach to MRA through both theory and numerical experiments.
Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining Bézier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.
Purpose: To develop a deep learning method for the automatic segmentation of spinal nerve rootlets on various MRI scans. Material and Methods: This retrospective study included MRI scans from two open-access and one private dataset, consisting of 3D isotropic 3T TSE T2-weighted (T2w) and 7T MP2RAGE (T1-weighted [T1w] INV1 and INV2, and UNIT1) MRI scans. A deep learning model, RootletSeg, was developed to segment C2-T1 dorsal and ventral spinal rootlets. Training was performed on 76 scans and testing on 17 scans. The Dice score was used to compare the model performance with an existing open-source method. Spinal levels derived from RootletSeg segmentations were compared with vertebral levels defined by intervertebral discs using Bland-Altman analysis. Results: The RootletSeg model developed on 93 MRI scans from 50 healthy adults (mean age, 28.70 years $\pm$ 6.53 [SD]; 28 [56%] males, 22 [44%] females) achieved a mean $\pm$ SD Dice score of 0.67 $\pm$ 0.09 for T1w-INV2, 0.65 $\pm$ 0.11 for UNIT1, 0.64 $\pm$ 0.08 for T2w, and 0.62 $\pm$ 0.10 for T1w-INV1 contrasts. Spinal-vertebral level correspondence showed a progressively increasing rostrocaudal shift, with Bland-Altman bias ranging from 0.00 to 8.15 mm (median difference between level midpoints). Conclusion: RootletSeg accurately segmented C2-T1 spinal rootlets across MRI contrasts, enabling the determination of spinal levels directly from MRI scans. The method is open-source and can be used for a variety of downstream analyses, including lesion classification, neuromodulation therapy, and functional MRI group analysis.
Offline reinforcement learning (RL) enables policy optimization from fixed datasets, making it suitable for safety-critical applications where online exploration is infeasible. However, these datasets are often contaminated by adversarial poisoning, system errors, or low-quality samples, leading to degraded policy performance in standard behavioral cloning (BC) and offline RL methods. This paper introduces Density-Ratio Weighted Behavioral Cloning (Weighted BC), a robust imitation learning approach that uses a small, verified clean reference set to estimate trajectory-level density ratios via a binary discriminator. These ratios are clipped and used as weights in the BC objective to prioritize clean expert behavior while down-weighting or discarding corrupted data, without requiring knowledge of the contamination mechanism. We establish theoretical guarantees showing convergence to the clean expert policy with finite-sample bounds that are independent of the contamination rate. A comprehensive evaluation framework is established, which incorporates various poisoning protocols (reward, state, transition, and action) on continuous control benchmarks. Experiments demonstrate that Weighted BC maintains near-optimal performance even at high contamination ratios outperforming baselines such as traditional BC, batch-constrained Q-learning (BCQ) and behavior regularized actor-critic (BRAC).
Recent neural audio codecs have achieved impressive reconstruction quality, typically relying on quantization methods such as Residual Vector Quantization (RVQ), Vector Quantization (VQ) and Finite Scalar Quantization (FSQ). However, these quantization techniques limit the geometric structure of the latent space, make it harder to capture correlations between features leading to inefficiency in representation learning, codebook utilization and token rate. In this paper we introduce Two-Dimensional Quantization (Q2D2), a quantization scheme in which feature pairs are projected onto structured 2D grids, such as hexagonal, rhombic, or rectangular tiling and quantized to the nearest grid values, yielding an implicit codebook defined by the product of grid levels, with codebook sizes comparable to conventional methods. Despite its simple geometric formulation, Q2D2 improves audio compression efficiency, with low token rates and high codebook utilization while maintaining state of the art reconstruction quality. Specifically, Q2D2 achieves competitive to superior performance in various objective and subjective reconstruction metrics, across extensive experiments in speech, audio and music domains compared to state of the art models. Comprehensive ablation studies further confirm the effectiveness of our design choices.
The Brunovsky canonical form provides sparse structural representations that are beneficial for computational optimal control, yet existing methods fail to compute it reliably. We propose a technique that produces Brunovsky transformations with substantially lower construction errors and improved conditioning. A controllable linear system is first reduced to the staircase form via an orthogonal similarity transformation. We then derive a simple linear parametrization of the transformations yielding the unique Brunovsky form. Numerical stability is further enhanced by applying a deadbeat gain before computing system matrix powers and by optimizing the linear parameters to minimize condition numbers.
We introduce a voice-agentic framework that learns one critical omni-understanding skill: knowing when to trust itself versus when to consult external audio perception. Our work is motivated by a crucial yet counterintuitive finding: naively fine-tuning an omni-model on both speech recognition and external sound understanding tasks often degrades performance, as the model can be easily misled by noisy hypotheses. To address this, our framework, Speech-Hands, recasts the problem as an explicit self-reflection decision. This learnable reflection primitive proves effective in preventing the model from being derailed by flawed external candidates. We show that this agentic action mechanism generalizes naturally from speech recognition to complex, multiple-choice audio reasoning. Across the OpenASR leaderboard, Speech-Hands consistently outperforms strong baselines by 12.1% WER on seven benchmarks. The model also achieves 77.37% accuracy and high F1 on audio QA decisions, showing robust generalization and reliability across diverse audio question answering datasets. By unifying perception and decision-making, our work offers a practical path toward more reliable and resilient audio intelligence.
We study online fine-tuning of pretrained control policies for autonomous driving using Real-Time Recurrent Reinforcement Learning (RTRRL), a memory-efficient algorithm that updates policy parameters at every time step without backpropagation through time. We extend RTRRL to support LrcSSM, a recently proposed nonlinear diagonal state-space model, and combine offline behavioral cloning with online RTRRL fine-tuning to adapt policies to distribution shifts at deployment. We validate the approach in the CarRacing simulation and on a 1:10-scale RoboRacer platform equipped with an event camera, where a pretrained policy is fine-tuned online during real-world line-following. To our knowledge, this is the first demonstration of online RL fine-tuning with event-camera observations on standard (non-spiking) hardware in closed-loop control. LrcSSM-based policies improve fastest and most consistently across both settings.
We present the PLATO Hand, a dexterous robotic hand with a hybrid fingertip that combines a rigid fingernail, embedded distal phalanx, and compliant pulp to shape contact behavior during manipulation. \rrev{By mechanically organizing how contact is initiated, supported, and transmitted at the fingertip, this structure creates stable and task-relevant contact conditions across diverse object geometries and grasp orientations.} We develop a strain-energy-based bending--indentation model to guide the fingertip design and to explain how material stiffness and contact geometry govern deformation partitioning within the fingertip. \rrev{Experiments show improved pinch stability, improved fingernail-mediated dorsal-contact force transmission and proprioceptive observability}, and successful execution of edge-sensitive manipulation tasks, including paper singulation, card picking, and orange peeling. These results show that coupling a mechanically structured contact interface with a force-motion-transparent finger mechanism provides a principled approach to precise manipulation. Our project page is at: this https URL
Robust trajectory optimization enables autonomous systems to operate safely under uncertainty by computing control policies that satisfy the constraints for all bounded disturbances. However, these problems often lead to large Second Order Conic Programming (SOCP) constraints, which are computationally expensive. In this work, we propose the CUDA Nonlinear Robust Trajectory Optimization (cuNRTO) framework by introducing two dynamic optimization architectures that have direct application to robust decision-making and are implemented on CUDA. The first architecture, NRTO-DR, leverages the Douglas-Rachford (DR) splitting method to solve the SOCP inner subproblems of NRTO, thereby significantly reducing the computational burden through parallel SOCP projections and sparse direct solves. The second architecture, NRTO-FullADMM, is a novel variant that further exploits the problem structure to improve scalability using the Alternating Direction Method of Multipliers (ADMM). Finally, we provide GPU implementations of the proposed methodologies using custom CUDA kernels for SOC projection steps and cuBLAS GEMM chains for feedback gain updates. We validate the performance of cuNRTO through simulated experiments on unicycle, quadcopter, and Franka manipulator models, demonstrating speedups of up to 139.6$\times$. More details are available at this https URL.
In the emerging mixed traffic environments, Connected and Autonomous Vehicles (CAVs) have to interact with surrounding human-driven vehicles (HDVs). This paper introduces MSH-MCCT (Multi-Source Human-in-the-Loop Mixed Cloud Control Testbed), a novel CAV testbed that captures complex interactions between various CAVs and HDVs. Utilizing the Mixed Digital Twin concept, which combines Mixed Reality with Digital Twin, MSH-MCCT integrates physical, virtual, and mixed platforms, along with multi-source control inputs. Bridged by the mixed platform, MSH-MCCT allows human drivers and CAV algorithms to operate both physical and virtual vehicles within multiple fields of view. Particularly, this testbed facilitates the coexistence and real-time interaction of physical and virtual CAVs \& HDVs, significantly enhancing the experimental flexibility and scalability. Experiments on vehicle platooning in mixed traffic showcase the potential of MSH-MCCT to conduct CAV testing with multi-source real human drivers in the loop through driving simulators of diverse fidelity. The videos for the experiments are available at our project website: this https URL.
Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffusion-based models offer a promising alternative through parallel generation, but they still require multiple denoising iterations. Compressing multi-step denoising to a single step could further reduce latency, but often degrades textual coherence due to the mean-field bias introduced by token-factorized denoisers. To address this challenge, we propose \textbf{ECHO}, an efficient diffusion-based VLM (dVLM) for chest X-ray report generation. ECHO enables stable one-step-per-block inference via a novel Direct Conditional Distillation (DCD) framework, which mitigates the mean-field limitation by constructing unfactorized supervision from on-policy diffusion trajectories to encode joint token dependencies. In addition, we introduce a Response-Asymmetric Diffusion (RAD) training strategy that further improves training efficiency while maintaining model effectiveness. Extensive experiments demonstrate that ECHO surpasses state-of-the-art autoregressive methods, improving RaTE and SemScore by \textbf{64.33\%} and \textbf{60.58\%} respectively, while achieving up to \textbf{$8\times$} inference speedup with negligible degradation in clinical accuracy.
Motivated by sensing modalities in modern autonomous systems that involve hardware-constrained spatial sampling over large arrays with limited coherence time, we develop a novel framework for rapid super-resolution multi-signal direction-of-arrival (DoA) estimation based on Hankel-structured sensing and data matrix decomposition of arbitrary rank, under both the $L_2$ and $L_1$-norm formulation. The resulting $L_2$-norm estimator is shown to be maximum-likelihood optimal in white Gaussian noise. The $L_1$-norm estimator is shown to be maximum-likelihood optimal in independent, identically distributed (i.i.d.) isotropic Laplace noise, offering broad robustness to impulsive interference and corrupted measurements commonly encountered in practice. Extensive simulations demonstrate that the proposed methods exhibit powerful super-resolution capabilities, requiring significantly lower SNR and achieving substantially higher resolution probability than recent competing approaches.
In this letter, we study a model-based inverse problem for infinite-horizon linear-quadratic differential games with descriptor dynamics. Given an observed feedback strategy profile, we seek to identify all cost functions that rationalize it as a feedback Nash equilibrium; this collection is referred to as the solution set. We characterize the solution set, show that it is rectangular and convex, and provide an algorithm for computing an admissible realization whenever it is nonempty. We also show that descriptor dynamics modify the geometry of the solution set and may reduce identifiability. Finally, we illustrate the results with numerical examples.
Evaluating canine electrocardiograms (ECGs) is challenging due to noise that can obscure clinically relevant cardiac electrical activity. Common sources of interference include respiration, muscle activity, poor lead contact, and external electrical artifacts. Classical signal denoising techniques, such as filtering and wavelet-based methods, struggle to suppress diverse noise patterns while preserving morphological features critical for accurate ECG delineation. We propose an autoencoder-based neural network model and training strategy for ECG denoising as a preprocessing step for canine ECG analysis. The model is trained to reconstruct clean cardiac signals from noisy inputs, enabling effective noise reduction without degrading diagnostically important waveforms. Our approach demonstrates strong performance across both noisy and clean ECG recordings, indicating robustness to varying signal conditions and suitability for downstream delineation tasks.
Preprocessing screening is often the most expensive part of a near-infrared spectroscopy calibration workflow. It works because smoothing, derivatives, detrending and related filters change the spectral directions seen by PLS or Ridge regression, but a full external search repeatedly refits nearly the same linear model. This paper studies the case where that search can be collapsed into one calibration step. For strict linear preprocessing operators, the transformed PLS cross-covariance satisfies (X A^T)^T Y = A X^T Y, and Ridge regression depends on the operator-induced kernel X A^T A X^T. These identities allow a finite operator bank to be screened inside the model while retaining original-wavelength coefficients. Sample-adaptive or fitted corrections such as SNV, MSC, EMSC and ASLS remain fold-local branches, not absorbed into the algebra. The study uses the AOM benchmark cohort: 61 regression rows and 17 classification rows in the manifest. On the main regression denominator (N=32), plain compact-bank AOM-PLS records median RMSEP ratios of 0.991 against PLS-default and 0.990 against PLS-HPO; the selected ASLS-AOM-compact-cv5 branch records 0.985 and 1.002 on the same two references. The plain AOMRidge-global-compact-none baseline records 0.974 against Ridge-default and 0.984 against Ridge-HPO, while the selected AOMRidge-Blender-headline-spxy3 records 0.918 and 0.966. The selected classifier, AOM-PLS-DA-global-simpls-covariance, improves balanced accuracy by 0.159 on N=13 datasets with 12/13 wins. The runtime gap is the practical result: PLS-HPO takes a median total time of 710.81 s per run, whereas the selected AOM-PLS branch takes 1.63 s. Linear operator-adaptive calibration therefore gives comparable prediction quality to exhaustive preprocessing screening, with orders-of-magnitude less fitting time for PLS.