Stain variation across hospitals degrades histopathology models at deployment. Existing augmentation methods perturb color spaces with arbitrary hyperparameters, lacking both a principled budget and coverage guarantees for unseen centers. We propose \textbf{C}alibrated \textbf{A}dversarial \textbf{S}tain \textbf{A}ugmentation (\textbf{CASA}), which performs adversarial augmentation in the Macenko stain parameter space with a budget calibrated from multi-center statistics via the DKW inequality. On Camelyon17-WILDS (5 seeds), CASA achieves $93.9\% \pm 1.6\%$ slide-level accuracy -- outperforming HED-strong ($88.4\% \pm 7.3\%$), RandStainNA ($85.2\% \pm 6.7\%$), and ERM ($63.9\% \pm 11.3\%$) -- with the highest worst-group accuracy ($84.9\% \pm 0.9\%$) among all 10 compared methods.
High-quality training datasets are essential for the performance of neural networks. However, the audio domain still lacks a large-scale, strongly-labeled, and single-source sound event dataset. The FSD50K dataset, despite being relatively large and open, contains a considerable fraction of multi-source samples where background interference or overlapping events could limit the usefulness of the data. To address this challenge, we introduce a data curation framework designed for large-scale open audio corpora. Our approach leverages a generative diffusion model to synthesize clean single-class events to construct controlled noisy mixtures for supervision. We subsequently employ a pre-trained audio encoder coupled with a discriminative classifier to automatically identify and filter out multi-source samples. Experiments show that our framework achieves strong performance on a human expert-curated test set. Finally, we release FSD50K-Solo, a model-curated subset of FSD50K containing single-source audio samples identified by our method. Beyond FSD50K, our method establishes a scalable paradigm for curating open source audio corpora.
Frequency-modulated continuous-wave (FMCW) lidar conventionally estimates distance and velocity from constant beat frequencies generated through interferometry. Existing FMCW implementations emphasize simple signal processing -- e.g., beat frequency estimation via a fast Fourier transform (FFT) algorithm plus peak-finding -- which results in hardware-focused solutions requiring linear swept-frequency laser sources or linearized resampling. However, the maximum achievable distance by this method is limited by the need to sample the interference signal without aliasing. In this work, we propose two signal processing methods: matched filtering and instantaneous frequency fitting. These two methods can recover larger ranges of distance and velocity by considering the full waveform despite aliasing in the frequency domain. Furthermore, the FMCW lidar signal is often corrupted by phase noise, and we show that the instantaneous frequency fitting approach is more robust than matched filtering by considering the deviation in the phase. We present comprehensive simulation studies along with theoretical analysis using the misspecified Cramér--Rao bound. As these methods are flexible to arbitrary frequency modulation, we also show results for non-linear modulations that could yield better sensitivity to distance and velocity compared to the popular triangular modulation.
The rapid growth of variable renewable energy has increased the need for flexible and efficiently coordinated energy resources. In this context, hybrid resources that combine renewable generation and battery storage within a single market-participating entity have attracted growing attention. Such hybrid resources can have multiple revenue streams, while allocating limited power and energy capacity across multiple electricity markets including energy and ancillary services. This multi-market coordination increases operational complexity and complicates profitability assessment, making optimal system sizing a challenging design problem. In addition, uncertainty in renewable generation and market prices makes it difficult for conventional optimization approaches to determine system designs that remain effective under stochastic operating conditions. To address these challenges, this paper proposes a deep reinforcement learning-based co-optimization framework for hybrid solar-battery resources. The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation. Case studies using historical renewable generation and market data demonstrate the effectiveness of the proposed framework in identifying economically rational hybrid system design considering multi-market operation.
Early-stage Parkinson's disease (EarlyPD) detection from speech is clinically meaningful yet underexplored, and published results are hard to compare because studies differ in datasets, languages, tasks, evaluation protocols, and EarlyPD definitions. To address this issue, we propose the first benchmark for speech-based EarlyPD detection, with a speaker-independent split designed for fair and replicable cross-method evaluation on researcher-accessible datasets. The benchmark covers three common speech tasks and evaluates methods under different training-resource settings. We also present multi-dimensional evaluation breakdowns by dataset, aggregation level, gender, and disease stage to support fine-grained comparisons and clinical adoption. Our results provide a replicable reference and actionable insights, encouraging the adoption of this publicly available benchmark to advance robust and clinically meaningful EarlyPD detection from speech.
Deceptive path planning enables autonomous agents to obscure their true goals from observers by deviating from an expected optimal path. Prior work largely solves full-horizon, end-to-end optimization for single agents, which is expensive to recompute online and difficult to scale or adapt en route. We propose a unified framework for deceptive path planning using a Boltzmann distribution, computing over short-horizon candidate trajectories within a receding-horizon loop. By param- By iterating a user-defined cost that captures deception, resources, and smoothness, and optionally includes coupling terms between agents, the framework yields stochastic policies that balance the tradeoff between optimal paths and deceptive deviation. Policies are updated locally and do not require training. The level of deception and adherence to constraints can be dynamically tuned, enabling online adaptation to changes in goals and constraints such as obstacles. This step-by-step tuning opens the door to new forms of dynamic deception. Simulation studies demonstrate the flexibility of our approach, maintaining deception while adapting to environmental and constraint updates, avoiding the recomputation required by full-horizon methods, and supporting intuitive tuning via a small set of parameters
Power system restoration following blackouts must ensure frequency stability throughout the recovery process. This paper proposes a frequency-constrained mixed-integer linear programming (MILP) framework for black-start restoration planning in transmission systems with synchronous machines and energy storage systems. To prevent excessive frequency deviations caused by restorative actions, a frequency nadir prediction method is developed for power systems with energy storage system (ESS) integration and incorporated into a multiperiod optimization framework. The formulation ensures that frequency deviations resulting from restorative actions remain within prescribed safe limits. Furthermore, the presented framework leverages ESSs to enhance frequency security and recovery speed. Case studies on a modified IEEE 9-bus system demonstrate that the computed restoration plan maintains frequency security, as validated through MATLAB and PSS/E simulations, while reducing restoration time through ESS coordination.
Coordinating growing grid flexibility under uncertainty is becoming increasingly important for efficient and reliable power-system operation. A core computational requirement is the efficient large-scale batched evaluation of AC power flow across candidate operating actions and uncertainty scenarios. Previous work has explored GPU-based batched power-flow evaluation, but has largely relied on hand-written C or CUDA code, creating barriers to customisation, efficient kernel optimisation, and long-term maintenance. JAX is a Python-based framework that enables efficient accelerator execution while keeping implementations in Python. This letter therefore proposes a JAX-based batched AC power-flow solver that uses current JAX functionality to implement Newton--Raphson for transmission networks and Z-Bus power flow for three-phase unbalanced distribution networks, achieving more than 10x speed-ups relative to pandapower and OpenDSS. In addition, JAX integrates seamlessly with the broader JAX-based AI ecosystem, making it straightforward to embed power-flow evaluation within AI methods for future larger-scale and more complex power-system operation.
Emerging connect-and-manage practices allow new transmission-connected mega-loads to connect while enforcing time-varying admissible power exchange limits at the point of common coupling (PCC) in real time. Hyperscale artificial intelligence data centers (AIDCs), whose demand can reach hundreds of megawatts and whose internal computing-cooling dynamics evolve rapidly, can therefore face frequent conflicts between workload continuity requirements and externally imposed PCC envelopes. This paper proposes a battery-assisted operational framework in which on-site battery energy storage (BESS) serves as a physical buffering interface to reconcile fast internal dynamics with time-varying interconnection limits. A continuity-aware energy-computation model is developed to jointly capture checkpoint-constrained AI training workloads, information technology (IT) computing power-throughput characteristics, and IT-cooling thermal dynamics. A two-stage decision framework is then formulated, consisting of scenario-based day-ahead workload commitment and a real-time receding-horizon delivery assurance controller that enforces battery, thermal, and grid-interaction constraints. Case studies on the IEEE 39-bus system with Australian real data demonstrate that BESS substantially increases credible day-ahead workload commitment and improves real-time delivery robustness under transmission congestion. Sensitivity analyses further reveal a regime-dependent role transition of BESS -- from feasibility-oriented continuity support when PCC limits are binding to economy-driven flexibility provision as transmission constraints are relaxed.
Emerging connect-and-manage interconnection practices allow gigawatt-scale artificial intelligence data centers (AIDCs) to connect to the transmission network without prior network upgrades, at the cost of real-time curtailment during grid stress. This paper formalizes the resulting AIDC-transmission system operator (TSO) coordination as a sequential request-acceptance protocol with an explicit curtailment variable and a strict information boundary between the two parties. Physical models are developed on both sides of the point of common coupling: the AIDC is decomposed into frontier training, batch training, and inference serving subclasses sharing on-site battery energy storage, capturing differentiated temporal flexibility; the transmission network is modeled via DC power flow with generator constraints and budget-constrained demand uncertainty. Because the TSO's acceptance mapping is opaque to the AIDC, a three-layer hierarchical architecture is formulated in which a learning-based planning layer generates power requests, the TSO evaluates each request through a robust acceptance mechanism, and a single-step execution optimizer enforces internal feasibility under the realized power budget. Case studies with a gigawatt-scale AIDC on the IEEE 39-bus system with Australian market data show that the framework reduces curtailment from 9.1% to 2.8% while preserving 98.1% frontier training workload, that batch training acts as the primary grid-elastic resource with the largest throughput swing during peak demand, and that the on-site battery provides curtailment buffering through active discharge and charge deferral.
Uncrewed aerial vehicles (UAVs) are gaining increasing attention in wireless systems, providing new opportunities to expand the reach and improve the quality of wireless services. Despite their versatility, UAVs are limited by available energy onboard, which results in significant challenges in deploying UAV-enabled wireless systems. Modeling energy consumption is an essential component of the deployment and trajectory optimization of UAVs. This article presents a comprehensive overview of UAV energy consumption models, with a focus on their relevance to wireless systems research. We deliberately exclude data-driven and overly complex models to provide clear and practical guidelines for their use in wireless systems research. We begin by categorizing the most common types of UAVs and describing the typical flight phases considered in the literature. We then review existing energy consumption models, focusing on their scope with respect to UAV types and flight phases. We also discuss common mistakes in the use of these models and highlight the existing gaps in the literature. In particular, we show how the use of a wrong model can lead to significant errors in energy consumption calculations. Finally, we emphasize the need to develop energy consumption models for missing scenarios.
This paper studies the synthesis of control policies for heterogeneous and interconnected multi-agent systems that collaborate through data exchange over a communication network to minimize a collective cost. We propose a distributed encoded corrective double actor-critic framework that integrates a novel message-passing mechanism. Existing methods assume noise-free and delay-free access to the global or partial states and overlook the fact that the global states, though noisy and delayed, can be progressively reconstructed and refined over time. In contrast, this work explicitly models communication sampling asynchrony, delay, and link noise based on the network configuration. The proposed message-passing mechanism characterizes timing and information flow to refine and time shift global state information, which is then used to incrementally correct the Q-networks. The double Q-network design mitigates overestimation bias, while the shared encoder coupling the actor-critic networks captures inter-agent dependencies. We evaluate our approach in multiple test cases, demonstrate its effectiveness over various baselines, and provide a numerical regret analysis.
Feature sharing via split inference offers a lightweight alternative to federated learning for resource-constrained hospitals, but transmitted features still leak patient identity information and lack practical mechanisms for controlled feature sharing. We propose Keyed Nonlinear Transform (KNT), a drop-in feature transformation that applies key-conditioned obfuscation to intermediate representations. KNT reduces re-identification AUC from 0.635 to 0.586, corresponding to a 36% reduction in above-chance identity signal, while introducing only 0.15 ms CPU overhead, without backbone retraining, and preserving classification performance within 1.0 pp. Our analysis shows that KNT's nonlinear transform prevents closed-form inversion and shifts recovery to iterative gradient-based optimization under full key compromise, substantially increasing inversion difficulty. The same transform generalizes to dense prediction tasks, incurring only a 4.4 pp Dice reduction on skin-lesion segmentation without retraining. These results position KNT as a practical and efficient privacy layer for split inference deployments.
Grid-forming (GFM) inverters are essential for enhancing stability in modern power systems with high penetration of inverter-based resources (IBRs). However, their performance highly depends on control parameters tuning, particularly the active power-frequency droop coefficient. This parameter presents a trade-off among competing objectives, including damping, settling time, rate of change of frequencies (RoCoF) and frequency nadirs. This paper proposes a real-time, adaptive optimization framework based on Extremum Seeking Control (ESC) to dynamically tune the GFM droop gain. A multi-objective cost function balances conflicting performance goals such as oscillation energy, frequency nadir, RoCoF, and post-disturbance settling performance. The approach is validated through numerical simulations on a modified IEEE 68-bus system. Results demonstrate that the cost function is convex with respect to the droop parameter, justifying gradient-based optimization. Furthermore, the ESC algorithm successfully tracks the time-varying optimal droop coefficient in real-time as network conditions change, thereby ensuring robust and near-optimal system performance without requiring an analytical grid model.
Connected and autonomous vehicles and smart mobility services increasingly use digital route guidance as an operational input to traffic network management. When this information becomes unreliable or adversarial, day-to-day traffic models must represent not only flow adaptation but also the evolution of user trust in the information source. This paper develops a coupled day-to-day traffic assignment and trust-evolution framework for route-guidance misinformation. Within-day congestion is represented by Lighthill-Whitham-Richards network loading, while day-to-day route choice follows bounded-rationality logit learning with trust-dependent reliance on external guidance. Trust is modeled as an aggregate class-level behavioral reliance state encoded by a Beta evidence model and updated from repeated guidance errors. Theoretical analysis establishes stationary equilibria, a conservative stability guide, a weighted compliance index for population-level vulnerability, and an asymmetric recovery law that explains post-attack trust hysteresis. Numerical experiments on Sioux Falls, with an Anaheim robustness check, show that endogenous trust creates a threshold-based resilience mechanism. Below the trust-activation threshold, the attack remains behaviorally stealthy and dynamic trust provides almost no attenuation. Above the threshold, trust erosion reduces the impact of the fixed-trust attack by about 91 percent in Sioux Falls and 85 percent in Anaheim. The experiments also show that CAV penetration increases fixed-trust vulnerability while preserving dynamic attenuation, and that traffic performance can recover before trust, resulting in a 77-day hidden vulnerability window. The results provide a trust-aware modeling basis for resilience analysis in CAV-enabled traffic networks.
This paper presents a flexible energy management system (EMS) for an electric bus charging station (EBCS) that integrates renewable generation, energy storage, and electric bus (EB) charging while accounting for uncertainties in solar PV output, electricity prices, and EB arrival/departure state of charge. A data-driven polynomial chaos expansion surrogate is developed from a limited set of uncertainty samples, and a nonparametric inference method is used to enrich the input data when historical data is limited. Case studies on a solar-powered EBCS with 20 EBs demonstrate the effectiveness of the proposed EMS and data-driven method.
Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on frame-to-frame transition models. However, these models are fragile when observations are non-Markovian (when they form only a partial slice of a higher-dimensional latent state as in real-world weather data): they tend to accumulate errors over long horizons. At the same time, learned DA methods typically commit to a single regime, either filtering (nowcasting, real-time forecasting) or smoothing (retrospective reanalysis), which splits what should be a shared prior across application-specific pipelines. To address both issues, we introduce ForcingDAS, a unified and robust DA framework. Built on Diffusion Forcing with an independent noise level assigned to each frame, ForcingDAS learns a joint-trajectory prior instead of frame-to-frame transitions. This allows it to capture long-horizon temporal dependencies and reduce error accumulation. In addition, the same trained model spans the full filtering to smoothing spectrum at inference time. Specifically, nowcasting, fixed-lag smoothing, and batch reanalysis are selected through the inference schedule alone, without retraining. We evaluate ForcingDAS on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation. Across all settings, a single model is competitive with or outperforms both learned and classical baselines that are specialized for individual regimes, with the largest gains observed on real-world weather benchmarks.
This letter investigates the problem of energy efficient collaborative strategy for mobile embodied artificial intelligence network (MEAN) over wireless communication. In the considered model, the agents execute the tasks through collaboration, and they can switch between two operating modes based on the signal-to-noise ratio (SNR) and global collaboration. The dual-mode comprises the base station (BS)-assisted collaborative mode, in which agents make decisions through semantic communication with BS and then collaborate on tasks, and the local computing mode, in which the agents make decisions and execute tasks independently. Due to the dynamic wireless communication and flexible collaboration strategy, we jointly consider computation energy, communication energy, and task-execution energy with specific collaborative gains into a mixed-integer nonlinear programming (MINLP) optimization problem whose goal is to minimize the total system energy consumption. To solve it, we propose a lower-complexity enumeration algorithm: first, we get the optimal closed-form solution for semantic compression ratio and transmit power by proving the strict convexity. Second, we determine the scale of collaboration and the operating mode of each agent by a greedy sorting algorithm based on individual energy-saving potentials. Simulation results show that the proposed algorithm can significantly reduce the total energy consumption compared to benchmark schemes.
Modern edge devices increasingly rely on neural networks for intelligent applications. However, conventional digital computing-based edge inference requires substantial memory and energy consumption. In analog radio frequency (RF) computing, a base station (BS) encodes the weights of the neural networks and broadcasts the RF waveforms to the clients. Each client reuses its passive mixer to multiply the received weight-encoded waveform with a locally generated input-encoded waveform. This enables wireless receivers to perform the matrix-vector multiplications (MVMs) that account for most of the computation burden in edge inference with ultra-low energy consumption. Unlike conventional downlink transmissions which are optimized for communications, analog RF computing requires a computing-centric physical layer that controls both the analog MVM accuracy and the energy consumption for inference. Motivated by this, in this paper, we propose a physical layer design framework for analog RF computing in MU-MIMO wireless systems. We derive tractable models for computing accuracy and energy consumption for inference, formulate a joint BS beamforming and client-side scaling problem subject to computing accuracy, transmit power, and hardware constraints, and develop a low-complexity algorithm to solve the non-convex problem. The proposed design provides client- and layer-specific accuracy control for both uniform- and mixed-precision inference. Simulations under 3GPP specifications show that analog RF computing can significantly reduce client-side energy consumption by nearly two orders of magnitude compared to digital computing, while mixed-precision inference requires even lower energy consumption than uniform-precision inference. Overall, these results establish analog RF computing over wireless networks as a promising paradigm for energy-efficient edge inference.
We present a physics-informed framework for system identification based on randomized stable atomic features. Impulse responses are represented as random superpositions of stable atoms, namely damped complex exponentials associated with poles sampled inside a prescribed disk. Identification is then cast as a convex regularized least-squares problem with optional linear, second-order-cone, and KYP constraints. The approach generalizes random Fourier and random Laplace features to the damped, nonstationary regime relevant to engineering systems while retaining modal interpretability and scalable finite-dimensional computation. The main analytic point is an operator-theoretic Disk-Bochner viewpoint: positive measures over stable poles generate positive-definite kernels with a radius-dependent shift defect, while a converse scalar disk moment representation for an arbitrary kernel is characterized by subnormality of the canonical shift. We prove this statement, establish an RKHS-to-l1 embedding, show that sampled poles induce a valid finite atomic gauge, discuss random-feature convergence, and state sparse-recovery guarantees conditionally on the restricted-eigenvalue properties of the realized disk-Vandermonde or input-output design matrix. We also connect the normalized transfer function problem to Nevanlinna-Pick interpolation and LFT set-membership. The framework directly encodes stability margins, modal localization, DC-gain bounds, monotonicity, passivity, relative degree, settling-time targets, and time/frequency-domain error bounds. Numerical comparisons illustrate how physically meaningful priors can compensate for poor excitation and improve constrained impulse-response recovery in an under-informative data setting.
Designing effective practice schedules for high-dimensional motor learning tasks remains a challenge, especially when skill states are unobservable and task performance may not reflect the true learning. We propose an automated curriculum design framework that combines a human motor learning model and personalized real-time skill estimation with Stochastic Nonlinear Model Predictive Control in \emph{de-novo} (novel) motor learning paradigms. We validated our framework both through simulations and human-subject studies (N = 36) using a hand exoskeleton. Our proposed approach accelerates skill acquisition by $\sim23\%$, and ${\sim17\%}$ when compared to a random curriculum and a performance heuristics-based curriculum, respectively. These significant gains in learning efficiency highlight the potential of model-based, individualized curricula for motor rehabilitation and complex skill training.
Rydberg atomic quantum receivers have been seen as novel radio frequency measurements and the high sensitivity to a large range of frequencies makes it attractive for communications reception. However, their performance can be significantly degraded by hardware-induced noise, particularly the noise from laser, which impacts the overall system noise floor and exhibits correlation. To address this challenge, this paper proposes a weight hybrid (WH) architecture for Rydberg-atomic sensors, a novel four-channel combining scheme designed for atomic sensors operating in correlated noise environments. By jointly processing dual signal channels and dual noise reference channels, the WH architecture effectively mitigates noise contributions from lasers and other hardware components. All channels are optimally combined via maximum likelihood estimation within an expectation maximization framework, enabling robust signal extraction under correlated noise. Moreover, the proposed WH architecture is universal and can be readily extended to other types of Rydberg receivers to achieve consistent performance improvements.
We propose Walsh-Hadamard Transform Division Multiplexing (WHTDM), a multicarrier waveform that replaces the conventional IFFT/FFT pair in OFDM with a real-valued, unitary Walsh-Hadamard transform (WHT). WHTDM inherits the CP-OFDM transceiver structure while eliminating all complex multiplications from the transform stage, yielding a transmitter with zero real multipliers in the core modulation block. For detection under doubly-selective channels, we adopt a cross-domain memory approximate message passing (CD-MAMP) equalizer that operates on the banded structure of the equivalent WHT-domain channel matrix. Simulation results under the 3GPP TDL-C channel model at 28 GHz demonstrate that WHTDM with CD-MAMP significantly outperforms conventional OFDM 1-tap MMSE at high mobility, achieving over an order of magnitude lower BER at 120 km/h. Among the compared CD-MAMP-equalized new waveforms, WHTDM achieves the best BER performance while maintaining a transmitter complexity 2.5 $\times$ lower than OFDM and completely eliminating complex multipliers from the transform stage, making it well-suited for low-power IoT terminals.
This paper proposes a fully dynamic Deep Reinforcement Learning (DRL) method for rebalancing dockless bike-sharing systems, overcoming the limitations of periodic, system-wide interventions. We model the service through a graph-based simulator and cast rebalancing as a Markov decision process. A DRL agent routes a single truck in real time, executing localized pick-up, drop-off, and charging actions guided by spatiotemporal criticality scores. Experiments on real-world data show significant reductions in availability failures with a minimal fleet size, while limiting spatial inequality and mobility deserts. Our approach demonstrates the value of learning-based rebalancing for efficient and reliable shared micromobility.
Power electronics systems are increasingly exposed to cyber threats due to their integration with digital controllers and communication networks. However, an attacker-oriented metric is still lacking to quantify the extent to which a node can be pushed toward instability within a privilege-constrained action space. This letter proposes an impedance-based Attack Reachable Domain (ARD) framework that maps feasible adversarial actions to critical-eigenvalue migration through impedance reshaping. Based on the ARD, an Attack Penetration Index is defined to quantify node-level cyber-vulnerability by jointly characterizing the penetration of the nominal stability margin and the accessibility of successful destabilizing attacks within a privilege-constrained action space. To make the proposed assessment computable when inverter models are unavailable, a practical gray-box workflow is further established by integrating existing impedance identification and differentiable surrogate tools. Case studies on a 4-bus system and a modified IEEE 39-bus system show that coordinated cross-layer manipulations are markedly more damaging than isolated single-layer attacks, and that the proposed metric reveals vulnerability patterns that cannot be inferred from grid-strength indicators.
The high penetration of voltage source converters in modern smart microgrids enhances operational flexibility while introducing complex cyber-physical vulnerabilities. Existing cyber-attack studies either require detailed knowledge of system topology and controller dynamics or depend on repeated online interactions, which may compromise practicality by generating operationally infeasible or limit-violating commands. This article investigates a dispatch command manipulation attack and develops an admittance-guided framework to identify the vulnerable inverter and the worst-case dispatch command that most severely degrades system stability. A compromised inverter is utilized to inject controlled harmonic perturbations for sparse admittance measurement, and a physics-informed neural network is then employed to reconstruct the operating-point-dependent admittance of target inverters over the feasible dispatch region. Based on the reconstructed admittance, a stability-margin-oriented optimization is formulated to locate the most vulnerable inverter and the corresponding worst-case dispatch command. Controller hardware-in-the-loop experiments on a five-inverter microgrid demonstrate that the identified command can drive the system into severe sub-synchronous oscillations while remaining within nominal dispatch bounds, highlighting the need for stability-aware command screening beyond static limit checking.
This paper addresses the critical sensitivity issue of narrow-beam communication systems to physical misalignments and exploits the potential of Integrated Sensing and Communications (ISAC) technology to propose a sensor-free antenna tilt failure detection and estimation framework. The proposed methods utilize environmental static clutter as geometric anchors to monitor systematic gain shifts in clutter heat maps. The proposed methods are introduced for precise antenna tilt detection and estimation using the standard 5G NR frame structure and two different waveforms. Numerical results show the potential of the proposed framework to enable autonomous, self healing network maintenance without the need for external sensors.
3D Gaussian Splatting (3DGS) has emerged as a prominent framework for real-time, photorealistic scene reconstruction, offering significant speed-ups over Neural Radiance Fields (NeRF). However, the fidelity of 3DGS representations remains heavily dependent on the quality of the initial point cloud. While standard Structure-from-Motion (SfM) pipelines using COLMAP provide adequate initialisation, they often suffer from high computational costs and sparsity in textureless regions, which degrades subsequent reconstruction accuracy and convergence speed. In this work, we introduce an AV1-based feature detection and matching pipeline that significantly reduces SfM processing overhead. By leveraging motion vectors inherent to the AV1 video codec, we bypass computationally expensive exhaustive matching while maintaining geometric robustness. Our pipeline produces substantially denser point clouds, with up to eight times as many points as classical SfM. We demonstrate that this enhanced initialisation directly improves 3DGS performance, yielding an 9-point increase in VMAF and a 63% average reduction in training time required to reach baseline quality. The project page: this https URL
This paper presents a distributionally robust model predictive control (DRMPC) framework for the optimal Virtual Power Plant (VPP) operation under electricity price uncertainty. A unified VPP model is formulated that captures the interaction between buildings, battery storage, and renewable generation, all influenced by exogenous weather and market signals. The proposed approach integrates data-driven forecasting with quantile-based uncertainty quantification to construct time-varying Wasserstein ambiguity sets that adapt to forecast dispersion and distributional shifts. This yields a tractable DR-MPC formulation that incorporates predictive distribution information directly into real-time decision making. The method is evaluated using real weather and market data from a Nordic case study across two seasonal scenarios. The results show that DR-MPC improves economic performance relative to standard forecast-based MPC when the ambiguity radius is chosen appropriately, with consistent gains of up to 0.8% for small radii across both seasonal scenarios. Larger radii become overly conservative and reduce revenue, underscoring the importance of proper radius selection. These findings demonstrate the practical value of distributionally robust optimization for uncertainty-aware VPP operation.
Accurate beam prediction is essential for mitigating signalling overhead and latency in integrated sensing and communication-enabled massive multi-input multi-output systems. With the aid of multimodal learning, the prediction accuracy can be enhanced by leveraging the complementary information from other existing sensors, but the practical deployment is often constrained by the high cost of acquiring semantically aligned multimodal datasets. This paper proposes a variational-inference-based multimodal framework that decouples the optimization problem into modular feature extraction and cross-modal semantic alignment. Specifically, we develop a two-stage training strategy where the model utilises abundant unimodal data for representation learning before performing refined alignment on limited multimodal samples. This design enhances data efficiency and ensures robust feature fusion under sensing uncertainties. Experimental results on the DeepSense6G dataset demonstrate that the proposed framework achieves competitive beam prediction accuracy and maintains high reliability, while only requiring 20% of the multimodal training data compared to conventional end-to-end benchmarks.
Underwater images suffer from severe wavelength-dependent light absorption and scattering, and turbidity due to suspended particles, degrading visual quality for applications in autonomous underwater vehicles (AUVs), marine biology, archaeology, and offshore infrastructure inspection. Classical IFM inadequately capture nonlinear underwater light behavior, while purely data-driven methods lack physical interpretability. This paper proposes a three-stage network named ADR, that extends the underwater image formation model with additional terms to perform underwater dehazing, followed by Retinex-based enhancement and attention-enabled U-Net++ refinement. Experiments on UIEB and UFO-120 benchmark datasets demonstrate competitive performance with state-of-the-art methods.
This paper addresses the dynamic event-triggered control for a class of discrete-time nonlinear systems described by a difference-algebraic representation (DAR), using a gain-scheduled controller. An outstanding aspect of the proposed method is the incorporation of information about the system's nonlinearities into the control law and the trigger function. The proposed event-triggered mechanism also incorporates information on the asynchronous terms induced by the event-based sampling. All these ingredients enable the derivation of a less conservative co-design condition for the simultaneous design of the gain-scheduled control law and the dynamic triggering mechanism to ensure the asymptotic stability of the closed-loop system. An estimate of the region of attraction of the origin of the closed-loop system is obtained to guarantee the closed-loop system's operation within the domain of validity of the DAR. Then, an optimization problem is formulated to reduce the number of events and enlarge the estimated region of attraction. Finally, the effectiveness of the proposed condition is illustrated by a numerical example.
Explosive growth in energy-intensive AI data centers is outstripping the pace of power grid interconnection and transmission expansion. While operational flexibility has been proposed to mitigate this stress, existing processes are often reactive and evaluate projects only after they enter a multi-year interconnection queue. To address this, we introduce a planner-initiated siting framework that integrates (i) reliability-gated screening, (ii) system-wide market-impact assessment under standardized flexibility envelopes (firm, pause, and shift), and (iii) entropy-weighted multi-criteria scoring to produce ranked, pre-certified catalogues of interconnection-ready locations. Applied to a synthetic 2,000-bus Texas power system, the framework demonstrates that operational flexibility expands the siting frontier by 9-17% at 1 GW and 19-21% at 2 GW compared to firm operation. Median all-hour average prices remain essentially unchanged (USD 24.32/MWh for the 2 GW cases), and the shift envelope attenuates peak-hour price dispersion by approximately 3.4% with minimal side effects during off-peak hours. Utilizing pre-certified envelopes to bypass major transmission reinforcements, this workflow enables first energization in 12-18 months, a conservative reduction of 3.5-4 years versus the conventional 5-8 year project-led process. This technology-agnostic framework provides a proactive decision-making tool for system operators and regulators to fast-track large flexible loads while preserving grid reliability and market stability.
This paper investigates the effect of oscillator phase noise in orthogonal time frequency space (OTFS) systems. The paper provides in-depth analysis of the interference due to phase noise in the delay-Doppler domain and derives expressions for SINR for three different oscillator types, namely free-running oscillators, continuous-time phase locked loops (PLLs) and discrete-time PLLs. The analysis demonstrates the OTFS is sensitive to phase noise and requires appropriate estimation and compensation. In particular, the analysis shows phase noise imposed inter-Doppler-interference (IDI) is severe and that existing phase noise estimation techniques which only consider the common-phase-error (CPE) can not compensate this IDI effectively. Additionally, the existing methods in the OTFS literature on phase noise assume the channel to be a known single tap channel. Hence, in this paper, we propose a method for joint channel and phase noise estimation using a Wiener filtering approach. Our proposed method exploits the statistical nature of both the phase noise and the Doppler spread channel. Our numerical results demonstrate the superior performance of our proposed technique, with gains of up to 8~dB in terms of bit error rate (BER) over existing methods in the literature.
Neuromorphic cameras, also known as event-based cameras, can detect changes in the environmental brightness asynchronously and independently for each pixel. They output the brightness changes, i.e., events, as 3-D (2-D pixel coordinates + time) streaming data. While event-based cameras are used in many applications because of their desirable characteristics, e.g., high temporal resolution, low latency, low power consumption, and high dynamic range, their measurements contain considerable noise due to their high sensitivity. In this paper, we propose a denoising method for event-based cameras based on graph spectral features. In the proposed method, we first construct a graph where nodes represent events and edges represent the spatiotemporal distance between the events. To calculate the graph-specified parameter that controls the connectivities of a constructed graph, we utilize the prior on the density of 3-D events. We then calculate the eigenvectors of the graph Laplacian. The obtained eigenvectors are used to extract noiseless events directly. In the calculation of the eigenvectors, we customize the graph Laplacian to reorder its eigenvalues. This allows us to leverage fast eigensolver algorithms instead of the naive eigendecomposition and thereby reduce computational complexity. In experiments on synthetic and real-world event data, we demonstrate that the proposed method effectively removes noise events from the raw events compared to alternative methods.
Electrified chemical processes are incentivized by exposure to time-varying electricity markets to operate flexibly, but participating in demand response schemes can require satisfying terminal constraints over long horizons. Specifically, terminal constraints may be required when computing optimal schedules in order to preserve dynamic stability. Model-based optimization methods are computationally costly, and data-driven scheduling via reinforcement learning (RL) faces severe credit-assignment challenges. We integrate Goal-Space Planning (GSP) with Deep Deterministic Policy Gradient (DDPG), using learned temporally abstract models over discrete subgoals to propagate value across extended horizons. Using a simulated air separation benchmark, we demonstrate the proposed approach improves sample efficiency over standard DDPG while satisfying terminal storage constraints, mitigating myopic control behavior.
As sixth-generation (6G) wireless networks evolve toward increasingly heterogeneous scenarios, tasks, and service requirements, conventional artificial intelligence (AI) models remain limited in task-aware decision-making and autonomous adaptation. To address this issue, this paper first proposes a ChannelAgent-empowered electromagnetic space world model, in which wireless intelligence is organized into a closed-loop process consisting of multi-modal sensing, ChannelAgent as the intelligent core, and execution with feedback update. As a case study, agent-driven channel generation is instantiated through path loss prediction. Specifically, a task-oriented intelligent feature selection mechanism is designed by integrating reinforcement-learning-inspired policy adaptation with evolutionary search, enabling the agent to iteratively derive compact and task-suitable feature subsets according to the current scenario and performance feedback. Simulation results demonstrate superior performance in both single-scenario and multi-scenario tasks, highlighting the potential of the proposed model for autonomous, adaptive, task-oriented, and closed-loop wireless intelligence.
Large-scale chemical plants rely on distributed process control systems (PCS) comprising numerous processing units, communication modules, and I/O devices interconnected via industrial networks. The design of a cost-efficient and reliable hardware architecture under partial uncertainty in plant parameters remains a challenging combinatorial optimization problem. This paper proposes a formal model for distributed control system hardware architecture synthesis. A hybrid ant colony-based metaheuristic framework is developed to construct feasible hierarchical architectures. The proposed approach is validated on a large-scale sulfuric acid plant control system case study. Plant parameters are identified from operational data, system stability is analyzed, and a controller synthesis is performed based on the optimized architecture. The results demonstrate the feasibility of the approach and confirm that the obtained architecture satisfies structural and dynamic performance requirements.
Affine frequency division multiplexing (AFDM) has recently emerged as a promising waveform for high-mobility communications due to its resilience to Doppler effects and its advantages for integrated sensing and communication (ISAC). AFDM modulates transmit data symbols using chirp subcarriers with two adjustable parameters. One is used for dealing with the Doppler effect and the second parameter can be used for physical layer security (PLS). In this paper, we focus on designing the second chirp parameter in the form of a generic phase function to enhance the robustness of the waveform against brute-force demodulation by the eavesdropper. In particular, we first derive a design criterion that reveals the brute-force demodulation complexity depends on the first derivative of the phase function. Then, we introduce a family of phase functions that can increase the brute-force demodulation complexity in an unbounded and controllable manner, while preserving chirp structure of AFDM. Our simulation results demonstrate that the proposed phase function design enhances the PLS performance of AFDM by several orders of magnitude compared with the conventional AFDM in terms of brute-force demodulation complexity.
Log-homotopy particle flow filters realize nonlinear Bayesian estimation by continuously migrating samples from the prior to the posterior distribution. This transport is governed by a pseudo-time ordinary differential equation (ODE). A major practical challenge of these filters is the need for numerical integration, which suffers from high computational cost and susceptibility to stiffness. This paper develops an exact, integration-free closed-form solution for the exact Daum--Huang (EDH) deterministic particle flow under vector linear Gaussian measurements. By transforming the ODE into a specific eigenspace, closed-form algebraic expressions are derived for both the homogeneous state transition matrix and the inhomogeneous forcing term. We prove that this analytic solution is mathematically equivalent to the exact Kalman measurement update. Furthermore, we demonstrate how this closed-form evaluation can be embedded within an $N$-step slicing method, providing a stiffness-mitigating, integration-free particle update for highly nonlinear measurement models.
Mild traumatic brain injury (mTBI) is a prevalent condition that remains difficult to diagnose in its early stages. Oculomotor dysfunction is a well-established marker of mTBI, motivating the development of portable tools that capture both eye-movement behavior and underlying neurophysiology. In this work, we present an initial framework that integrates electroencephalogram (EEG) with augmented-reality (AR)-based Vestibular/Ocular Motor Screening (VOMS) tasks to estimate subject-specific ocular response times. Pre-processed EEG signals, obtained through band-pass filtering and average referencing, are analyzed using a Redundant Discrete Wavelet Transform (RDWT)-driven deep neural framework. The RDWT coefficients are subjected to trainable zero-phase convolutional filtering and reconstructed into the time domain via inverse RDWT, followed by channel-wise temporal and spatial filtering using 2D convolution layers and convolutional-LSTM-based decoding. An ablation study demonstrates that wavelet-domain filtering serves as an effective denoising strategy, improving prediction performance. Sliding-window predictions were validated using Pearson correlation (>= 0.5), and Dynamic Time Warping (DTW) was subsequently used to estimate ocular response times. DTW-derived metrics revealed significant inter-subject differences across all VOM tasks, supported by Mann-Whitney U tests. Cross-correlation analysis further revealed task-dependent temporal behaviors: pursuit tasks exhibited reactive tracking, whereas saccades showed anticipatory responses. Overall, the results highlight pursuit tasks as particularly informative for distinguishing timing differences and demonstrate the potential of RDWT-based EEG features combined with DTW metrics for multimodal mTBI assessment.
Transmit beamforming for underwater acoustic communication is challenging because it requires perfect knowledge of the channel to the receiver in advance. In practice, channel estimates must be learned through feedback and are often noisy or outdated because of feedback delay and channel variation. In this paper, we investigate angle-based beamforming strategies for a single-user link that reduce dependence on full channel knowledge by exploiting stable components of the geometric structure in the propagation field. In particular, we focus on scenarios in which there exists a dominant path that remains relatively stable over time, making it a suitable candidate for transmit beamforming. Experimental results using the SPACE and MACE data sets demonstrate the effectiveness of the proposed method in terms of data-detection mean-squared error and bit error rate.
Electroencephalogram (EEG) signals are highly susceptible to artifacts, resulting in a low signal-to-noise ratio which makes extraction of meaningful neural information challenging. Artifact Subspace Reconstruction (ASR) is one of the most widely used artifact filtering techniques in EEG-based BCI applications, owing to its real-time applicability. ASR reconstructs artifact-free signals by operating in Principal Component (PC) space within sliding windows. However, ASR performance is critically sensitive to its threshold parameter - an incorrect threshold risks removing task-relevant neural features alongside artifacts. Furthermore, since PCs are linear combinations of all channels, subspace reconstruction in PC space may alter the underlying data structure, potentially discarding essential neural information. To address these limitations, we propose nASR, a novel end-to-end trainable Keras layer that jointly optimizes artifact rejection and downstream decoding. nASR introduces two trainable threshold parameters: K, which governs artifact detection in PC variance space, and L, which quantifies eigen-spread to pinpoint the primary artifact--contributing channels, enabling selective channel-level reconstruction that preserves clean channel information. An ablation study comprising five model variants (m01 - m05), evaluated across two subjects from the BCI Competition IV Dataset 1, confirms that nASR variants consistently outperform traditional ASR on test classification metrics, while achieving a 6-8x reduction in inference time, making nASR a strong candidate for real-time BCI applications demanding both low latency and high decoding performance.
The study addresses the problem of quadcopter motion control using output feedback. By applying a geometric approach, the quadcopter model is transformed into a normal form with a time-varying gain coefficient, which is subsequently made stationary through double integration of the control input. A robust output feedback control law is synthesised based on the extended observer method.
Angle power spectrum (APS) characterizes the directional distribution of received signal power and is directly relevant to beam management and MIMO processing. While environment-aware learning has been widely studied for radio maps and path loss, direct map-to-APS prediction still lacks a standardized large-scale benchmark. This paper presents Map2APS, a physically grounded benchmark constructed from intelligent ray-tracing (IRT) path-level propagation records. Map2APS covers 51 equal-height urban maps and approximately 2.55 million Tx--Rx samples, with a strict cross-map split for evaluating generalization to unseen urban layouts. We benchmark representative model families and introduce MS-AReg as a strong reference baseline. On the full held-out test set of 249{,}993 samples, MS-AReg achieves a cosine similarity of 0.948, a peak location error of 1.20$^\circ$, and an inference latency of 0.101 ms/sample. We further report dominant-direction metrics, including Top-1 dominant peak hit rate and dominant peak recall, to evaluate whether predicted spectra preserve decision-relevant arrival directions. The benchmark, code, and evaluation scripts are released at this https URL.
Intelligent Reflecting Surfaces (IRSs) are a promising technology for enhancing the spectral and energy efficiency of millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. In these systems, accurate channel estimation remains challenging due to the passive nature of IRS elements and the high pilot overhead in large-scale deployments. This paper presents a deep learning-based Multi-Block Attention (MBA) framework for efficient cascaded channel estimation in IRS-assisted mmWave MIMO systems that utilize orthogonal frequency division multiplexing (OFDM). First, we show the optimality of the discrete Fourier transform (DFT) and Hadamard matrices as phase configurations for least squares (LS) estimation. To reduce training overhead, we selectively deactivate IRS elements and compensate for induced feature loss using a two-stage architecture: (i) a Convolutional Attention Network (CAN) for spatial correlation recovery and (ii) a Complex Multi-Convolutional Network (CMN) for noise suppression. The MBA architecture mitigates error propagation through attention-guided feature refinement and denoising. Simulation results indicate that the MBA method reduces pilot overhead by up to 87% compared to the LS estimator. Additionally, at signal-to-noise ratios of 10 dB, our proposed method achieves approximately 51% lower normalized mean squared error (NMSE) than leading methods. It also maintains low computational complexity and adapts effectively to various propagation environments.
Data-dependent secondary transforms, which aim to decorrelate coefficients of a separable primary transform, can improve residual coding efficiency; however, their deployment is often constrained by computational complexity. Recent video codecs use variants of the low-frequency non-separable transform (LFNST), which discards some high-frequency secondary transform coefficients, limiting achievable coding gains. Moreover, existing data-dependent secondary transforms lack explicit rate-distortion (RD) optimal design criteria. In this work, we propose a framework for designing low-complexity data-dependent secondary transforms, termed Fast Sparsifying Secondary Transforms (FaSSTs). Our approach approximates data-driven sparse orthonormal transforms (SOTs) by factorizing them into a sequence of Givens rotations. The rotations are efficiently determined using an alternating minimization strategy combined with an approximate Givens factorization procedure. Our method adapts the number of rotations based on the prediction mode, further reducing computational complexity. We design mode-dependent secondary transforms for intra-prediction residuals in AV2 using FaSST. Experimental results show that mode-adaptive FaSST matches the RD performance of LFNST while reducing the number of computations by 83.67%. Moreover, by avoiding fixed-coefficient truncation, FaSST achieves up to 1.80% BD-rate savings relative to LFNST while operating at 66.24% lower complexity.
This paper presents an analytical framework for downlink pinching antenna systems (PASS) employing waveguide division multiple access (WDMA) and non-orthogonal multiple access (NOMA). A unified channel model is developed to capture antenna deployment, user spatial distribution, and path loss. Closed-form and single-integral expressions for the outage probability and average achievable rate are derived and validated via Monte Carlo simulations. The results show that NOMA achieves higher spectral efficiency at high transmit signal-to-noise ratio (SNR) due to successive interference cancellation (SIC), whereas WDMA offers more reliable performance at low to moderate SNR but suffers from an outage floor and rate saturation at high SNR. Moreover, WDMA performance is more sensitive to the user spatial distribution due to the spatially dependent inter-waveguide interference. These findings provide design insights for access-scheme selection and antenna placement in PASS.
As a critical component of sixth-generation (6G) wireless networks, ultra-reliable and low-latency communication (URLLC) is expected to support real-time and reliable information exchange in low-altitude environments. However, achieving URLLC often incurs significant resource overhead, including increased bandwidth consumption, higher transmit power, and denser access point (AP) deployment, which pose significant challenges to both spectral efficiency (SE) and energy efficiency (EE). Besides, existing iterative optimization algorithms are computationally intensive and struggle to meet the latency requirements of URLLC. To address these challenges, we propose a hybrid aerial-terrestrial cell-free massive MIMO (CF-mMIMO) network to support diverse services, along with a channel prediction network and a deep mixture of experts (MoE) network for uplink optimization. First, we design a channel prediction network (CP-Net) to mitigate channel aging caused by high-mobility user equipment (UE). CP-Net employs three Transformer-based sub-networks for aged channel state information (CSI) prediction, while a channel quality-aware loss function is introduced to improve the prediction accuracy of weak links. Based on the predicted CSI, we develop a deep MoE network (MoE-Net) for power allocation comprising three expert models targeting different objectives. Then, we introduce a weighted gating network (WT-Net) to learn an efficient adaptive combination of expert outputs. The proposed framework better captures heterogeneous UE requirements and improves communication performance under URLLC constraints. Numerical results demonstrate the effectiveness of the proposed method.
Understanding when linear immersions of nonlinear dynamical systems exist is important since such immersions allow us to leverage the rich tools of linear system theory to analyze nonlinear dynamics. Recently, Liu et al. (2023) showed that continuous-time dynamical systems that admit countably many but more than one omega-limit sets cannot be immersed into finite dimensional linear systems with a one-to-one and continuous mapping. In this paper, we extend these results to discrete-time dynamics and show that similar obstructions exist also in discrete time. We further consider a generalization involving alpha-limit sets. Several examples are provided to demonstrate the results.
Multi-person 3D reconstruction is pivotal for real-world interaction analysis, yet remains challenging due to severe occlusions and depth ambiguity. Current approaches typically rely on single-modality inputs, which inherently lack geometric guidance. Furthermore, these methods often reconstruct subjects in isolation, neglecting the collective group context essential for resolving ambiguities in crowded scenes. To address these limitations, we propose Contrastive Multi-modal Hypergraph Reasoning to synergize semantic, geometric, and pose cues for crowd reconstruction. We first initialize robust node representations by combining RGB features, geometric priors, and occlusion-aware incomplete poses. Additionally, we introduce a pelvis depth indicator as a global spatial anchor, aligning visual features with a metric-scale-agnostic depth ordering. Subsequently, we construct a shared-topology hypergraph that moves beyond pairwise constraints to model higher-order crowd dynamics. To improve feature fusion, we design a hypergraph-based contrastive learning scheme that jointly enhances intra-modal discriminability and enforces cross-modal orthogonality. This mechanism enables the network to propagate global context effectively, allowing it to infer missing information even under severe occlusion. Extensive experiments on the Panoptic and GigaCrowd benchmarks confirm that our method achieves new state-of-the-art performance. Code and pre-trained models are available at this https URL.
We present Seed3D 2.0, an advanced 3D content generation system built on Seed3D 1.0, with substantial improvements across generation fidelity, simulation-ready capabilities, and application coverage. For geometry, a coarse-to-fine two-stage pipeline decouples global structure learning from high-frequency detail recovery, while a locality-aware VAE achieves higher spatial compression and more efficient decoding. For texture and material generation, we replace the cascaded pipeline of Seed3D 1.0 with a unified PBR model that directly generates multi-view albedo and metallic-roughness maps, enhanced by Mixture-of-Experts scaling and VLM-based semantic conditioning for improved material precision and visual fidelity. Beyond single-object generation, Seed3D 2.0 introduces a simulation-ready model suite comprising scene layout planning, part-aware decomposition, and training-free articulation generation, enabling coherent scene construction and part-level physical interaction across physics and graphics engines. A large-scale human preference study against five recent commercial models shows that Seed3D 2.0 achieves consistent win rates of 69.0% to 89.9% in textured 3D asset generation. Seed3D 2.0 is available on this https URL
We study certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. The monitor must infer safety-relevant quantities from images and provide finite-sample guarantees, while being \emph{reusable}: once trained and calibrated, it should certify any formula in a target fragment without per-formula retraining. For fragments induced by a finite dictionary of temporal atoms, we prove that the \emph{semantic basis}, the vector of atom robustness scores, is the minimum prediction target within the class of monotone, 1-Lipschitz reusable interfaces: any formula is evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. We also introduce a \emph{rolling prediction monitor} that predicts only current predicate values and reconstructs temporal history online; this is easier to learn but grows conservative at long horizons. On a pedestrian-crossroad benchmark, rolling achieves tighter certified bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons. We validate the presented monitors on real-world Waymo driving data, where both monitors satisfy the conformal coverage guarantee empirically.
The increasing computational demand of AI workloads has intensified the need for energy-efficient in-memory and near-memory computing architectures, particularly because data movement often consumes significantly more energy than computation itself. While fully digital architectures provide robust scalability and support higher-resolution computation, analog in-memory computing has demonstrated improved energy efficiency for low-precision workloads. However, its reliance on peripheral DACs and ADCs introduces additional power, area, and design overhead. To address these challenges, this work presents a time-domain near-memory computing architecture for low-precision multiply-and-accumulate (MAC) operations. In the proposed approach, digital weight bits stored in SRAM are converted using a current-steering DAC, while the digital input vector is encoded by an N-pulse generator. This enables multiplication to be performed in the time domain while maintaining a digital-friendly interface. Two accumulation schemes, a delay-cell-based architecture and a counter-based architecture, are investigated and compared in terms of design trade-offs, linearity, scalability, and power efficiency. To improve technology portability, the N-pulse generator and counters are implemented using RTL synthesis, while the current-steering DAC remains in the analog domain. A 4 x 4 MAC prototype is implemented with a 1 V supply, achieving an operating frequency of 40 MHz, power consumption of 42 uW, and energy efficiency of 7.62 TOPS/W.
Motion planning for autonomous vehicles requires generating collision-free and dynamically feasible trajectories in complex environments under real-time constraints. While nonlinear optimal control formulations provide high-fidelity solutions, they are computationally demanding and sensitive to initialization, whereas geometric planning methods scale well but often decouple path selection from trajectory optimization. This paper studies the extent to which optimization over Graphs of Convex Sets (GCS) can approximate solutions of nonlinear optimal control problems in the context of autonomous driving. The free space is represented as a finite union of convex regions organized as a directed graph, allowing nonconvex geometry to be handled through discrete connectivity decisions while maintaining convex trajectory constraints within each region. Vehicle motion is parameterized using Bezier curves for the spatial path and a polynomial time-scaling function for temporal evolution. Under small-slip and linear tire assumptions, a simplified dynamic bicycle model enables approximate enforcement of dynamic feasibility through convex constraints on trajectory derivatives. The approach is evaluated in CommonRoad scenarios involving static obstacle avoidance and lane-changing maneuvers, and is compared against a nonlinear discrete-time optimal control formulation. The results indicate that the GCS-based method generates collision-free and dynamically consistent trajectories that closely match those obtained from the nonlinear program, while exhibiting improved computational efficiency and reduced sensitivity to initialization. These findings suggest that GCS provides a structured approximation of nonlinear motion planning problems, capturing dominant geometric and dynamic effects while preserving convexity in the continuous relaxation.
Large language models can now be personalised efficiently at scale using parameter efficient finetuning methods (PEFTs), but serving user-specific PEFTs harms throughput, even with specialised kernels and memory management techniques. This is because, theoretically and empirically, a mismatch exists between prefill (processing a large number of tokens at once) and decode (generating a single token autoregressively): the latter has far lower throughput when serving multiple adapters. Rather than optimising performance relative to parameter count, for efficient multi-adapter serving, we instead ought to optimise performance relative to serving throughput. We therefore propose PreFT (Prefill-only Finetuning), wherein we only apply the adapter to prefill tokens and discard it afterwards. PreFT significantly increases throughput with minimal effect on performance. We develop and release an efficient implementation of two prefill-only PEFTs, LoRA and ReFT, on the vLLM inference engine. We first show that serving multi-user PreFTs is more efficient than traditional PEFTs ($1.9\times$ the throughput when serving $512$ adapters on Llama 3.1 70B). Then, we compare the performance of prefill-only vs. all-token adapters on a variety of supervised finetuning and reinforcement learning tasks with LMs at varying scales. On SFT, we observe that the evaluation loss of PreFTs is higher than PEFTs, but can be compensated by increasing rank with nearly no reduction in throughput. On RL, we consistently find that PreFTs approach parity with standard PEFTs. Together, this work validates prefill-only adaptation of LLMs as a more favourable accuracy-throughput tradeoff than existing PEFTs for personalised serving.
The security of networked control systems (NCS) is receiving increasing attention from both cyber-security and system-theoretic perspectives. The former focuses on classical IT security goals such as confidentiality, integrity, and availability of process data, while the latter investigates tailored attacks (and detection schemes), including covert and zero-dynamics attacks. Confidentiality in control systems can, for instance, be achieved by securely outsourcing the evaluation of the controller to third-party platforms, such as cloud services. The underlying technology enabling such secure computation often is homomorphic encryption (HE). Recent works in encrypted control have proposed modifications to underlying HE schemes to achieve not only confidentiality but also resilience to certain types of integrity attacks. While extensions in this direction are desirable in principle, we show that the integrity problem in encrypted control cannot be solved by public-key HE schemes alone due to their inherent malleability. In other words, the same homomorphisms that enable encrypted control % in the first place can be leveraged not only constructively but also destructively. More precisely, we demonstrate that NCS are vulnerable to covert attacks, even when encrypted control is employed. Remarkably, this remains possible without knowledge of an unencrypted model. Yet, resilience to such attacks can still be achieved through complementary techniques. We present an approach based on verifiable computation that integrates with modern homomorphic cryptosystems and is asymptotically secure while incurring no communication overhead.
Many safety-critical control problems are modeled as risk-sensitive partially observable Markov decision processes, where the controller must make decisions from incomplete observations while balancing task performance against safety risk. Although belief-space planning provides a principled solution, maintaining and planning over beliefs can be computationally costly and sensitive to model specification in practical domains. We propose a lightweight risk-gated reinforcement learning approximation for risk-sensitive control under partial observability. The method constructs a compact finite-history proxy state and learns an action-conditioned predictor of near-term safety violation. This predicted candidate-action risk is used in two complementary ways: as a risk penalty during value learning, and as a decision-time gate that interpolates between optimistic and conservative ensemble value estimates. As a result, low-risk actions are evaluated closer to reward-seeking estimates, while high-risk actions are evaluated more conservatively. We evaluate the approach in two safety-critical partially observable domains: automated glucose regulation and safety-constrained navigation. Across adult and adolescent glucose-control cohorts, the method improves overall glycemic tradeoffs and substantially reduces runtime relative to a belief-space planning baseline. On Safety-Gym navigation benchmarks, it achieves a more favorable reward-cost balance than unconstrained RL and several standard safe-RL baselines. These results suggest that action-conditioned near-term risk can provide an effective local signal for approximate risk-sensitive POMDP control when full belief-space planning is impractical.
As artificial intelligence (AI) is increasingly embedded in wireless networks, models are becoming core components that influence signal processing, resource scheduling and network control. However, model anomalies, tampering and malicious functions also introduce new security risks. In this article, we focus on model forensics in AI-native wireless networks. Specifically, we first discuss key problems including model authenticity verification, malicious function identification and accountability tracing, and summarize the main categories of model forensics. We then explain the role of model forensics in AI-native wireless networks and review representative application scenarios. In the case study, we use RF fingerprinting as an example and present two concrete workflows based on watermark authentication and backdoor detection, illustrating how provenance authentication and malicious behavior identification can be implemented in practice. The results show that model forensics can provide important support for anomaly assessment, provenance tracing and trustworthy operation in AI-native wireless networks. Finally, we outline several promising directions for future research in this emerging area.
The inherent randomness of communication symbols creates a fundamental tension in Integrated Sensing and Communications (ISAC). On the one hand, they enable data transmission while allowing sensing to fully reuse communication resources. On the other hand, their randomness induces waveform-dependent fluctuations that directly affect sensing accuracy. This paper investigates a foundational question arising from this tradeoff: \textit{How does the modulation waveform affect the ranging Cramér--Rao Bound (CRB) when sensing reuses random data symbols?} We address this question by revealing a structural factorization of the Fisher information matrix (FIM) for joint delay-amplitude estimation, which separates the deterministic Jacobian of the target geometry from the random frequency-domain signal power induced by the data symbols. This structure yields a Jensen-type universal lower bound on the CRB, which is exactly attained by CP-OFDM under PSK constellations. For QAM and broader sub-Gaussian constellations, we develop an asymptotic perturbation analysis of the inverse FIM and prove that, when the number of transmitted symbols $N$ grows large, CP-OFDM achieves a lower ranging CRB than any frequency-spread orthogonal waveform over the almost-sure event where the random FIM is invertible. This superiority is further extended to amplitude estimation and full joint delay-amplitude estimation. We also characterize the local geometry of the stochastic CRB minimization problem over the unitary group. The analysis reveals that CP-OFDM is a stationary point for finite $N$, and its Riemannian Hessian is positive semidefinite for sufficiently large $N$, establishing its asymptotic local optimality. Numerical results confirm that OFDM outperforms representative waveforms including SC, OTFS, and AFDM.
Building black-box models for dynamical systems from data is a challenging problem in machine learning, especially when asymptotic stability guarantees are required. In this paper, we introduce a novel stability-ensuring and backpropagation-compatible projection scheme based on the Schur decomposition for the state matrix of linear discrete-time state-space layers, as well as an alternative pre-factorized formulation of the methodology. The proposed methods dynamically project the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer, ensuring stable dynamics with minimal overparameterization. Experiments on synthetic linear systems demonstrate that the method achieves accuracy and convergence rates comparable to those of state-of-the-art stable-system identification techniques, despite a marginal increase in computational complexity. Furthermore, the lower weight count facilitates convergence during training without sacrificing accuracy in stacked neural-network architectures with static nonlinearities targeting real-world datasets. These results suggest that the Schur-based projection provides a numerically robust framework for identifying complex dynamics on par with the State of the Art while satisfying strict asymptotic-stability requirements.
Subretinal injection is a delicate vitreoretinal procedure requiring precise needle placement within the subretinal space while avoiding perforation of the retinal pigment epithelium (RPE), a layer directly beneath the target with extremely limited regenerative capacity. To enhance depth perception during cannula advancement, intraoperative optical coherence tomography (iOCT) offers high-resolution cross-sectional visualization of needle-tissue interaction; however, interpreting these images requires sustained visual attention alongside the en face microscope view, thereby increasing cognitive load during critical phases and placing additional demands on the surgeon's proprioceptive control. In this paper, we propose a structured, real-time sonification framework designed for extensible mapping of iOCT-derived anatomical features into perceptual auditory feedback. The method employs a physics-inspired acoustic model driven by segmented retinal layers from a stream of iOCT B-scans, with needle motion and injection-induced retinal layer displacements serving as excitation inputs to the sound model, enabling perception of tool position and retinal deformation. In a controlled user study (n=34), the proposed sonification achieved high retinal layer identification accuracy and robust detection of retinal deformation-related events, significantly outperforming a state-of-the-art baseline in overall event identification (83.4% vs. 60.6%, p < 0.001), with gains driven primarily by enhanced detection of injection-induced retinal deformation. Evaluation by experts (n=4) confirmed the clinical relevance and potential intraoperative applicability of the method. These results establish structured iOCT sonification as a viable complementary modality for real-time surgical guidance in subretinal injection.
Kunchenko's method of polynomial maximization provides a semiparametric apparatus for parameter estimation under non-Gaussian errors, but its classical power basis relies on finite higher-order integer moments. This paper introduces the Parametrically Adaptive Transition Polynomial (PATP), a signed-parity fractional-power family controlled by a continuous parameter alpha in [0,1]. The quadratic exponent map p_i(alpha) connects the fractal regime p_i(0)=1/i, the degenerate linear point p_i(1/2)=1, and the signed-parity integer-power regime p_i(1)=i. For the degree-S=2 case we derive a closed-form variance-reduction coefficient g_2(alpha) in terms of signed and absolute fractional moments, identify the singular behavior at alpha=1/2, and state the moment and regularity conditions under which the formula is meaningful. The construction should be read as a Form-B PATP analogue within Kunchenko's generalized apparatus, not as an exact recovery of the canonical even-power PMM basis at alpha=1. Numerical illustrations on canonical distributions are used to examine the finite-sample behavior of the signed-parity estimator and to mark the boundary of applicability for extremely heavy-tailed cases such as Cauchy.
High-resolution seafloor mapping necessitates stable and precise positioning for underwater robots. This paper introduces a novel mathematical model for SeaVis remotely operated towed vehicles (ROTVs) and develops a gain-scheduled linear-quadratic regulator (LQR) for robust depth and attitude control. We validate the approach in a high-fidelity simulation, benchmarking the LQR against a conventional PID controller over a challenging seabed profile. The presented results demonstrate the LQR's superior performance, with significantly enhanced robustness to disturbances, greater control efficiency, and substantially reduced flap actuation. The gain scheduling also confirms the controller's effectiveness across the full operational velocity range. The complete simulation environment and controller are open-sourced.
Normally, a system that translates speech into text consists of separate modules for speech recognition and text-to-text translation. Combining those tasks into a SpeechLLM promises to exploit paralinguistic information in the speech and to reduce cascaded errors. But existing SpeechLLM systems are slow since they do not work in a real streaming fashion: they wait for a complete utterance of audio before outputting a translation, or output tokens at fixed intervals, which is not suitable for real applications. This work proposes an LLM-based architecture for real streaming speech-to-text translation. The LLM learns not just to emit output tokens, but also to decide whether it has seen enough audio to do so. The system is trained using automatic alignments of the input speech and the output text. In experiments on different language pairs, the system achieves a translation quality close to the non-streaming baseline, but with a latency of only 1-2 seconds.
Traditional methods for classifying global navigation satellite system (GNSS) jamming signals typically involve post-processing raw or spectral data streams, requiring complex and costly data transmission to cloud-based interference classification systems. In contrast, our proposed approach efficiently compresses GNSS data streams directly at the hardware receiver while simultaneously classifying jamming and spoofing attacks in real time. Given the growing prevalence of GNSS jamming, there is a critical need for real-time solutions suitable for power-constrained environments. This paper introduces a novel method for compressing and classifying GNSS jamming threats using generative artificial intelligence (GenAI), specifically variational autoencoders (VAEs), deployed on Google Edge tensor processing units (TPUs). The study evaluates various autoencoder (AE) architectures to compress and reconstruct GNSS signals, focusing on preserving interference characteristics while minimizing data size near the receiver hardware. The pipeline adapts large-scale AE models for Google Edge TPUs through 8-bit quantization to ensure energy-efficient deployment. Tests on raw in-phase and quadrature-phase (IQ) data, Fast Fourier Transform (FFT) data, and handcrafted features show the system achieves significant compression (>42x) and accurate classification of approximately 72 interference types on reconstructed signals (F2-score 0.915), closely matching the original signals (F2-score 0.923). The hardware-centric GenAI approach also substantially reduces jammer signal transmission costs, offering a practical solution for interference mitigation. Ablation studies on conditional and factorized VAEs (i.e., FactorVAE) explore latent feature disentanglement for data generation, enhancing model interpretability and fostering trust in machine learning (ML) solutions for sensitive interference applications.
Forecasting within signal processing pipelines is crucial for mitigating delays, particularly in predicting the dynamic movements of objects such as NBA players. This task poses significant challenges due to the inherently interactive and unpredictable nature of sports, where abrupt changes in velocity and direction are prevalent. Traditional approaches, including (S)ARIMA(X), Kalman filters (KF), and Particle filters (PF), often struggle to model the non-linear dynamics present in such scenarios. Machine learning (ML) methods, such as long short-term memory (LSTM) networks, graph neural networks (GNNs), and Transformers, offer greater flexibility and accuracy but frequently fail to explicitly capture the interplay between temporal dependencies and contextual interactions, which are critical in chaotic sports environments. In this paper, we evaluate these models and assess their strengths and weaknesses. Experimental results reveal key performance trade-offs across input history length, generalizability, and the ability to incorporate contextual information. ML-based methods demonstrated substantial improvements over linear models across forecast horizons of up to 2s. Among the tested architectures, our hybrid LSTM augmented with contextual information achieved the lowest final displacement error (FDE) of 1.51m, outperforming temporal convolutional neural network (TCNN), graph attention network (GAT), and Transformers, while also requiring less data and training time compared to GAT and Transformers. Our findings indicate that no single architecture excels across all metrics, emphasizing the need for task-specific considerations in trajectory prediction for fast-paced, dynamic environments such as NBA gameplay.
Semantic communication systems for goal-oriented transmission must protect task-relevant information not only through source compression but also via physical layer mapping. Existing approaches decouple constellation design and semantic encoding, exposing critical symbols to channel errors at the same rate as irrelevant ones. Contrary to this, in this paper, a joint semantic-physical layer framework is proposed, which is composed of a vector quantized-variational autoencoder that extracts discrete latent concepts, a semantic criticality indicator (SCI) that scores each concept by task relevance, and a deep reinforcement learning agent that dynamically selects the transmission subset based on instantaneous channel conditions. At the physical layer, a learned semantic-aware M -QAM constellation assigns symbol positions according to joint co-occurrence statistics and SCI scores, departing from the uniform spacing and Gray coding of standard M -QAM which minimizes average BER without regard for semantic content. We introduce a novel semantic symbol vulnerability (SSV) metric and a semantic protection probability (SPP) to quantify the exposure of task-critical symbols to decoding errors, and prove that any Gray-coded constellation is strictly suboptimal in SCI-Weighted SSV whenever the source exhibits non-uniform semantic importance and co-occurrence statistics. Simulation results demonstrate that the proposed constellation achieves near 100% SPP across modulation orders from 4-QAM to 1024-QAM versus 50% for standard constellations at high spectral efficiency, a 21:1 compression ratio with semantic quality above 0.9, generalizing across MNIST, Fashion-MNIST, and FSDD without modification.
The use of mobile robotics in radioactive source seeking has become an important part of modern radiation-safety practices, supporting timely mitigation of contamination risks and helping protect public health. However, measuring radiation is often time-consuming, rendering traditional gradient-based source-seeking methods less effective due to lower sample efficiency. This paper proposes a sample-efficient Bayesian-Optimisation source-seeking strategy that utilises a heteroscedastic Gaussian process surrogate to balance exploration and exploitation. Excessive inter-sample travel is discouraged through a movement switching cost. The strategy is shown to generate sublinear regret in the source-seeking task, while simulations demonstrate its effectiveness in localising radioactive sources.
With the growth of the construction industry and the global shortage of skilled labor, the automation of crane control has become increasingly important for safe and efficient operations. A central challenge in automatic crane control is the reduction of load oscillations during motion, which is primarily addressed through appropriate slewing trajectories. In this context, classical model-based control methods rely on accurate dynamical models and expert tuning, and often struggle to meet safety and precision requirements, while many learning-based approaches require large data sets and significant computational resources. This paper proposes a behavioral data-driven framework for generating open-loop slewing trajectories for rotary cranes that suppress load sway while reducing operation time and energy consumption. The approach builds on Willems' fundamental lemma and its generalizations, to bypass explicit system modeling and operate directly on measured input-output data. A practical workflow is presented in this paper to reduce the need for expert knowledge. Despite the underactuated nature of the crane dynamics, the method identifies a nonparametric representation of the system behavior and generates smooth, optimal trajectories using limited data and convex optimization. The proposed trajectory generation method is validated on a laboratory crane setup and compared against an established model-based approach, achieving up to 35% reduction in load sway, 43% reduction in tracking error, and 50% reduction in travel time.
Carotid atherosclerosis is a major contributor to ischemic stroke and transient ischemic attack. Conventional ultrasound assessment is commonly based on intima-media thickness, plaque appearance, stenosis degree, and peak systolic velocity, but these morphology- and velocity-based indicators may not fully capture patient-specific vascular risk. This study presents AtheroFlow-XNet, a CUBS-compatible ultrasound morphology and uncertainty-aware learning baseline for carotid intima-media segmentation and preliminary risk prediction. Using the Carotid Ultrasound Boundary Study dataset, manual lumen-intima and media-adventitia boundary annotations were converted into dense intima-media masks for supervised segmentation. Clinical variables were incorporated into an auxiliary risk-prediction branch, and Monte Carlo dropout was used for uncertainty-aware inference. The model was evaluated using a patient-level train-validation-test split with 1,522 training images, 326 validation images, and 328 testing images. The proposed model achieved a Dice coefficient of 0.7930 for LI-MA mask segmentation, a segmentation loss of 0.2359, and an area under the receiver operating characteristic curve of 0.6910 for preliminary risk prediction. Qualitative results showed that predicted masks were generally aligned with manual annotations, while uncertainty maps highlighted ambiguous wall-boundary regions. These results suggest that ultrasound-derived carotid morphology can support automated wall analysis and uncertainty-aware interpretation. Since CUBS does not provide Doppler waveforms or CFD-derived hemodynamic biomarkers, this work should be interpreted as a reproducible morphology-driven baseline. Future work will incorporate Doppler-derived flow profiles, patient-specific vascular reconstruction, and CFD-based wall shear biomarkers.
From subcellular structures to entire organisms, many natural systems generate complex organisation through self-organisation: local interactions that collectively give rise to global structure without any blueprint of the outcome. Yet a significant portion of the information driving such processes is not produced by self-organisation itself, instead, it is often offloaded to initial conditions of the system. Biological development is a prime example, where maternal pre-patterns encode positional and symmetry-breaking information that scaffolds the self-organising process. From maternal morphogen gradients in early embryogenesis to tissue-level morphogenetic pre-patterns guiding organ formation, this transfer of information to initial conditions, analogous to a memory-compute trade-off in computational systems, is a fundamental part of developmental processes. In this work, we study this offloading phenomenon by introducing a model that jointly learns both the self-organisation rules and the pre-patterns, allowing their interplay to be varied and measured under controlled conditions: a Neural Cellular Automaton (NCA) paired with a learned coordinate-based pattern generator (SIREN), both trained simultaneously to generate a set of patterns. We provide information-theoretic analyses of how information is distributed between pre-patterns and the self-organising process, and show that jointly learning both components yields improvements in robustness, encoding capacity, and symmetry breaking over purely self-organising alternatives. Our analysis further suggests that effective pre-patterns do not simply approximate their targets; rather, they bias the developmental dynamics in ways that facilitate convergence, pointing to a non-trivial relationship between the structure of initial conditions and the dynamics of self-organisation.
As audio-first agents become increasingly common in physical AI, conversational robots, and screenless wearables, audio large language models (audio-LLMs) must integrate speaker-specific understanding to support user authorization, personalization, and context-aware interaction. This requires modeling who is speaking, how the voice sounds, and how recording conditions affect speaker cues. Conventional speaker verification systems provide strong scalar scores but little linguistic evidence, while current audio-LLMs and speaker-aware language models have limited ability to organize speaker information beyond binary labels or descriptive profiles. We present SpeakerLLM, a speaker-specialized audio-LLM framework that unifies single-utterance speaker profiling, recording-condition understanding, utterance-pair speaker comparison, and evidence-organized verification reasoning within a natural-language interface. We construct verification-reasoning targets and a decision-composition policy that separate profile-level evidence from the final same-or-different decision and organize recording condition, profile evidence, and the decision into a structured trace. At its core, SpeakerLLM uses a hierarchical speaker tokenizer designed to capture multiple granularities of speaker evidence. Utterance-level speaker embeddings summarize identity and profile-level cues, whereas frame-level speaker features preserve fine-grained acoustic descriptors. Experiments show that SpeakerLLM-Base improves speaker-profile and recording-condition understanding over general audio-LLMs, while SpeakerLLM-VR preserves strong generated-verdict accuracy and produces decision traces grounded in the supervised verification reasoning schema. We will release the metadata-enriched supervision dataset and target-construction code for reproducibility.
This paper presents a prototyping framework for distributed control of multi-robot systems, aimed at bridging theory and practical testing of distributed optimization algorithms. Using the Single Program, Multiple Data (SPMD) paradigm, the framework emulates distributed control on a single computer, with each core running the same algorithm using local states and neighbour-to-neighbour communication. We demonstrate the framework on a four-quadrotor position-swapping task using a non-cooperative game-theoretic distributed algorithm. Computational time and trajectory data are compared across the supported dynamics levels: a point-mass model, a high-fidelity quadrotor model, and an experimental hardware testbed using Crazyflie quadcopters. The results show that the framework provides a low-cost and accessible approach for validating distributed algorithms.
Robust state estimation for highly dynamic motion of legged robots remains challenging, especially in dynamic, contact-rich scenarios. Traditional approaches often rely on binary contact states that fail to capture the nuances of partial contact or directional slippage. This paper presents CoCo-InEKF, a differentiable invariant extended Kalman filter that utilizes continuous contact velocity covariances instead of binary contact states. These learned covariances allow the method to dynamically modulate contact confidence, accounting for more nuanced conditions ranging from firm contact to directional slippage or no contact. To predict these covariances for a set of predefined contact candidate points, we employ a lightweight neural network trained end-to-end using a state-error loss. This approach eliminates the need for heuristic ground-truth contact labels. In addition, we propose an automated contact candidate selection procedure and demonstrate that our method is insensitive to their exact placement. Experiments on a bipedal robot demonstrate a superior accuracy-efficiency tradeoff for linear velocity estimation, as well as improved filter consistency compared to baseline methods. This enables the robust execution of challenging motions, including dancing and complex ground interactions -- both in simulation and in the real world.
The design of direct data-based controllers has become a fundamental part of control theory research in the last few years. In this paper, we consider three classes of data-based state feedback control problems for linear systems. These control problems are such that, besides stabilization, some additional performance requirements must be satisfied. First, we formulate and solve a trajectory-reference control problem, on which desired closed-loop trajectories are known and a controller that allows the system to closely follow those trajectories is computed. Then, the solution of the LQR problem for continuous-time systems is presented. Finally, we consider the case in which the precise position of the desired poles of the closed-loop system is known, and introduce a data-based variant of a robust pole-placement procedure. The applicability of the proposed methods is tested using numerical simulations.
We derive a closed-form approximation of the stationary distribution of the Age of Information (AoI) of the semi-persistent scheduling (SPS) protocol which is a core part of NR-V2X, an important standard for vehicular communications. While prior works have studied the average AoI under similar assumptions, in this work we provide a full statistical characterization of the AoI by deriving an approximation of its probability mass function. As result, besides the average AoI, we are able to evaluate the age-violation probability, which is of particular relevance for safety-critical applications in vehicular domains, where the priority is to ensure that the AoI does not exceed a predefined threshold during system operation. The study reveals complementary behavior of the age-violation probability compared to the average AoI and highlights the role of the duration of the reservation as a key parameter in the SPS protocol. We use this to demonstrate how this crucial parameter should be tuned according to the performance requirements of the application.
Rheumatoid arthritis (RA) is a common autoimmune disease that has been the focus of research in computer-aided diagnosis (CAD) and disease monitoring. In clinical settings, conventional radiography (CR) is widely used for the screening and evaluation of RA due to its low cost and accessibility. The wrist is a critical region for the diagnosis of RA. However, CAD research in this area remains limited, primarily due to the challenges in acquiring high-quality instance-level annotations. (i) The wrist comprises numerous small bones with narrow joint spaces, complex structures, and frequent overlaps, requiring detailed anatomical knowledge for accurate annotation. (ii) Disease progression in RA often leads to osteophyte, bone erosion (BE), and even bony ankylosis, which alter bone morphology and increase annotation difficulty, necessitating expertise in rheumatology. This work presents a multi-task dataset for wrist bone in CR, including two tasks: (i) wrist bone instance segmentation and (ii) Sharp/van der Heijde (SvdH) BE scoring, which is the first public resource for wrist bone instance segmentation. This dataset comprises 1048 wrist conventional radiographs of 388 patients from six medical centers, with pixel-level instance segmentation annotations for 618 images and SvdH BE scores for 800 images. This dataset can potentially support a wide range of research tasks related to RA, including joint space narrowing (JSN) progression quantification, BE detection, bone deformity evaluation, and osteophyte detection. It may also be applied to other wrist-related tasks, such as carpal bone fracture localization. We hope this dataset will significantly lower the barrier to research on wrist RA and accelerate progress in CAD research within the RA-related domain.
This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.
The global capacity for mineral processing must expand rapidly to meet the demand for critical minerals, which are essential for building the clean energy technologies necessary to mitigate climate change. However, the efficiency of mineral processing is severely limited by uncertainty, which arises from both the variability of feedstock and the complexity of process dynamics. To optimize mineral processing circuits under uncertainty, we introduce an AI-driven approach that formulates mineral processing as a Partially Observable Markov Decision Process (POMDP). We demonstrate the capabilities of this approach in handling both feedstock uncertainty and process model uncertainty to optimize the operation of a simulated, simplified flotation cell as an example. We show that by integrating the process of information gathering (i.e., uncertainty reduction) and process optimization, this approach has the potential to consistently perform better than traditional approaches at maximizing an overall objective, such as net present value (NPV). Our methodological demonstration of this optimization-under-uncertainty approach for a synthetic case provides a mathematical and computational framework for later real-world application, with the potential to improve both the laboratory-scale design of experiments and industrial-scale operation of mineral processing circuits without any additional hardware.
This paper presents the modeling of autonomous vehicles with high maneuverability used in an experimental framework for educational purposes. Since standard bicycle models typically neglect wide steering angles, we develop modified planar bicycle models and combine them with both parametric and non-parametric identification techniques that progressively incorporate physical knowledge. The resulting models are systematically compared to evaluate the tradeoff between model accuracy and computational requirements, showing that physics-informed neural network models surpass the purely physical baseline in accuracy at lower computational cost.
This paper considers data-based solutions of linear-quadratic nonzero-sum differential games. Two cases are considered. First, the deterministic game is solved and Nash equilibrium strategies are obtained by using persistently excited data from the multiagent system. Then, a stochastic formulation of the game is considered, where each agent measures a different noisy output signal and state observers must be designed for each player. It is shown that the proposed data-based solutions of these games are equivalent to known model-based procedures. The resulting data-based solutions are validated in a numerical experiment.
Large Language Models (LLMs) have demonstrated strong semantic reasoning across multimodal domains. However, their integration with graph-based models of brain connectivity remains limited. In addition, most existing fMRI analysis methods rely on static Functional Connectivity (FC) representations, which obscure transient neural dynamics critical for neurodevelopmental disorders such as autism. Recent state-space approaches, including Mamba, model temporal structure efficiently, but are typically used as standalone feature extractors without explicit high-level reasoning. We propose NeuroMambaLLM, an end-to-end framework that integrates dynamic latent graph learning and selective state-space temporal modelling with LLMs. The proposed method learns the functional connectivity dynamically from raw Blood-Oxygen-Level-Dependent (BOLD) time series, replacing fixed correlation graphs with adaptive latent connectivity while suppressing motion-related artifacts and capturing long-range temporal dependencies. The resulting dynamic brain representations are projected into the embedding space of an LLM model, where the base language model remains frozen and lightweight low-rank adaptation (LoRA) modules are trained for parameter-efficient alignment. This design enables the LLM to perform both diagnostic classification and language-based reasoning, allowing it to analyze dynamic fMRI patterns and generate clinically meaningful textual reports.
Phased-array Bluetooth systems have emerged as a low-cost alternative for performing aided inertial navigation in GNSS-denied use cases such as warehouse logistics, drone landings, and autonomous docking. Basing a navigation system off of commercial-off-the-shelf components may reduce the barrier of entry for phased-array radio navigation systems, albeit at the cost of significantly noisier measurements and relatively short feasible range. In this paper, we compare robust estimation strategies for a factor graph optimisation-based estimator using experimental data collected from multirotor drone flight. We evaluate performance in loss-of-GNSS scenarios when aided by Bluetooth angular measurements, as well as range or barometric pressure.
Speech separation in realistic acoustic environments remains challenging because overlapping speakers, background noise, and reverberation must be resolved simultaneously. Although recent time-frequency (TF) domain models have shown strong performance, most still rely on late-split architectures, where speaker disentanglement is deferred to the final stage, creating an information bottleneck and weakening discriminability under adverse conditions. To address this issue, we propose SR-CorrNet, an asymmetric encoder-decoder framework that introduces the separation-reconstruction (SepRe) strategy into a TF dual-path backbone. The encoder performs coarse separation from mixture observations, while the weight-shared decoder progressively reconstructs speaker-discriminative features with cross-speaker interaction, enabling stage-wise refinement. To complement this architecture, we formulate speech separation as a structured correlation-to-filter problem: spatio-spectro-temporal correlations computed from the observations are used as input features, and the corresponding deep filters are estimated to recover target signals. We further incorporate an attractor-based dynamic split module to adapt the number of output streams to the actual speaker configuration. Experimental results on WSJ0-{2,3,4,5}Mix, WHAMR!, and LibriCSS demonstrate consistent improvements across anechoic, noisy-reverberant, and real-recorded conditions in both single- and multi-channel settings, highlighting the effectiveness of TF-domain SepRe with correlation-based filter estimation for speech separation.
This study presents two analytical closed-form PI controller tuning solutions for second-order plants with real poles, each achieving monotonic step response and minimum settling time. The first solution employs pole-zero cancellation, placing the controller zero at the slower plant pole and reducing the closed-loop dynamics to a critically damped second-order system. The second solution, applicable when the plant pole ratio is less than two, places all three closed-loop poles at a common location without cancelling any plant pole, yielding a closed-loop transfer function with a triple real pole and a zero. Despite retaining a closed-loop zero, this solution achieves strictly faster settling time than the pole-zero cancellation method in its region of applicability. The two solutions coincide at the boundary pole ratio of two and together form a continuous piecewise-analytical tuning covering the full range of plant pole ratios. This study further establishes that closed-loop transfer functions of the form a^n/(s + a)^n possess a maximum sensitivity Ms that is independent of the pole location a and depends solely on the order n, yielding universal robustness constants for each n. Numerical verification confirms the analytical results across multiple plant configurations.
Computational constraints permeate the controller design process, and yet are rarely treated as explicit design constraints. Towards addressing this gap, we propose a quantitative framework that captures the effects of common design approximations, such as model order reduction, temporal discretization, horizon truncation, and solver accuracy, on both controller performance and computational requirements. Our framework highlights that these approximations are tunable parameters within an overall controller design process. By leveraging incremental input-to-state stability, we show that bounding the aggregate effects of these approximations reduces to verifying a design-dependent sector bound on the difference between the deployed policy and an idealized baseline, with stability enforced via a small-gain condition. We operationalize these insights via a Design Meta-Problem in which the performance gap is minimized subject to stability, real-time compute, and timing constraints. Finally, we instantiate the framework on a receding horizon LQR case study, and demonstrate a principled near-optimal navigation of tradeoffs among sampling rate, model order, horizon length, and solver iterations.
We propose a novel extremum seeking control (ESC) method that operates in a lifted Koopman state space to minimize the filtered RMS energy in the dominant subspace. The lifted representation provides linear embeddings of nonlinear dynamics, enabling more accurate gradient estimation and dampening of state interference for more consistent ESC performance. Applied to a parameterized, forced, and time-varying Van der Pol oscillator, we show that the approach yields faster and more robust performance than operating ESC on the measured states. These advantages position the method for a diverse range of applications including vibration suppression, motion control, and subsynchronous oscillation mitigation in inverter-dominated power systems.
While U-Net architectures remain the gold standard for medical image segmentation, their deployment in resource-constrained environments demands aggressive model compression. However, finding an optimally efficient configuration is computationally prohibitive, typically requiring exhaustive train-and-evaluate cycles to find the smallest model that maintains peak performance. In this paper, we introduce a training-free selection framework to automatically identify ultralightweight, dataset-specific U-Net configurations directly at initialization. We observe that systematically scaling down U-Net channel width induces a sharp transition from a stable performance plateau to representational capacity collapse. To pinpoint this boundary without training, we propose a Jacobian-based sensitivity metric that scores discrete, width-capped U-Net variants using a small set of unlabeled images. By analyzing the total variation of this sensitivity curve, we isolate the smallest stable configuration, which we denote as XTinyU-Net. Evaluated across six diverse medical datasets within the nnU-Net framework, XTinyU-Net achieves segmentation accuracy comparable to the heavy nnU-Net baseline with 400x-1600x fewer parameters, and outperforms contemporary lightweight architectures while utilizing 5x-72x fewer parameters. Code is publicly accessible on this https URL.
This paper treats the inverse problem of retrieving the electrical conductivity of a material starting from boundary measurements in the framework of Electrical Resistance Tomography (ERT). In particular, the focus is on non-iterative reconstruction methods suitable for real-time applications. In this work, the Kernel Method, a new non-iterative reconstruction method for Electrical Resistance Tomography, is presented. The imaging algorithm addresses the problem of retrieving one or more anomalies of arbitrary shape, topology, and size embedded in a known background (the inverse obstacle problem). The foundation of the Kernel Method is based on the idea that if a proper current density applied at the boundary (Neumann data) of the domain exists such that it is able to produce the same measurements with and without the anomaly, then this boundary source produces a power density that vanishes in the region occupied by the anomaly, when applied to the problem involving the background material only. This new tomographic method has a simple numerical implementation that requires a very low computational cost. In this paper, the theoretical foundation of the Kernel Method is provided, and an extensive numerical campaign proves the effectiveness of this new imaging method.
We study the remote estimation of a linear Gaussian system over a channel that wears out over time and with every use. The sensor can either transmit a fresh measurement in the current time slot, restore the channel quality at the cost of downtime, or remain silent. Frequent transmissions yield accurate estimates but incur significant wear on the channel. Renewing the channel too often improves channel conditions but results in poor estimation quality. What is the optimal timing to transmit measurements and restore the channel? This problem is formulated as a semi-Markov decision process (SMDP). We establish monotonicity properties of the optimal policy and propose structure-aware solution methods.
Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time-frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: Accuracy: Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to $11\%$ improvement in Cohen's Kappa over strong baselines. Generalization: Moreover, as a plug-and-play component, it consistently boosts the performance of diverse foundation models, including BIOT and LaBraM. Scalability: By operating at the single-channel level rather than relying on the strict 10-20 EEG system, our method has the potential to be device-agnostic. Experiments on ear-EEG sleep staging, which differs from the pretraining data in signal format, channel configuration, recording device, and task, show that our tokenizer outperforms baselines by $14\%$. A comprehensive token analysis reveals strong class-discriminative, frequency-aware, and consistent structure, enabling improved representation quality and interpretability. Code is available at this https URL.
Avoiding the risk of undefined categorical labels using nearest neighbor interpolation overlooks the risk of exacerbating pixel level annotation errors in augmented training data. Additionally, the inherent low pass filtering effects of interpolation algorithms exacerbate the risk of degrading high frequency structural details within annotated regions of interest. To avoid these risks, the author modified convolutional neural networks data transformation functions by incorporating a modified geometric transformation function, removing reliance on nearest neighbor interpolation, and integrating a mean-based class filtering mechanism to handle undefined categorical labels with alternative interpolation algorithms. The author also implemented an offline data augmentation pipeline to generate interpolation specific augmented training data, enabling quantitative assessment of interpolation specific low pass filtering effects on augmented training data. Experimental evaluation on three medical image segmentation datasets and the XBAT+ datasets demonstrated performance gains across multiple quantitative metrics.
Ill-posed inverse problems are fundamental in many domains, ranging from astrophysics to medical imaging. Emerging diffusion models provide a powerful prior for solving these problems. Existing maximum-a-posteriori (MAP) or posterior sampling approaches, however, rely on different computational approximations, leading to inaccurate or suboptimal samples. To address this issue, we introduce a new approach to solving MAP problems with diffusion model priors using a dual ascent optimization framework. Our framework achieves better image quality as measured by various metrics for image restoration problems, it is more robust to high levels of measurement noise, it is faster, and it estimates solutions that represent the observations more faithfully than the state of the art.
High-resolution sinogram completion is critical for computed tomography reconstruction, as missing projections can introduce severe artifacts. While diffusion models provide strong generative priors for this task, their inference cost grows prohibitively with resolution. We propose HRSino, a training-free and efficient diffusion inference approach for high-resolution sinogram completion. By explicitly accounting for spatial heterogeneity in signal characteristics, such as spectral sparsity and local complexity, HRSino allocates inference effort adaptively across spatial regions and resolutions, rather than applying uniform high-resolution diffusion steps. This enables global consistency to be captured at coarse scales while refining local details only where necessary. Experimental results show that HRSino reduces peak memory usage by up to 30.81% and inference time by up to 17.58% compared to the state-of-the-art framework, and maintains completion accuracy across datasets and resolutions.
Recent publications have suggested using the Shapley value for anomaly localization for sensor data systems. Using a reasonable mathematical anomaly model for full control, experiments indicate that using a single fixed term in the Shapley value calculation achieves a lower complexity anomaly localization test, with the same probability of error, as a test using the Shapley value for all cases tested. A proof demonstrates these conclusions must be true for all independent observation cases. For dependent observation cases, no proof is available.
We propose a control barrier function (CBF) formulation for enforcing equality and inequality constraints in variational inference. The key idea is to define a barrier functional on the space of probability density functions that encode the desired constraints imposed on the variational density. By leveraging the Liouville equation, we establish a connection between the time derivative of the variational density and the particle drift, which enables the systematic construction of corresponding CBFs associated to the particle drift. Enforcing these CBFs gives rise to the safe particle flow and ensures that the variational density satisfies the original constraints imposed by the barrier functional. This formulation provides a principled and computationally tractable solution to constrained variational inference, with theoretical guarantees of constraint satisfaction. The effectiveness of the method is demonstrated through numerical simulations.
This paper considers the beamforming and power optimization problem for a class of integrated sensing and communications (ISAC) problems that utilize the communication signals simultaneously for sensing. We formulate the problem of minimizing the Bayesian Cramér-Rao bound (BCRB) on the mean-squared error of estimating a vector of parameters, while satisfying downlink signal-to-interference-and-noise-ratio constraints for a set of communication users at the same time. The proposed optimization framework comprises two key new ingredients. First, we show that the BCRB minimization problem corresponds to maximizing beamforming power along certain sensing directions of interest. Second, the classical uplink-downlink duality for multiple-input multiple-output communications can be extended to the ISAC setting, but unlike the classical communication problem, the dual uplink problem for ISAC may entail negative noise power and needs to include an extra condition on the uplink beamformers. This new duality theory opens doors for efficient iterative algorithm for optimizing power and beamformers for ISAC.
Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We present CELM, the first clinical EEG-to-Language foundation model capable of summarizing long-duration, variable-length EEG recordings and performing end-to-end clinical report generation at multiple scales. CELM integrates pretrained EEG foundation models with language models to enable scalable multimodal learning. We curate a large-scale clinical EEG dataset containing 9,922 reports paired with approximately 11,000 hours of EEG recordings from 9,048 patients to train CELM, and release the benchmark with an automated report-structuring pipeline to facilitate future research. Experimental results show that CELM consistently outperforms existing methods across all evaluation settings. Importantly, we further conduct human evaluation with clinical experts, demonstrating that CELM generates reports that are more clinically coherent, diagnostically reliable, and better aligned with expert interpretation. We release our model and benchmark construction pipeline at this https URL.
We analyze Stackelberg Gaussian signaling games where the encoder and decoder have a linear sensitivity mismatch. Unlike the standard additive-bias model, a sensitivity mismatch means the encoder prefers the decoder to track a linear transformation of the state rather than a shifted one. We derive the equilibrium structure for both noiseless (cheap-talk) and noisy signaling channels. In the noiseless case, the equilibrium admits a spectral characterization: the encoder transmits information only along eigenspaces associated with the negative eigenvalues of a mismatch matrix. In the noisy regime, we derive analytical thresholds for informative signaling, showing that communication collapses if the sensitivity mismatch or transmission cost exceeds a channel-dependent threshold.
Real-world tasks involve nuanced combinations of goal and safety specifications. In high dimensions, the challenge is exacerbated: formal automata become cumbersome, and the combination of sparse rewards tends to require laborious tuning. In this work, we consider the innate structure of the Bellman Value as a means to naturally organize the problem for improved automatic performance. Namely, we prove the Bellman Value for a complex task defined in temporal logic can be decomposed into a graph of Bellman Values, connected by a set of well-known Bellman equations (BEs): the Reach-Avoid BE, the Avoid BE, and a novel type, the Reach-Avoid-Loop BE. To solve the Value and optimal policy, we propose VDPPO, which embeds the decomposed Value graph into a two-layer neural net, bootstrapping the implicit dependencies. We conduct a variety of simulated and hardware experiments to test our method on complex, high-dimensional tasks involving heterogeneous teams and nonlinear dynamics. Ultimately, we find this approach greatly improves performance over existing baselines, balancing safety and liveness automatically.
As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative components, introducing structure that constrains agent behavior and exposes useful semantic predictability. Unlike traditional LLM serving, which operates under highly dynamic and uncertain conditions, this structured topology enables opportunities to reduce runtime uncertainty$\unicode{x2015}$yet existing systems fail to exploit it, treating agentic workloads as generic traffic and incurring significant inefficiencies. Our analysis of production traces from an agent-serving platform and an internal coding assistant reveals key bottlenecks, including low prefix cache hit rates, severe resource contention from long-context requests, and substantial queuing delays due to suboptimal scaling. To address these challenges, we propose Pythia, a multi-agent serving system that captures workflow semantics through a simple interface at the serving layer, unlocking new optimization opportunities and substantially improving throughput and job completion time over state-of-the-art baselines.