Global Navigation Satellite Systems (GNSS) underpin positioning, navigation, and timing (PNT), yet their low-power signals are easily blocked or disrupted, leaving gaps in PNT availability in contested environments (e.g. maritime settings) where interference, spoofing, or denial can occur. A key practical need is an independent, ubiquitous aiding signal that can be tracked passively and fused with inertial sensing to sustain full navigation-state estimation without dedicated or cooperative infrastructure. This paper presents an end-to-end LEO-aided hybrid framework that fuses GPS, Starlink downlink beacons, and an inertial measurement unit (IMU) in a 9D (3D position, 3D velocity, and 3D attitude) PNT system using an extended Kalman filter (EKF). We (i) extract Doppler-rate from Starlink downlink beacon tones by associating measurements with satellite IDs, (ii) benchmark beacon Doppler-rate against OFDM-derived range observables under a common processing/estimation pipeline, and (iii) integrate the resulting observable into inertial navigation. We evaluate GPS/IMU, Starlink/IMU, and GPS-Starlink-IMU using Fisher-information predictions, Monte Carlo simulations, and hardware measurements. Results show that Starlink Doppler-rate provides meaningful complementary PNT information, and can aid 9D estimation when GNSS is degraded or intermittently unavailable.
In sound field control applications, it is commonly assumed that one has access to an accurate representation of the sound field in the region of interest. This is a problematic assumption since the reconstruction of a sound field from available microphone measurements is especially challenging in real-time applications where only causal measurements are available. Notably, causal time-windowed observations introduce correlation between frequency components, making sound field reconstruction methods that process each frequency band independently sub-optimal. In this work, we formulate a causal finite-window spatio-temporal linear minimum mean-square error estimator for sound field reconstruction. The sound field is modeled as the solution to the wave equation driven by a stationary stochastic spatio-temporal source distribution, which induces a physically interpretable covariance function. It is shown that this covariance function is closely related to the classical diffuse-field coherence model. Since the computational complexity grows rapidly with the number of spatio-temporal observations, we formulate a budget-constrained spatio-temporal sample selection approach to minimize the posterior reconstruction variance. The proposed estimator and sampling strategy are evaluated using both simulated and measured sound fields, demonstrating improved short-window reconstruction compared to frequency domain finite-window baselines.
Class imbalance is a fundamental challenge in medical image segmentation, where frequent classes typically dominate training at the expense of rare classes. Loss-based approaches mitigate imbalance by reweighting the per-pixel loss within the batch, while sampling strategies control which images enter the batch. Yet neither explicitly controls which classes appear within the batch, leaving rare-class exposure only partially rebalanced. In this work, we adopt episodic sampling from few-shot learning to promote class-balanced batch construction in a fully supervised setting. We decouple episodic sampling from its conventional metric-learning context and evaluate it in body composition segmentation in CT. We compare episodic sampling against random and weighted sampling on nine muscle and adipose tissues, derived from 210 scans of the public SAROS dataset. Training is performed under full- and low-data regimes, with additional comparisons under matched training iteration budgets. Under full-data training, all three strategies performed comparably (mean Dice 0.882 for episodic, 0.878 for random and weighted). Under low-data training, episodic sampling outperformed random and weighted (0.787 vs. 0.758 and 0.762), driven by a 12-fold difference in training iterations. Under matched training budgets, random and weighted overfit earlier, while episodic improved for approximately three times more iterations before plateauing. Our findings identify the training iteration budget as under-recognized confound in sampling strategies, motivating iteration-aware evaluation protocols for small datasets. Furthermore, the residual advantage of episodic sampling is consistent with an implicit regularization effect of class-balanced batches, offering a low-cost, model-agnostic strategy for class-imbalanced medical image segmentation. Code is available at this https URL.
Stochastic hybrid systems combine continuous-time stochastic dynamics with discrete reset events, producing intrinsically non-Gaussian and often multimodal uncertainty. A consistent propagation law must also account for boundary-induced probability flux across guard sets, making direct density propagation through hybrid Fokker-Planck equations expensive. We develop a hybrid extension of the Max-Entropy Moment Kalman Filter (MEM-KF) that performs filtering from partial statistical information by propagating a finite collection of moments through stochastic hybrid dynamics and reconstructing beliefs using moment-constrained maximum-entropy distributions. The key step is a moment propagation rule derived from Dynkin's formula with a jump-sum, in which reset effects appear as a boundary-flux correction over the guard set. This yields tractable moment dynamics without solving the underlying hybrid PDE. In a stochastic bouncing-ball example, the proposed method captures reset-induced non-Gaussianity through corrected moment equations while retaining the MEM-KF's optimization-based maximum-entropy representation.
Long-form audio understanding poses significant challenges for large audio language models (LALMs) due to the extreme length of audio sequences and the need to reason over heterogeneous acoustic cues distributed over time, such as speech content, speaker identity, emotion, and sound events. To address these challenges, we propose \textbf{PlanRAG-Audio}, a planning-based retrieval-augmented generation framework for scalable long-form audio understanding. Rather than having audio LALMs process entire recordings directly, PlanRAG-Audio explicitly plans which modalities and temporal spans are required for a given query, and retrieves only query-relevant information from a structured text and audio database. This retrieval planning enables effective reasoning over complex, cross-domain audio queries while substantially reducing the input length passed to the large language models. Experiments across a wide range of speech/audio retrieval demonstrate that PlanRAG-Audio improves reasoning accuracy and stabilizes performance as audio duration increases by decoupling inference cost from raw audio length.
The paper describes a new method for estimating the poles of an ARMA model using higher-order crossings. The method involves transforming counts of crossing events into estimates of ARMA poles via the autocorrelation domain. An important advantage of the method is that the crossing counts are the only features that need to be stored from the original data. The poles of an ARMA model of a control loop correspond to the roots of the characteristic equation and are thus useful for evaluating control performance.
Advanced Air Mobility (AAM) operations are expected to significantly increase aerial traffic in urban airspace, requiring autonomous traffic management systems to ensure collision-free operations in highly congested environments. In this paper, we propose a multi-agent coordination framework that uses minimum time-to-reach (TTR) as a unifying metric for priority assignment, temporal separation, and safety filtering. We focus on the problem of coordinating multiple aerial vehicles merging into an air corridor while maintaining safe separation between vehicles. Vehicles are assigned arrival-consistent priority based on TTR, and target TTR values are used to enforce temporal spacing that induces spatial separation. A priority-consistent safety filtering layer based on Hamilton-Jacobi reachability value functions ensures collision avoidance while minimally modifying the reference guidance. Simulation results in a highly congested corridor merging scenario show that the proposed method improves safety, fairness, and efficiency compared to time-optimal guidance and priority-agnostic safety filtering.
Wi-Fi-based human activity recognition (HAR) has emerged as a promising approach for contactless sensing, leveraging channel state information (CSI) collected from wireless transceivers. While existing studies have primarily concentrated on single-user scenarios, real-world deployments often involve multi-user settings where concurrent users' movements induce overlapping CSI patterns that challenge conventional classification methods. To address this limitation, this paper introduces an attention-based multi-user activity recognition (AMAR) framework that formulates HAR as a set prediction problem. The transformer-based architecture in AMAR leverages learnable query embeddings acting as specialized activity detectors, enabling the simultaneous identification of multiple activities from composite CSI representations. Moreover, to address deployment constraints, AMAR is designed in an edge-cloud split architecture form where lightweight convolutional networks on edge devices perform initial feature extraction, followed by residual vector quantization that achieves substantial bandwidth reduction while preserving activity-discriminative information. The cloud component performs final activity prediction through attention-based set matching, enabling the system to handle varying occupancy levels. Across classroom, meeting-room, and empty-room environments, on average AMAR nearly doubles the rate of perfectly predicting all concurrent activities compared to the best baseline. Moreover, it achieves an $F_1$-score of 53.4% compared to 45.6% for the best benchmark, and reduces occupancy estimation error by 74%, while minimizing bandwidth substantially.
Thermal management is a major challenge in next-generation high-performance computing systems, particularly for heterogeneous multi-chip packages such as the NVIDIA GB200 Grace Blackwell Superchip. In this work, a physics-based computational framework is developed to optimize embedded cooling channel layouts for high-power multi-chip modules. The model couples steady-state heat conduction with a porous media-based representation of coolant transport, coupled with a row-wise coolant energy balance, to estimate chip temperature fields within microchannel networks. Unlike conventional designs, an interdigitated cooling architecture is parameterized using geometric variables, including channel count, width, and expansion over chip regions, enabling systematic design exploration. To enable efficient optimization, a surrogate-based approach is employed to approximate the relationship between geometric parameters and temperature metrics. The resulting model is optimized using a mixed-integer quadratic programming algorithm to minimize a weighted objective based on peak and average chip temperatures. To improve physical relevance, channel placement is further constrained to increase cooling coverage near GPU regions, where thermal loads are highest. The framework is applied to a representative multi-chip configuration based on NVIDIA GB200 architecture, consisting of two graphics processing units and one central processing unit. The results demonstrate that the optimal design reduces the peak chip temperature by 140.45°C and the average chip temperature by 35.87°C compared to the baseline configuration.
This paper introduces Locally Adaptive Neural Context Estimation (LANCE), a novel extension for overfitted image compression (OIC) frameworks like Cool-Chic. While traditional OIC methods rely on lightweight autoregressive networks with globally signaled parameters, they struggle with non-stationary image statistics. LANCE addresses this by incorporating a forward-signaled spatial hyperprior that enables regional adaptation of the entropy model. To minimize overhead, we employ a predictive coding scheme that combines a static Median Edge Detector (MED) with a lightweight learned context model. Experiments demonstrate that LANCE achieves BD-rate reductions of 1.40% on the Kodak dataset and 1.97% on CLIC 2020 over Cool-Chic 4.0 at the high end of our decoder complexity range of 606-1481 MAC/pixel. At the low end of the complexity range, we outperform Cool-Chic 4.0 by 2.41% and 2.99% on Kodak and CLIC, respectively. Qualitative analysis reveals that the learned spatial hyperprior effectively segments image regions into areas of similar image statistics, providing an automated, content-aware adaptation layer.
Conventional cardiac cine MRI relies on breath-hold Cartesian acquisitions, which are vulnerable to motion artifacts and can be uncomfortable or infeasible, particularly for pediatric and other noncompliant patients who cannot reliably hold their breath. Free-breathing radial acquisitions can alleviate these limitations, but robust reconstruction at high acceleration remains challenging due to prominent streak artifacts. To address these limitations, we propose Cine-DL, a clinically oriented framework that couples targeted k-space preprocessing with fast, model-based deep reconstruction. In this pipeline, raw free-breathing radial data undergo retrospective cardiac binning and respiratory gating to resolve cardiac phases and discard motion-corrupted spokes. We then introduce Streak Optimized Coil Compression (SOC), which explicitly preserves cardiac signals while suppressing peripheral interference that typically drives the streak artifacts. The resulting 2D+t cine series is reconstructed with an unrolled network that alternates a ResNet proximal operator with physics-based data consistency updates solved via conjugate gradient. We further employ a memory-efficient training strategy that reduces peak memory usage. We evaluate Cine-DL on free-breathing volunteer data against established baselines (k-t SENSE and iGRASP) and demonstrate clinical translation via hospital deployment on newly acquired patient data. Our experiments show that Cine-DL consistently improves quantitative metrics and visual fidelity, supporting a practical route toward routine, time-sensitive clinical adoption of free-breathing cine MRI.
Recent advances in spoken dialogue language models have shifted from turn-based to full-duplex designs, where the model continuously listens to the user while generating responses. However, existing duplex backbones still lack a native channel for in-conversation planning and tool calling, leaving real-time agentic behaviour either tied to turn boundaries or relegated to an external cascade. We propose DuplexSLA, a native full-duplex Speech-Language-Action foundation model that decodes assistant audio together with a structured action stream on a shared 160 ms chunk timeline. DuplexSLA is built on a dual-stream three-channel formulation: a continuous user audio channel, a discrete assistant audio channel, and a rate-limited textual action channel, all decoded jointly by a single backbone, so that listening, speaking, planning, and tool calling unfold on one shared clock. Two capabilities define the model: (1) semantic-driven turn-taking control, where interruption, pause, and backchannel are handled inside the same backbone instead of by an external semantic VAD; and (2) in-conversation planning and tool calling, where planning text and structured tool calls are emitted on the action channel without halting assistant audio, so that multi-action and backchannel-triggered tool use are interleaved with ongoing speech. To evaluate these capabilities together, we further construct DuplexSLA-Bench, a duplex benchmark covering pause, interrupt, and backchannel turn-taking together with three styles of in-conversation tool calling. Our project page, interactive demos, and the DuplexSLA-Bench evaluation suite are publicly available at this https URL.
Recent advances in text-to-speech (TTS) models show impressive speech naturalness and quality, yet the role of large-scale open data in driving this progress remains underexplored. In this work, we introduce Raon-OpenTTS, an open TTS model that performs competitively with state-of-the-art closed-data TTS models, and Raon-OpenTTS-Pool, a large-scale open dataset for reproducible TTS training. Raon-OpenTTS-Pool consists of 615K hours of 240M speech segments aggregated from publicly available English speech corpora and web-sourced recordings. With a model-based filtering pipeline applied to Raon-OpenTTS-Pool, we derive Raon-OpenTTS-Core, a curated, high-quality subset of 510K hours and 194M speech segments. Using Raon-OpenTTS-Core, we train Raon-OpenTTS, a series of diffusion transformer (DiT)-based TTS models from 0.3B to 1B parameters. On multiple benchmarks, Raon-OpenTTS-1B shows comparable performance to state-of-the-art models such as Qwen3-TTS and CosyVoice 3, which are trained on several million hours of proprietary speech data. Notably, on Seed-TTS-Eval, Raon-OpenTTS-1B achieves a word error rate (WER) of 1.78% and a speaker similarity (SIM) of 0.749, ranking second on WER and first on SIM among recent open-weight TTS baselines. On CV3-Hard-EN, Raon-OpenTTS-1B achieves a WER of 6.15% and a SIM of 0.775, ranking first on both metrics. Furthermore, to support robust evaluation, we introduce Raon-OpenTTS-Eval, a structured benchmark for assessing TTS robustness across diverse acoustic conditions including clean, noisy, in-the-wild, and expressive speech. On Raon-OpenTTS-Eval, Raon-OpenTTS-1B achieves the best average WER and SIM among all evaluated models, and the second-best human preference, as measured by comparative mean opinion score (CMOS). Our data pool, filtering pipeline, training code, and checkpoints are publicly available at this https URL.
Predicting Room Impulse Responses (RIRs) remains a challenge due to the high dimensionality of audio signals and the need for perceptual accuracy. This paper introduces a neural network framework that predicts multi-band Energy Decay Curves (EDCs) directly from room geometry and material properties. Unlike standard models, our framework employs a custom composite loss function that optimizes for both energy levels and decay slopes in the log-domain. This ensures the predicted curves adhere to physical decay principles while maintaining high sensitivity to reverberation time and early reflections. Results demonstrate that the model successfully approximates ground-truth acoustics with minimal error in T30 and clarity indices. The approach offers a computationally efficient alternative to traditional simulations, facilitating realistic audio rendering for interactive virtual environments.
Most neural video codecs rely on temporal conditioning, which makes them susceptible to error propagation over long sequences. While Transformer-based architectures like the VCT offer a drift-free alternative, they suffer from high computational complexity and inferior RD performance. The recent SWA addresses these shortcomings by reducing complexity and enhancing RD performance, yet it restricts decoding to a strictly sequential raster-scan order, creating a critical bottleneck in decoding latency. To resolve this, we propose P-SWA, utilizing diagonal wavefronts to enable parallel decoding. By embedding a hyperprior and introducing an accumulator to fuse side information and local spatial context, our method increases decoding speed by 36% over the parallel VCT. Simultaneously, it achieves Bjøntegaard Delta-rate savings of up to 10.0% for I-frames and 7.1% for P-frames over the SWA baseline.
Reasoning has become a defining capability of modern foundation models, yet its development in the audio modality remains limited. Audio poses challenges that are distinct from those of text and vision. It is continuous, temporally dense, and contains linguistic, paralinguistic, and environmental information at multiple time scales. As a result, audio reasoning models must align acoustic signals with the discrete semantic space of large language models, while still preserving fine-grained information needed for reliable inference. Progress is also limited by three major obstacles: the scarcity of genuinely audio-grounded reasoning data, shortcut learning and modality hallucination, and the tension between reasoning depth and real-time latency in spoken interaction. In this paper, we present the first dedicated survey of audio reasoning. We provide a unified formulation that distinguishes direct predictive modeling from reasoning-augmented generation, review the architectural and training foundations of audio reasoning models, and systematically organize recent advances in Audio-to-Text, Audio-to-Speech, Audio-Visual Reasoning and Agentic Audio Reasoning. We further examine emerging paradigms such as Chain-of-Thought prompting, supervised fine-tuning, reinforcement learning, and latency-aware spoken interaction, and discuss evaluation practices, open challenges, and future directions. Our goal is to offer a coherent roadmap for developing robust, efficient, and natively grounded audio reasoning systems.
Multiple-input multiple-output (MIMO) radar has waveform diversity and large spatial degrees of freedom (DoFs), making it attractive for high-resolution sensing. Scaling MIMO radar to massive arrays can further improve sensing performance, but it also increases hardware cost, power consumption, and digital processing complexity. The microwave linear analog computer (MiLAC) can tackle these challenges by moving linear operations from the digital domain to the analog domain. MiLAC has shown promising benefits for communications in recent studies and this paper identifies its potential for radar sensing. Specifically, we consider both MiLAC-aided transmit beamforming and receiver-side two-dimensional discrete Fourier transform (2D-DFT)-based direction-of-arrival (DoA) estimation. For transmit beamforming, we formulate a weighted Cramer Rao bound (CRB) minimization problem under lossless and reciprocal MiLAC constraints and propose a penalty dual decomposition (PDD)-based iterative algorithm to address the non-convex problem. We further prove that MiLAC-aided and fully-digital beamforming achieve the same CRB. For receiver processing, we show that the 2D DFT can be implemented by a lossless reciprocal MiLAC, which enables analog-domain DoA estimation without digital optimization. Numerical results confirm the theoretical finding and show that the MiLAC-aided approach achieves the same CRB and DoA estimation performance as the fully-digital benchmark. Meanwhile, hardware cost and power consumption are reduced because only low-resolution DACs are required at the transmitter, while RF chains and ADCs are eliminated at the receiver. Moreover, performing the 2D DFT in the analog domain eliminates all digital DFT operations for DoA estimation.
In this paper, we propose an end-to-end transcoding pipeline, to create 3D Gaussian splatting (3DGS) models from existing 3D plenoptic point cloud or mesh models, when the original multi-view images of the captured 3D object or scene are not available. We also propose a custom initialisation to guide the 3DGS model learning, with constraints to ensure that the final 3DGS model aligns closely with the input point cloud or mesh surface. Tests on a high-quality, standard plenoptic point cloud dataset show that our pipeline produces 3DGS models of high visual quality, with many fewer splats than points in the original dense point clouds. Additionally, our custom initialisation leads to much faster convergence and cleaner surface representation than when starting from the default SfM-based initialisation that is typically used for 3DGS model learning.
This paper proposes a joint alignment and denoising method for event-based vision sensors (EVSs). Existing signal processing methods for EVSs typically perform event alignment (EA) and event denoising (ED) as separate modules. However, this separation creates a dilemma: without ED, EA is biased by noise, whereas without EA, ED struggles to distinguish signal events from noise ones. To address this dilemma, we jointly optimize EA and ED by formulating a bi-objective Pareto optimization problem. Our formulation is built upon a contrast map that counts the number of events localized in each pixel. With a contrast map, we can formulate EA as maximizing its variance and ED as minimizing the variance. We cast these two conflicting problems as a Pareto optimization and use a regret strategy to obtain a solution. Experimental results on denoising and motion estimation demonstrate that our method achieves improvements against alternative ones.
Existing Synthetic Aperture Radar (SAR) image generation methods still lack reliable controllability over key imaging parameters, particularly azimuth angle, depression angle, and polarization mode. Our preliminary GeoDiff-SAR supported limited azimuth completion, but remained ineffective for large missing azimuth sectors and did not provide unified control over multiple imaging conditions. To address this problem, we propose GeoDiff-SAR II, a 3D model-guided decoupled framework for controllable SAR image generation. The proposed framework imposes controllability through physically grounded geometric-electromagnetic cues rather than image intensity alone. We introduce a Geometric-Electromagnetic Conditioning Map (GECM), a structured intermediate representation that encodes the target pose map and dominant scattering centers, thereby decoupling macroscopic geometry from microscopic scattering responses. During training, GECMs are derived from real sparse-azimuth SAR images. During inference, the same representation is rendered directly from a 3D CAD model under specified azimuth, depression angle, and polarization conditions, enabling physically consistent control across large viewpoint gaps. The imaging parameters are further converted into text conditions, while the GECM is injected through ControlNet to provide explicit spatial guidance. Combined with Low-Rank Adaptation (LoRA) on a FLUX backbone, the proposed framework unifies geometric-electromagnetic conditioning and parameter-aware generation within a single process. Experiments on simulated and real datasets demonstrate controllable generation over key SAR imaging parameters, stable generalization across large azimuth gaps, and consistent improvements in image fidelity, physical consistency, and downstream Automatic Target Recognition (ATR) performance.
We propose a deep beamforming framework for enhancing target speaker(s) in multi-speaker environments. A deep neural network (DNN) is trained to estimate beamforming weights directly from noisy multichannel inputs while satisfying linear spatial constraints through an adaptive multi-term loss inspired by the augmented Lagrangian framework. The loss combines signal reconstruction with penalties that enforce a distortionless response toward the target and suppress the interference subspace. The model is further guided by the target relative transfer function (RTF) and the estimated interference subspace. The proposed model can direct a beam toward the target speaker while directing nulls toward the interfering sources, achieving superior overall enhancement performance compared with the classical LCMV beamformer constructed by the same estimated spatial signatures. Furthermore, compared with the LCMV beamformer, the proposed model produces more controlled sidelobes and improved background-noise attenuation.
This letter proposes a network-wide coordinated optimization model to mitigate voltage unbalance (VU) by unleashing the remaining capacity of community inverter-based resources (IBRs). Existing single-sequence strategies ignore coupled capacity constraints and cause idle headroom. Meanwhile, they fail to harness the collective governance capabilities of community IBRs. To solve this discrepancy and exploit the unused potential, we developed a sequence-domain network model in dual commonly shared synchronous reference frames. Strict phase current and apparent power limits are formulated and convexified via polyhedral approximations. A quadratic objective function flexibly balances sequence capacity allocation. Simulation and experimental results validate the effectiveness of the proposed strategy.
Against the backdrop of the burgeoning global low-altitude economy, countries have successively introduced a series of policies to accelerate the application and commercialization of electric vertical take-off and landing (eVTOL) aircraft. Nevertheless, purely electric eVTOLs confront constraints including limited battery energy density, high operational power requirements, and challenges associated with rapid energy replenishment, which collectively restrict their flight endurance and application scenarios. Furthermore, while eVTOL deployment is scaling up, supporting charging infrastructure and regulations remain underdeveloped. This situation presents emerging power distribution networks with new challenges in maintaining adequate electricity supply and ensuring operational continuity. To tackle these issues, following an investigation into battery energy replenishment strategies, a closed-loop supply chain-based model for eVTOL battery charging and swapping is proposed. Time-space network methods are utilized to characterize the scheduling of batteries and logistics throughout the system. Subsequently, aiming to maximize the operational revenue of the model, optimized management of battery swapping, transportation, and charging processes is implemented, facilitating coordinated operation among eVTOLs, swapping stations, and charging stations. Finally, the model is solved by Gurobi, verifying its feasibility. Simulation results further indicate that the model alleviates range anxiety for eVTOLs, offering strong support for their commercialization. Moreover, it enables coordinated scheduling between eVTOLs and the distribution network, thereby facilitating the network's gradual improvement and upgrading.
In this work we present an efficient and practically implementable approach for the application of reinforcement learning (RL)-based control in chemical process systems. This is an area that has yet to widely adopt RL-based control largely due to inherent challenges in trusting RL algorithms and the time-consuming process of training reliable agents. To address these challenges, we leverage a class of RL algorithms termed Y-wise Affine Neural Network (YANN)- RL, which we have developed in our prior work (Braniff and Tian, 2025a). By strategically initializing actor and critic networks YANN-RL algorithms provide confident and interpretable starting points within control schemes. We apply this RL-based control approach to three different process engineering case studies publicly available on the PC-Gym library (Bloor et al., 2026): (i) a continuous stirred tank reactor (CSTR), (ii) a four-tank system, and (iii) a multistage extraction column. Our approach is compared to several popular RL algorithms (PPO, SAC, DDPG, and TD3) and is benchmarked against nonlinear model predictive control (NMPC). These case studies demonstrate that YANN-RL can greatly reduce the training time and data needed, can be deployed with confidence for chemical process systems, and can approach the performance of NMPC without the knowledge of a full nonlinear model.
A retinal vessel analysis is a procedure that can be used as an assessment of risks to the eye. This work proposes an unsupervised multimodal approach that improves the response of the Frangi filter, enabling automatic vessel segmentation. We propose a filter that computes pixel-level vessel continuity while introducing a local tolerance heuristic to fill in vessel discontinuities produced by the Frangi response. This proposal, called the local-sensitive connectivity filter (LS-CF), is compared against a naive connectivity filter to the baseline thresholded Frangi filter response and to the naive connectivity filter response in combination with the morphological closing and to the current approaches in the literature. The proposal was able to achieve competitive results in a variety of multimodal datasets. It was robust enough to outperform all the state-of-the-art approaches in the literature for the OSIRIX angiographic dataset in terms of accuracy and 4 out of 5 works in the case of the IOSTAR dataset while also outperforming several works in the case of the DRIVE and STARE datasets and 6 out of 10 in the CHASE-DB dataset. For the CHASE-DB, it also outperformed all the state-of-the-art unsupervised methods.
Motor-imagery (MI) EEG can be classified using supervised machine learning techniques such as Linear Discriminant Analysis applied to features extracted by Common Spatial Patterns. Performance of these models varies widely, possibly due to MI studies commonly utilising differing post-cue time windows and frequency bands to one another. This study aims to assess how the simultaneous optimisation of both these parameters impact MI classification performance. This is done by iteratively training and testing a series of subject-specific models on different combinations of frequency bandwidth and time window options across 109 subjects. This is followed by a statistical analysis using repeated measures ANOVA to uncover significant differences between different bandwidths and time windows in terms of accuracy across the patient cohort. The resulting visualisations and statistical tests show that there are, indeed, significant differences between both specific time windows and specific bandwidths in terms of accuracy. While the comparison of classification accuracies across 23 frequency bandwidths during five different time windows demonstrates an optimal temporal and spectral scale combination of (0, 4) s at the range of (4, 12) Hz across all subjects, the subjects demonstrate similar accuracies for other parameter combinations. These findings highlight the efficacy of personalised models to detect optimal temporal and spectral parameter combinations to best classify MI EEG signals that inherently vary across subjects.
Automatic subjective speech quality assessment (SSQA) traditionally estimates speech quality on an utterance or system level. While this resolution was adequate for older transmission or synthesis systems that produced speech signals of mediocre quality, modern systems generate high-quality speech with degradations that may occur only locally. With suitable model architectures and regularization losses, SSQA models trained with utterance-level targets can also yield useful local predictions of speech quality. In this work, we extend such models to produce frame-level embeddings that cluster by degradation type. Specifically, we employ a partial mix-up strategy on a parallel corpus of clean and degraded utterances and apply a contrastive loss to distinguish between degradation types. Through experiments on both in- and out-of-domain data, we demonstrate that our approach improves degradation detection and enables the identification of degradation types by analyzing embedding clusters.
This study investigates radar technology for non-invasive brain imaging and tumour detection, offering an alternative to MRI and CT scans. Using Ansys HFSS to simulate electromagnetic interactions in brain tissues, we evaluate the penetration, signal strength, and safety of Patch and Vivaldi antennas. Results show Patch antennas are optimal for tumour localization, while Vivaldi antennas suit broader scanning applications. Although promising for safer, more accessible imaging, especially in resource-limited environments, further research with diverse models and actual patient data is essential to advance this technology in non-invasive medical diagnostics.
Distribution networks are transitioning from passive to active systems due to the growing integration of distributed energy resources (DERs). Peer to Peer (P2P) energy trading has emerged as a viable framework that enables local energy exchange among participants, represented here as aggregated microgrids (MGs). Incorporating network constraints is essential to ensure that P2P transactions remain physically feasible and consistent with grid's operating limits. However, existing P2P frameworks still lack advanced predictive mechanisms that allow prosumers to anticipate network feasibility or the distribution system operator (DSO) response during trade formulation. This paper proposes a learning augmented P2P and DSO interface that predicts the DSOs response to the proposed P2P trades, allowing prosumers to self-assess and refine their trading decisions. A supervised transformer based regression model is trained to enable MGs to locally predict the DSOs response without sharing their proposed trades, thereby reducing transaction overhead, alleviating DSO burden, and preserving information privacy. The proposed framework is validated on the modified IEEE 33 bus distribution power system with interconnected microgrids. Case studies are presented to validate the effectiveness of the proposed model in terms of market efficiency, trade acceptance and computational burden.
This paper introduces a systematic method for designing robust linear controllers using output feedback in the presence of operational constraints. The design uses Nagumo's Theorem and the Comparison Lemma to guarantee constraint satisfaction, while incorporating min-norm optimal control principles inspired by Control Barrier Functions. The resulting controller is a continuous piecewise-linear output feedback policy that preserves the closed-loop system's analyzability using linear systems theory. Due to the linear control design, multi-input multi-output (MIMO) robustness margins can be derived with and without active operational constraints. This paper shows that operational constraints on the system's state can be satisfied using an observer-based output feedback control design. Through flight control trade studies, we demonstrate the practical relevance of the framework in safety-critical aircraft control applications.
This paper presents a low-complexity, model-free, output-feedback controller for a class of unknown time-varying nonlinear systems with unknown input constraints. The controller achieves the preset control accuracy when the actuator is not saturated and maintains flexible control accuracy after actuator saturation. This result extends existing constraint control methods for linear manifolds to a more general form, including the construction of nonlinear manifolds and various types of constraints, thereby achieving preset control accuracy within finite or fixed time. Additionally, flexible control under unknown saturation is achieved through the construction of an error-driven flexible constraint. Finally, second-order and higher-order control examples and simulations are provided.
Quadratic constraints (QCs) are widely used to characterize nonlinearities and uncertainties, but generic analytical characterizations can be conservative on bounded domains. This paper develops a framework for constructing verified quadratic characterizations of scalar relations in the two-dimensional real plane. Candidate quadratic inequalities are locally generated by solving convex quadratic programs using samples from the relation and exterior sample points. They are then verified globally using sum-of-squares certificates over an exact semialgebraic description or, in the case of nonpolynomial relations, over relaxed polynomial descriptions. The resulting verified constraints define a sound overapproximation of the scalar relations over the considered domains. These constraints are directly compatible with existing analysis frameworks based on QCs and pointwise integral quadratic constraints (IQCs) for static nonlinearities and uncertainties, and they can also be embedded in QC-based semidefinite programs for reachability and safety analysis of feedforward neural networks. For smooth activations such as $\tanh$, the method yields domain-dependent quadratic characterizations that constitute an alternative to generic sector- or slope-based descriptions. For ReLU networks, we give methods to reduce conservatism in QC-based reachability analysis of feedforward networks by exploiting dependencies between neurons and tighter local bounds. Numerical examples demonstrate improved reachability results for smooth activations, reduced conservatism for ReLU networks, and applicability beyond neural networks through an example involving saturation.
Modern artificial intelligence systems require calibrated uncertainty estimates that remain reliable in sequential and non-stationary environments. Online conformal prediction (OCP) addresses this challenge through adaptively updated prediction sets that provide deterministic long-run miscoverage guarantees. These guarantees, however, hinge on the assumption of perfect feedback about the coverage of past prediction sets. In practice, the observed miscoverage indicator may be corrupted by noise, communication failures, or adversarial manipulation, which can severely degrade OCP's calibration guarantees. In this paper, we study OCP under corrupted feedback. We first model feedback corruption as an arbitrary binary flip sequence, and analyze how feedback corruption affects and degrades the miscoverage performance of standard OCP. We then propose two robust schemes: robust OCP via filtering, which leverages the structural properties of the predicted threshold to filter corrupted feedback, and robust OCP via active compensation, which incorporates an active compensation mechanism to mitigate the effect of corrupted feedback. For both methods, we establish explicit miscoverage guarantees, which are further specialized for an independent stochastic flip model and for an arbitrary error model with memory bounds. Experiments on real-world datasets validate the proposed approach, showing markedly improved calibration and significantly smaller prediction sets compared with baseline OCP methods under corrupted feedback.
We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets. It spans ages 5-104 and five clinical domains: Alzheimer's, Parkinson's, tumors, white matter disease, and neurodevelopment. Unlike prior medical Visual Question Answering (VQA) efforts that operate on 2D slices or rely on narrow diagnostic labels, NeuroQA pairs every item with a full 3D volume. It evaluates 11 clinically grounded reasoning skills across Yes/No, multiple-choice, and open-ended formats. Of the 203 templates, 131 are image-grounded (answerable from a 3-plane viewer) and 72 are image-informed (ground truth from quantitative volumetry or clinical instruments). To remove text-only shortcuts, we apply answer-distribution refinement, reducing closed-format text-only accuracy from $>$80% to 44.6%; image necessity is assessed separately through an image-grounding protocol released with the benchmark. A 38-rule deterministic pipeline and two rounds of expert review verify every QA pair against FreeSurfer measurements, metadata, or radiology report fields, with zero same-subject contradictions across templates. We conduct a clinician evaluation in which two clinicians independently assess 100 frozen test items on a three-plane viewer. On closed-format (Yes/No + multiple-choice) test-public items, the best zero-shot vision-language model and a supervised 3D CNN baseline reach 47.5% and 43.7% accuracy respectively, both below the 49.4% text-only majority-template floor. NeuroQA adopts a two-tier release with public QA pairs for open-access datasets and reproducible generation scripts for datasets restricted by data use agreements (DUAs), plus subject-level splits, a held-out private test set, and an online leaderboard.
Flexible coupler antenna systems have recently received significant research interest due to their capability to intelligently reconfigure wireless channels by controlling coupler positions and/or rotations and dynamically exploiting mutual coupling. In this paper, we investigate a new type of flexible coupler antenna, termed rotatable coupler antenna (RCA), for enabling spectrum and energy efficient wireless communication cost-effectively. Specifically, an RCA consists of one fixed active antenna and multiple low-cost passive couplers, each of which can independently rotate in three-dimensional (3D) space, so as to collaboratively achieve mechanical beamforming without requiring additional radio-frequency (RF) chains for the couplers. We study an RCA-enhanced point-to-point communication system, where one RCA is deployed at the transmitter to serve a single user equipped with a fixed antenna. Based on multi-port circuit theory, we establish the channel model and characterize the mutual coupling coefficients as a function of coupler rotations. We formulate a new problem to maximize the received signal-to-noise ratio (SNR) at the user by optimizing the 3D rotations of all couplers, subject to practical coupler rotation constraints. To tackle this nonconvex problem, we develop a spherical-cap conditional-gradient-based algorithm with cross-entropy-method initialization. Simulation results demonstrate that the proposed RCA system can significantly improve communication performance in comparison with benchmark schemes, while requiring substantially fewer active antennas and RF chains.
Flexible coupler antenna (FCA) is a new technique that aims to improve the performance of wireless communication networks by smartly translating low-cost passive couplers around fixed-position active antennas to reshape the induced currents on the passive elements for radiation. Specifically, different couplers can independently control their positions/rotations at the transceiver and thereby collaboratively achieve mechanical beamforming for directional signal enhancement or nulling. The position and/or rotation reconfiguration of passive couplers provides a new and cost-effective means of enhancing wireless communication performance, while significantly reducing the antenna and radio-frequency (RF) chain costs of conventional active arrays. The compact and low form-factor structure of the FCA makes it particularly appealing for devices with stringent size, weight, and power (SWAP) constraints. In this article, we provide an overview of FCA to reveal its promising capabilities in wireless networks, including its system modeling, practical implementation, and competitive advantages over existing techniques. We present a variety of FCA-enabled performance enhancements in terms of mechanical beamforming gain, path-loss reduction, fading mitigation, spatial multiplexing gain, interference suppression, and geometric gain. Furthermore, we elaborate on the design challenges of FCA as well as promising solutions, and discuss the key applications of FCA in wireless networks. Finally, numerical results are presented to verify the substantial capacity gains enabled by FCA-aided transmission in wireless networks.
Distributed optimization has found widespread applications in smart grids, optimal control, and machine learning. This paper studies distributed consensus optimization. We extend the Augmented Lagrangian-based Alternating Direction Inexact Newton (ALADIN) framework to propose Consensus ALADIN (C-ALADIN) with a central coordinator, which directly handles consensus constraints. Our C-ALADIN algorithm admits both a first-order variant and a second-order variant that employs a Hessian approximation, avoiding direct transmission of second-order information while preserving fast local convergence. We then develop a decentralized version of C-ALADIN that operates over directed graphs with quantized communication, using a finite-time coordination protocol. For both versions, we establish global convergence guarantees for convex problems and local convergence guarantees for non-convex problems. For the decentralized case, the iterates converge to a neighborhood of the optimum determined by the quantization level. Numerical results demonstrate that our methods retain fast convergence while substantially reducing communication and computational costs compared to existing decentralized approaches.
This work presents E-ReCON, a 16 Kb energy and resource-efficient digital compute-in-memory (DCIM) macro based on a compact 3T1R ReRAM bitcell for edge-AI inference. The proposed bitcell occupies only 0.85 um^2 and supports reliable AND-based in-memory multiplication for both conventional convolutional neural network (CNN) and spiking neural network (SNN) workloads. To reduce accumulation overhead, a novel interleaved 10T/28T adder tree is introduced, reducing transistor count and power consumption by 37% and 28%, respectively, compared to a conventional 28T RCA-based design. Implemented in 65 nm CMOS at 1.2 V, the proposed macro achieves a minimum latency of 0.48 ns, throughput of 2.31-3.1 TOPS, and energy efficiency of up to 419 TOPS/W. When evaluated on LeNet-5, AlexNet, and CNN-8 models, the macro achieves 97.81%, 93.23%, and 96.51% accuracy on MNIST/A-Z, CIFAR10, and SVHN datasets, respectively. In addition, 40% pruning preserves nearly 99.8% of the original accuracy while reducing MAC operations and computation cycles. For SNN-oriented workloads, the proposed AND-type bitcell efficiently supports spike-weight multiplication with low switching activity, where the 2A2W configuration achieves accuracy close to the FP32 baseline across VGG-8, VGG-16, and ResNet-18 networks on CIFAR-10, CIFAR-100, and ImageNet-1K datasets. Compared to prior ADC-based ReRAM-CIM designs, the proposed architecture improves latency and energy efficiency by nearly 30-40% while maintaining robust operation under full PVT and ReRAM variability. Overall, E-ReCON provides a scalable, low-latency, and energy-efficient nvCIM platform for next-generation edge-AI, IoT, biomedical sensing, and neuromorphic applications.
Estimation under model misspecification arises in many signal processing problems, where the assumed observation model deviates from the true data-generating mechanism due to errors or simplifications. The misspecified Cramér-Rao bound (MCRB) is a widely recognized mean-squared-error (MSE) lower bound for this case, which has originally been used to describe the asymptotic behavior of the misspecified maximum likelihood (MML) estimator. Despite its widespread use, the MCRB lacks a rigorous characterization of the class of estimators for which it is valid. In this paper, we revisit the theory of parameter estimation under model misspecification and re-examine the foundations of the MCRB. We first demonstrate these limitations and examine a naive version of the MCRB, which relies only on local misspecified unbiasedness. We show that this bound is generally not tight and may be unattainable. To obtain a meaningful bound, we develop a new derivation based on the concept of pointwise equivalent models. By maximizing the naive bound for these models, we recover the classical MCRB, now supported by a constructive derivation, an explicit characterization of the associated estimator class, and an equality condition. This formulation establishes a formal link between local unbiasedness conditions and achievable bounds, offering new insights into the MCRB structure and its relevance to practical estimators. Finally, we define the notion of an efficient misspecified estimator and show that if it exists, it is achieved by the MML estimator.
Effective diabetes management requires continuous monitoring of glycemic levels. Clinically, glycemic control is assessed using metrics such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR), typically derived from continuous glucose monitoring (CGM). However, many patients rely on self-monitoring of blood glucose (SMBG) due to the high cost and limited accessibility of CGM. Unlike CGM, SMBG provides sparse and irregular measurements, making accurate estimation of these metrics challenging. Conventional supervised learning approaches struggle under such sparsity, leading to poor generalization and unstable performance. To address this, we propose PACD-Net, a self-supervised contrastive knowledge distillation framework for estimating glycemic control from SMBG. Pseudo-SMBG samples with richer temporal coverage are used as teacher signals to guide learning from sparse observations. In addition, multi-view contrastive learning enforces representation consistency across diverse sampling patterns. The model adopts a hybrid Swin Transformer-CNN backbone to capture temporal dependencies in sparse SMBG sequences. Experimental results demonstrate that PACD-Net consistently outperforms existing methods in estimating TAR, TIR, and TBR from real-world SMBG data, achieving improved accuracy as well as enhanced stability and generalization under extremely sparse observation settings. The proposed framework provides a practical tool for clinical SMBG interpretation and offers a generalizable approach for learning from sparse and irregularly sampled sensor data in broader applications.
Passive acoustic monitoring (PAM) enables large-scale biodiversity assessment, but continuous recording generates large amounts of non-informative audio, creating challenges for storage, power consumption, and long-term edge deployment. Bird audio detection (BAD), which identifies bird vocalizations, can reduce this burden by filtering irrelevant recordings before downstream analysis. However, most BAD systems are trained on temperate datasets despite tropical soundscapes being denser, more species-rich, and acoustically unpredictable. To address this gap, we introduce SEABAD (Southeast Asian Bird Activity Detection), a dataset of 50,000 curated three-second clips from Southeast Asian soundscapes, evenly balanced between bird-present and bird-absent samples. The dataset spans 1,677 bird species and is standardized to 16 kHz mono audio for embedded and low-power inference. We developed a dual-branch curation pipeline: a six-stage positive-label workflow applied to Xeno-Canto recordings, alongside six source-specific negative-label extractions from environmental datasets. These procedures reduced class imbalance by 13.7% (Gini coefficient: 0.601 to 0.519). A manual audit of 1,000 positive clips confirmed 97.8% +/- 0.9% labeling accuracy. Baseline experiments using MobileNetV3-Small achieved 99.57% +/- 0.25% accuracy and 0.9985 +/- 0.0002 AUC across three random seeds. SEABAD and the full curation pipeline are publicly released to support tropical BAD research and energy-efficient acoustic monitoring.
KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to detect or recover from failures at runtime. We present a tiered KV cache architecture that enables runtime-certified attention: INT8 keys and INT4 values are stored in GPU memory, while FP16 originals are retained in system RAM for deterministic fallback. A two-term error decomposition yields per-head, per-step bounds on (i) attention distribution distortion from key quantization and (ii) value reconstruction error. These bounds are computed online and used to drive adaptive precision selection and a multi-stage fallback ladder, which guarantees recovery to the exact dense attention output when required. Across PG-19, NIAH, and RULER benchmarks on LLaMA~3.1-8B with contexts up to 128K, the system matches dense FP16 KV quality within noise for language modelling and retrieval tasks, while recovering catastrophic failures observed in naive INT8/INT4 baselines. Value-sensitive tasks at short context expose a controlled trade-off between compression and fidelity, which can be eliminated via tighter value tolerances or FP16-value fallback. The certification is local (per-head, per-step) and does not guarantee end-to-end model correctness, but ensures that each attention computation is either bounded relative to an FP16 reference or exactly recovered via fallback. This reframes KV cache quantization as a runtime-verified computation rather than a fixed approximation. The goal is not raw speedups, but enabling safe deployment of aggressive KV compression under strict quality constraints.
Feedforward steering control is a key component of hierarchical control architectures for autonomous racing. The goal is to reduce steering corrections from the feedback controllers by predicting the vehicle's inverse lateral dynamics. This paper presents a systematic benchmark of two learning-based and two empirical (analytical) feedforward steering controllers. We introduce a new \acf{ehd} formulation based on a polynomial surface fit that captures velocity-dependent nonlinear steering behavior with minimal parametrization. We test the feedforward controllers in a high-fidelity simulation framework based on the real-world Abu Dhabi Autonomous Racing League competition, using a high-fidelity double-track vehicle dynamics simulator. Open-loop evaluation shows that the learning-based controllers achieve the lowest prediction errors; however, closed-loop testing reveals that this improved accuracy does not translate into superior path tracking performance or lap times, even after iterative fine-tuning. In contrast, the proposed EHD approach achieves the best overall closed-loop robustness and lap time, highlighting the necessity of evaluating feedforward strategies within the complete trajectory planning and control software stack. Our code is available at this https URL.
Reset systems can overcome fundamental limitations of linear time-invariant control. The recently introduced notion of scaled (relative) graphs provides a promising framework for developing graphical analysis and design tools for reset systems, in line with widely adopted loopshaping methods for linear systems. The aim of this paper is to derive techniques for over-bounding the scaled graph of reset systems, and obtain insights in their accuracy. We exploit connections between quadratic dissipativity and scaled graphs to recast the over-bounding problem as the search for piecewise quadratic storage functions. Using specific sampling techniques, we reveal a fundamental limitation of general scaled graph approximation methods that are based on quadratic dissipativity.
Existing LoRaWAN/LoRa simulators consist of large, complicated C++ codebases and often do not support all device classes. This paper presents the design of a simple to use, Python-based discrete-event simulator that addresses these gaps while also introducing a novel method for evaluating real device firmware in the simulator. The simulator is built on a custom asyncio-based simulation kernel, a three-phase packet delivery model that reproduces the capture effect, a full LoRaWAN 1.0.4 stack, and a containerized firmware system that cross-compiles real STM32 C firmware and redirects HAL calls into the simulator via CFFI. The simulator is distributed as a Python package via Github (this https URL) and requires no external simulation framework or dependencies.
Semi-blind joint channel estimation and data detection (JCD) is a promising approach to mitigate pilot contamination in cell-free massive multiple-input multiple-output (CF-MaMIMO) networks. The effectiveness of such methods fundamentally depends on identifiability, i.e., the ability to unambiguously recover the unknown channel coefficients and transmitted data signals from the received uplink observations. In this work, we investigate the identifiability of semi-blind JCD from a large-scale system design perspective. We consider a CF-MaMIMO network in which access points (APs) and user equipments (UEs) are spatially distributed according to Poisson point processes (PPPs). The resulting network topology is modeled as bipartite random geometric graph (BRGG) that captures local connectivity induced by wireless propagation. To enable a tractable analysis, the spatially dependent graph model is approximated by a surrogate independent-edge random graph with matched degree distributions. Building on this model, we develop a recursive probabilistic analysis that characterizes the conditions under which semi-blind recovery succeeds with high probability. The proposed analysis reveals an identifiability region as a function of key system parameters, including AP and UE densities and the connectivity radius beyond which channel coefficients are assumed negligible. Monte Carlo simulations validate the predicted identifiability region and assess the accuracy of the proposed graph approximation. The proposed framework provides system level insights into how network density and connectivity affect identifiability in large-scale CF-MaMIMO systems and offers guidelines for selecting deployment parameters and pilot sequence lengths that enable reliable semi-blind recovery.
As a rapidly emerging interdisciplinary field that intrinsically integrates microwave and photonics, microwave photonics (MWP) provides disruptive solutions to overcome the fundamental bandwidth of conventional electronic systems. By exploiting the inherently ultra-wide bandwidth and low-loss characteristics of photonic technologies, MWP enables the generation, transmission, processing, and detection of microwave, millimeter-wave, and terahertz signals. Representative breakthroughs include fully photonic microwave radar systems, photonic analog-to-digital converters with bandwidth up to 320 GHz, and photonic wireless communication systems achieving data rate as high as 616 Gbit/s. Meanwhile, the rapid growth of artificial intelligence (AI) is reshaping scientific research, engineering, and daily life in unprecedented ways, such as AI for science/engineering and AI co-scientist/assistant. Correspondingly, AI is profoundly reshaping MWP in all aspects, ranging from signal generation, transmission to signal processing and detection. AI has revolutionized the design, simulation, fabrication, testing, deployment, and maintenance of MWP systems, delivering autonomous operation and exceptional efficiency beyond traditional systems. Motivated by these developments, this Review Paper provides the first comprehensive overview of AI-enabled MWP, systematically summarizing the state-of-the-art advances and presenting insights for both the academic community and the broader public.
Eduardo Sontag and coauthors studied Input-to-Output Stability (IOS) and the output asymptotic gain property. These notions changed control theory and recently had an impact on robust adaptive control through the Deadzone-Adapted Disturbance Suppression (DADS) control scheme. Moreover, recently the notion of IOS was extended to systems described by Partial Differential Equations (PDEs). In this work, we celebrate Eduardo Sontag by combining DADS and IOS for PDEs: we study the partial-state regulation problem for a scalar Ordinary Differential Equation (ODE) which is interconnected with a possibly infinite-dimensional system. In such a case the DADS control scheme can allow an escape from the requirements of the small-gain theorem that is mainly used for partial-state feedback. We show the design procedure of partial-state DADS controllers and we prove robust regulation even in the presence of external inputs (disturbances) without assuming knowledge of any disturbance/parameter bounds. The DADS controller is applied to three different cases of the interconnection of an ODE with an almost completely unknown: (a) heat PDE, (b) transport PDE, and (c) wave PDE with viscous damping. We show that the same DADS controller can achieve robust regulation in all three cases.
The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15.
Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%, based solely on the textual content of their utterances. Our explainability study reveals that the system decisions are linked to semantically similar keywords within utterances, stemming from how LibriSpeech is curated. Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.
This paper presents a PID tuning method based on step response curve fitting (PID-SRCF) that utilizes L2-norm minimization for precise reference tracking and explicit transient response shaping. The algorithm optimizes controller parameters by minimizing the root-mean-square error between desired and actual step responses. The proposed approach determines optimal PID parameters by matching any closed-loop response to a desired system step response. Practically a first-order plus time delay model or a second-order system with defined settling time and overshoot requirements are preferred. The method has open-source implementation using constrained nonlinear optimization in MATLAB. Comparative evaluations demonstrate that PID-SRCF can replace known analytical methods like Ziegler Nichols, Lambda Tuning, Pole Placement, Dominant Pole and MATLAB proprietary PID tuning applications.
High altitude platform stations (HAPS) offer a promising solution for achieving ubiquitous connectivity in next-generation wireless networks (xG). Integrating HAPS with terrestrial networks, creating HAPS-empowered vertical heterogeneous networks (vHetNets), significantly improves coverage and capacity and supports emerging novel use cases. In HAPS-empowered vHetNets, HAPS and terrestrial network tiers can share the same spectrum, forming harmonized spectrum vHetNets that enhance spectral efficiency (SE). However, harmonized spectrum vHetNets face major challenges, including severe co-channel interference and scalability in large-scale deployments. To address the first challenge, we adopt a cell-free multiple-input multiple-output (MIMO) network architecture in which users are simultaneously served by multiple base stations using beamforming. However, beamforming weight design leads to a nonconvex, high-dimensional optimization problem, highlighting the scalability challenge. To address this second challenge, we develop a two-level distributed proportional fairness beamforming weight design (PFBWD) algorithm. This algorithm combines the augmented Lagrangian method (ALM) with a three-block ADMM framework. Simulation results demonstrate the performance improvements achieved by integrating HAPS with standalone terrestrial networks, as well as the reduced complexity and signaling overhead of the distributed algorithm compared to centralized algorithms.
Voice anonymization systems aim to protect speaker privacy by obscuring vocal traits while preserving the linguistic content relevant for downstream applications. However, because these linguistic cues remain intact, they can be exploited to identify semantic speech patterns associated with specific speakers. In this work, we present VoxATtack, a novel multimodal de-anonymization model that incorporates both acoustic and textual information to attack anonymization systems. While previous research has focused on refining speaker representations extracted from speech, we show that incorporating textual information with a standard ECAPA-TDNN improves the attacker's performance. Our proposed VoxATtack model employs a dual-branch architecture, with an ECAPA-TDNN processing anonymized speech and a pretrained BERT encoding the transcriptions. Both outputs are projected into embeddings of equal dimensionality and then fused based on confidence weights computed on a per-utterance basis. When evaluating our approach on the VoicePrivacy Attacker Challenge (VPAC) dataset, it outperforms the top-ranking attackers on five out of seven benchmarks, namely B3, B4, B5, T8-5, and T12-5. To further boost performance, we leverage anonymized speech and SpecAugment as augmentation techniques. This enhancement enables VoxATtack to achieve state-of-the-art on all VPAC benchmarks, after scoring 20.6% and 27.2% average equal error rate on T10-2 and T25-1, respectively. Our results demonstrate that incorporating textual information and selective data augmentation reveals critical vulnerabilities in current voice anonymization methods and exposes potential weaknesses in the datasets used to evaluate them.
Distributed formation maneuver control refers to the problem of maneuvering a group of agents to change their formation shape by adjusting the motions of partial agents, where the controller of each agent only requires local information measured from its neighbors. Although this problem has been extensively investigated, existing approaches are mostly limited to uniform scaling transformations. This article proposes a new type of local matrix-valued constraints, via which non-uniform scaling control of position formation can be achieved by tuning the positions of only two agents (i.e., leaders). Here, the non-uniform scaling transformation refers to global scaling the position formation with different ratios along different orthogonal coordinate directions. Moreover, by defining scaling and translation of attitudes, we propose a distributed control scheme for scaling and translation maneuver control of joint position-attitude formations. It is proven that the proposed controller achieves global convergence, provided that the sensing graph among agents is a 2-rooted bidirectional graph. Compared with the affine formation maneuver control approach, the proposed approach leverages a sparser sensing graph, requires fewer leaders, and additionally enables scaling transformations of the attitude formation. A simulation example demonstrates our theoretical results.
This paper introduces a Markov chain-based approach for the analysis and optimization of spare-management policies in large-scale satellite constellations. Focusing on the direct strategy, we model spare replenishment as a periodic-review reorder-point/order-quantity policy, where spares are deployed directly to constellation planes. The stochastic behavior of satellite failures and launch vehicle lead times is captured through Markov representations of both failure and replenishment dynamics. Based on this efficient and accurate framework, we construct and solve an optimization problem aimed at minimizing operational costs. The effectiveness of the proposed method is demonstrated through a case study using a real-world mega-constellation.
Infrastructure-mounted sensors can capture rich environmental information to enhance communications and facilitate beamforming in millimeter-wave systems. This work presents an efficient sensing-assisted long-term beam tracking framework that selects optimal beams from a codebook for current and multiple future time slots. We first design a large attention-enhanced neural network (NN) to fully exploit past visual observations for beam tracking. A convolutional NN extracts compact image features, while gated recurrent units with attention capture the temporal dependencies within sequences. The large NN then acts as the teacher to guide the training of a lightweight student NN via knowledge distillation. The student requires shorter input sequences yet preserves long-term beam prediction ability. Numerical results demonstrate that the teacher achieves Top-5 accuracies exceeding 93% for current and six future time slots, approaching state-of-the-art performance with a 90% reduction of model parameters. The student closely matches the teacher's performance while reducing the number of model parameters by over 1670% and cutting complexity by over 450%, despite operating with 60% shorter input sequences. This improvement significantly enhances data efficiency, reduces latency, and reduces power consumption in sensing and processing.
This work investigates an integrated sensing and edge artificial intelligence (ISEA) system, where multiple devices first transmit probing signals for target sensing and then offload locally extracted features to the access point (AP) via analog over-the-air computation (AirComp) for collaborative inference. To characterize the relationship between AirComp error and inference performance, two proxies are established: the \emph{computation-optimal} proxy that minimizes the aggregation distortion, and the \emph{decision-optimal} proxy that maximizes the inter-class separability, respectively. Optimal transceiver designs in terms of closed-form power allocation are derived for both time-division multiplexing (TDM) and frequency-division multiplexing (FDM) settings, revealing threshold-based and dual-decomposition structures, respectively. Experimental results validate the theoretical findings.
This paper investigates the parameter identification for multi-participant autoregressive exogenous input (ARX) systems while protecting the system input and output. To do so, the discrete Gaussian noise in the standard Cheon-Kim-Kim-Song (CKKS) cryptosystem is replaced with a truncated one. By using the CKKS cryptosystem with the truncated discrete Gaussian noise and the key-switching technique, a proxy re-encryption scheme is developed. Based on this scheme, a secure parameter identification algorithm is proposed for multi-participant ARX systems. By rigorously proving that the statistical distance between the discrete Gaussian noise and the truncated one is negligible, the polynomial-time reduction between the standard Ring-Learning with Errors (RLWE) problem and the RLWE problem with the truncated discrete Gaussian noise is established. This result ensures the indistinguishability under chosen-plaintext attacks (IND-CPA) security of the algorithm. By giving a lower bound condition on the size of the plaintext space, the computational overflow in encryption is avoided. Based on this condition, the mean square convergence and convergence rate of the algorithm are given. The trade-off between the security level and the convergence of the algorithm is presented. Finally, a numerical example is given to verify the effectiveness of the algorithm.
This paper investigates a distributed ISEA system under a Bayesian framework, focusing on incorporating task-relevant priors to maximize inference performance. At the sensing level, an RWB estimator with a GM prior is designed. By weighting class-conditional posterior means with responsibilities, RWB effectively denoises features and outperforms ML at low SNR. At the communication level, two theoretical proxies are introduced: the computation-optimal and decision-optimal proxies. Optimal transceiver designs in terms of closed-form power allocation are derived for both TDM and FDM settings, revealing threshold-based and dual-decomposition structures. Results show that the discriminant-aware allocation yields additional inference gains.
Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of limited sensing by formulating a Bayesian prior over the continuous-time dynamics and latent state trajectory in state-space form and updating it through a targeted Metropolis-Hastings sampler equipped with a numerical ODE integrator. The resulting posterior samples are used to formulate a scenario-based optimal control problem that accounts for the uncertainty in the dynamics and latent state and is solved using standard nonlinear programming methods. The approach is validated in a numerical case study on glucose regulation using a Type 1 diabetes model.
Fetal ultrasound is the cornerstone of antenatal care, and accurate recognition of a small set of standard anatomical planes underpins biometry, growth surveillance, and detection of structural anomalies. Deep learning classifiers now match or exceed expert accuracy on curated benchmarks, but most remain opaque and miscalibrated, leaving clinicians without the calibrated confidence or faithful explanations needed for safe decision support. We systematically reviewed 78 studies published between January 1, 2015 and April 30, 2026 that paired automated fetal plane classification with explainability or predictive uncertainty quantification, following PRISMA 2020. Pooled balanced accuracy across six standard planes was 0.93 (95% CI 0.91 to 0.95), but only 19 studies (24%) reported calibration and 14 (18%) reported selective prediction. We propose CALIB-XFUS, a 22-item reporting framework that operationalises calibration, explanation faithfulness, and fairness for regulated fetal ultrasound artificial intelligence. The framework spans six domains: clinical task and indication for use; dataset provenance and representativeness; model and training pipeline; calibration and selective prediction; explanation faithfulness and clinician validation; and post-market surveillance. We argue that uncertainty-calibrated, faithfully explained, and fairness-audited fetal ultrasound AI is now both technically feasible and regulatorily expected under the FDA Good Machine Learning Practice principles and the EU AI Act high-risk obligations.
Target speaker extraction (TSE) aims to recover the speech of a desired speaker from a mixture given a short enrollment utterance, while speech enhancement (SE) focuses on improving speech quality under noisy conditions. Most existing TSE and SE systems are based on discriminative modeling and have shown strong interference suppression ability, but they often remain limited in perceptual quality and naturalness. To address this issue, we first introduce LauraTSE, a generative TSE model built on an autoregressive decoder-only language model. Although generative modeling is promising for quality enhancement, purely generative TSE may suffer from hallucination, content drift, and limited controllability in complex acoustic conditions. We therefore propose a discriminative-generative two-stage framework, where a discriminative front-end first produces target-related representations with strong interference suppression, and a generative back-end then reconstructs high-quality speech in the neural audio codec representation space. This design combines the controllability of discriminative extraction with the reconstruction capability of generative modeling. We further investigate several collaboration strategies for the two-stage framework, including front-end freezing, joint fine-tuning, SI-SDR regularization, and autoregressive/non-autoregressive inference. Experimental results on both TSE and SE benchmarks show that the proposed framework achieves a better balance among perceptual quality, intelligibility, and speaker consistency than purely discriminative or purely generative baselines.
Regulators and voluntary corporate sustainability efforts are increasingly adopting time-matching requirements (TMRs) for clean electricity procurement for large loads, such as data centers, and electricity-intensive fuel production, such as hydrogen. We use a stochastic capacity expansion model (CEM) framework to assess how inter-annual weather variability affects the cost, composition, and emissions of procurement-driven infrastructure to meet annual and hourly TMRs using the case study of a grid-connected hydrogen producer in Texas. Our approach, which relies on co-optimizing investments and hourly operations over nine weather scenarios, reveals that hourly TMR comes at a higher cost premium compared to annual TMR than previously estimated by single-scenario deterministic modeling, while emissions outcomes remain directionally consistent. Demand flexibility and partial hourly TMR (80-90%) lower the cost premium while preserving emissions benefits. We further examine how binding renewable portfolio standards (RPS) interact with TMR costs and emissions outcomes. When an RPS is applied to non-H2 electricity demand, annual TMR reduces emissions comparably to hourly TMR at a lower cost. Incorporating H2-related electricity demand directly into the RPS constraint, rather than imposing a separate TMR, achieves similar emissions outcomes at still lower cost, suggesting that TMR-based clean electricity procurement, particularly hourly matching, offers limited additional value in regions with stringent grid decarbonization policies.
Low-Earth-orbit (LEO) satellites and vehicle-to-everything (V2X) networks are driving integrated communication and navigation (ICAN) toward next-generation intelligent transportation. Affine frequency division multiplexing (AFDM) is a promising waveform for high-mobility LEO scenarios owing to its Doppler robustness, simple modulation, and low pilot overhead. However, applying existing high-accuracy AFDM fractional delay-Doppler estimators to LEO-ICAN entails substantial search or inference complexity, while the spectrum-wrapping-induced envelope structure in line-of-sight (LOS)-dominated channels remains underexploited. This paper analyzes and exploits the spectrum-wrapping-induced envelope structure of the fractional AFDM response, and proposes a low-complexity joint estimator that combines minimum-entropy fractional Doppler estimation with closed-form fractional delay estimation. Simulation results show that the proposed estimator approaches the root Cramér--Rao lower bound (RCRLB) and achieves root-mean-square error (RMSE) performance comparable to that of matched filtering (MF), matched filtering with generalized Fibonacci search (MF-GFS), and off-grid sparse Bayesian learning (OG-SBL), while requiring substantially lower computational complexity and runtime. This favorable accuracy-complexity profile highlights the potential of the proposed estimator for real-time ICAN processing in high-mobility LEO-assisted vehicular networks.
Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.
During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking simultaneously with speech perception. Unlike conventional "thinking" mechanisms in NLP, which require post-hoc generation, our approach aligns seamlessly with spoken dialogue systems: during the user's speaking phase, it recursively feeds the latent embedding output from the previous step into the next step, enabling continuous reasoning that strictly adheres to causality without introducing additional latency. To enable this latent reasoning, we design an Evidence Lower Bound-based objective that supports efficient supervised finetuning via teacher forcing, circumventing the need for explicit reasoning annotations. Experiments demonstrate the effectiveness of this think-while-listening design, which achieves competitive results on a range of speech benchmarks. Furthermore, FLAIR robustly handles conversational dynamics and attains competitive performance on full-duplex interaction metrics.
Many core problems in nonlinear systems analysis and control can be recast as solving partial differential equations (PDEs) such as Lyapunov and Hamilton-Jacobi-Bellman (HJB) equations. Physics-informed neural networks (PINNs) have emerged as a promising mesh-free approach for approximating their solutions, but in most existing works there is no rigorous guarantee that a small PDE residual implies a small solution error. This paper develops verifiable error bounds for approximate solutions of Lyapunov and HJB equations, with particular emphasis on PINN-based approximations. For both the Lyapunov and HJB PDEs, we show that a verifiable residual bound yields relative error bounds with respect to the true solutions as well as computable a posteriori estimates in terms of the approximate solutions. For the HJB equation, this also yields certified upper and lower bounds on the optimal value function on compact sublevel sets and quantifies the optimality gap of the induced feedback policy. We further show that one-sided residual bounds already imply that the approximation itself defines a valid Lyapunov or control Lyapunov function. We illustrate the results with numerical examples.
LLM-assisted modeling holds the potential to rapidly build executable Digital Twins of complex systems from only coarse descriptions and sensor data. However, resilience to LLM hallucination, human oversight, and real-time model adaptability remain challenging and often mutually conflicting requirements. We present three critical design principles for integrating resilience and oversight into such workflows, derived from insights gained through our work on FactoryFlow - an open-source LLM-assisted framework for building simulation-based Digital Twins of manufacturing systems. First, orthogonalize structural modeling and parameter fitting. Structural descriptions (components, interconnections) are LLM-translated from coarse natural language to an intermediate representation (IR) with human visualization and validation, which is algorithmically converted to the final model. Parameter inference, in contrast, operates continuously on sensor data streams with expert-tunable controls. Second, restrict the model IR to interconnections of parameterized, pre-validated library components rather than monolithic simulation code, enabling interpretability and error-resilience. Third, and most important, is to use a density-preserving IR. When IR descriptions expand dramatically from compact inputs hallucination errors accumulate proportionally. We present the case for Python as a density-preserving IR : loops express regularity compactly, classes capture hierarchy and composition, and the result remains highly readable while exploiting LLMs strong code generation capabilities. A key contribution is detailed characterization of LLM-induced errors across model descriptions of varying detail and complexity, revealing how IR choice critically impacts error rates. These insights provide actionable guidance for building resilient and transparent LLM-assisted simulation automation workflows.
Nonlinear underactuated systems such as two-wheeled inverted pendulums (TWIPs) exhibit a limited region of attraction (RoA), which defines the set of initial conditions from which the closed-loop system converges to the equilibrium. The RoA of nonlinear and constrained systems is generally nonconvex and analytically intractable, requiring numerical or approximate estimation methods. This work investigates the estimation of the RoA for a TWIP stabilized under three model-based control strategies: saturated linear quadratic regulator (LQR), linear model predictive control (MPC), and constraint tightening MPC (CTMPC). We first derive a Lyapunov-based invariant set that provides a certified inner approximation of the RoA. Since this analytical bound is highly conservative, a Monte Carlo-based estimation procedure is then employed to obtain a more representative approximation of the RoA, capturing how the controllers behave beyond the analytically guaranteed region. The proposed methodology combines analytical guarantees with data-driven estimation, providing both a formally certified inner bound and an empirical characterization of the RoA, offering a practical way to evaluate controller performance without relying solely on conservative analytical bounds or purely empirical simulation.
The reproduction of automobile components through additive manufacturing presents significant geometric challenges, as many automotive parts feature complex, organically shaped surfaces that are difficult to fabricate accurately using conventional 3D printing approaches without wasteful support structures. Multi-axis Digital Light Processing (DLP) 3D printing addresses this by orienting a robotic arm to cure resin layers at varying angles and positions, enabling the fabrication of geometries that fixed-axis systems cannot reliably reproduce. However, this flexibility introduces a key challenge: layers printed at non-orthogonal orientations exhibit non-uniform thickness across their cross-section, which traditional DLP systems cannot accommodate without subdividing the layer, increasing total layer count, print time, and the need for supporting structures. This paper introduces a variable exposure method to address this challenge. Rather than splitting a non-uniform layer into multiple uniform ones, our approach divides each layer into sublayers and modulates the UV illumination duration for each sublayer proportionally to its local thickness. This is governed by an established cure-depth equation relating exposure time to material penetration depth, allowing precise control over curing without additional hardware. The result is a meaningful reduction in total layer count for printed objects. Fewer layers directly translates to faster print times and a reduction in wasteful support structures. Our contribution is a practical and low-overhead extension to existing multi-axis DLP pipelines that improves print efficiency without sacrificing geometric accuracy, with clear applications in the rapid prototyping and reproduction of automotive components.
This study presents two analytical closed-form PI controller tuning solutions for second-order plants with real poles, each achieving monotonic step response and minimum settling time. The first solution employs pole-zero cancellation, placing the controller zero at the slower plant pole and reducing the closed-loop dynamics to a critically damped second-order system. The second solution, applicable when the plant pole ratio is less than two, places all three closed-loop poles at a common location without cancelling any plant pole, yielding a closed-loop transfer function with a triple real pole and a zero. Despite retaining a closed-loop zero, this solution achieves strictly faster settling time than the pole-zero cancellation method in its region of applicability. The two solutions coincide at the boundary pole ratio of two and together form a continuous piecewise-analytical tuning covering the full range of plant pole ratios. This study further establishes that closed-loop transfer functions of the form a^n/(s + a)^n possess a maximum sensitivity Ms together with phase margin and gain margin that are independent of the pole location a and depend solely on the order n, yielding universal robustness constants for each n. A closed-form expression GM(n) = 1 + sec^n(pi/n) is established for the gain margin of the family. Numerical verification confirms the analytical results across multiple plant configurations.
As data rates scale, intra-pair skew has become a critical bottleneck for high-speed differential signaling. Current analytical models are often limited, while 3D electromagnetic simulations are computationally intensive. This paper presents a comprehensive analytical framework for intra-pair skew in generic asymmetric coupled transmission lines, explicitly integrating skew into S-parameter formulations. We introduce the Intra-pair Skew Propagation Graph (ISPG), a novel graph-based methodology for calculating cumulative skew in complex, cascaded channels. The proposed framework is validated against both S-parameter simulations and empirical measurements of a 2m bulk twinax cable assembly, demonstrating excellent accuracy and robustness for high-speed interconnect design.
Wake-Up Radio (WUR) enables resource-constrained, battery-powered sensor nodes to remain in a low-power deep sleep state while continuously listening for a Wake-Up Signal (WUS). Sensor nodes only wake and transmit data after receiving the WUS, significantly reducing energy consumption. However, polling nodes whose transmitted data provides little or no meaningful update to the remote monitor can still result in unnecessary energy usage and increased storage overhead. To address this issue, this paper uses the Age of Incorrect Information (AoII) metric to prioritise the polling of nodes that provide informative updates to the remote monitor. Determining the optimal set of nodes to poll based on AoII can be formulated as a Restless Multi-Armed Bandit (RMAB) problem, which traditionally requires prior knowledge of the monitored process transition dynamics. Since such dynamics are often unknown in practical deployments, we propose an online learning framework based on state estimation to derive Whittle Index AoII (WAoII) and Fair Whittle Index AoII (FWAoII) policies without assuming known transition probabilities. The proposed policies efficiently schedule node polling while adapting to unknown process behaviour. Experimental evaluation using both real-world and synthetic datasets demonstrates that the proposed online WAoII policy can reduce packet transmissions by up to 70\% compared to the widely used Round Robin (RR) polling strategy, while maintaining Root Mean Squared Error (RMSE) values within acceptable application error tolerances. These results demonstrate the effectiveness of WAoII and FWAoII as energy-efficient polling techniques for low-power WUR sensor networks.
Neuromorphic cameras, also known as event-based cameras, can detect changes in the environmental brightness asynchronously and independently for each pixel. They output the brightness changes, i.e., events, as 3-D (2-D pixel coordinates + time) streaming data. While event-based cameras are used in many applications because of their desirable characteristics, e.g., high temporal resolution, low latency, low power consumption, and high dynamic range, their measurements contain considerable noise due to their high sensitivity. In this paper, we propose a denoising method for event-based cameras based on graph spectral features. In the proposed method, we first construct a graph where nodes represent events and edges represent the spatiotemporal distance between the events. To calculate the graph-specified parameter that controls the connectivities of a constructed graph, we utilize the prior on the density of 3-D events. We then calculate the eigenvectors of the graph Laplacian. The obtained eigenvectors are used to extract noiseless events directly. In the calculation of the eigenvectors, we customize the graph Laplacian to reorder its eigenvalues. This allows us to leverage fast eigensolver algorithms instead of the naive eigendecomposition and thereby reduce computational complexity. In experiments on synthetic and real-world event data, we demonstrate that the proposed method effectively removes noise events from the raw events compared to alternative methods.
Inferring latent dynamics from multivariate time-series defined over topological cell complexes is crucial for capturing the complex, higher-order interactions inherent in real-world systems such as in water, sensor, and transportation networks. However, reconstructing these latent states is challenging because the signals are coupled across higher-order topologies, while high dimensionality, nonlinear observations, and unknown structures increase the difficulty. To address this, we propose a topology-aware state space framework derived from stochastic partial differential equations on cell complexes. State evolution follows heat-like topological diffusion, with perturbations propagating along boundary operators. Under partial observability, we model observations using a cell complex convolution of latent states coupled with a nonlinear mapping. We perform recursive state estimation via an Extended Kalman Filter, simultaneously learning model parameters and uncertainties through an online Expectation-Maximization algorithm. Finally, for scenarios where only lower-order topological structure is known, e.g., nodes and edges, as in critical infrastructure networks, we introduce a heuristic cell identification algorithm to explicitly infer the second-order cell structures. Validations on synthetic and real datasets from water, sensor and transportation networks demonstrate that our approach yields reliable estimates under partial observability and successfully recovers the underlying topological structures.
Automotive radars are increasingly susceptible to mutual interference from neighboring radar systems, which can lead to false target detections and the masking of valid targets. While current interference levels remain manageable due to the relatively low penetration of radar-equipped vehicles, this assumption is expected to break down as radar adoption and per-vehicle radar density continue to increase. This paper presents a comprehensive analysis of automotive radar performance in high-density interference environments. A realistic end-to-end simulation framework is developed at the intermediate frequency (IF) level, incorporating analytical interference modeling and detailed radar signal processing. The study evaluates the impact of interference across a range of future scenarios characterized by increased radar density and multiple radar configurations per vehicle. Conventional interference mitigation techniques are systematically assessed to validate the simulation results, controlled experiments were conducted using a host radar exposed to up to 30 interfering radars in both anechoic and real-world environments. The results demonstrate significant performance degradation under high interference conditions, with substantial reductions in detection probability and effective range. Among the evaluated techniques, time-frequency coding consistently provides the most robust performance, maintaining high detection probability even at elevated radar penetration rates. These findings highlight the limitations of current mitigation approaches and emphasize the need for coordinated and scalable interference management strategies in future automotive radar systems.
In this paper, we theoretically analyze and experimentally demonstrate the performance gains achievable by integrating an in-house built reconfigurable intelligent surface (RIS) with a 5G new radio (NR) system implemented using the OpenAirInterface (OAI) software stack. Unlike conventional RIS-assisted systems that rely on explicit channel state information (CSI) estimation followed by RIS phase configuration optimization, we adopt a low-complexity approach in which the RIS phase states are randomly switched among predefined configurations. The resulting channel fluctuations are opportunistically exploited by the inherent proportional fair (PF) scheduling mechanism of 5G NR. We develop a theoretical framework that characterizes the interaction between RIS switching dynamics and PF scheduling. Based on this framework and the associated analysis, we provide design guidelines for selecting the RIS switching time $T_s$ and the PF throughput averaging window $T_c$ that maximize the system throughput. Experimental evaluations on the 5G NR testbed demonstrate improvements in key performance metrics, including reference signal received power (RSRP), block error rate (BLER), modulation and coding scheme (MCS) index, and throughput. Our key takeaway is that randomly configured RIS operation with appropriately chosen system parameters can achieve performance comparable to optimized RIS designs, with no additional overhead compared to a conventional 5G NR system. More importantly, it requires no coordination between the RIS and the 5G NR system.
The recent advancements of Large Language Models (LLMs) have spurred considerable research interest in extending their linguistic capabilities beyond text to other modalities, which leads to emergence of speech-based LLMs (SpeechLMs) with capability of processing user request in either speech or textual formats. However, owing to inter-modal discrepancies, these SpeechLMs still exhibit a significant performance gap compared to their text-based LLM counterparts in instruction-following, particularly when confronted with the dynamic and variable nature of user speech. To address this challenge, this paper introduces a framework termed Reinforced Behavior Alignment (RBA), designed to bolster the language generation proficiency of SpeechLMs. Instead of relying on supervised fine-tuning from human annotations, RBA employs a self-synthesis methodology to generate extensive, high-fidelity alignment data by a powerful teacher LLM. Then SpeechLMs is aligned its behavior with that of a teacher using a reinforcement learning-based approach. Experimental results demonstrate that this method effectively enhances the instruction-following capabilities of SpeechLMs that outperform conventional distillation baselines. Crucially, we demonstrate that RBA can be seamlessly extended to tasks such including spoken question answering and speech-to-text translation, attaining state-of-the-art performance on open benchmarks with only self-generated data.
Python is widely used for agent-based modelling because it is accessible and has a mature scientific ecosystem, but object-per-agent execution incurs interpreter overhead that restricts the population sizes feasible in interactive modelling, calibration, and parameter sweeps. This paper presents AMBER, a Python framework that stores agent state in a Polars-backed columnar table and exposes population operations through a compact view API. The framework preserves conventional model and agent abstractions while translating common population updates into compiled column operations; behaviours that do not vectorise remain expressible through a buffered object-oriented path. We evaluate AMBER on wealth transfer, random walk, and spatial SIR benchmarks against Mesa, AgentPy, SimPy, Melodie, this http URL, and AMBER's own loop path, using invariant checks to verify comparable model outputs before timing. Across the tested workloads, AMBER has the lowest execution time among Python-hosted implementations and achieves speedups of up to $1118\times$ over Mesa; on the largest SIR benchmark it is also faster than the Julia-based this http URL implementation.
Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales linearly with task variance, providing a quantitative criterion for when adaptation justifies its overhead. Validation on quantum gate calibration shows negligible benefits for low-variance tasks but >40% fidelity gains on two-qubit gates under extreme out-of-distribution conditions (10$\times$ the training noise), with implications for reducing per-device calibration time on cloud quantum processors. Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics. We further introduce a few-shot pre-adaptation protocol that estimates the optimal adaptation budget from $N{=}3$-5 probe steps within 3-19% relative error across out-of-distribution regimes.
Automatic speech recognition for French medical conversations remains challenging, with word error rates often exceeding 30% in spontaneous clinical speech. This study proposes a multi-pass LLM post-processing architecture alternating between Speaker Recognition and Word Recognition passes to improve transcription accuracy and speaker attribution. Ablation studies on two French clinical datasets (suicide prevention telephone counseling and preoperative awake neurosurgery consultations) investigate four design choices: model selection, prompting strategy, pass ordering, and iteration depth. Using Qwen3-Next-80B, Wilcoxon signed-rank tests confirm significant WDER reductions on suicide prevention conversations (p<0.05, n=18), while maintaining stability on awake neurosurgery consultations (n=10), with zero output failures and acceptable computational cost (RTF 0.32), suggesting feasibility for offline clinical deployment, pending validation on larger corpora.
Accurate inter-vehicle distance estimation is a cornerstone of Advanced Driver Assistance Systems (ADAS) and autonomous driving. While LiDAR and radar provide high precision, their high cost prohibits widespread adoption in mass-market vehicles. Monocular camera-based estimation offers a low-cost alternative but suffers from fundamental scale ambiguity. Recent deep learning methods for monocular depth achieve impressive results yet require expensive supervised training, suffer from domain shift, and produce predictions that are difficult to certify for safety-critical deployment. This paper presents a framework that exploits the standardized typography of United States license plates as passive fiducial markers for metric ranging, resolving scale ambiguity through explicit geometric priors without any training data or active illumination. First, a four-method parallel plate detector achieves robust plate reading across the full automotive lighting range. Second, a three-stage state identification engine fusing optical character recognition text matching, multi-design color scoring, and a lightweight neural network classifier provides robust identification across all ambient conditions. Third, hybrid depth fusion with inverse-variance weighting and online scale alignment, combined with a one-dimensional constant-velocity Kalman filter, delivers smoothed distance, relative velocity, and time-to-collision for collision warning. Baseline validation on a controlled static dataset reproduces a 2.3% coefficient of variation in character height measurements and a 36% reduction in distance-estimate variance compared with plate-width methods from prior work.
We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step, outperforming multi-step diffusion baselines and establishing a new paradigm for speech enhancement.
Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as \textit{HilbNets}. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin \& Niyogi \cite{BELKIN20081289} convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.
This paper extends safety guarantees for multi-task Bayesian optimization with uncertain co-regionalization matrices from intrinsic co-regionalization models to linear models of co-regionalization. The latter allows for more flexible modeling of the inter-task correlations by composing multiple features. We derive uniform error bounds for vector-valued functions sampled from a Gaussian process with a linear model of co-regionalization kernel. Furthermore, we show the potential performance gains of linear models of co-regionalization in a numerical comparison on a safe multi-task Bayesian optimization benchmark.
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.