Event cameras have emerged as a high-bandwidth, low-latency sensing modality for safety-critical perception in automated driving systems (ADS), offering microsecond temporal resolution, 120-140 dB dynamic range, and intrinsic absence of motion blur. However, no task-agnostic quality metric currently operates directly on the asynchronous event stream: state-of-the-art proxies require a downstream task (e.g., detection accuracy, tracking error) to assess stream integrity, which is incompatible with the certification requirements of ISO 21448 (SOTIF) and ISO/PAS 8800:2024. The recent BiasBench benchmark (CVPR 2025) explicitly identifies this gap. This work proposes a unified algebraic framework that lifts the Pearson Correlation Coefficient (PCC), historically used in two prior works for redundancy filtering and ROI selection on frame-based images, to the three standard event representations: Time Surface, Event Frame, and Voxel Grid. The framework yields three metrics: (i) r-TS for stream integrity monitoring against an ego-motion-predicted Time Surface, (ii) r2-EF for adaptive ROI selection requiring only integer comparisons, and (iii) r-VG for temporal redundancy gating. A structural isomorphism is established between the contrast-threshold mechanism of the event camera (|Delta L| >= C) and the PCC-based change criterion, the three lifted metrics are formalized, and pipeline latency and information loss are analyzed symmetrically against the raw stream. Illustrative behavior of each metric is demonstrated on a procedural-synthetic event stream, generated by direct simulation of the emission model rather than drawn from any real or video-derived dataset, including a tunnel-dip integrity-anomaly scenario in which r_C drops from 0.93 (coherent flow) to below 0 (alarm). An explicit epistemic convention ([ESTABLISHED], [SOLID], [HYPOTH.], [OPEN]) delineates the status of every contribution.
Steganalysis models excel on benchmark datasets but struggle in the wild when analyzed images are produced by a processing pipeline unseen during training. This problem known as Cover Source Mismatch (CSM) is particularly hard in realistic settings where practitioners (1) have access to only a small, unlabeled dataset, (2) are unsure of the processing techniques applied to these images, and (3) lack information on the proportion of covers and stegos in that set. To answer this challenge, we introduce TADA (Target Alignment through Data Adaptation), a framework learning to emulate the unknown processing pipeline from a small unlabeled target set. This architecture is trained with a loss combining residual covariance alignment, residual distribution matching, and a $\ell^2$ loss constraining the emulator to produce realistic images. Across toy and operational targets, TADA yields substantial gains in robustness to CSM and improves operational generalization compared to strong holistic and atomistic baselines. Additional resources are available at this link: this https URL
The Versatile Video Coding (VVC) standard, introduced in 2020, offers 40-50% bitrate savings for equivalent visual quality of reconstructed videos over its predecessor, High Efficiency Video Coding (HEVC), at the cost of significantly increased encoding complexity. This growth in encoding complexity is mainly due to the addition of the Quad Tree Multi Type Tree (QTMTT) partitioning structure, which increases the split combinatorial complexity. This paper presents a critical evaluation of state-of-the-art (SOTA) partitioning acceleration techniques designed to reduce the complexity of the partitioning search in VVC. Particular attention is given to how these methods have evolved alongside successive versions of the VVC Test Model (VTM), which serves as the reference software for benchmarking coding tools. These techniques are analyzed in the context of their adaptation to internal changes in VTM, such as updated heuristics for fast partitioning decisions. The study also highlights the challenges involved in improving the trade-off between encoding complexity and compression efficiency. This challenge becomes more pronounced when evaluating methods across diverse VTM configurations and multiple software versions.
Glaciers play a critical role as freshwater reserves and indicators of climate change, yet their automatic delineation, especially for debris-covered glaciers, remains challenging due to spectral similarity with surrounding terrain. This study introduces CryoNet, a deep learning framework that leverages a rich multi-modal dataset combining Sentinel-2 optical imagery, DEM-derived topographic variables, spectral indices, Principal Component Analysis (PCA), InSAR coherence and phase, tasseled-cap features, and GLCM texture to discriminate clean-ice glaciers, debris-covered glaciers, and glacial lakes. CryoNet is an encoder-decoder CNN with nested skip connections and spatial-channel Squeeze-and-Excitation (scSE) attention, built upon a ResNet101 encoder to capture hierarchical contextual and spatial features. The study is conducted in the Poiqu Basin in the central Himalaya, and transferability is evaluated by applying the trained model to the Mont Blanc Massif in the Alps. We additionally analyse the importance of each data layer in improving glacier mapping performance. The proposed model achieves an overall IoU of 90.52%, mean Recall of 98.08%, and mean Precision of 92.26%. For debris-covered glaciers specifically, CryoNet obtains an IoU of 90.46%, a recall of 95.79%, and a precision of 94.21%. Across both per-class and overall metrics, CryoNet surpasses DeepLabV3+, SegFormer, and U-Net, taken as state-of-the-art (SOTA) references, demonstrating its effectiveness for robust glacier mapping in complex high-mountain environments.
This work proposes a variable neighbourhood search (FTS) that uses a fractal-based local search primarily designed for images. Searching for specific content in images is posed as an optimisation problem, where evidence elements are expected to be present. Evidence elements improve the odds of finding the desired content and are closely associated to it in terms of spatial location. The proposed local search algorithm follows the fashion of a chain of triangles that engulf each other and grow indefinitely in a fractal fashion, while their orientation varies in each iteration. The authors carried out an extensive set of experiments, which confirmed that FTS outperforms state-of-the-art metaheuristics. On average, FTS was able to locate content faster, visiting less incorrect image locations. In the first group of experiments, FTS was faster in seven out of nine cases, being >8% faster on average, when compared to the second best search method. In the second group, FTS was faster in six out of seven cases, and it was >22% faster on average when compared to the approach ranked second best. FTS tends to outperform other metaheuristics substantially as the size of the image increases.
Accurately estimating the point spread function (PSF) of an optical system requires solving free-space wave propagation, which entails evaluating a diffraction integral. This integral is traditionally computed numerically using Fast Fourier Transform (FFT) or Hankel Transform, as it lacks a closed-form solution. We show that, under defocus and spherical aberration, the diffraction integral admits an approximate closed-form solution by combining a piecewise Bessel approximation with Gaussian-type integrals. Based on this result, we develop a fast wave-based PSF simulator with linear complexity in the radial resolution. The proposed, un-optimized simulator achieves up to a 2x speedup over Hankel-based integration and a 4x speedup over FFT while closely matching wave-optical PSFs, enabling efficient large-scale depth-of-field synthesis.
When the blood supply to the brain is obstructed by a clot, oxygen delivery to brain tissues becomes insufficient, leading to cellular necrosis. In healthcare settings, accurately identifying and delineating ischemic lesion boundaries is essential for treatment and surgical planning. However, ischemic stroke lesions vary widely in shape, size, and location, and in grayscale MRI modalities such as T1W they may resemble surrounding brain structures. This makes lesion detection and segmentation a challenging task for clinicians. This study introduces a novel VRU-Net architecture, derived from visual features, residual connections, and a U-shaped network, for detecting and segmenting ischemic stroke lesions in 3D magnetic resonance imaging scans. The proposed method first uses a modified VGG model to identify ischemic stroke in separate 2D slices. Then, a U-shaped segmentation model with residual blocks segments the lesion in each slice. This procedure is applied independently to the axial, sagittal, and coronal planes, and the final output is generated by aggregating the three segmentation results. To improve both performance and processing speed, a high-performance classifier is applied before the segmentation model in a sequential framework. This strategy reduces unnecessary segmentation of non-lesion slices and improves overall accuracy. In addition, decomposing 3D images into 2D slices reduces model complexity while allowing information from three anatomical planes to support more accurate lesion localization. The proposed model is trained on the Anatomical Tracings of Lesions After Stroke dataset and outperforms state-of-the-art models in terms of accuracy and Dice coefficient. Moreover, the segmentation output provides feedback that helps the classification model reduce false-positive predictions.
Hyperspectral super-resolution (HSR) reconstructs a high-spatial-resolution hyperspectral image by fusing a low-resolution hyperspectral image (LR-HSI) with a high-resolution multispectral image (HR-MSI). In the absence of real-world paired data, HSR methods are evaluated almost exclusively on synthetic experiments derived from hyperspectral datasets through Wald's protocol. Despite the protocol's widespread adoption, its practical implementation varies markedly across research works, typically relying on a single (usually Gaussian) or very few point spread functions (PSFs), one or two spectral response functions (SRFs), and a couple of spatial downsampling factors. As a result, reported performance figures are difficult to compare across the literature, in addition to being often difficult to reproduce; furthermore, they may not generalize across realistic sensing conditions. We introduce HyperBench, a unified and extensible framework that standardizes synthetic experimentation for HSR. HyperBench supports diverse degradation configurations spanning ten PSFs, four SRFs derived from operational multispectral sensors, configurable spatial downsampling factors, and matched additive white Gaussian noise; its goal is to automate large-scale evaluation and structured logging. By decoupling model development from experimental design, the framework enables reproducible, apples-to-apples cross-method comparison with minimal friction. We use HyperBench to evaluate six recently proposed HSR methods across a 70-configuration sweep on four widely used hyperspectral scenes and observe that the inter-method PSNR spread widens from approximately 5 dB on the easiest PSF to over 13 dB on the hardest - a fragility that is structurally invisible to the prevailing single-configuration evaluation protocol. HyperBench code is available at this https URL .
This paper introduces a Doppler domain localized (DDL) implementation of the adaptive matched filter (AMF) for radar target detection in severely heterogeneous clutter environments with limited training data. The proposed detector uses the concept of a region of possible target detection (RPTD), a small set of Doppler bins that capture most of the target signal power. This RPTD-based DDL-AMF detector outperforms an earlier suggested DDL implementation of the generalized likelihood ratio (GLR) test, which employs the region of detection improvement (RODI) concept. Unlike the RODI-based DDL-GLR detector, the proposed DDL-AMF detector requires no information on clutter spectrum parameters and no measurements to determine the number and locations of RODIs. Moreover, the performance of the RODI-based DDL-GLR detector falls far below the optimum when the target Doppler frequency is unknown. In contrast, the RPTD-based DDL-AMF detector ensures rapid adaptive detection with near-optimum performance under unknown target Doppler frequency and multimodal clutter spectra.
This paper presents a passivity-based control framework for AC-DC converters supplying non-passive Information Technology rack loads in DC data centers. Unlike conventional cascaded proportional-integral controllers that ensure stability only near nominal operating points, the proposed method is derived from the system total energy balance using the Port-Hamiltonian formulation. By shaping the stored energy and injecting virtual damping through a lossless interconnection with a PH controller, the converter behaves as a passive system even when interfaced with non-passive loads or under grid disturbances. The closed-loop system guarantees asymptotic voltage regulation and strict energy dissipation without assuming constant grid voltage or frequency. Simulation studies under realistic load and fault scenarios validate that the proposed controller achieves smaller voltage deviations, faster recovery, and superior robustness, demonstrating its suitability for future high-efficiency DC data-center architectures.
Neural control offers strong potential for handling highly nonlinear dynamics in shipboard microgrids (SMGs), yet its black-box nature can trigger abrupt control spikes and actuator saturation during initial transient shocks. This letter devises a formal verification method for SMG neural controller to assess its shock responses. Our contributions include: 1) a set-based SMG differential-algebraic equation(DAE) model compatible with set propagation; 2) a DAE-embedded bound propagation approach to compute tight envelopes of all possible neural control output. Extensive case studies demonstrate the effectiveness of the devised method in formally certifying SMG control performance under uncertain disturbances.
This paper investigates the control synthesis for continuous-time uncertain systems under nested Signal Temporal Logic (STL) specifications containing nested temporal operators. Control Barrier Functions (CBFs) are utilized herein to encode STL formulas into system constraints. However, traditional CBF designs fail to encode nested STL formulas, whereas recent reachability analysis-based methods capable of handling such formulas are inapplicable to uncertain systems and suffer from a severe computational burden. To overcome these challenges, a novel recursive CBF design procedure based on a modified STL tree (sTLT) is proposed to yield explicit parameterized CBFs. Within this framework, sliding window variables are introduced to capture complex temporal relationships. Crucially, satisfying the resulting CBF constraints is proven to guarantee the fulfillment of the STL specifications. To render the proposed recursive CBF design applicable to systems subject to uncertain disturbance, a novel controller based on reconstructed CBF using quadratic programming (QP) is proposed, ensuring strict CBF constraint satisfaction under disturbances. In contrast to existing methods, the proposed reconstructed CBF approach requires no prior knowledge of the disturbances while relaxing initial safety assumptions. Simulation results validate the efficacy of the proposed approach.
Achieving effective uplink bistatic ISAC over an OFDM waveform gives rise to challenging interference structures. These are mostly due to unequal direct- and echo-path contributions and Doppler-induced ICI, rendering orthogonal resource separation and fixed SIC strategies inadequate. To address this problem, we propose a RS-inspired framework where the transmitter splits each communication message into a robust and a supplementary stream, which are jointly superposed over a sensing signal. Furthermore, we present the design of a staged sensing-communication receiver. Based on this framework, we derive tractable per-subcarrier SINR expressions and establish the relation between sensing accuracy and communication reliability based on the Fisher information. Building on these, we formulate a joint power-allocation problem for SE maximization under sensing-performance and power constraints. The resulting non-convex formulation is solved using convex surrogates and fractional programming. Numerical results demonstrate that, compared to NOMA-inspired baselines, the proposed framework provides more effective IFI management and improved robustness to Doppler-induced ICI.
Short-packet communication alters the fundamental performance limits of reconfigurable intelligent surface (RIS)-assisted systems, making conventional analyses based on the infinite blocklength regime insufficient. This work investigates RIS-assisted transmission in the finite blocklength (FBL) regime while explicitly incorporating thermal noise generated by passive RIS elements, an effect commonly neglected in existing models. A unified analytical framework is developed to characterize the block-error rate (BLER), its asymptotic behavior, and the resulting goodput under both uniform and non-uniform RIS reflection coefficients. Our results show that ignoring RIS thermal noise leads to a pronounced overestimation of reliability with the mismatch increasing as the number of reflecting elements grows. Furthermore, increasing the RIS size does not always improve performance, particularly in the low transmit power regime where accumulated noise becomes dominant. Overall, the results highlight fundamental limitations of idealized RIS models and demonstrate the need for incorporating thermal noise for accurate system evaluation.
Advanced air mobility operations will require reliable coordination mechanisms for managing dense traffic near vertiports. However, sequencing decisions may become vulnerable when they rely on potentially falsified self-reported information such as estimated time of arrival. Self-interested vehicles may misreport their arrival times to obtain favorable landing priority, while malicious actors may spoof information to disrupt sequencing decisions or induce unnecessary congestion. This paper studies secure coordination for vertiport sequencing under sensing uncertainty. We consider a coordinator that combines self-reported Remote-ID information with externally obtained surveillance measurements to check reports and assign separation-feasible arrival schedules. Since surveillance-based estimates are uncertain, falsified reports may remain consistent with the sensing uncertainty region and cannot always be rejected outright. We therefore formulate sequencing as a robust design problem over this uncertainty region. Self-interested misreporting is modeled as a strategic deviation that improves the reporting vehicle's own sequencing outcome, whereas malicious spoofing is modeled as an adversarial disturbance that degrades the system-level objective. The final paper will develop robust sequencing rules over surveillance-consistent uncertainty sets and evaluate their performance in representative vertiport sequencing scenarios.
Purpose: Diffusion MRI (dMRI) provides a diverse set of quantitative measures and derived datatypes to assess white matter microstructure and macrostructure. Coupled with the increasing size of imaging studies using dMRI, the number of downstream outputs requiring quality control (QC) will continue to grow. Previous work has shown that failure modes which are often not evident from aggregate metrics or summary statistics can be identified through structured visual inspection. This work aims to better understand common failure modes and the expected characteristics of valid dMRI processing outputs to ensure the validity and interpretability of quantitative findings. Approach: We deployed a structured QC framework to assess 18,328 dMRI scans across nine datasets, visually evaluating the outputs of seven processing pipelines representative of conventional dMRI analyses. Results: Downstream outputs that pass visual QC may still rely on failed upstream dependencies; such failures may only be visually detectable through systematic inspection of the full pipeline hierarchy. Additionally, appropriate QC granularity is algorithm-specific, as the spatial structure of each algorithm's outputs determines whether failures warrant selective or global exclusion. Conclusion: This work demonstrates the feasibility and analytical value of large-scale, structured QC for dMRI processing pipelines. Our results highlight the need for systematic QC spanning the full processing hierarchy to ensure the validity and interpretability of quantitative findings.
Field-scale crop maps support supply-chain forecasting and policy, yet statewide crop identification still often depends on retrospective surveys or remote-sensing workflows built around hand-engineered spectral features. Those pipelines can be accurate, but they require repeated preprocessing and often lose robustness across years. This study evaluated whether Google DeepMind's AlphaEarth geospatial embeddings can serve as an analysis-ready alternative for mapping processing tomato systems in California. LandIQ 2018 crop polygons were used to assemble a balanced reference dataset of 4,742 tomato and 4,742 non-tomato fields. For each polygon, 64-band AlphaEarth embedding chips were extracted and aligned with binary masks, then divided into spatially independent training (n = 6,638), validation (n = 1,422), and test (n = 1,424) sets. A U-Net segmentation model was trained on AWS SageMaker using a composite masked binary cross-entropy and soft Dice loss. To complement hard predictions, Monte Carlo dropout was retained at inference and repeated 100 times per chip to estimate predictive mean and variance. On the independent test set, the model achieved 99.19% pixel accuracy, 98.69% precision, 99.40% recall, 99.04% F1 score, 98.11% intersection over union, and 99.02% chip accuracy. Uncertainty maps were consistently highest near field edges and low within field interiors. The results show that AlphaEarth embeddings retain crop-relevant spatial and temporal structure and can support accurate, field-scale tomato mapping without manual feature engineering.
Beamforming has proven to be valuable in enabling full-duplex massive MIMO base stations, but doing so effectively often requires knowledge of the self-interference channel matrix H. Estimating this high-dimensional channel is costly in practice, however, since it requires a prohibitive number of measurements, especially in fast-fading conditions. In this work, we overcome this dilemma by designing full-duplex beams using implicit channel knowledge gathered from a relatively small number of measurements across H. These measurements are collected by the base station using a sequence of beams tailored to both the deployment environment and the particular users being served. This is accomplished through site-specific training of a transformer-based deep learning model that learns to efficiently probe portions of H most relevant to the particular users being served by exploiting the underlying structure of the surrounding environment. The deep learning model then uses these probing measurements to design transmit and receive beams that couple low self-interference while delivering high gain to a pair of downlink and uplink users. For favorable multi-user scaling, a single set of probing measurements can be used by the model to serve several users throughout the coherence time of H by leveraging correlations across those users' channels. Simulation results using ray-tracing demonstrate that our proposed approach exceeds the best possible performance with explicit channel estimation across a wide range of scenarios, especially with large antenna arrays.
The synergistic interpretation of anatomical information from computed tomography (CT) and metabolic information from positron emission tomography (PET) is important to oncologic imaging. However, existing deep learning methods for PET/CT remain largely task-specific, are often trained on single-center cohorts, or adopt dual-branch fusion schemes that delay cross-modal interaction and underutilize early spatial correspondence between PET and CT. To address these limitations, we present an open-source, multi-center, whole-body FDG PET/CT foundation model utilizing 4,997 harmonized scans from four public datasets. Our framework employs hierarchical UNet-shaped backbones with early channel-wise concatenation, enabling anatomical and metabolic features to interact from the first embedding layer onward. We further introduce a masked autoencoding objective based on zero-mean imputation, combined with a weighted global reconstruction loss. This design avoids non-physical intensity discontinuities at masked-region boundaries that arise from learnable mask tokens. On downstream AutoPET lesion segmentation, the proposed models demonstrate strong label efficiency: with only 10\% of the labeled training data, they achieve performance comparable to models trained from scratch on the full dataset. Under extreme 5-shot linear probing, joint PET/CT pretraining also achieves higher Dice scores than separated-modality pretraining. This multi-center foundation model demonstrates label efficiency and cross-modality representation learning for PET/CT tumor segmentation. It provides a robust, open-source basis for advancing automated oncologic imaging, significantly reducing the need for large-scale manual annotations in clinical practice.
Objective: Conventional urodynamics (UDS) provide critical diagnostic information, but requires invasive dual catheterization and manual labeling of clinically important events. Wireless, catheter-free bladder function tests are becoming available for home use, but only provide vesical pressure (Pves). We developed a machine learning framework that was trained and externally validated on UDS data for automated urological event classification from single-channel (Pves) recordings. Methods: We analyzed 118 annotated UDS traces segmented into 0.8-second Pves intervals. Using the discrete wavelet transform, we extracted 55 statistical features per segment. Consecutive segments (233,338 segments; three classes) sharing the same class, abdominal (ABD), detrusor overactivity (DO), or voiding contraction (VOID), were grouped into events, and median feature aggregation was applied to derive event-level representations. Using an imbalanced dataset, we trained a two-stage multilayer perceptron (MLP): Stage 1 distinguished VOID vs non-VOID, and Stage 2 classified non-VOID into ABD and DO. The model was trained on two independent datasets and externally validated on a third independent dataset. Additional cross-dataset training-validation permutations were performed to assess generalizability. Performance was evaluated using accuracy, F1-macro, sensitivity, specificity, and area under the curve (AUC). Results: Stage 1 (VOID vs. non-VOID) achieved 84% accuracy (balanced accuracy 76%), F1-macro 0.74, and AUC 0.85, while Stage 2 (ABD vs. DO) reached 90% accuracy (balanced accuracy 80%), F1-macro 0.80, and AUC 0.87. Permutation feature importance indicated that most features contributed meaningfully. Conclusion: Our machine learning approach enables accurate automated detection of urological events from Pves, demonstrating feasibility for single-channel monitoring and future ambulatory applications.
Coordinate-conditioned neural networks can generate head-tracked personal sound zone (PSZ) loudspeaker filters in real time, but they are sensitive to localization uncertainty. Small fluctuations in estimated listener coordinates, caused by optical distortion, temporary occlusions, or tracking jitter, may produce large filter changes even when listeners are physically stationary. This paper proposes neighbor-consistent neural filters that regularize the coordinate-to-filter mapping by penalizing filter differences at randomly perturbed neighboring coordinates during training. To evaluate robustness against tracking noise, we introduce a decoupled protocol that fixes the acoustic transfer functions at a physical anchor while perturbing only the coordinate inputs used for filter generation. Isolation quality and local stability are evaluated using neighborhood median and lower-tail statistics of inter-zone and inter-program isolation, together with spatial variation rates that quantify metric sensitivity within a coordinate neighborhood. In simulation with a split-band woofer-tweeter system and 25 randomly sampled anchor positions, neighbor consistency reduces the root-mean-square (RMS) variation rate by up to 55.9% in the woofer band and 30.3% in the tweeter band while largely preserving isolation quality and improving lower-tail robustness. In in-situ measurements using a 24-driver array and two stationary head-and-torso simulators, the proposed regularization improves worst-case neighborhood isolation by up to 16.9% and reduces spatial variation rates by up to 61.8%. These results demonstrate that neighbor-consistency regularization effectively stabilizes PSZ rendering under localization uncertainty.
Orbital debris is a pressing problem which presents a danger to global space operations and a barrier to continued development of the space economy and space infrastructure. As research continues regarding orbital debris, there is a need for tools to understand the system-level implications of orbital debris solutions. This research considers the orbital debris problem as a dynamic process. Based on dynamic system theories, time-series variables of the numbers of orbital debris, orbital objects, and object launches should be causally linked, which means they share a common system attractor manifold. We propose a data-driven method based on complexity science to reconstruct a shadow attractor of the dynamic system using limited observable variables. The reconstructed shadow attractor helps us to understand the fundamental system dynamics for orbital debris and enables us to simulate the future of the orbital debris system based on changes to policy. These findings represent a significant advancement in our ability to understand high level impacts of space system policy with limited data available.
In this letter, we propose a new wireless sensing system equipped with a rotatable antenna (RA) array to enhance the sensing performance of a uniform sparse array (USA). To tackle the severe spatial undersampling issues, we propose a novel tensor decomposition-based direction-of-arrival (DOA) estimation algorithm. Specifically, we introduce a synchronous multiple rotation pattern for active target probing such that the received signals across multiple rotations to capture the diverse spatial degree of freedoms. Subsequently, we mathematically formulate the received signals across successive rotations as a third-order tensor, and leverage the canonical polyadic decomposition to obtain the factor matrices incorporating the DOA of targets. By analyzing the extrema distribution laws of array steering vector correlation (SVC) and gain SVC of RAs, we propose to combine the array and gain factor matrices via the Kronecker product, which theoretically guarantees the unambiguous DOA estimation. Simulation results demonstrate that the proposed RA-enhanced tensor decomposition-based algorithm achieves high-precision and unambiguous sensing performance compared to conventional uniform dense arrays and omnidirectional antenna systems.
The highly dynamic nature of vehicular networks necessitates proactive and site-specific radio resource management (RRM) to achieve ultra-reliable low-latency communications. While Network Digital Twins (NDTs) have emerged as a promising enabler, ray-tracing remains time-consuming, challenging accurate RRM under latency constraints. We propose AdaPTwin, an adaptive multi-fidelity predictive NDT for proactive and latency-aware RRM in vehicular networks. Unlike single- and multi-fidelity NDTs with fixed fidelity levels, AdaPTwin dynamically adjusts NDT fidelity based on network conditions. The framework adopts a hierarchical cloud-edge architecture, where computationally intensive fidelity selection is performed periodically in the cloud, and the proactive RRM loop operates in real-time at the edge. The edge-based proactive RRM task consists of channel prediction between vehicles and roadside units (RSUs) via trajectory forecasting and look-ahead ray tracing, followed by RRM execution. A transformer model enhanced with continual and transfer learning enables vehicular trajectory prediction while adapting to new environments and traffic patterns. Ray-tracing is performed using NVIDIA Sionna by exploiting a dynamically updated virtual environment to ensure realistic radio propagation within the NDT. Furthermore, a joint RSU beamforming and vehicle-RSU association problem is formulated to maximize proportionally fair sum-rate, and it is efficiently solved using a scalable multi-start iterative coordinate descent algorithm. Comparisons against reactive, single-fidelity, and non-adaptive predictive NDTs under realistic vehicular conditions confirm that AdaPTwin successfully adapts to diverse scenarios where other frameworks fail. Ultimately, AdaPTwin achieves up to 90% sum-rate gain and 80% outage probability reduction compared to non-adaptive NDTs, while maintaining real-time performance.
The integration of machine learning with domain-specific physics is transforming the design, monitoring, and control of electricity systems, where data scarcity, limited interpretability, and the need to enforce physical laws constrain purely data-driven models. Physics-informed machine learning (PIML) addresses these limitations by embedding governing equations directly into the learning process, yielding accurate, efficient, and scalable solutions for Industry 4.0 applications. This article reviews hybrid PIML architectures for electricity systems, including physics-informed neural networks (PINNs), Deep Operator Networks (DeepONets), Fourier Neural Operators, Extreme Learning Machine-enhanced PINNs, graph-based PINNs (PIGNNs), and domain-decomposition PINNs. Each approach is examined through case studies spanning field analysis, fault detection, digital twins, surrogate modeling, and control optimization. The review shows that embedding Maxwell's equations and other first-principles constraints substantially improves predictive accuracy under sparse and noisy data, reduces simulation time by orders of magnitude relative to finite element methods, and enhances generalization across operating regimes. Hybrid frameworks consistently outperform purely data-driven baselines on parameter sensitivity, dynamic behavior, and robustness, while supporting real-time digital-twin calibration and uncertainty quantification. Persistent challenges include training instability for stiff multi-scale problems, computational cost of high-fidelity models, and the absence of standardized benchmarks. The findings demonstrate that PIML enables a paradigm shift from black-box data-driven methods to transparent, physics-informed strategies, positioning the field for sustained innovation in resilient and intelligent electricity systems.
Accurate and robust medical image classification is paramount for early disease diagnosis and treatment planning. However, challenges such as limited annotated data, high intra-class variability, and subtle inter-class differences often hinder the performance of deep learning models. This paper introduces a synergistic deep learning framework that leverages the strengths of self-supervised learning and transfer learning for enhanced medical image classification. Our approach employs two distinct ConvNeXt-Tiny models: one pre-trained on a large-scale natural image dataset (ImageNet) and another pre-trained using an entropy-guided Masked Autoencoder (MAE) on the target medical dataset. Both models are then fine-tuned on specific medical image classification tasks. A final ensemble strategy, based on averaging predicted probabilities, is utilized to combine the complementary insights from these two models. Rigorous experimental validation across four diverse medical imaging datasets (Breast Ultrasound Images (BUSI), International Skin Imaging Collaboration (ISIC) 2018, Kvasir, and COVID) demonstrates the superior performance and robustness of our ensemble approach. The MAE pre-training significantly improves feature learning on domain-specific data, while the ImageNet pre-training provides strong generalizable features. The ensemble consistently achieves state-of-the-art results, outperforming individual models and existing methods, highlighting the efficacy of combining diverse pre-training strategies for challenging medical image analysis.
In modern industrial systems, machinery frequently operates under dynamic environments with continuously varying loads and speeds. Consequently, deep learning-based fault diagnosis models often suffer from severe performance degradation under unseen operating conditions due to complex data distribution shifts. Since existing methods predominantly rely on static offline training, they lack the capability to dynamically adapt to these continuous variations. To address this issue, an integrated framework combining offline domain generalization (DG) and online test-time adaptation (OTTA) is proposed. Initially, a model with preliminary generalization capability is obtained offline by extracting domain-invariant features via adversarial learning. During the online phase, a dual-memory replay mechanism is developed. By selectively storing high-confidence online pseudo-labeled samples and replaying them with historical offline data, the model facilitates adaptation to changing data distributions and helps reduce forgetting of previously learned knowledge Experiments on a real-world motor dataset show that the proposed approach achieves competitive performance under the considered unseen operating conditions.
This paper investigates the joint optimization of power allocation and antenna activation in sparse extremely large aperture array systems operating under power amplifier non-linearities. We first derive an analytical expression for the achievable spectral efficiency (SE) of point-to-point MIMO channels affected by non-linear distortions using the Bussgang decomposition. To address the combinatorial and non-convex nature of the energy-efficiency (EE) maximization problem, we employ an unsupervised deep neural network (DNN) that learns the non-linear mapping between the channel state information and the optimal EE operating point. The DNN jointly predicts distortion-aware power allocation, total transmit power scaling, and modular sub-array activation based on singular-value and geometric channel features. Numerical results demonstrate that the proposed DNN-based arrays achieve significant EE gains over the conventional sparse arrays.
Since the beam squint and near-field effects both inherently exist in upper-6 GHz (U6G) extremely large-scale multiple-input multiple-output (XL-MIMO) systems, wideband near-field channel estimation faces severe challenges, such as higher computational complexity, and higher pilot overhead particularly at hybrid architectures with fewer radio frequency (RF) chains. To precisely reduce the complexity and number of pilots, the parametric symmetry of wideband near-field channels is explored, such that the channel parameters, including angle, distance, and range, can be decoupled based on the delay variations observed by different antennas. Based on this, a distributed parametric symmetry-based (DPS) algorithm, applicable to U6G XL-MIMO, is proposed. The delays observed by different subarrays are estimated and extrapolated across the local processing units (LPUs) firstly, and then, the channel parameters are decoupled and estimated at the central processing unit (CPU), by only linearly combining the delays from different LPUs. The path gains are calculated at different LPUs, respectively, to reconstruct the channel with low complexity. Since the proposed algorithm does not rely on scanning the polar-domain dictionary, only a single pilot is required even with hybrid architectures. Furthermore, the computational complexity, multiple-path resolution, Cramer-Rao lower bound (CRLB) and lower bound (LB) of the estimates in hybrid architectures and the DPS algorithm, respectively, are analyzed, to evaluate the realizable potential of the proposed algorithm. The simulation results prove that the proposed algorithm has a higher estimation accuracy, while requiring less complexity and pilots.
This paper addresses bearing-only algorithms for solving the Fermat-Weber Location Problem (FWLP) with a unicycle agent. Unlike existing FWLP solutions for single- or double-integrator agents, our approach accounts for the nonholonomic constraints of wheeled robots. We first develop a bearing-only control law for the case with stationary beacons. Next, we consider saturated control inputs and propose a corresponding bearing-only control law. Finally, we address moving beacons with constant velocities and develop a control law that enables the unicycle agent to track the moving Fermat-Weber point. Both simulations and experiments are provided to demonstrate the effectiveness of the proposed methods.
Extremely large aperture arrays (ELAAs) and millimeter-wave (mmWave) technologies are essential for achieving high data rates in future wireless communication systems. To perform precise beamforming, these systems require accurate channel estimation, in which the near-field wavefront curvature effect must be taken into account. Existing channel estimation methods rely on the spherical wavefront channel (SWC) model, which is suitable for near-field propagation with point sources, scatterers, and reflection planes. However, when a near-field curved reflecting surface exists, the wavefront of the reflected wave becomes anisotropic rather than spherical, causing the SWC model to become inaccurate. To address this problem, in this paper, we formulate a parameterized model for the anisotropic wavefront channel (AWC). Using this model, we propose a channel estimation algorithm based on physical parameter recovery for the AWC. Simulation results reveal that the AWC no longer retains sparsity in the angle-distance domain. Furthermore, the results demonstrate how different physical characteristics of the propagation scenario affect the degree of wavefront anisotropy, and confirm the effectiveness of our proposed algorithm in AWC scenarios.
User-defined keyword spotting (KWS) is crucial for personalized voice interaction, yet existing methods face several challenges: (1) insufficient discriminability among confusable words, (2) performance inconsistency across speakers with varying pronunciations, and (3) high data cost to ensure reliable wake-word performance. In this paper, we introduce DMA-KWS, an efficient and robust framework for user-defined keyword spotting. First, it adopts a dual-stage matching pipeline: CTC decoding with streaming phoneme search to locate candidate segments, followed by QbyT with a phoneme matcher for fine-grained verification, enabling it to better distinguish confusable words. Next, multi-modal enrollment fuses user-specific speech with text embeddings to further improve accuracy for registered users. Finally, a parameter-efficient continual adaptation mechanism performs lightweight updates using synthetic and real data. Extensive experiments demonstrate the superior performance of DMA-KWS. On the LibriPhrase Hard subset, it achieves 97.85% AUC and 6.13% EER, reaching state-of-the-art performance. In speaker-dependent settings, DMA-KWS consistently outperforms text-only enrollment, demonstrating significant performance gains. Moreover, the proposed parameter-efficient fine-tuning mechanism adapts DMA-KWS with only 187k updated parameters, further enhancing KWS performance while ensuring suitability for on-device deployment.
Renewable-driven microgrids dominated by grid-forming (GFM) converters are subject to persistent power fluctuations, making equilibrium-known stability assessments restrictive. This paper develops an equilibrium-free contraction stability method based on semi-contraction theory. By formulating the system in a symmetry-aware projected state space, the intrinsic rotational mode induced by uniform angle shifts is removed. A blockwise Jacobian decomposition is introduced to characterize the coupled active and reactive power dynamics, yielding a computable regional contraction condition. This condition is then converted into forward-invariant stability certificates that provide trajectory-level performance guarantees. For autonomous operation without disturbances, the method provides an equilibrium-free nonlinear stability characterization together with an estimation of the region of attraction (ROA). For non-autonomous operation under disturbances, it derives explicit bounds for quasi-steady tracking under slowly varying injections and for robustness under fast or composite disturbances. Case studies on a 9-bus system validate the proposed method.
Safety has been a major concern when deploying deep reinforcement learning algorithms in the real world. A promising direction that ensures that the learned policy does not visit unsafe regions is to learn a \emph{barrier function} along with the policy. A barrier is a function from states to reals that assigns low values to the initial states, high values to the unsafe states, and decreases in expectation on each transition; such a function can be used to bound the probability of reaching unsafe states. Previous attempts learned a barrier function directly from exploration data, but this required either large amounts of data or restrictions on the system dynamics. In this paper, we show how kernel embeddings can be used to learn barrier functions during deep reinforcement learning for stochastic systems with unknown dynamics. Our algorithm, \emph{kernel-based safe exploration (KBSE)}, learns an optimal policy and a barrier simultaneously during exploration. The barriers are computed iteratively, represented as conditional mean embeddings, and provide better probabilistic safety guarantees with more exploration. The exploration algorithm uses the learned barrier functions to identify safety violations. In the case of violation, it intervenes to modify the unsafe action to a safe action, thereby ensuring that the exploration is restricted to actions that bound the probability of reaching unsafe states. We evaluate KBSE on several complex continuous control benchmarks. Experimental results establish our new algorithm to be suitable for synthesizing control policies that are probabilistically safe without degradation in reward accumulation.
We address joint active and passive beamforming for uplink RIS-assisted multi-user multi-stream MIMO systems with joint detection. The coupled design of the receive combiner, block-diagonal user precoders, and RIS phase vector is formulated through a third-order composite channel tensor. Exploiting this multilinear structure, we propose a multi-stream tensor alternating optimization method that updates the combiner, user precoders, and RIS coefficients via low-dimensional tensor projections. Simulations show that the proposed method approaches a multi-start alternating-optimization benchmark while reducing computational complexity and improving large-RIS scaling.
Digital twins (DTs) are promising for wireless deployment, optimization, and data generation, but building a propagation-faithful twin from sparse real measurements remains difficult. This paper proposes a wireless environment digital twin (WEDT) construction paradigm that evolves a reconstructed geometric DT into a propagation-consistent wireless environment representation through calibration of a scene-level electromagnetic (EM) property field. Instead of directly fitting link-specific channel responses, the proposed paradigm first constructs a geometry-prior Bayesian channel map (BCM) to convert sparse position-labeled channel state information (CSI) into dense probabilistic supervision with uncertainty estimates. It then embeds the learnable EM property field into differentiable ray tracing (RT) based channel computation, thereby enabling calibration through an explicit propagation chain. Experiments in both public and real-world scenes show that WEDT achieves accurate channel prediction, generalizes to unseen transceiver topologies, and remains effective across different sampling conditions. WEDT also offers utility for material-related environment sensing, more reliable physical-layer planning, and higher-quality synthetic data generation for wireless AI. These results demonstrate the value of the proposed paradigm for propagation-consistent WEDT construction and related wireless applications.
Communication performance and channel estimation accuracy in MIMO systems are known to be limited by hardware impairments. Specifically, the presence of phase impairments, such as phase noise, makes real-time coherent transmission a challenging task. While phase impairment compensation is typically performed at the receiver, practical methods for enabling coherent transmission at the transmitter side remain underexplored. Established methods for OTA calibration of MIMO systems face several limitations such as assumptions of phase stationarity and accurate channel knowledge. In this work, a real-time local phase calibration method is experimentally compared with OTA calibration on a fully digital array of USRP X310 software-defined radios. Using RMS cycle-to-cycle jitter as a metric, it is shown that for low and high synchronization signal bandwidths, both approaches effectively eliminate phase drift and whiten the phase noise. Local calibration achieves higher phase stability and is channel-independent, whereas OTA calibration requires no additional hardware but is sensitive to multipath effects and channel-induced impairments. Practical deployment trade-offs are discussed based on the measurement results.
Remote photoplethysmography (rPPG) enables non-contact measurement of cardiac pulse signals by analyzing subtle color changes in facial videos. Nevertheless, extracting rPPG signals remains challenging because of their extremely weak signal strength and susceptibility to illumination noise. In this paper, we propose an rPPG signal extraction method that exploits the quasi-periodic characteristics of rPPG signals. Our approach models quasi-periodicity of the rPPG signal, which arises from the stable cardiac cycle, as a block-sparse structure in the time-frequency domain. To incorporate a block-sparse model and enable adaptive signal separation under illumination fluctuations, we construct a time-varying signal separation framework. Experiments using a public dataset demonstrate the effectiveness of our method.
Dynamic line rating (DLR) is a methodology that requires timely monitoring data to determine the real-time ampacity of power lines. However, DLR monitoring devices (MD) are vulnerable to connectivity disruptions, leading to missing or delayed data. Although unmanned aerial vehicles (UAV) can enable resilient data collection from MD, their limited onboard energy challenges timely monitoring over extended transmission corridors with flight hazards. This paper proposes a cooperative UAV-based data collection framework with integrated sensing and communication (ISAC) to support timely DLR updates. In this framework, ISAC is employed to maintain the sensing and communication quality required for safe and cooperative UAV data collection. Accordingly, a joint energy minimization problem is formulated over UAV trajectories and collection scheduling under ISAC constraints. To solve it, a hybrid algorithm combining deep reinforcement learning (DRL) and semidefinite relaxation (SDR) is proposed, where DRL optimizes the trajectory and collection scheduling, while SDR is used to handle the non-convex ISAC constraints. Simulation results show that the proposed scheme reduces energy consumption by up to 34.6% compared with offline benchmarks and by about 2.2% compared with the separated sensing-and-communication baseline, while satisfying the minute-level timescale requirement of DLR.
With the rising number of interactions between autonomous or sensor-assisted vehicles -- especially in poor weather conditions -- come the need and opportunity for a new class of bicycle safety reflectors designed to enhance cyclist visibility to radars. To this effect, the first retrodirective planar metalens-based tag operating in the millimeter-wave automotive frequency range is proposed. The compact, lightweight ($0.61~\mathrm{g}$) design consists of two layers: a metalens layer and a patch antenna pixel layer. The metalens focuses incoming plane waves from different incidence angles onto corresponding patch antenna pixels on the second layer, which re-radiate the signal back through the metalens, enabling retrodirective operation. The proposed tag was thoroughly evaluated, demonstrating reliable detection beyond 70 m and a peak monostatic radar cross section (RCS) of $3.54~\mathrm{dBsm}$ with stable retrodirectivity over $\pm 40^\circ$, providing an average gain improvement of $7.58~\mathrm{dB}$ and an RCS enhancement of $15.16~\mathrm{dB}$ relative to a lens-less reference. A realistic deployment scenario on a metallic bicycle demonstrated up to a 110x improvement in its detectability at broadside. These results highlight the potential of the proposed passive tag to operate as a low-cost, lightweight, and easily integrable bicycle safety reflector for next-generation autonomous vehicle radar systems.
This article presents dynamic directional lane allocation in urban air mobility (UAM) corridors as a discrete-time mixed-integer linear program (MILP). This formulation activates, deactivates, and reverses lane direction as bi-directional airspace demand evolves. We model demand from disaggregate ground travel data by decomposing each trip into a multi-modal sequence with first-, middle-, and last-mile legs and routing the UAM-served middle-mile segment through a vertiport-side dispatch model. We use the San Francisco Bay Area as a case study by placing a multi-region spanning corridor between Contra Costa county and Silicon Valley. We find that the dynamic policy cuts unused airspace capacity by 5x, increases mean lane utilization from 36-48% to 67% at the same service level relative to baselines, and reduces commuting-population mean travel time by up to 21.6%. These results show that dynamic configuration of airspace capacity alleviates a significant percentage of the under-utilization issue of lane-based UAM airspace design and UAM concept of operations. This dynamic allocation also provides a safe, structural way to increase throughput, making UAM a more viable complement to multimodal door-to-door mobility systems.
Models of opinion dynamics aim to capture how individuals' opinions change when they interact with each other. One well-known model of opinion dynamics is the Deffuant--Weisbuch (DW) model, which is a type of bounded-confidence model (BCM). In the DW model, agents have pairwise interactions, and they are receptive to other agents' opinions when their opinions are sufficiently close to each other. In this paper, we extend the DW model by studying it on networks with heterogeneous and adaptive edge weights between pairs of agents. These edge weights govern the interaction probabilities between the agents and thereby encode the idea that people are more likely to communicate with individuals with whom they have previously compromised or had other positive interactions. We prove theoretical guarantees of our adaptive edge-weighted DW model's convergence properties, the long-time dynamics of its edge weights, and the model's associated ``effective graph", which is a time-dependent subgraph that includes edges only between agents that are receptive to each other's opinions. We support our theoretical results with numerical simulations of our adaptive edge-weighted DW model on a variety of networks and find that including adaptive edge weights yields different qualitative dynamics for different types of networks. In particular, for small confidence bounds, we observe that incorporating adaptive edge weights decreases the convergence time for dense networks but increases the convergence time for sparse networks.
Near-infrared spectroscopy is increasingly used as a rapid, non-destructive chemical sensing technology for the analysis of food, pharmaceutical, biological, and environmental samples. However, the practical deployment of NIR sensors still depends on calibration models able to handle high-dimensional, collinear spectra, limited sample sizes, preprocessing dependence, spectral outliers, and extrapolation beyond the calibration domain. Here, we evaluate whether tabular foundation models can provide a new calibration strategy for NIR chemical sensing. We benchmark TabPFN on 66 NIR datasets covering 54 regression and 12 classification tasks, and compare direct inference on raw spectra with preprocessing-optimized inference against PLS/PLS-DA, Ridge, Catboost, and one-dimensional convolutional neural networks. The study uses a unified validation framework in which preprocessing and model selection are performed exclusively on calibration data before external test evaluation. In regression, preprocessing-optimized TabPFN achieves the best overall average rank and significantly outperforms PLS, CatBoost, TabPFN on raw spectra, and CNN-1D, while remaining statistically comparable to Ridge. In classification, TabPFN applied directly to raw spectra provides the best average rank, with performance close to the optimized variant. Robustness analyses show that TabPFN provides strong average predictive performance but that its advantage decreases on spectral outliers and extrapolated samples, where classical chemometric models remain competitive. These results suggest that tabular foundation models can complement established chemometric workflows for NIR chemical sensing, especially in small- to medium-sized calibration settings, while highlighting the need for spectroscopy-specific priors and uncertainty-aware deployment strategies.
Tokens are becoming the basic units through which foundation models represent and process information for understanding and inference. However, traditional wireless communication, centered on bit-level fidelity, faces a mismatch between what is transmitted reliably and what downstream models actually consume. This mismatch calls for a communication design that directly accounts for token-level task relevance and downstream model requirements, rather than treating all transmitted bits as equally important. In this paper, we propose TONIC, a token-centric semantic communication framework for task-oriented wireless systems. The transmitter converts each source sample into a sequence of tokens, estimates token-level task relevance, and allocates protection through utility-aware unequal error protection under a fixed channel-use budget. At the receiver, token-level confidence is used to gate unreliable decisions, turning harmful substitutions into recoverable erasures before a Transformer-based completion model restores the masked tokens for final task inference. Our framework combines transmitter-side semantic-aware protection with receiver-side confidence-aware gating in a modular and interpretable architecture, rather than relying solely on fully black-box end-to-end learning. We further establish a utility-aware Bayes-risk interpretation for the receiver-side gating rule and study its interaction with unequal protection and completion. Experimental results on image classification show that TONIC consistently outperforms separation-based schemes, the pixel-domain DeepJSCC baseline, and token-domain baselines under matched communication budgets over AWGN, Rayleigh, and Rician channels.
Autonomous contact-based micromanipulation is challenging because surface and interfacial interactions at the microscale are difficult to model accurately, limiting the use of conventional model-based control and sim-to-real learning. We present a closed-loop sim-to-real reinforcement learning (RL) approach for microfiber shape control on a surface. The central idea is to train geometric shape regulation in a simplified frictionless simulator and rely on real-time visual feedback during deployment to iteratively correct the observed effects of unmodeled surface interactions. An RL policy trained entirely in simulation is transferred directly to a physical dual-gripper micromanipulation system operating at 40 Hz, without retraining or domain adaptation. Using silk microfibers as a testbed, the policy achieves a mean point-wise shape error of 270 $\pm$ 80 $\mu$m across twenty-four diverse initial configurations. Across nine specimens covering all combinations of three fiber diameters (50, 80, and 120 $\mu$m) and three manipulated lengths (10 mm, 15mm, and 20 mm), the same policy achieves sub-millimeter final shape error without any retraining or retuning. These results show that a policy learned in a simplified simulator can achieve repeatable real-world microfiber shape regulation under surface contact, provided that the task-relevant effects of the sim-to-real mismatch remain observable and correctable within the closed feedback loop.
Locomotion in microgravity often relies on sparsely and irregularly arranged anchors, motivating grasp-based mobility with multiple limbs. In this setting, dynamic locomotion is feasible only through deliberate regulation of both anchored interactions and whole-body coordination under coupled dynamic and kinematic constraints. This paper presents design insights for grasp-based dynamic locomotion with multi-limbed robotic systems in microgravity, targeting scenarios that require 6D limb manipulation to establish contacts with candidate anchors. The investigated design parameters include gait pattern, stride length, locomotion speed, and nominal posture. A parameterizable locomotion planning framework is proposed to support variations of these parameters and to evaluate the resulting locomotion performance in terms of stability and actuation demand. Two representative quadruped morphologies are adopted for evaluation in physics-based simulation. The results demonstrate that enlarging the feasible contact wrench space and attenuating impulsive whole-body dynamics improve locomotion performance. These findings inform strategies for contact configuration selection and whole-body coordination in microgravity locomotion with multi-limbed systems.
In this work, we address the problem of multi-robot adaptive coverage, where teams of robots perform dynamic sampling by continuously adjusting their positions to collect data in an environment. This task can be challenging, particularly when robots must be efficiently allocated to new sampling locations over time. Ergodic search methods optimize robot trajectories by ensuring that the robots' time-averaged spatial distribution aligns with the spatial distribution of environmental information. While these methods promote effective exploration provided a target distribution, they often fail to account for unknown prior distributions of the environment. To overcome this limitation, we propose an adaptive coverage strategy that utilizes real-time feedback from an environmental model to adjust robot sampling behavior in response to unknown conditions. Our approach enhances traditional ergodic trajectory optimization by constructing a target spatial information distribution based on parametric models of the environment, which are updated online. This strategy assumes that the environment is either static or changes slowly compared to the robot's motion. Our framework allows robots to dynamically prioritize regions of high interest, improving coverage efficiency, synthesizing effective control policies for individual agents, and optimizing resource use in settings with unknown prior distributions. We validate our approach through simulations, demonstrating its effectiveness in enhancing coverage and resource allocation.
This paper studies heterogeneous multi-team collaboration through dynamic robot allocation, where robots are treated as transferable resources. Leveraging Hamilton's rule from ecology as an altruistic decision-making mechanism, we propose a multi-team collaborative resource allocation framework with heterogeneous capabilities, transfer costs, and capability-dependent contributions. The resulting allocation problem is combinatorial and is shown to be NP-hard. To address scalability, we develop a graph neural network policy under centralized training and decentralized execution that approximates the altruistic allocations based on Hamilton's rule. The model operates over the team interaction graph and predicts robot-level transfer decisions and next robot-to-team assignments. The proposed approach is validated in a firefighting scenario through simulations and experiments, demonstrating that the learned policy achieves near-optimal performance while scaling to larger systems.
We study risk-sensitive reinforcement learning in finite discounted MDPs, where a generative model of the MDP is assumed to be available. We consider a family or risk measures called the optimized certainty equivalent (OCE), which includes important risk measures such as entropic risk, CVaR, and mean-variance. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive OCE. We provide an exact characterization of utility functions $u$ for which the corresponding OCE defines an objective that is PAC-learnable. We analyze a simple model-based approach and derive PAC sample complexity bounds. We establish that whenever $u$ does not have full domain $\text{dom}(u)\neq \mathbb{R}$, the corresponding problem is not PAC-learnable. Finally, we establish corresponding lower bounds for both value and policy learning, demonstrating tightness in the size $SA$ of state-action space, and for a more restricted class of utilities, we derive lower bounds that makes the dependence on the effective horizon $\frac{1}{1-\gamma}$ explicit. Specifically, for $\text{CVaR}_\tau$ we show that the correct dependence on $\tau$ is $\frac{1}{\tau^2}$, thus improving by a factor of $\frac{1}{\tau}$ over state-of-the-art although our bound has a suboptimal dependence on $\frac{1}{1-\gamma}$.
Near-field beamfocusing enabled by extremely large-aperture arrays (ELAA) is a promising 6G technique for massive connectivity and high spectrum efficiency. While beamfocusing concentrates energy at an intended user, the radiated field outside the focal point exhibits a structured leakage that varies with the focal-point coordinates. This paper shows that this leakage enables a new form of passive user localization in which distributed far-field sensors measuring only received power can infer the user's location by exploiting this location-dependent power signature. Using the induced noncentral chi-square statistics, we derive a Bayesian Cramér-Rao lower bound (BCRLB) that establishes the fundamental limits of this inference problem. We then evaluate a model-based grid-search estimator and an attention-based permutation-invariant deep learning regressor (DeepSet). Results under both line-of-sight (LoS) and multipath propagation confirm that reliable location inference is feasible, with accuracy improving as more sensors and snapshots are used.
Standard transformer attention computes pairwise similarity between queries and keys, treating all tokens as equally salient regardless of their intrinsic informational content. In turbulent fluid dynamics, coherent structures -- the energetically dominant, spatially organized patterns that persist amid background chaos -- carry a disproportionate fraction of total energy and govern all transport. We propose that tokens play an analogous role in transformer attention: informationally dense positions (morphological boundaries, syntactic heads, discourse markers) concentrate spectral energy and should attract proportionally more attention than background tokens (function words, repeated patterns, low-information filler). We propose Energy-Gated Attention (EGA): a simple modification that gates value aggregation by the spectral energy of key token embeddings, computed by a single learned linear projection that discovers the dominant spectral mode of the embedding field. On TinyShakespeare, EGA achieves +0.103 validation loss improvement with only 12,480 additional parameters (<0.26% overhead) and no measurable computational cost. The result is consistent on Penn Treebank (+0.101), demonstrating dataset independence. A systematic ablation across three wavelet families (fixed Morlet, Daubechies db2/db4, and a parametric Morlet) establishes that fixed structured bases are suboptimal -- the optimal energy direction is data-adaptive and non-sinusoidal -- while identifying learned wavelet packets as a promising open direction. The learned energy threshold converges to tau ~= 0.35 independently of initialization, corresponding to the fraction (~36%) of tokens carrying above-average spectral energy in English text, a stable linguistic property consistent with the fraction of content words in running English text.
Data leakage from API responses has drawn wide attention. APIs are often not fully regulated, making them easy to abuse. One common solution is to embed watermarks into API responses for traceability. However, existing watermarking methods often require modifying database content or API response data. This forces changes to business system code, and may even disrupt normal business operations because data values are altered. In this paper, we propose an original pluggable watermarking scheme based on a watermark proxy gateway and PEMark (Position Encoding-based Watermarking). The key novelty of our approach is exploiting the inherent permutation redundancy in the ordering of JSON/XML key-value pairs -- an overlooked dimension that carries no semantic information yet provides abundant encoding capacity. First, we forward server responses to the watermark proxy gateway, a design that requires zero modification to existing business systems. Then, we embed a watermark into each API response using position encoding, which reorders keys without altering any data values. To the best of our knowledge, this is the first work to achieve distortion-free API response watermarking via position encoding over a proxy gateway. Our method does not modify any data values, so normal business operations continue seamlessly after watermark embedding. Experimental results show that our framework maintains business usability while ensuring that returned API data is traceable. Compared with current mainstream schemes, our method is robust against tampering and insertion attacks (100\% similarity), and can withstand certain levels of deletion attacks.
While flow-matching text-to-speech (TTS) achieves strong zero-shot speaker similarity and naturalness, it remains susceptible to content fidelity issues, particularly skip and repeat errors from imperfect alignment. We propose RobustSpeechFlow, a training strategy that improves alignment robustness by extending contrastive flow matching with length-preserving repeat and skip latent augmentations. Requiring no external aligners or preference data, our method directly penalizes realistic failure modes and readily integrates into existing pipelines. On Seed-TTS-eval, it reduces the word error rate (WER) from 1.44 to 1.38 using only 0.06B parameters. On our ZERO500 benchmark, it delivers consistent intelligibility improvements across diverse speaker and prosody conditions; at NFE=24, it reduces English character error rate (CER) from 0.48\% to 0.35\% and Korean CER from 0.81\% to 0.57\%. Audio samples: this https URL
We study online optimization problems in which the cost function depends on latent, time-varying parameters that are unmeasurable and governed by unknown dynamics. Specifically, we consider a strongly convex cost function whose linear term evolves according to unknown linear stochastic dynamics, while the algorithm has access only to finite noisy gradient measurements. We propose a solution that uses control theoretic tools to reconstruct the latent parameters from gradient observations using a Gauss-Markov estimator, then identifies the parameter dynamics using an instrumental-variable estimator, and finally forecasts the parameters to compute the future minimizer. We provide a bound on the expected tracking error. We illustrate the effectiveness of our algorithm on a series of numerical examples.
Audio context determines which sound components and sources are relevant and which can be perceived as irrelevant (noise) by listeners. For example, traffic noise is informative in urban surveillance but noise for a phone call at the same location. Most current audio denoising systems apply fixed target-noise definitions, often removing useful components in one context while failing to suppress irrelevant components. To address this, we introduce the concept automatic contextual audio denoising (ACAD) which defines target and noise based on the inferred context. In this work, we restrict context to be associated with an acoustic scene class. We label sound events outside the event distribution of a scene class (noise) as out-of-context (OC) and events typical for that scene as in-context (IC). We implement a deep learning method that automatically infers the context of the audio signal and removes OC components, and benchmark it against variants: without context inference, with oracle context, and with separately provided uninformative context. On paired clean/noisy data across diverse contexts, where OC components in one context may be IC in another, our proposed method outperforms other approaches across standard objective metrics, indicating that the model can infer context and context-dependent processing can enhance denoising.
In this paper, we investigate a multi-cell six-dimensional movable antenna (6DMA) network for enhancing downlink communication performance under inter-cell interference (ICI). Each base station (BS) is equipped with multiple 6DMA surfaces, and the 6DMA rotations affect both the desired-signal enhancement for in-cell users and the interference leakage toward neighboring cells, which makes the antenna-rotation design and transmit precoding intrinsically coupled across BSs. To address this issue, we formulate an average weighted sum-rate maximization problem for the multi-cell system by jointly optimizing the short-term downlink precoders and long-term 6DMA rotations under practical antenna geometric constraints. To tackle the resulting nonconvex problem, we propose a distributed two-timescale design based on inter-cell interference power constraint (IPC) coordination among neighboring BSs, under which each BS performs local short-term precoder optimization based on instantaneous channel state information (CSI) and long-term 6DMA rotation update according to statistical CSI with limited inter-BS information exchange. In particular, an edge-wise IPC coordination mechanism based on two-stage one-dimensional grid search and random maximal matching is developed to enable scalable distributed implementation. A centralized offline benchmark is also provided for performance comparison. Numerical results show that the proposed distributed design achieves performance close to the centralized benchmark under different interference conditions, while maintaining favorable scalability as the network size increases.
This paper reconstructs the half-century evolution of the scientific school founded by Yuriy P. Kunchenko (1939--2006) as the development of a semiparametric methodology for non-Gaussian estimation. Starting with Kunchenko's 1972/1973 dissertation applying Volterra series to estimate parameters of random processes, the trajectory is followed through 2006--2026. Kunchenko stochastic polynomials are presented as a coherent family of moment-cumulant procedures: the polynomial maximization method (PMM) for parameter estimation, polynomial criteria for hypothesis testing, and decomposition in spaces with a generating element. The paper details the school's structure: a verified genealogy of 15 defended dissertations, collaborations in Poland, Slovakia, and Germany, and the R package EstemPMM. A recent 2026 paper on Volterra-based signal processing is analyzed, showing how Kunchenko's nonlinear formulation reappears in applied radio engineering. We build a formal bridge between finite Volterra models and generalized Kunchenko polynomials, while separating the MMSE/L2 criterion from PMM: the former is a covariance projection for kernel adaptation, whereas PMM is a parameter-dependent moment procedure. PMM efficiency claims are stated conditionally: gains require that moments exist, the centered correlant matrix is nondegenerate, and the variance reduction coefficient is below one. The concluding research program operationalizes the historical reconstruction into testable statistical and signal-processing tasks.
As trapped-ion quantum computing scales to larger qubit registers and more complex control protocols, classical control systems face a fundamental tradeoff: sub-microsecond board-level feedback requires tight hardware coupling, whereas maintainability and extensibility require clean, modular software abstractions. This paper presents QuCtrl-BELL (Bell), a compiler-driven software stack for trapped-ion quantum control. The design resolves this tradeoff by decoupling control flow -- including loops, branches, and synchronization -- from hardware state data. A Python-embedded domain-specific language (DSL) is lowered through a six-stage transpilation pipeline covering control flow graph (CFG) construction, static single-assignment (SSA) conversion, liveness analysis, and graph-coloring register allocation. The compiler generates deterministic distributed board-level programs and compact step-table data. A cross-board synchronization protocol supports feedback loops with latency below 700~ns without host intervention. Bell is deployed and evaluated on the QuCtrl-BELL platform (RISC-V + PXIe), demonstrating that a compiler-based infrastructure can provide programmability, deterministic timing, and modularity for scalable trapped-ion quantum control.
While linear manufacturing relies on homogeneous materials and predefined process sequences, circular manufacturing reintroduces used products with heterogeneous and uncertain conditions. This shift demands manufacturing systems capable of handling variable product states, dynamically reconfigurable processes, and the integration of human and machine knowledge. Conventional manufacturing IT architectures, designed for stable structures and deterministic execution, are unable to meet these requirements, as they cannot adequately represent and manage the uniqueness of individual components at runtime. Following a design science methodology for developing a Cyber Physical Production System for circular manufacturing, we derive 14 requirements from five complementary perspectives. Based on these requirements, we design KAPPS, a knowledge-based architecture that uses an ontology-grounded knowledge graph as a unifying data backbone, combined with a semantic interface layer to enable consistent data and information integration, reasoning, and communication across heterogeneous systems and services, turning the knowledge graph from an integration layer into the factories authoritative write-time state. KAPPS incorporates modules for constraint enforcement and event-driven planning, enabling incremental adaptation of execution plans under uncertainty and human-machine knowledge exchange. The applicability of KAPPS is demonstrated through two implemented use cases: (i) Anomaly detection and learning through knowledge graph mediated services and (ii) runtime constraint enforcement in a modular conveyor system. Subsequently, the architecture is evaluated against the 14 requirements (ed. abstract shortened)
This paper studies the flows of continuous-time dynamics for equality-constrained optimization based on control-theoretic Lagrangian methods. In particular, we consider dynamics induced by proportional-integral and feedback linearization controllers, which have been recently proposed as alternatives to primal-dual gradient methods. Unlike existing convergence results, which rely on strong convexity of the objective function or boundedness assumptions, we exploit the geometric structure induced by the constraints. Specifically, we show global exponential convergence for non-convex problems that satisfy a suitable convexity property when restricted to the constraint manifold.
From a multi-input-multi-output (MIMO) discrete-time linear system, we collect input-output data affected by noise in the form of an unknown exosignal and, from these data points (without knowledge of the system model), we design a feedback controller that asymptotically annihilates the effect of that exosignal on the output. This amounts to solving an output regulation problem purely from input-output data, for MIMO linear systems. The design of the controller corresponds to a semidefinite program and is pursued on a suitable auxiliary system. Such design carries over from the auxiliary system to the original one by a rigorous examination of the relation between the solutions of the two systems.
While large language models provide strong compositional reasoning, existing reasoning segmentation pipelines fail to transparently connect this reasoning to visual perception. Current methods, such as latent query alignment, are end-to-end yet opaque "black boxes". Conversely, textual localization readout is merely readable, not truly interpretable, often functioning as an unconstrained post-hoc step. To bridge this interpretability gap, we propose SegCompass, an end-to-end model that leverages a Sparse Autoencoder (SAE) to forge an explicit, interpretable, and differentiable alignment pathway. Given an image-instruction pair, SegCompass first generates a chain-of-thought (CoT) trace. The core of our method is an SAE that maps both the CoT and visual tokens into a shared, high-dimensional sparse concept space. A query codebook selects salient concepts from this space, which are then spatially grounded by a slot mapper into a multi-slot heatmap that guides the final mask decoder. The entire model is trained jointly, unifying reinforcement learning for the reasoning path with standard segmentation supervision. This SAE-driven interface provides a "white-box" connection that is significantly more traceable than latent queries and more coherent than textual readouts. Extensive experiments on five challenging benchmarks demonstrate that SegCompass matches or surpasses state-of-the-art performance. Crucially, our visual and quantitative analyses show a strong correlation between the quality of the learned sparse concepts and final mask accuracy, confirming that SegCompass achieves superior results through its enhanced and inspectable alignment. Code is available at this https URL.
Trauma resuscitation is a clinical process for treating life-threatening physiological disorders in safety-critical environments, driven by the experience of healthcare workers (HCWs). Designing and optimizing quantifiable metrics that accurately capture HCW decisions may augment current resuscitation procedures with the potential to improve patient outcomes. This motivates our socio-technical formulation of trauma resuscitation as a distributed generalized Nash equilibrium (GNE)-seeking game with coupled inequality constraints. This method is optimized over a time-varying communication graph. We introduce novel insights from clinical experience to model HCWs behavior. This work facilitates the best possible resuscitation outcome given HCWs workloads, schedules, competencies, and limited resources.
Electromagnetic (EM) side-channel analysis traditionally assumes a stationary, close-proximity probe - a threat model that underestimates aerial adversaries. TriSweep is a simulation framework that designs and evaluates a four-drone swarm architecture for autonomous standoff EM-SCA of embedded microcontrollers at 0.25-1.5 m. Three spatially specialized collector drones - Anchor (full-spectrum), Mask Probe (mask-register loading leakage), and Cipher Probe (masked SubBytes output leakage) - feed a stationary Accumulator drone that performs coherent combining (+4.8 dB SNR gain) and second-order mask cancellation via a centered product of the two spatially separated leakage streams. Evaluated against three real ANSSI ASCAD datasets (ATmega8515 masked AES-128 and 50/100-sample desynchronized variants), the framework achieves a simulated key rank of 18 +/- 1.7 (five-seed) at 0.25 m on the primary masked dataset. Profiling-trace cross-correlation alignment reduces single-drone rank from 89 to 21 on the 100-sample-jitter variant, demonstrating compensation for drone hover vibration. A two-channel CNN in the Accumulator converges to a loss of 0.454 (vs. random baseline 5.545) and improves rank on desynchronized datasets. No physical hardware has been fabricated; prototype construction is the planned next step.
Autonomous parking requires efficient path planning that ensures kinematic feasibility and collision avoidance in constrained environments. Hybrid A* is widely used but computationally expensive, while reinforcement learning (RL) methods lack reliability and often struggle with long-horizon geometric constraints, leading to suboptimal trajectories. We present N3P, a fast learning-based three-stage framework for automated parking. By introducing an intermediate preparatory pose and using a learning module to predict it, N3P decomposes the maneuver into simpler subproblems, thereby reducing computational complexity and accelerating path generation. We validate the framework by integrating it with Hybrid A* algorithms. Experiments in perpendicular and parallel parking scenarios show that N3P-enhanced Hybrid A* speeds up planning by more than 80%. It also outperforms RL baselines in success rate and trajectory quality, producing shorter trajectories with fewer gear changes, while achieving comparable or lower planning time in most cases.
We investigate whether acoustic emotion recognition models can serve as proxies for the Pathos dimension in political speech analysis, as operationalised by the TRUST multi-agent large language model (LLM) pipeline. Using a Bundestag plenary speech by Felix Banaszak (51 segments, 245 s) as a case study, we compare three analysis modalities: (1) emotion2vec_plus_large, an acoustic speech emotion recognition (SER) model whose continuous Arousal and Valence values are derived via post-hoc Russell Circumplex projection; (2) Gemini 2.5 Flash, an LLM analysing the full speech audio together with its transcript in an open-ended, context-aware fashion; and (3) TRUST-Pathos scores from a three-advocate LLM supervisor ensemble. Spearman rank correlations reveal that Gemini Valence correlates strongly with TRUST-Pathos (rho = +0.664, p < 0.001), whereas emotion2vec Valence does not (rho = +0.097, p = 0.499). We further demonstrate, via a systematic quality evaluation of the Berlin Database of Emotional Speech (EMO-DB) using Gemini in an open-ended annotation paradigm, that standard SER benchmark corpora suffer from acted speech, cultural bias, and category incompatibility. Our results suggest that LLM-based multimodal analysis captures semantically defined political emotion substantially better than acoustic models alone, while acoustic features remain informative for low-level Arousal estimation. Future work will extend this approach to video-based analysis incorporating facial expression and gaze.
Real-world sensor-based learning systems require uncertainty estimation that is both reliable and computationally efficient. Evidential Deep Learning (EDL) provides single-pass uncertainty estimation by modeling the class probabilities via Dirichlet distributions, where the Dirichlet parameters are predicted by a learned neural network mapping. However, this approach can lead to computational challenges, as Dirichlet expected objectives are more complex than standard supervised learning losses, complicating their analysis and implementation. We address this issue by approximating the objective of the first-order empirical risk minimization problem induced by EDL with a plug-in loss evaluated at the Dirichlet mean and show that, under mild assumptions, the approximation error decays with growing evidence for a broad class of loss functions, including mean-squared error and cross-entropy loss. As a special case, our analysis provides justification for the use of softmax in the context of uncertainty estimation, since under a particular evidence-to-Dirichlet mapping, our framework includes the standard softmax classifier. We validate the proposed simplified objectives on the Google Speech Commands dataset and show that they achieve predictive accuracy and selective prediction performance comparable to classical EDL, while being simpler to implement using standard deep learning losses and training pipelines. To the best of our knowledge, this empirical analysis is the first to obtain coverage-accuracy trade-offs for speech recognition tasks through EDL.
Although encrypted control systems ensure confidentiality of private data, it is challenging to detect anomalies without the secret key as all signals remain encrypted. To address this issue, we propose a homomorphic encryption scheme for dynamic controllers that automatically discloses the residue signal for anomaly detection, while keeping all other signals private. To this end, we characterize the zero-dynamics of an encrypted dynamic system over a finite field of integers and incorporate it into a Learning With Errors (LWE) based scheme. We then present a method to further utilize the disclosed residue signal for implementing dynamic controllers over encrypted data, which does not involve re-encryption even when they have non-integer state matrices.
Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a universally accepted definition of GFM behavior and precise method for its quantification remain elusive. Moreover, the impact of GFM converter on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, the paper introduces a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, a new quantitative measure of system strength that captures the multi-bus voltage stiffness is proposed, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. The paper further extends and defines this concept to grid strength and bus strength to identify weak areas within the system. Finally, the device and system levels are bridged by formally proving that GFM converters enhance system strength. The proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.
This paper proposes a general framework to evaluate power system strength. The formulation features twelve indicators, grouped in three dynamical orders, that quantify the resistance of bus voltage phasors and their first and second order rates of change to sudden current injection changes. To quantify such changes the paper introduces a novel finite differentiation technique, that we named Delta operator, able to properly capture "jumps" of algebraic variables and utilizes the recently developed concept of complex frequency. The paper also shows how the proposed framework can be systematically applied to any system device, and provides a variety of examples based on synchronous machines, converters and loads models are given. Numerical results in a benchmark system validate the exactness of the formulation.
Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a precise method for quantifying GFM converter behavior and a universally accepted GFM definition remain elusive. Moreover, the impact of GFM on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, we introduce a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, we propose a new quantitative measure of system strength that captures the multi-bus voltage stiffness, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. We further extend and define this concept to grid strength and bus strength to identify weak areas within the system. Finally, we bridge the device and system levels by formally proving that GFM converters enhance system strength. Our proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.
Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantization (PTQ) on Whisper-small that disentangles the impact of quantization scheme, method, granularity, and bit-width. Our study is based on four libraries: PyTorch, Optimum-Quanto, HQQ, and bitsandbytes. Experiments on LibriSpeech test-clean and test-other show that dynamic int8 quantization with Quanto offers the best trade-off, reducing model size by 57% while improving on the baseline's word error rate. Static quantization performed worse, likely due to Whisper's Transformer architecture, while more aggressive formats (e.g., nf4, int3) achieved up to 71% compression at the cost of accuracy in noisy conditions. Overall, our results demonstrate that carefully chosen PTQ methods can substantially reduce model size and inference cost without retraining, enabling efficient deployment of Whisper-small on constrained hardware.
Event-based control, unlike analogue control, poses significant analytical challenges due to its hybrid dynamics. This work investigates the stability and inter-event time properties of a control-affine system under event-based impulsive control. The controller consists of multiple neuronal units with leaky integrate-and-fire dynamics acting on a time-invariant, multivariable plant in closed loop. Both the plant state and the neuronal units exhibit discontinuities that cancel if combined linearly, enabling a direct correspondence between the event-based impulsive controller and a corresponding analogue controller. Leveraging this observation, we prove global practical stability of the event-based impulsive control system. In the general nonlinear case, we show that the event-based impulsive controller ensures global practical asymptotic stability if the analogue system is input-to-state stable (ISS) with respect to specific disturbances. In the linear case, we further show global practical exponential stability if the analogue system is stable. We illustrate our results with numerical simulations. The findings reveal a fundamental link between analogue and event-based impulsive control, providing new insights for the design of neuromorphic controllers.
This paper presents the E-Rocket, an electric-powered, low-cost rocket prototype for validation of Guidance, Navigation & Control (GNC) algorithms based on Thrust Vector Control (TVC). Relying on commercially available components and 3D printed parts, a pair of contra-rotating DC brushless motors is assembled on a servo-actuated gimbal mechanism that provides thrust vectoring capability. A custom avionics hardware and software stack is developed considering a dual computer setup which leverages the capabilities of the PX4 autopilot and the modularity of ROS 2 to accommodate for tailored GNC algorithms. The platform is validated in an indoor motion-capture arena using a baseline PID-based trajectory tracking controller. Results demonstrate accurate trajectory tracking and confirm the suitability of the E-Rocket as a versatile testbed for rocket GNC algorithms.
Next-generation communication and localization systems increasingly rely on extremely large-scale arrays (XL-arrays), which promise unprecedented spatial resolution and new functionalities. These gains arise from their inherent operation in the near field (NF) regime, where the spherical nature of the wavefront can no longer be ignored; consequently, characterizing the ambiguity function -- which amounts to the matched beam pattern -- is considerably more challenging. Implementing very wide apertures with half-wavelength element spacing is costly and complex. This motivates thinning the array (removing elements), which introduces intricate aliasing structures, i.e., grating lobes. Whereas prior work has addressed this challenge using approximations tailored to specific array geometries, this paper develops a general framework that reveals the fundamental origins and geometric behavior of grating lobes in near-field ambiguity functions. Using a local spatial-frequency analysis of steering signals, we derive a systematic methodology to model NF grating lobes as aliasing artifacts, quantifying their structure on the AF, and providing design guidelines for XL-arrays that operate within aliasing-safe regions. We further connect our framework to established far-field principles. Finally, we demonstrate the practical value of the approach by deriving closed-form expressions for aliasing-free regions in canonical uniform linear arrays and uniform circular arrays.
Existing dynamics prediction frameworks for transient stability analysis (TSA) fail to achieve multi-scenario "universality": the inherent ability of a single, pre-trained architecture to generalize across diverse operating conditions, unseen faults, and heterogeneous systems. To address this, this paper proposes Uni-TSA, a pre-trained generative Transformer-enabled universal framework that models multivariate transient dynamics prediction as a univariate generative task with three key innovations: First, a novel data processing pipeline featuring channel independence decomposition to resolve dimensional heterogeneity, sample-wise normalization to eliminate separate stable/unstable pipelines, and temporal patching for efficient long-sequence modeling; Second, a parameter-efficient freeze-and-finetune strategy that augments the pre-trained generative Transformer backbone with dedicated input embedding and output projection layers while freezing core transformer blocks to preserve generic feature extraction capabilities; Third, a two-stage fine-tuning scheme that combines teacher forcing, which feeds the model ground-truth data during initial training, with scheduled sampling, which gradually shifts to leveraging model-generated predictions, to mitigate cumulative errors in long-horizon iterative prediction. Comprehensive testing demonstrates the framework's universality, as Uni-TSA trained solely on the New England 39-bus system achieves zero-shot generalization to mixed stability conditions and unseen faults, and matches expert performance on the Iceland 189-bus system with only 5% fine-tuning data. Additional cross-system experiments on the IEEE 68-bus and IEEE 118-bus systems, together with stability metrics and PEBS comparison, further confirm Uni-TSA's strong zero-shot transferability and data-efficient adaptation.
Decarbonizing the global energy supply requires more efficient heating and cooling systems. Model predictive control enhances the operation of cooling and heating systems but depends on accurate system models, often based on control volumes. We present an automated framework including time discretization to generate model predictive controllers for such models. To ensure scalability, a primal decomposition exploiting the model structure is applied. The approach is validated on an underground heating system with varying numbers of states, demonstrating the primal decomposition's advantage regarding scalability.
This paper develops a generalized finite horizon recursive solution to the discrete time signal bound disturbance attenuation regulator (SiDAR) for state feedback control. This problem addresses linear dynamical systems subject to signal bound disturbances, i.e., disturbance sequences whose squared signal two-norm is bounded by a fixed budget. The term generalized indicates that the results accommodate arbitrary initial states. By combining game theory and dynamic programming, we derive a recursive solution for the optimal state feedback policy valid for arbitrary initial states. The optimal policy is nonlinear in the state and requires solving a tractable convex scalar optimization for the Lagrange multiplier at each stage; the control is then explicit. For fixed disturbance budget $\alpha$, the state space partitions into two distinct regions: $\mathcal{X}_L(\alpha)$, where the optimal control policy is linear and coincides with the standard linear $H_{\infty}$ state feedback control, and $\mathcal{X}_{NL}(\alpha)$, where the optimal control policy is nonlinear. We establish monotonicity and boundedness of the associated Riccati recursions and characterize the geometry of the solution regions. A numerical example illustrates the theoretical properties. This work provides a complete feedback solution to the finite horizon SiDAR for arbitrary initial states. Companion papers address the steady-state problem and convergence properties for the signal bound case, and the stage bound disturbance attenuation regulator (StDAR).
This paper establishes convergence and steady-state properties for the signal bound disturbance attenuation regulator (SiDAR). Building on the finite horizon recursive solution developed in a companion paper, we introduce the steady-state SiDAR and derive its tractable linear matrix inequality (LMI) with $O(n^3)$ complexity. Systems are classified as degenerate or nondegenerate based on steady-state solution properties. For nondegenerate systems, the finite horizon solution converges to the steady-state solution for all states as the horizon approaches infinity. For degenerate systems, convergence holds in one region of the state space, while a turnpike arises in the complementary region. When convergence holds, the optimal multiplier and control gain are obtained directly from the LMI solution. Numerical examples illustrate convergence behavior and turnpike phenomena. Companion papers address the finite horizon SiDAR solution and the stage bound disturbance attenuation regulator (StDAR).
Recent progress of voice conversion~(VC) has achieved a new milestone in speaker cloning and linguistic preservation. But the field remains fragmented, relying on specialized models for linguistic-preserving, expressive, and singing scenarios. We propose OneVoice, a unified zero-shot framework capable of handling all three scenarios within a single model. OneVoice is built upon a continuous language model trained with VAE-free next-patch diffusion, ensuring high fidelity and efficient sequence modeling. Its core design for unification lies in a Mixture-of-Experts (MoE) designed to explicitly model shared conversion knowledge and scenario-specific expressivity. Expert selection is coordinated by a dual-path routing mechanism, including shared expert isolation and scenario-aware domain expert assignment with global-local cues. For precise conditioning, scenario-specific prosodic features are fused into each layer via a gated mechanism, allowing adaptive usage of prosody information. Furthermore, to enable the core idea and alleviate the imbalanced issue (abundant speech vs. scarce singing), we adopt a two-stage progressive training that includes foundational pre-training and scenario enhancement with LoRA-based domain experts. Experiments show that OneVoice matches or surpasses specialized models across all three scenarios, while verifying flexible control over scenarios and offering a fast decoding version as few as 2 steps. Audio samples are available on demo page.
Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.
Future wireless networks, deploying thousands of antenna elements, may operate in the radiative near-field (NF), enabling spatial multiplexing across both angle and range domains. Sparse arrays have the potential to achieve comparable performance with fewer antenna elements. However, fixed sparse array designs are generally suboptimal under dynamic user distributions, while movable antenna architectures rely on mechanically reconfigurable elements, introducing latency and increased hardware complexity. To address these limitations, we propose a reconfigurable array thinning approach that selectively activates a subset of antennas to form a flexible sparse array design without physical repositioning. We first analyze grating lobes for uniform sparse arrays in the angle and range domains, showing their absence along the range dimension. Based on the analysis, we develop two particle swarm optimization-based strategies: a grating-lobe-based thinned array (GTA) for grating-lobe suppression and a sum-rate-based thinned array (STA) for multiuser sum-rate maximization. Simulation results demonstrate that GTA outperforms conventional uniform sparse arrays, while STA achieves performance comparable to movable antennas, thereby offering a practical and efficient array deployment strategy without the associated mechanical complexity.
Automated identification of DICOM image series is essential for large-scale medical image analysis, quality control, protocol harmonization, and reliable downstream processing. However, DICOM series classification remains challenging due to heterogeneous slice content, variable series length, and entirely missing, incomplete or inconsistent DICOM metadata. We propose an end-to-end multimodal framework for DICOM series classification that jointly models image content and acquisition metadata while explicitly accounting for all these challenges. (i) Images and metadata are encoded with modality-aware modules and fused using a bi-directional cross-modal attention mechanism. (ii) Metadata is processed by a sparse, missingness-aware encoder based on learnable feature dictionaries and value-conditioned modulation. By design, the approach does not require any form of imputation. (iii) Variability in series length and image data dimensions is handled via a 2.5D visual encoder and attention operating on equidistantly sampled slices. We evaluate the proposed approach on the publicly available Duke Liver MRI dataset and a large multi-institutional in-house cohort, assessing both in-domain performance and out-of-domain generalization. Across all evaluation settings, the proposed method consistently outperforms relevant image only, metadata-only and multimodal 2D/3D baselines. The results demonstrate that explicitly modeling metadata sparsity and cross-modal interactions improves robustness for DICOM series classification.
This paper investigates the distributed safety critical control for multi-agent systems (MASs) in the presence of uncontrollable agents with uncertain behaviors. To ensure system safety, the control barrier function (CBF) is employed in this paper. However, a key challenge is that the CBF constraints are coupled when MASs perform collaborative tasks, which depend on information from multiple agents and impede the design of a fully distributed safe control scheme. To overcome this, a novel reconstructed CBF approach is proposed. In this method, the coupled CBF is reconstructed by leveraging state estimates of other agents obtained from a distributed adaptive observer. Furthermore, a prescribed performance adaptive parameter is designed to modify this reconstruction, ensuring that satisfying the reconstructed CBF constraint is sufficient to meet the original coupled one. Based on the reconstructed CBF, we design a safety-critical quadratic programming (QP) controller and prove that the proposed distributed control scheme rigorously guarantees the safety of the MAS, even in the uncertain dynamic environments involving uncontrollable agents. The effectiveness of the proposed method is illustrated through a simulation.
This paper shows that the concept of complex frequency, originally introduced to characterize the dynamics of signals with complex values, constitutes a generalization of eigenvalues when applied to the states of linear time-invariant (LTI) systems. Starting from the definition of geometric frequency, which provides a geometrical interpretation of frequency in electric circuits that admits a natural decomposition into symmetric and antisymmetric components associated with amplitude variation and rotational motion, respectively, we show that complex frequency arises as its restriction to the two-dimensional Euclidean plane. For LTI systems, it is shown that the complex frequencies computed from the system's states subject to a non-isometric transformation, coincide with the original system's eigenvalues. This equivalence is demonstrated for diagonalizable systems of any order. The paper provides a unified geometric interpretation of eigenvalues, bridging classical linear system theory with differential geometry of curves. The paper also highlights that this equivalence does not generally hold for nonlinear systems. On the other hand, the geometric frequency of the system can always be defined, providing a geometrical interpretation of the system flow. A variety of examples based on linear and nonlinear circuits illustrate the proposed framework.
Environment-aware 6G wireless networks demand the deep integration of multimodal and wireless data. However, most existing datasets are confined to 2D terrestrial far-field scenarios, lacking the 3D spatial context and near-field characteristics crucial for low-altitude extremely large-scale multiple-input multiple-output (XL-MIMO) systems. To bridge this gap, this letter introduces Multimodal-NF, a large-scale dataset and specialized generation framework. Operating in the upper midband, it synchronizes high-fidelity near-field channel state information (CSI) and precise wireless labels (e.g., Top-5 beam indices, LoS/NLoS) with comprehensive sensory modalities (RGB images, LiDAR point clouds, and GPS). Crucially, these multimodal priors provide spatial semantics that help reduce the near-field search space and thereby lower the overhead of wireless sensing and communication tasks. Finally, we validate the dataset through representative case studies, demonstrating its utility and effectiveness. The open-source generator and dataset are available at this https URL.
Production logistics (PL) is increasingly exposed to variability, dynamic interdependencies, and operational disturbances that challenge conventional centralized planning and control. These characteristics are particularly pronounced in circular production systems, but are increasingly relevant across PL more generally. This paper addresses this challenge through the concept of Self-Organizing Production Logistics (SOPL) using the Design Science Research Methodology (DSRM) as a structuring framework. The paper identifies key technological and systemic drivers motivating SOPL, including autonomous logistics resources, distributed AI-based decision-making, and increasing operational uncertainty in circular production. Based on these drivers, system-level objectives and design requirements for SOPL are derived. Building on these requirements, an initial multi-agent architecture is proposed that combines embodied and non-embodied agents, event-driven coordination, semantic knowledge structures, and digital twins. In addition, a three-phase demonstration roadmap is presented, ranging from an initial laboratory demonstrator toward increasingly distributed and adaptive SOPL systems. The Phase I demonstrator serves as an experimental setup for investigating disturbance handling, human involvement, and supervisory coordination in an order-driven kitting and supply scenario. Overall, the paper contributes a conceptual foundation for the design, implementation, and experimental evaluation of SOPL systems.
Clinicians lack a principled framework to quantify diagnostic utility in ultrasound reconstructions. Existing standards like PSNR and VGG-LPIPS are inadequate, failing to account for modality-specific physics or the structural nuances of acoustic imaging. We close this gap with a TinyUSFM-based evaluation framework featuring two distinct metrics: TinyUSFM-uLPIPS, a full-reference perceptual distance based on multi-layer token relations, and TinyUSFM-NRQ, a deployable no-reference quality score utilizing clean-manifold modeling and worst-region aggregation to detect localized harmful artifacts. We demonstrate that the presented metrics have four unique advantages: 1) Task-linked quality, where TinyUSFM-uLPIPS achieves superior calibration with semantic task damage, accurately reflecting Dice-score drops in segmentation where VGG-based metrics fail; 2) Cross-organ comparability, maintaining stable scoring scales and consistent severity rankings across diverse anatomical sites and domain-shifted data; 3) PSNR-consistent sensitivity, with TinyUSFM-NRQ providing a reliable quality score without ground-truth images that remains consistent with traditional fidelity benchmarks (i.e. PSNR); and 4) Clinical utility, improving the prediction of expert preference from 47.2$\%$ to 72.8$\%$ accuracy and producing super-resolution reconstructions preferred by sonographers. By integrating these advantages into a unified assessment and optimization loop, this work establishes a modality-aligned standard that finally bridges the gap between algorithmic performance and diagnostic utility. Our code is available at this https URL.
Deep learning has been widely adopted for WiFi CSI-based human activity recognition (HAR) due to its ability to learn spatio-temporal features in a privacy-preserving and cost-effective manner. However, DL-based models generalize poorly across environments, a challenge amplified in multi-user settings where overlapping activities cause CSI entanglement and domain shifts. Practical deployments often limit access to labeled source data due to privacy constraints, motivating source-free adaptation using only unlabeled target-domain CSI and a pre-trained source model. In this paper, we propose MU-SHOT-Fi, a source-free unsupervised domain adaptation framework for single- and multi-user Wi-Fi sensing. MU-SHOT-Fi employs permutation-invariant set prediction with Hungarian matching during source training, followed by frozen-classifier backbone adaptation in the target domain. To enable stable adaptation without labels, we introduce occupancy-weighted information maximization that prevents model collapse by focusing diversity regularization on likely-occupied slots while excluding the dominant class from marginal entropy. Additionally, we employ binary rotation prediction as spatial self-supervision that exploits CSI frequency-time structure to learn domain-invariant features. For single-user scenarios, we introduce SU-SHOT-Fi by replacing occupancy weighting with standard information maximization and incorporating contrastive predictive coding to exploit temporal consistency. Extensive experiments on the WiMANS and Widar 3.0 datasets across cross-environment, cross-frequency, cross-orientation, and combined domain shifts demonstrate that MU-SHOT-Fi effectively recovers multi-user exact-activity classification performance under large domain shifts while maintaining accurate occupancy estimation and preventing collapse toward dominant classes.
Underwater acoustic target recognition is critical for maritime applications, yet it faces challenges arising from the complex and diverse nature of ship-radiated noise. To address these issues, we propose a robust deep learning-based framework. First, we introduce a feature extraction and fusion method based on variational mode decomposition (VMD) and the 3/2-D spectrum to generate high-fidelity 2-D DEMON spectral features, which effectively capture modulation envelope information. To further enhance feature representation, we design a one-dimensional convolutional neural network (1-D CNN) integrated with a novel Multi-Stage Multi-Type Attention Mechanism (MMATT) that adaptively refines features at different network depths. Within this mechanism, we propose a Residual Channel-Independent Spectral Attention Mechanism (R-CISAM) and a Multi-Scale Separate-and-Fuse Spectral Attention Mechanism (MS-SFSAM). Moreover, to mitigate performance degradation caused by severe class imbalance inherent in real-world ship-radiated noise data, we devise an Adjustable Class-Balanced Focal Loss (ACBFL), which provides flexibility across tasks with varying degrees of imbalance. Experimental results on a real-world ship-radiated noise dataset demonstrate that the proposed solutions effectively enhance underwater acoustic target recognition performance.
Automotive radars are increasingly susceptible to mutual interference from neighboring radar systems, which can lead to false target detections and the masking of valid targets. While current interference levels remain manageable due to the relatively low penetration of radar-equipped vehicles, this assumption is expected to break down as radar adoption and per-vehicle radar density continue to increase. This paper presents a comprehensive analysis of automotive radar performance in high-density interference environments. A realistic end-to-end simulation framework is developed at the intermediate frequency (IF) level, incorporating analytical interference modeling and detailed radar signal processing. The study evaluates the impact of interference across a range of future scenarios characterized by increased radar density and multiple radar configurations per vehicle. Conventional interference mitigation techniques are systematically assessed to validate the simulation results, controlled experiments were conducted using a host radar exposed to up to 30 interfering radars in both anechoic and real-world environments. The results demonstrate significant performance degradation under high interference conditions, with substantial reductions in detection probability and effective range. Among the evaluated techniques, time-frequency coding consistently provides the most robust performance, maintaining high detection probability even at elevated radar penetration rates. These findings highlight the limitations of current mitigation approaches and emphasize the need for coordinated and scalable interference management strategies in future automotive radar systems.
MRI reconstruction is an inherently ill-posed inverse problem, since incomplete measurements admit many plausible solutions. This ambiguity becomes more severe under high acceleration, where pixel-domain continuous predictors tend to average over feasible reconstructions and suppress high-frequency anatomy. We address this limitation by moving reconstruction to discrete multi-scale latent space and posing it as autoregressive next-acceleration-scale prediction. Leveraging discrete priors proven effective in visual autoregressive modeling, our method restricts the solution to compact sequences of codebook tokens, enabling sharp reconstructions even from extremely sparse measurements. This discrete autoregressive formulation also aligns naturally with modern large language model post-training techniques. Building on this observation, we introduce on-policy privileged information distillation for visual autoregressive modeling, where a teacher is provided training only privileged context that is unavailable at inference, in our case fully sampled acquisitions, and supervises a student trained on its own rollouts, leading to consistent reconstruction gains. Through extensive experiments on the fastMRI benchmark, we show that our approach delivers improved reconstruction performance across diverse sampling patterns under extreme undersampling. Project website is \href{this https URL}{here}.
Light field cameras capture multi-view observations within a single exposure. However, existing studies are typically tailored to specific LF representations, leaving the field without a unified learning framework. To bridge this gap, we present LFX, the first unified framework for LF perception. LFX establishes a representation-invariant feature modulation space, enabling it to adapt to heterogeneous LF representations and diverse perception tasks. Specifically, we propose Field-of-Parallax Angular Subspace Modeling (FoP-ASM), which assigns an independent angular marker to each auxiliary view, enabling view-wise independent modeling. Meanwhile, shared manifold subspace constraints and regularization losses enforce globally consistent semantic modulation across views. Extensive evaluations across three LF benchmarks show that LFX achieves state-of-the-art results across distinct LF representations, outperforming representation-specific methods by up to 12% and 20% with 0.029/0.027 MAE for salient object detection, and achieving 84.37 mIoU for semantic segmentation. The source code will be made publicly available at this https URL.
This paper investigates a distributed robust Nash Equilibrium (NE) seeking problem for second-order players subject to external disturbances and uncertain dynamics while communicating via semi-Markov switching topologies. To accommodate the above concerns, the following targets require to be reached simultaneously: (1) Disturbances and uncertain dynamics rejection in finite time; (2) NE seeking for the second-order players; (3) Distributed action estimation on non-neighboring players under semi-Markov switching. By combining supertwisting-based Integral Sliding-Mode Control (ISMC) with a leader-follower consensus protocol, a novel robust NE seeking algorithm is constructed. Furthermore, to lessen dispensable information transmission, a sampled-data-based event-triggered mechanism is introduced. Incorporating the advantages of both semi-Markov switching and event-triggered mechanism, another NE seeking algorithm is proposed. Theoretical analysis via a Lyapunov-Krasovskii functional proves the leader-follower consensus can be achieved in the mean-square sense. Finally, a connectivity control game is formulated to validate the algorithms.
Wireless indoor localization using predictive models with received signal strength information (RSSI) requires proper calibration for reliable position estimates. One remedy is to employ synthetic labels produced by a (generally different) predictive model. But fine-tuning an additional predictor, as well as estimating residual bias of the synthetic labels, demands additional data, aggravating calibration data scarcity in wireless environments. This letter proposes an approach that efficiently uses limited calibration data to simultaneously fine-tune a predictor and estimate the bias of synthetic labels, yielding prediction sets with rigorous coverage guarantees. Experiments on a fingerprinting dataset validate the effectiveness of the proposed method.
As multi-agent systems (MAS) become increasingly prevalent in autonomous systems, distributed control, and edge intelligence, efficient communication under resource constraints has emerged as a critical challenge. Traditional communication paradigms often emphasize message fidelity or bandwidth optimization, overlooking the task relevance of the exchanged information. In contrast, goal-oriented communication prioritizes the importance of information with respect to the agents' shared objectives. This review provides a comprehensive survey of goal-oriented communication in MAS, bridging perspectives from information theory, communication theory, and machine learning. We examine foundational concepts alongside learning-based approaches and emergent protocols. Special attention is given to coordination under communication constraints, as well as applications in domains such as swarm robotics, federated learning, and edge computing. The paper concludes with a discussion of open challenges and future research directions at the intersection of communication theory, machine learning, and multi-agent decision making.
We study the problem of learning the optimal policy in a discounted, infinite-horizon reinforcement learning (RL) setting in the presence of adversarially corrupted rewards. To address this problem, we develop a novel robust variant of the \(Q\)-learning algorithm and analyze it under the challenging asynchronous sampling model with time-correlated data. Despite corruption, we prove that the finite-time guarantees of our approach match existing bounds, up to an additive term that scales with the fraction of corrupted samples. We also establish an information-theoretic lower bound, revealing that our guarantees are near-optimal. Notably, our algorithm is agnostic to the underlying reward distribution and provides the first finite-time robustness guarantees for asynchronous \(Q\)-learning. A key element of our analysis is a refined Azuma-Hoeffding inequality for almost-martingales, which may have broader applicability in the study of RL algorithms.
Music performance is a distinctly human activity, intrinsically linked to the performer's ability to convey, evoke, or express emotion. Machines cannot perform music in the human sense; they can produce, reproduce, execute, or synthesize music, but they lack the capacity for affective or emotional experience. As such, music performance is an ideal candidate through which to explore aspects of collaboration between humans and machines. In this paper, we introduce the witheFlow system, designed to enhance real-time music performance by automatically modulating audio effects based on features extracted from both biosignals and the audio itself. The system, currently in a proof-of-concept phase, is designed to be lightweight, able to run locally on a laptop, and is open-source given the availability of a compatible Digital Audio Workstation and sensors.
Optimal control problems with discrete-valued inputs are inherently challenging due to their mixed-integer nature, rendering them generally intractable for real-time, safety-critical aerospace applications. Lossless convexification offers a powerful alternative by reformulating these mixed-integer programs into computationally efficient convex programs. This paper develops a lossless convexification framework for the optimal control of linear time-varying systems with discrete-valued inputs. We extend existing theoretical results by demonstrating that system normality is preserved when reformulating Lagrange-form problems into Mayer-form via an epigraph transformation. Furthermore, we establish that under simple geometric conditions on the input set, the solution to the relaxed convex problem strictly satisfies the original non-convex input constraints. This framework enables the real-time computation of optimal discrete-valued controls without resorting to mixed-integer optimization. The proposed algorithm is validated on a spacecraft rendezvous maneuver utilizing discrete-valued reaction thrusters in an elliptical orbit. Numerical results from Monte Carlo simulations confirm that the algorithm consistently yields exact discrete-valued control inputs with computational timelines compatible with safety-critical, on-board applications.
Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs. We show that scaling model capacity, data, and compute yields a generalist humanoid controller capable of natural, robust whole-body movements. We position motion tracking as a scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (1.2M to 42M parameters), dataset volume (100M+ frames from 700 hours of motion capture), and compute (21k GPU hours). Beyond demonstrating the benefits of scale, we further show downstream utility through: (1) a real-time kinematic planner bridging motion tracking to tasks such as navigation, enabling natural and interactive control, and (2) a unified token space supporting VR teleoperation and vision-language-action (VLA) models with a single policy. Through this interface, we demonstrate autonomous VLA-driven whole-body loco-manipulation requiring coordinated hand and foot placement. Scaling motion tracking exhibits favorable properties: performance improves steadily with compute and data diversity, and learned policies generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.
Inspired by contagion models of social belief formation, we develop an epistemically-informed modeling framework, SIS-Vo, in which vaccine-related information propagates on a signed opinion network. Our model allows for heterogeneous treatment effects of policy messages across subpopulations through demographic-specific responses. We derive fixed-point characterizations of the healthy (disease-free) and endemic equilibria of this model, and obtain conditions for local stability of the healthy state in terms of the contact network and opinion-dependent vaccination capacities. Using numerical simulations, we illustrate how suitably targeted policy interventions, acting through opinion dynamics, can stabilize the epidemic process by moving the system towards the healthy regime. The SIS-Vo framework thus provides a natural basis for control-theoretic analysis of vaccination policies that remain robust even when misinformation targets specific subgroups.
Standard formulations of prescribed worst-case disturbance energy-gain control policies for linear time-varying systems depend on all forward model data. In discrete time, this dependence arises through a backward Riccati recursion. This article is about the infinite-horizon $\ell_2$ gain performance of state feedback policies with only finite receding-horizon preview of the model parameters. The proposed synthesis of controllers subject to such a constraint leverages the strict contraction of lifted Riccati operators under uniform controllability and observability. The main approximation result is a sufficient number of preview steps for the incurred performance loss to remain below any set tolerance, relative to the baseline gain bound of the associated infinite-preview controller. Aspects of the result are explored in a numerical example.
In this work, we investigate a blockage-aware pinching antenna (PA) system designed for secure and robust wireless communication. The considered system comprises a base station equipped with multiple waveguides, each hosting multiple PAs, and serves multiple single-antenna legitimate users in the presence of multi-antenna eavesdroppers under imperfect channel state information (CSI). To safeguard confidential transmissions, artificial noise (AN) is deliberately injected to degrade the eavesdropping channels. Recognizing that conventional linear CSI error bounds become overly conservative for spatially distributed PA architectures, we develop new geometry aware uncertainty sets that jointly characterize eavesdropper position and array-orientation errors. Building upon these sets, we formulate a robust joint optimization problem that determines per waveguide beamforming and AN covariance, individual PA power ratio allocation, and PA positions to maximize the system sum rate subject to secrecy constraints. The highly nonconvex design problem is efficiently addressed via a low computational complexity iterative algorithm that capitalizes on block coordinate descent, penalty based methods, majorization minimization, the S procedure, and Lipschitz based surrogate functions. Simulation results demonstrate that the sum rate achieved by the proposed algorithm outperforms conventional fixed-antenna systems by 4.7 dB, offering substantially improved rate and secrecy performance. In particular, (i) adaptive PA positioning preserves LoS to legitimate users while effectively exploiting waveguide geometry to disrupt eavesdropper channels, and (ii) neglecting blockage effects in the PA system significantly impacts the system design, leading to performance degradation and inadequate secrecy guarantees.
The alternating direction method of multipliers (ADMM) has gained increasing popularity in embedded model predictive control (MPC) due to its code simplicity and pain-free parameter selection. However, existing ADMM solvers either target general quadratic programming (QP) problems or exploit sparse MPC formulations via Riccati recursions, which are inherently sequential and therefore difficult to parallelize for long prediction horizons. This technical note proposes a novel \textit{parallel-in-horizon} and \textit{construction-free} nonlinear MPC algorithm, termed $\pi$MPC, which combines a new variable-splitting scheme with a velocity-based system representation in the ADMM framework, enabling horizon-wise parallel execution while operating directly on system matrices without explicit MPC-to-QP construction. Numerical experiments and accompanying code are provided to validate the effectiveness of the proposed method.
Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel Lyapunov function that accounts for such delays. The resulting proof is short and modular, providing the first high-probability bounds for SAG and SAGA that can be seamlessly extended to non-convex objectives and Markov sampling. As an immediate byproduct of our new analysis technique, we obtain the best known rates for the IAG algorithm, significantly improving upon prior bounds.
Mapping is essential in robotics and autonomous systems because it provides the spatial foundation for path planning. Efficient mapping enables planning algorithms to generate reliable paths while ensuring safety and adapting in real time to complex environments. Fixed-resolution mapping methods often produce overly conservative obstacle representations that lead to suboptimal paths or planning failures in cluttered scenes. To address this issue, we introduce Parallel OctoMapping (POMP), an efficient OctoMap-based mapping technique that maximizes available free space and supports multi-threaded computation. To the best of our knowledge, POMP is the first method that, at a fixed occupancy-grid resolution, refines the representation of free space while preserving map fidelity and compatibility with existing search-based planners. It can therefore be integrated into existing planning pipelines, yielding higher pathfinding success rates and shorter path lengths, especially in cluttered environments, while substantially improving computational efficiency.
Predictive safety filters (PSFs) leverage model predictive control to enforce constraint satisfaction during deep reinforcement learning (RL) exploration, yet their reliance on first-principles models or Gaussian processes limits scalability and broader applicability. Meanwhile, model-based RL (MBRL) methods routinely employ probabilistic ensemble (PE) neural networks to capture complex, high-dimensional dynamics from data with minimal prior knowledge. However, existing attempts to integrate PEs into PSFs lack rigorous uncertainty quantification. We introduce the Uncertainty-Aware Predictive Safety Filter (UPSi), a PSF that provides rigorous safety predictions using PE dynamics models by formulating future outcomes as reachable sets. UPSi introduces an explicit certainty constraint that prevents model exploitation and integrates seamlessly into common MBRL frameworks. We evaluate UPSi within Dyna-style MBRL on standard safe RL benchmarks and report substantial improvements in exploration safety over prior neural network PSFs while maintaining performance on par with standard MBRL. UPSi bridges the gap between the scalability and generality of modern MBRL and the safety guarantees of predictive safety filters.
The "binding problem" of how distributed neural activity unifies into conscious experience has remained an open challenge since its articulation in 1890. We present evidence that conscious integration relies on self-organized criticality maintained by brain-body resonance, placing human cognition within the universality class of critical systems. Using 64-channel EEG data, we demonstrate that conventional preprocessing inadvertently eliminates the very integrative dynamics it seeks to measure. Removing physiological signals conventionally treated as "artifacts" drastically reduces the shared variance between global phase synchronization and stimulus-evoked amplitude, an effect highly specific to physiological components. We trace this to a fundamental brain-body resonance at 78 milliseconds that establishes zero-lag synchronization driven by robust bidirectional causality. Crucially, raw data exhibits heavy-tailed avalanche dynamics indicative of a near-critical regime, whereas conventionally cleaned data definitively rejects power-law distributions, signaling an artificial shift to subcriticality. Finally, we show these critical dynamics enable holographic information encoding, evidenced by a significant emergence of spatial interference patterns post-resonance. Together, these findings indicate that physiological signals actively and selectively support the coupling between large-scale neural coordination and event-related processing.
This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and within the same facility over time, even when the demand path is held fixed. Repeated sampling, a natural test-time remedy, fails to meaningfully reduce this instability, suggesting that reliability requires changing the underlying decision policy rather than merely averaging over model outputs. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. Post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.
Robotic systems are vulnerable to False Data Injection Attacks (FDIAs), where adversaries corrupt sensor signals to gain malicious control. Feedback linearization exposes robotic systems to integrator vulnerability, making them susceptible to stealthy attacks that can cause significant deviations in end-effector behavior without raising alarms. This paper addresses the resilience of manipulators against finite-horizon FDIAs by formalizing two defense methods, namely anomaly-aware virtual damping and manipulability reduction, with probabilistic guarantees on nominal task execution. Simulations on a 7-DOF redundant manipulator show that the proposed defenses substantially reduce the impact of FDIA compared to using solely a threshold-based ADS like the Chi-squared, while preserving nominal task performance in the absence of attack.