Reliable seizure prediction is a prerequisite for closed-loop neurostimulation therapy, yet existing methods rarely account for the variability in EEG signal quality encountered in real-world deployment, and the overwhelming majority adopt non-strict evaluation protocols that overestimate generalisation performance. We propose CLSP-REQA (Closed-Loop Seizure Prediction with Real-time EEG Quality Assessment), a unified framework that embeds a lightweight signal quality estimator directly within the prediction pipeline. A Real-time EEG Quality Assessment (REQA) module runs in parallel with a Mamba-BiLSTM backbone, producing a scalar quality score q in [0,1] that modulates output confidence through a tiered non-linear fusion function (ECLO). Under strict cross-patient evaluation on the CHB-MIT Scalp EEG Database (n = 23 subjects, 198 seizures), CLSP-REQA achieves an AUC-ROC of 0.7426 +- 0.0199, outperforming the unadapted cross-patient baseline of 0.69 reported by Jemal et al., using only 16 EEG channels compared to 23 in prior work, and without requiring any target-patient data or domain adaptation. On the SIENA Scalp EEG Database (n = 14 subjects, 47 seizures), CLSP-REQA achieves AUC 0.7012 +- 0.0249, substantially surpassing the best domain-adapted cross-patient result of 0.61 on the same dataset, demonstrating strong cross-dataset generalisation. The framework outputs a structured four-tuple (p, q, c, Phi_SHAP) directly compatible with closed-loop neurostimulator interfaces.
This paper presents a proof-of-concept system for localising ground-based WiFi access points, acting as IEEE~802.11mc Fine Time Measurement (FTM) responders, from an uncrewed aerial vehicle using FTM ranging and Global Navigation Satellite System (GNSS)-referenced moving-baseline multilateration. Each associated GNSS-referenced FTM-initiator pose supplies a known reference point, turning the flight trajectory into a temporal multilateration problem. The real-time smartphone pipeline performs GNSS--ranging time association, robust outlier gating, a two-stage Gauss-Newton bootstrap, and sequential Bayesian filtering with bias tracking. Six measurement-noise configurations, including empirical and adaptive models, are evaluated on field data collected in unstructured, mountainous terrain. For a line-of-sight access point with \num{455} ranging measurements, the online Android pipeline achieves a final horizontal error of \SI{4.4}{\metre}, while offline replay of the same flight yields a time-weighted mean horizontal error of \SI{4.7}{\metre} and a best-case final horizontal error of \SI{1.1}{\metre} under the best noise model after a close flyby. For non-line-of-sight targets, the real-time pipeline does not converge because of limited measurement availability, weak geometry, and signal attenuation, although an offline robust least-squares solver recovers a coarse solution for the vegetation-only case. The system is intended as a building block for Networked One Search Agent architectures, and preliminary middleware tests demonstrate software-level interoperability, while quantitative multi-agent accuracy is left for future work.
Brain-computer interfaces (BCIs) are limited by low signal-to-noise ratio in modalities such as electroencephalography, which requires multiple trials to reliably decode user intentions. This induces a speed-accuracy trade-off, whereby higher accuracy comes at the cost of speed. The speed-accuracy balance is application-dependent, motivating controllable trade-offs. Conventional metrics, such as the Information Transfer Rate, combine speed and accuracy obscuring their dependence and potentially introducing biases. In this study, we propose an evaluation framework independent of classifier, paradigm, and early-stopping strategy that separates speed and accuracy. We employ two measures, Gain (relative speed improvement) and Conservation (relative accuracy preservation), and combine them into a tunable Gain-Cons Balance controlled by {\alpha}, regulating the speed-accuracy trade-off. The parameter adjusts the operating point without modifying the classifier, facilitating deployment across scenarios. The framework was evaluated on P300 event-related potential paradigms using public recordings from 63 subjects as well as multiple classifiers and early-stopping strategies to achieve distinct operating points in speed-accuracy and bitrate. Results show that tuning {\alpha} yields fast, accurate, or balanced BCI behaviours, demonstrating explicit control of the speed-accuracy trade-off. The method supports subject-level performance prediction and improves explainability of BCI behaviour. Further analysis of the Information Transfer Rate reveals a systematic bias toward speed, explained by the proposed framework through the Gain and Conservation measurements. Overall, this work establishes the speed-accuracy trade-off as a controllable design variable validated on public P300-based paradigms, enabling transparent evaluation and application-specific optimization of BCIs.
Electrocardiography (ECG) remains central to cardiovascular screening, yet interpretation remains largely manual and episodic. Clinical practice relies on brief resting ECGs and, when required, long-duration ambulatory recordings, both generating data that require resource-intensive review. Consequently, subtle morphological changes or progressive drift preceding clinically apparent abnormalities may go unnoticed. We propose a motif-based framework that defines beat-aligned ECG motifs as interpretable cardiac signatures and quantifies morphological drift and deviation across short and long-term monitoring. Motifs are representative cardiac cycles capturing dominant morphology. We introduce three interpretable drift metrics: deviation from a normal sinus rhythm (NSR), deviation from a personalised baseline, and a motif instability index. Motifs are extracted by selecting beats that minimise Dynamic Time Warping (DTW) distance within fixed windows. We evaluate these metrics on short (PTB-XL) and long-duration (MIT-BIH Arrhythmia) ECG datasets. Interpretability is achieved through representative motif overlays and fiducial-based visualisations, enabling direct inspection of morphological changes. In MIT-BIH, the proposed metrics significantly separated predominantly normal from arrhythmic subjects (p<0.01). In PTB-XL, NSR deviation distinguished normal from abnormal ECGs across major diagnostic subtypes (p<1e-4, Cliff's delta up to 0.93). ECG motifs provide an interpretable representation of cardiac morphology, supporting scalable longitudinal monitoring and early detection of morphology-driven change.
Global biodiversity is declining at unprecedented rates, yet the tools available to monitor and protect ecosystems remain limited by constraints in power, connectivity, and accessibility. We present SPARROW, a hardware and software open-source platform that integrates solar energy, edge artificial intelligence, and satellite communication to enable continuous, autonomous biodiversity monitoring in remote environments. Each SPARROW node combines a low-power Graphics Processing Unit (GPU) with modular visual, acoustic, and environmental sensors, performing on-device deep learning inference and transmitting summarized results through Low-Earth-Orbit (LEO) satellite or Global System for Mobile Communications (GSM) networks. We deployed SPARROW across tropical, temperate, and montane ecosystems in Colombia, Peru, Tanzania, and the United States, where it sustained 24/7 operation under variable environmental conditions and collected more than two million images and acoustic recordings in the first 190 days. The system demonstrated robust real-time classification and adaptive power management, achieving full autonomy without on-site human intervention. By integrating renewable energy, on-edge AI, and open-source design, SPARROW lowers the technical and financial barriers to ecological monitoring and establishes a scalable foundation for a distributed, intelligent network of sensors, an emerging "Internet of Living Things" for planetary biodiversity monitoring.
State-of-the-art learned image compression (LIC) schemes are increasingly based on hybrid CNN-transformer architectures. To further improve rate-distortion performance, we introduce channel-wise wavelet transforms into both the transformer and entropy-coding components. First, we propose a channel-wise wavelet-domain transformer attention (ChWDTA) mechanism. ChWDTA keeps the efficient windowed spatial self-attention used in modern LIC backbones, but computes the Q/K/V projections on channel-wise wavelet-transformed features before mapping the attention output back with the inverse transform. The resulting Channel-wise Wavelet-Domain Transformer Block (ChWDTB) therefore preserves the spatial tokenization pattern of windowed attention while sparsifying the channel covariance seen by the attention projections. Second, in the entropy-coding stage, we introduce a channel-wise wavelet packet (ChWP) decomposition that produces four equal-sized subbands, which better fit channel-wise slice-based autoregressive entropy modeling. When each channel-wise subband is divided into two slices, we use eight slices for entropy coding. With this configuration, the proposed scheme obtains BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively. Even when each channel-wise subband is coded as a single slice, the scheme still retains most of the coding gains with lower complexity. The results confirm the advantage of introducing wavelet transform in CNN-transformer-based LIC schemes.
This paper proposes SpikeWFM, a novel hybrid architecture that integrates spiking neural networks (SNNs) with conventional artificial neural network (ANN)-based transformers for wireless foundation models (WFMs). Inspired by the noise-robust and energy-efficient information processing in the human brain, SpikeWFM aims to enhance the resilience of WFMs against noise and interference while maintaining strong generalization capabilities across diverse wireless scenarios. Drawing from the success of large language models, WFMs leverage self-supervised pre-training on large-scale datasets spanning various wireless environments to learn a unified embedding that supports a wide range of downstream tasks, including channel prediction, channel estimation, beam predition, positioning and etc. Such models typically outperform task-specific designs and exhibit superior adaptability to unseen conditions. However, existing WFMs remain vulnerable to realistic noise and interference in practical wireless systems. To address this limitation, we incorporate spiking neurons into the transformer-based WFM architecture. We provide a brief theoretical analysis demonstrating how the SNN-ANN hybrid effectively mitigates noise and interference through temporal sparsity and event-driven processing. Experimental results show that SpikeWFM consistently outperforms conventional ANN-based WFMs in both pre-training convergence and channel prediction accuracy. Additional results on communication and sensing tasks will be presented in the full journal version of this work.
We approach image denoising from a perception-driven perspective: how can we select the parameters that are best suited for human visual perception? We combine research methods in mathematics and psychology to develop a mathematical framework for measuring perceived similarity. We construct a sample set of differently denoised photographs by using the same base image as input data and by tuning the parameter value in a total variation denoising algorithm. A comparison test is conducted with human participants to survey perceived differences between the images. Analyzing the results with psychometric scaling provides us with a HaarPSI value to use as a threshold in discretizing parameter grids. As a result, we obtain psychometrically scaled, openly available image sets that are ready to use in further experiments in perception-driven imaging, as well as a framework for ensuing experiments involving comparison tests.
Differentiable signal parameterizations such as implicit neural representations (INRs) and hybrid models are increasingly central to computational imaging, yet principled tools for evaluating reconstruction fidelity at finite model size remain limited when ground truth is unavailable. We introduce a framework for predicting the reconstruction error of compressive signal parameterizations, yielding non-asymptotic, signal-specific bounds that are both theoretically sound and efficiently computable without access to the ground truth signal. Specifically, we prove that when parameterization-based compression satisfies certain natural properties, the compression error at any compression level is bounded by a simple scaled difference between model predictions at different compression levels. We verify these properties for representative model families including interpolated grids, Fourier feature networks, multi-resolution hash encodings, and tensor factorizations, and show empirically that the resulting worst-case guarantees can be efficiently adapted into signal-specific error predictors that are tight and generalizable. Across direct fitting of synthetic and natural signals, and inverse problems including radiance field and MRI reconstruction, our method closely tracks global error curves and yields informative local error heatmaps without ground-truth access. Code is available at this https URL.
Motion artifacts in magnetic resonance imaging (MRI) degrade diagnostic reliability. Existing deep learning methods are typically contrast-specific and fail to generalize across diverse modalities and artifact severities. We propose a unified framework combining parameter-informed contrast disentanglement with severity-aware adaptive correction. ScanCLIP, pretrained on over 30,000 MRI text-image pairs, derives contrast embeddings from acquisition parameters to disentangle contrast style from anatomical content, yielding contrast-free features. A Vision Transformer then estimates motion severity and routes features through a Mixture-of-Experts network, enabling targeted artifact correction. A dual-pathway decoder reconstructs both the clean image and residual artifact map, enforcing image-space consistency. On IXI and HCP benchmarks, our method improves PSNR by 0.75 dB and SSIM by up to 0.0279 over state-of-the-art approaches, with larger gains at higher artifact severities. It further demonstrates robust zero-shot generalization on real-world clinical data acquired with unseen scanning parameters, where existing methods either fail to remove artifacts or introduce additional distortions.
Understanding the human brain requires access to its microscopic tissue architecture. Diffusion magnetic resonance imaging (MRI) provides the only noninvasive window into whole-brain microstructure in vivo, yet reliable quantitative mapping remains confined to specialized research settings requiring dense sampling and optimized acquisition protocols. To address this gap, we present a physics-informed generative microstructure network (PIGMENT) that learns a universal generative prior of human brain microstructure and adapts it zero-shot to each participant's measured data to recover subject-specific maps. Trained on 11375 scans spanning multiple sites, vendors, and field strengths, PIGMENT enabled reliable quantitative mapping for tensor, kurtosis, and NODDI models across external datasets from five independent centers. It remains effective where conventional fitting becomes unreliable, recovering meaningful maps from extremely sparse acquisitions while supporting downstream tractography and structural connectivity mapping. PIGMENT estimates demonstrated strong biological validity, preserving submillimeter cortical microarchitectural patterns and early-childhood white matter developmental trajectories from 10-fold accelerated scans. Furthermore, PIGMENT enables reliable quantitative tensor mapping on cost-efficient low-field systems and the extraction of tumor-related biomarkers using ultra-fast clinical protocols. Together, these results establish PIGMENT as a physics-informed foundation model that extends quantitative diffusion MRI into regimes traditionally too sparse, heterogeneous, or clinically constrained for reliable analysis.
Continuous variable-rate compression is highly demanded in real-world applications, but remains underexplored in scalable image coding for humans and machines. In this paper, we propose a training-free variable-rate scalable image coding framework. By adjusting quantization steps based on predicted scale values, the proposed method achieves continuous bitrate control while preserving high-scale information in the machine and enhancement layers. Experimental results demonstrate the effectiveness of the proposed method and highlight the importance of bitrate allocation between the two layers.
In this paper, we investigate contraction analysis for nonlinear time-delay systems described by functional differential equations. We first extend the concept of Lyapunov-Krasovskii functionals within the differential framework. We then show that its existence is equivalent to that of an incremental Lyapunov-Krasovskii functional and guarantees uniform incremental exponential stability. Next, we extend the concept of Lyapunov-Razumikhin functions within the differential framework, whose existence also ensures uniform incremental exponential stability. As an application of our results, we formulate stabilizing feedback control design for nonlinear time-delay systems with single delays in terms of linear matrix inequalities.
This letter studies CSI denoising for MIMO--OFDM with variable NR resource block (RB) allocations. ReFLEX is a length-generalizable Transformer whose frequency attention uses a relative-frequency position bias (RFPB) generated from subcarrier offsets. A single checkpoint handles unseen RB lengths and can be applied to sparse DM-RS observations in the tested RB5/RB10 PUSCH setup without retraining. In a 3GPP~TR~38.901 UMa NLOS channel, ReFLEX achieves about $-9.6$~dB NMSE on unseen RB lengths. In NR PUSCH/UL-SCH simulations, ReFLEX denoising followed by time-frequency interpolation reduces the 10\% BLER threshold by about 2--3~dB.
Fast and low-overhead beam management is a critical requirement for the practical deployment of non-terrestrial networks (NTNs) operating at millimeter-wave and higher frequencies. In this paper, we propose a radar-assisted beam selection framework for NTNs that limits the set of candidate beams by utilizing spatial sensing information such as the angle-of-departure (AoD) and distance estimations. To provide theoretical insight into the expected worst-case overhead, we conduct a probabilistic analysis under idealized conditions, where an approximation of the worst-case beam selection overhead is proposed and its statistics are derived under Gaussian error. Additionally, the proposed framework is applied to a physical-layer security (PLS) scenario by leveraging the radar's capability to detect passive targets that represent unintended users. The simulation results show that the unintended user's power is suppressed below -135 dBm, while an additional beamforming gain of roughly 2 dB is attained for the legitimate users.
Control barrier functions (CBFs) provide real-time safety guarantees through pointwise conditions on the state. However, synthesizing a valid CBF is difficult and the resulting controllers are myopic. To address myopia, this article introduces predicted-flow control barrier functions (P-CBFs), which generalize the CBF from a function of the current state to a functional of a predicted flow under a parametrized control plan over a finite prediction horizon. For safety, a P-CBF can certify that the predicted flow is in a safe set over the entire prediction horizon. However, candidate P-CBFs suffer from the same challenge as candidate CBFs, namely, control constraints make it difficult to guarantee that the P-CBF is valid. This article resolves this challenge by introducing a terminal candidate P-CBF requiring that the predicted flow end in a backup safe set at the terminal time, and a planning-time shift that modulates the prediction horizon, providing an additional degree of freedom to ensure feasibility. The real-time control and the evolution of the control-plan parameter and planning-time shift are determined jointly by a single convex optimization that is guaranteed to be feasible and renders the associated safe set forward invariant. The resulting safe optimal flow control provides a safety certificate over the entire prediction horizon and unifies finite-horizon integral-cost optimization with safety certification. This optimization reduces to a quadratic program (QP) if the control constraints are a convex polytope. The QP implementation, termed FlowBarrier, is validated on a nonholonomic ground robot navigating a dense environment. FlowBarrier is compared to nonlinear model predictive control and two CBF-based safety filter methods across 100 trials, where FlowBarrier achieves the highest goal-reaching rate, zero safety violations, and the lowest computation time.
Model Predictive Path Integral (MPPI) control is a powerful sampling-based method for solving stochastic optimal control problems and has enabled real-time control in complex robotic systems. Despite its empirical success, its theoretical understanding remains limited. In this work, we show that MPPI can be interpreted as a special case of the Expectation-Maximization (EM) algorithm applied to a probabilistic inference formulation of optimal control. This perspective leads to a generalized EM-MPPI framework that extends MPPI beyond the commonly used Gaussian parameterization. We analyze the convergence behavior of this algorithm and characterize the local convergence rate in terms of the covariance of the posterior trajectory distribution and the exploration distribution. For exponential-family distributions, we establish a sufficient increase property of the log-likelihood when the log-partition function is strongly convex. Specializing the analysis to Gaussian MPPI yields explicit global and local convergence characterizations. The code for the experiments will be available upon acceptance.
Recursive systems can enter collapse-like regimes -- self-reinforcing amplification, persistent recursion, and narrowing diversity that mask accelerating internal degradation -- before overt failure becomes visible. We introduce Loopzero, a claim-bounded benchmark framework for testing whether recursive failures follow a directional telemetry pattern: rising gain (G), recursive persistence (p), and declining diversity ($\delta$). The claim boundary is specified in Lean; the Lean artifact does not verify real telemetry, benchmark validity, or detector performance. We evaluate the bridge on two frozen public-artifact benchmarks: a segmented public-markets benchmark (Volmageddon 2018, COVID MWCB 2020) and a MovieLens-25M offline deterministic recommender replay. Detectors are evaluated under a locked equal-false-positive contract (FP $\in$ [0.03, 0.07], pre-registered) so all configurations face the same alert budget. Neither tested standard comparators nor Loopzero's pre-registered quantile detector achieved an accepted operating point. Directional witness alignment held on both canonical benchmarks, with adjacent-horizon and row-level limitations disclosed. Digitized Shumailov et al. (2024) LLM training-loop trajectories are directionally consistent with the pattern; matched-FP evaluation in that domain is deferred. The contribution is a reproducible, falsifiable benchmark framework for evaluating recursive-collapse warning claims under an explicit alert-budget contract -- non-acceptance reported as a first-class scientific outcome.
Sparse linear arrays obtained by thinning a uniform linear array (ULA) achieve large effective apertures with a reduced number of physical sensors and have become a key enabling technology across radar, sonar, communications, and integrated sensing and communications. The price of thinning, however, is the emergence of ambiguities in the array manifold: distinct sets of directions of arrival that produce identical sensor measurements, precluding unique identification of multiple sources. Conventional sparse-array design criteria, based on beampattern shaping or estimation-performance optimization, do not fully capture how multiple steering vectors interact jointly to produce such ambiguities. This paper develops a scalable algebraic framework for the multi-source identifiability analysis of thinned ULAs. By relating the rank deficiency of the generalized Vandermonde matrix associated with the sparse steering matrix to that of a thinned Toeplitz matrix, and further to a rank condition on an augmented full-ULA steering matrix with prescribed generators, we obtain a systematic characterization of the ambiguity sets in large sparse arrays together with constructive design guidelines for ambiguity-free geometries. Algebraic and numerical examples demonstrate that the proposed framework characterizes ambiguity sets at scales well beyond the practical reach of previous sparse-array design and synthesis methods
This paper proposes a tensor-based channel estimation framework for an uplink MIMO system assisted by a movable intelligent surface. The considered architecture combines a fixed transmissive metasurface with a smaller movable layer, whose discrete positions create an additional structured training dimension. By jointly exploiting fixed-layer phase patterns and movable-layer positions, the received pilots are modeled as a fourth-order PARAFAC tensor. A trilinear alternating least-squares receiver is then derived to estimate the individual channels and the position-dependent response. Importantly, the proposed method does not require prior knowledge of the movable-layer phase response at the receiver, since this unknown factor is estimated from the tensor structure of the received signal. Simulation results show that increasing the training length improves the NMSE of the estimated factors and the reconstructed cascaded channel.
Fermentation-derived side streams represent an underutilised resource for sustainable protein production. This study investigates the potential of centrate from industrial Fusarium venenatum fermentation as a nutrient source for fungal biomass generation. Following compositional characterisation, a synthetic centrate medium was formulated and evaluated using a Box-Behnken design combined with response surface methodology. Across 46 experimental runs, cell dry weight (CDW) ranged from 0.22 to 3.87 g per liter, demonstrating a strong dependence on nutrient composition. Ammonia and glucose were identified as the dominant factors influencing biomass production, with significant nonlinear effects. The model predicted a maximum CDW of 4.17 g per liter under optimised conditions, which was experimentally validated at 3.99 g per liter. Carbon conversion efficiency reached up to 29.02%, indicating effective substrate utilisation. These findings demonstrate that fermentation-derived centrate can support substantial fungal growth, while highlighting its potential to enhance nutrient recovery and influence the biochemical composition of sustainable mycoprotein.
Geometric distortion in prostate diffusion-weighted imaging (DWI) can impair lesion localization and reduce the reliability of MRI-based clinical assessment. We propose AutoIQ, an ensemble machine learning framework for automatic quantification and classification of DWI geometric distortion severity. A total of 140 retrospective prostate biparametric MRI examinations were analyzed, including 33 scans with severe distortion requiring repeat acquisition and 107 scans with acceptable distortion based on expert radiologist assessment. AutoIQ combines two complementary distortion quantification strategies: a segmentation-based method measuring prostate boundary mismatch between T2-weighted imaging (T2WI) and DWI, and a registration-based method estimating deformation magnitude after DWI-to-T2WI alignment. The resulting distortion scores were used to train individual classifiers and a logistic-regression ensemble model. Both computational methods significantly differentiated severe from acceptable distortion cases (p < 0.001). On an independent test set, the ensemble model achieved an accuracy of 0.95, F1-score of 0.93, and AUC of 0.98, outperforming individual models. These results suggest that AutoIQ can provide automated, quantitative quality assessment for prostate DWI and may help identify scans that require repeat acquisition.
Speech representations that capture prosodic information can be useful for both understanding and generation. However, speaker characteristics are reflected in acoustic-prosodic features (e.g., pitch). To address privacy concerns from the leakage of identity information, we propose a new self-supervised approach to learning prosody representations that incorporates speaker disentanglement strategies. We evaluate our encoder on three tasks to probe representation capabilities, including pitch reconstruction and detection of different prosodic events. Our encoder outperforms raw prosody and HuBERT-base baselines, achieving strong speaker disentanglement without adverse impact on prosody-related downstream tasks.
This paper addresses joint channel and symbol estimation in reconfigurable intelligent surface (RIS)-aided multiuser uplink systems with fluid antennas (FAs) at the base station. We propose the Nested Tucker for Fluid Antenna Systems (NTFAS) protocol, in which FA port selection and user-dependent coding vary across blocks while the transmitted symbol matrix is shared across observations. This structure yields coupled Tucker models with common channel and data factors. A two-stage semi-blind bilinear alternating least squares (BALS) receiver is then developed to estimate the cascaded channel and symbols, and to separate the user-to-RIS and RIS-to-BS channels through the embedded PARAFAC structure. Simulations show that NTFAS improves cascaded-channel NMSE and spectral efficiency (SE) with respect to a competing semi-blind benchmark, while maintaining comparable BER performance.
This paper characterizes the triggering behaviors of event-triggered control systems from a geometric-algebraic perspective. We first model the feasibility of inter-event time transition relations as a nonconvex quadratic constraint satisfaction problem and reformulate it as an equivalent linear cone problem, which provides a clearer geometric description of the feasible region, making subsequent analysis more reliable. Building on this formulation, we establish necessary and sufficient conditions that rigorously determine whether a given transition relation is feasible. Based on this condition, we propose an algorithm that computes the set of all feasible transition relations. Numerical simulations further demonstrate how the feasibility of specific transitions evolves with the control parameter \sigma, with visualizations of the feasible state space offering intuitive insight into parameter selection and system design.
The airport access use case is a promising early-stage application for Urban Air Mobility (UAM). Understanding the operational paradigm of UAM at airports is crucial for making equitable and effective regulatory and management decisions. A central open question is whether UAM will be integrated into the airport transportation network as a conventional scheduled transit service, such as subways and rail, or as a Transportation Network Company (TNC) characterized by dynamic supply-demand matching. In this paper, we propose a two-stage framework for conducting an economic feasibility analysis of UAM networks. In the first stage, we introduce a joint-supply-demand variable pricing problem to evaluate the impact of dynamic pricing on UAM operations. This model uses a binary logit formulation to capture the trade-off between travel time advantages and fare levels. In the second stage, the determined demand is used as input for the Electric Urban Air Mobility Vehicle Routing Problem with Non-linear Charging Time (eUAMVRP-NL), which optimizes fleet scheduling and charging decisions to derive operating revenue and cost estimates. We apply this framework to a case study of the Los Angeles International Airport (LAX) access market with an eight-spoke vertiport network. Our results indicate that UAM operations benefit significantly from TNC-like management; a variable pricing policy can increase operating profits by more than 100\% compared to fixed-pricing schemes. Furthermore, we identify economies of stage length in longer UAM flights.
The Machine Learning Core (MLC) embedded in the STMicroelectronics LSM6DSOX IMU is widely cited as a low-latency alternative to host-side inference, yet wire-level decision-delivery latency is rarely measured. Using a Saleae Logic Pro 8 logic analyzer on an NVIDIA Jetson Orin Nano, we measured interrupt-to-decision latency (sensor INT1 edge to host decision GPIO) for three pipelines (a host-side decision-tree classifier, the standard MLC bank-switch read protocol, and an MLC binary-fast variant) under idle, I2C bus contention, and CPU stress. The protocol was pre-registered with 12 externally-timestamped Zenodo amendments before confirmatory data collection (4,770 of 4,860 trials included, 98.15%, across nine cells). The host pipeline exhibits lower median latency than the MLC pipeline under all conditions: 321.7 vs 681.5 us at idle (2.1x faster) and 574.5 vs 1,325.4 us under I2C contention (2.3x faster). The three-transaction I2C read protocol, not the silicon's classification, is the dominant latency contributor. We additionally characterize a reproducible 706.5 ms MLC decision cadence that bounds full stimulus-to-decision latency. Code, data, and pre-registration: this http URL.
In this work, transmit antenna selection (TAS) and generalized selection combining (GSC), i.e., TAS/GSC is revised over independent identically distributed Nakagami-$m$ flat fading channels with pretty simple newly derived closed-form expressions of outage probability (OP), symbol error rate (SER), and ergodic capacity. While compares to their multinomial theorem-based counterparts for GSC and TAS/GSC, the intelligibility, practicality, and simplicity of our derivations are invaluable, which from now on facilitates TAS/GSC implementations in various fields. As an example, performance analysis of decode-and-forward multihop networks with TAS/GSC implementation in each hop is presented over independent non-identically distributed Nakagami-$m$ fading channels in this work, with the closed-form expressions for OP, SER, and ergodic capacity. Finally, all derived analytical expressions are validated via Monte-Carlo simulation technique.
Control science is a core representative of the third industrial revolution and is so important to modern civilization. Control systems are the main subject of control science and may involve many aspects of consideration, such as hardware consideration, software consideration, operation consideration, maintenance consideration, economy consideration, society consideration. However, besides all such aspects of consideration, one aspect that is most essential to the control system is methodology consideration in mathematical sense, knowledge on which is what we refer to as control theory. Besides its importance from the mathematical perspective, control theory is even more charming as it is deeply rooted in practical applications. Charms of control theory consist in both know-why and know-how and it is the fusion of control theory and practical applications that highlights such charms. Control theory for practical applications, especially when somewhat with so-called ``advanced'' flavour, involves several fundamental aspects. This article introduces the Handling Control System Optimality aspect of Advanced Control Theory for Practical Applications [1,2].
This paper proposes a recursive algorithm, rARX-DIPCA, for identifying errors-in-variables autoregressive models with exogenous input (EIV-ARX), for tracking time-varying SISO processes. Building on a recently developed recursive iterative PCA method, the proposed algorithm recursively updates model parameters and noise variances as new measurements arrive, without storing historical data beyond a specified lag window. The method enables real-time adaptation to sensor degradation, and changes in model coefficients. The algorithm simultaneously identifies process order, time delay, and noise variances while maintaining computational efficiency through online covariance updates. Simulation studies on benchmark systems demonstrate effective tracking performance and practical applicability.
Curved beams, that is, beams that are able to propagate on nonlinear trajectories, are often envisioned as ideal candidates for blockage avoidance in future wireless connectivity. Owing to this unique feature, they are considered as ideal beams for bending around and behind corners to reach users beyond the line-of-sight (LoS), thus offering unprecedented connectivity. In this work, we explain the various mechanisms of beam propagation beyond the LoS, and we demonstrate that beam bending behind corners results from an interplay between wavefront engineering and edge diffraction, with distinct characteristics that depend on the extent of blockage and the beam formation efficiency. We identify three distinct regimes of operation, namely the unblocked, the partially blocked, and the fully blocked regime, and we show that beam bending through wavefront engineering dominates in the unblocked and partially blocked regimes, while edge diffraction dominates in the fully blocked regime; as a result, curved beams cannot really bend behind the corner, unless there is some LoS between the user and the transmitter. Based on our findings, we compare curved beams with focused beams, and we demonstrate that they perform similarly in the partially blocked regime, while focused beams outperform curved beams in the unblocked and fully blocked regimes.
We address the problem of out-of-distribution (OOD) detection for target observations embedded in a subspace of the high dimensional data space. Using continuous normalizing flows (CNFs), we propose a Lagrangian sub-flow (LSF) framework designed to isolate and estimate the density for the relevant components in the representation and using the remaining components as context. Through experimentation with models for speech synthesis, we show that CNFs, similarly to other deep generative models (DGMs), are susceptible to the "likelihood paradox", where high likelihood is erroneously assigned to OOD samples. This is attributed to the inductive bias of DGMs that prioritize low-level structural details over high-level semantic coherence. To mitigate this phenomenon, we propose a number of geometric diagnostic signals based on the velocity field over the sub-flow trajectory. Based on these signals, we design metrics for the challenging task of zero-shot phoneme-level mispronunciation detection. Finally, we demonstrate the superiority of these metrics compared to likelihood-based methods on a real-world mispronunciation detection benchmark.
Control science is a core representative of the third industrial revolution and is so important to modern civilization. Control systems are the main subject of control science and may involve many aspects of consideration, such as hardware consideration, software consideration, operation consideration, maintenance consideration, economy consideration, society consideration. However, besides all such aspects of consideration, one aspect that is most essential to the control system is methodology consideration in mathematical sense, knowledge on which is what we refer to as control theory. Besides its importance from the mathematical perspective, control theory is even more charming as it is deeply rooted in practical applications. Charms of control theory consist in both know-why and know-how and it is the fusion of control theory and practical applications that highlights such charms. Control theory for practical applications, especially when somewhat with so-called ``advanced'' flavour, involves several fundamental aspects. This article introduces the State-Space Modelling and Analysis aspect of Advanced Control Theory for Practical Applications [1,2].
Narrowband Internet of Things (NB-IoT) over non-terrestrial networks (NTN) is a key enabler for massive Internet of Things (IoT) in 6G, but in low Earth orbit (LEO) scenarios, large and time-varying Doppler shifts generate carrier frequency offset (CFO) beyond the correction range of standard user equipment (UE), making initial downlink synchronization a major bottleneck. This paper analyzes Doppler characteristics in realistic NB-IoT LEO scenarios, reviews Doppler mitigation strategies, and proposes a standard-compliant, low-overhead search-space optimization method for downlink acquisition. Results under realistic LEO conditions with real-time measurements show reduced acquisition overhead while maintaining synchronization reliability, supporting NB-IoT adaptation to 6G NTN deployment.
Low-Earth orbit (LEO) satellite beam-hopping (BH) technology is emerging as a promising approach to meet the ever-increasing global connectivity demands, enabling agile, on-demand coverage. LEO satellite BH can address the spatio-temporal non-uniformity of ground user traffic by dynamically allocating capacity and optimizing network performance. Cooperative multi-satellite BH enables joint transmission and interference avoidance to improve received signal quality. This article provides a comprehensive paradigm of BH, detailing its key dimensions, strategies, and architectures. Through exploration of key challenges, including beam pattern design, on-demand scheduling, and interference management, this paper identifies the potential applications of BH, ranging from adaptive capacity allocation for hotspot areas, low-power Internet-of-Things (IoT), delay-sensitive services, to massive connectivity support. Furthermore, a system-level analysis is presented, including key metrics, models of inter-beam and inter-satellite interference, and cooperative joint transmission, and a case study is provided to demonstrate the performance benefits of BH with cooperative transmission. Several promising future research directions are discussed to guide the future development of LEO satellite BH networks.
The main focus of this talk is to present mathematical fundamentals, state-of-the-art, technical challenges and open problems in control of flight for atmospheric vehicles, such as aircraft and other aerial platforms. Reduced order modeling and flight simulation key features for control applications will be discussed. The emphasis is on the theoretical and engineering aspects of creating and transitioning to practice guidance and flight control systems with guarantees of closed-loop stability, robustness and performance.
We present a framework for planning trajectories that avoid obstacles and satisfy logical precedence constraints expressed with a fragment of signal temporal logic (STL). Our approach models environments containing obstacles, keys, and doors, where collecting a key unlocks its associated door and potentially opens shorter paths to a goal. Based on an exact convex partitioning of the free space that encodes connectivity among convex free space, key, and door regions, we construct an augmented graph of convex sets (GCS) whose layered structure exactly encodes the key-door precedence logic. A shortest path in the augmented GCS simultaneously selects an optimal key collection sequence and computes an optimal continuous trajectory, providing an exact solution up to a finite Bézier curve parameterization.
In modern radio networks, nodes frequently access multiple communication interfaces such as WiFi, cellular, LoRa, and Zigbee. Optimal utilization of such heterogeneous networks (HetNets) at link and network levels is essential for ensuring efficient and secure communication. Some applications require a high level of security, requiring the signal to be completely undetectable. Previous works have considered such covertness, but it often results in limited achievable rates. Physical layer analysis shows that friendly jamming can significantly improve covert data rates, motivating its incorporation into HetNets. Here, we analyze a scenario where a jammer assists communication in a HetNet in the presence of an adversary attempting signal detection. We first optimize the physical layer (PHY) for a single link and then incorporate those results into an optimal routing and link configuration approach that accounts for an adversary observing the aggregate signals from all links. Numerical results demonstrate significant performance gains when compared to alternative approaches. In fact, the rate observed for the proposed approach is high enough to question the optimality of the low rate design approach employed; we address this concern through revised algorithms and characterize their performance.
The growing penetration of grid-connected inverters renders Transient Stability Analysis (TSA) increasingly challenging in modern power systems. Existing TSA methodologies encounter an intrinsic trade-off between accuracy and scalability when dealing with these networked inverter-based resources (IBRs). To bridge this gap, this paper proposes a Lipschitz-enforced machine learning framework that leverages Lipschitz continuity to restructure the transient stability certification mechanism. By replacing computationally intensive verification procedures with a deterministic and efficient algebraic check, the proposed method enables rigorous stability guarantees for complex multi-inverter systems, effectively bypassing the complexity limits of traditional analytical approximations. Validated on networked Grid-Forming (GFM) inverter systems, the proposed framework accelerates the training process by over 5 times compared to existing methods. Notably, the proposed framework substantially outperforms traditional transient stability analysis approaches (e.g., Linear Matrix Inequality and Sum-of-Squares methods) by capturing up to 30\% larger Regions of Attraction (ROA), effectively shattering the conservativeness bottleneck that has long constrained traditional analytical tools. This advancement provides a scalable and theoretically rigorous solution for the TSA of networked IBRs in modern power grids.
Stacked intelligent metasurfaces (SIMs) are emerging as a promising architecture for the sixth generation (6G) and beyond of wireless systems, enabling richer electromagnetic-wave manipulation than conventional single-layer metasurfaces. However, realizing these gains requires accurate and scalable channel estimation under the strong inter-layer coupling and multilinear parameter interactions introduced by the stacked programmable metasurface layers. This paper proposes TenSIM, a tensor-based channel-estimation framework for SIM-assisted multiple-input multiple-output (MIMO) systems. By exploiting a structured SIM training protocol, TenSIM derives two parity-dependent observation models: a PARAllel FACtor (PARAFAC) model for odd-layer SIMs and a Tucker model for even-layer SIMs. These formulations decouple the transmitter-SIM and SIM-receiver channels while explicitly accounting for inter-layer wave coupling. Based on the resulting tensor models, we develop alternating least squares estimators, establish identifiability conditions using the associated design matrices, and characterize practical sufficient conditions for full-column-rank training designs, including those involving scaling ambiguities. The proposed framework is validated through extensive numerical experiments and reveals the main operating trade-offs. We show that both TenSIM-PARAFAC and TenSIM-Tucker outperform unstructured least squares baselines by exploiting the tensor structure of the SIM cascade. Moreover, TenSIM-PARAFAC offers better scalability, lower computational complexity, and stronger robustness to inter-layer spacing, while TenSIM-Tucker can provide more accurate channel reconstruction when sufficient training and strong layer coupling are available. Finally, it is shown that the proposed TenSIM framework remains effective under imperfect or blind SIM training when additional pilot diversity is available.
This article addresses recent advances in artificial intelligence, which have set off an astounding race among technology frontiers to build large data centers. It provides insights into impacts of large data centers on the planning and operation of the power grid.
Digital twin (DT) is envisioned as a key enabler of sixth-generation (6G) communication systems, evolving from offline descriptive replicas for monitoring and analysis to inthe-loop agents within digital twin networks (DTNs) that couple physical and digital worlds. Recent advances in integrated sensing and communication (ISAC)-driven electromagnetic (EM) scattering methods enable environment twinning by linking channel behaviors to EM properties of the scatterers, supporting interpretable DT states and EM-grounded optimization. However, existing studies primarily focus on DT construction and lack mechanisms for closed-loop control in wireless systems. Moreover, array-geometry mismatch can bias DT reconstruction and degrade control performance, while prior works assume known arrays. To address these gaps, we propose an EM-ISACbased closed-loop DTN framework with a hierarchical design integrating environment twinning, prior injection, and control decision into an end-to-end loop. Leveraging ISAC measurements, the proposed framework jointly reconstructs scatterer information and array-dependent forward operator and employs a low-complexity Bayesian message-passing algorithm to perform contrast inference and array calibration. The reconstructed DT guides codebook preselection to reduce training overhead and narrow candidate beams. Subsequently, downlink beamforming (BF) is performed based on DT-predicted channels, enabling latency-bounded closed-loop control. Simulation results demonstrate improved robustness and control performance under array mismatch.
Multicast transmission in millimeter-wave (mmWave) networks is fundamentally limited by the weakest user, and blockages further exacerbate this problem. Large-scale reconfigurable intelligent surfaces (XL-RIS) offer a promising solution by providing high array gain to overcome blockages. However, the large aperture of XL-RIS significantly expands the near-field region, creating a hybrid-field scenario where some users lie in the near-field while others remain in the far-field. Existing hybrid-field studies on XL-RIS have primarily focused on channel estimation and deployment optimization, leaving multicast capacity analysis unexplored. This paper investigates the fundamental capacity limits of XL-RIS-assisted multicast communications in hybrid-field scenarios. For the fundamental two-user case consisting of one near-field and one far-field user, we derive the optimal closed-form covariance matrix and optimize the RIS phase shifts via manifold optimization. We establish that the multicast capacity scales as $\Theta(\log_2(MN))$ as the number of transmit antennas M and/or RIS elements N grow large, and prove this scaling is order-tight. Numerical results validate the bounds and show the impact of M, $N$, and distance on the multicast rate.
Automatically distinguishing child-directed speech from adult-directed speech in long-form recordings is key to scalable analyses of children's language environments. Existing approaches process utterances in isolation and have been evaluated primarily on English. We address these gaps along three dimensions. First, we fine-tune and evaluate six-self supervised models on a multilingual dataset of 182 children, showing that in-domain pre-training on child-centered recordings substantially outperforms models trained on adult speech. Second, we demonstrate that incorporating surrounding context substantially improves classification, with an absolute gain of 13.8% in average F1-score. Third, we evaluate our model in a realistic end-to-end pipeline, from adult speech detection to addressee classification, showing that performance drops under automatic segmentation but still consistently outperforms a rule-based baseline.
Joint unicast and multi-group multicast transmission with RIS and RSMA is a promising enabler for 6G services. However, existing RSMA schemes for such scenarios split only unicast messages while leaving multicast messages intact, limiting the degree of freedom of interference management. To this end, we propose a joint rate splitting framework that splits both unicast and multicast information and two RSMA schemes. The common-common fusion (CCF-RSMA) scheme encodes the unicast common part into the global multicast common stream, while the private-common fusion (PCF-RSMA) scheme merges it with the group-specific multicast private part. For each scheme, we formulate energy efficiency (EE) maximization problems under both perfect and imperfect channel state information, and jointly optimize active beamforming, RIS phase shifts and rate allocation parameters. Simulation results demonstrate that the proposed schemes significantly outperform the comparative schemes in terms of EE, thereby proving the effectiveness of the proposed framework. Moreover, CCF-RSMA is more favorable in scenarios with larger groups and higher unicast QoS demands, whereas PCF-RSMA is better suited for scenarios with smaller groups and higher multicast QoS.
Matrix multiplication is a fundamental computational kernel underlying a wide range of real-world applications, including machine learning, scientific computing, signal processing, and computer graphics. Its performance directly impacts the efficiency, scalability, and energy consumption of modern computing systems. This paper presents a comparative analysis of several matrix multiplication algorithms implemented in software and examined in the context of their hardware execution characteristics. Naive, NumPy, Strassen, and Winograd algorithms are evaluated based on execution time, user time, and CPU time across increasing matrix sizes. The performance metrics reveal computational bottlenecks and highlight the benefits of algorithmic optimizations. Furthermore, the study investigates the mathematical operations underlying each algorithm and analyzes how matrix dimensions influence MAC (Multiply-Accumulate) behavior and overall computational efficiency in the hardware domain. The results provide a performance benchmark and contribute to understanding how algorithmic choices interact with modern computing architectures for applications in computer architecture, data science, and real-time embedded systems.
Neuromorphic computing relies on low-power, high-reliability hardware, yet the integrity of input/output pads (IOPADs) remains an underexplored factor affecting system performance. This chapter examines the role of IOPAD integrity in neuromorphic VLSI design and connects algorithmic development with practical hardware implementation. While much attention has been given to spiking neural networks (SNNs) and ultra-low-power core logic, the electrical and functional robustness of the I/O interface is equally critical for ensuring signal fidelity and minimizing energy consumption. We review the structure and function of IOPADs, outline their influence on power, performance, and reliability, and discuss design trade-offs involving pad libraries, pad ring architectures, and bonding strategies. The chapter also introduces the fundamentals of SNNs and summarizes the digital hardware design flow from behavioral description to physical layout. Physical implementation considerations are highlighted using the SkyWater 130 nm CMOS process as a practical platform for neuromorphic prototyping. Real-world examples illustrate how early-stage I/O planning can prevent redesign, reduce yield loss, and improve overall system efficiency. This work emphasizes that IOPAD integrity is a key enabler of scalable, energy-efficient neuromorphic systems.
Multi-modal sensing is an important enabler for future environment-aware wireless systems, since a single sensing modality is generally insufficient to provide accurate metric geometry, material awareness, and semantic interpretability in complex environments. This paper presents a measurement-based multi-modal THz sensing and vision framework for indoor environment reconstruction. A three-dimensional monostatic THz channel sounding system operating at 290-310 GHz is integrated with an omnidirectional fisheye camera to acquire radio-frequency and visual observations from a common sensing viewpoint. From the measured THz data, a signal processing pipeline extracts multipath components and infers geometryand material-consistent structural primitives through trajectory tracking-assisted parameter estimation, graph-based structure discovery, planar reconstruction, and reflection-loss analysis. In parallel, AI-based visual perception modules extract object-level semantic masks and depth priors from panoramic images. To associate these heterogeneous representations, an agentic-AI-based task-driven THz-agent module is developed to select appropriate integration tools according to the attributes of the modality-specific outputs. Through angular alignment and consistency analysis, THz-derived metric geometry and material information are associated with vision-derived semantic regions and depth priors, enabling geometry-consistent and semantically interpretable environment reconstruction directly from measurements. Experimental validation in the indoor L-shaped hallway demonstrates that the proposed framework reconstructs dominant structural elements with centimeter-level accuracy while identifying semantic categories and material attributes of representative indoor objects.
Sub-terahertz (Sub-THz) wireless communications and their potential applications continue to attract significant attention and foster debate on the usage of their unused frequency bands to relieve existing spectrum congestion. However, for these next-generation networks, experimental research is limited by the lack of flexible, real-time testbeds. This study presents a real-time, multi-radio-frequency(RF) channel, cascaded software-defined radio (SDR)-based Orthogonal Frequency-Division Multiplexing (OFDM) transmission platform achieving an aggregate sampling rate of 2 x 3.84 GSPS and approximately 1.1 GHz of instantaneous bandwidth, targeting sub-THz and THz SDR testbeds. The digital transmitter architecture, including the OFDM signal processing chain and instrumentation workflow, is described in detail. A comparative case study between a conventional sub-6 GHz implementation and a 180 GHz configuration is conducted, evaluating phase noise, spectral occupancy, received average and peak power. The direct impact of sub-THz bandpass filtering, necessitated by harmonic-mixer-based upconversion, is also experimentally analyzed. Measurement results show conversion and filtering losses of up to 24.1 dB, while the system exhibits stationary phase noise levels on the order of -60 dBc/Hz, demonstrating the feasibility and limitations of real-time wideband OFDM transmission at 180 GHz. Beyond immediate current capabilities, the platform builds a foundation for the scalable integration of multiple transmitters and receivers, which is essential for the implementation, conformance, and testing of emerging sub-THz communication systems.
Data-driven control of nonlinear systems with rigorous guarantees is a challenging control problem. Integral quadratic constraints (IQCs) provide a powerful framework for modeling nonlinearities. This paper presents a data-driven min-max model predictive control (MPC) synthesis method for unknown systems subject to (nonlinear) uncertainties using the IQC framework. The unknown system matrices are characterized by a set-membership representation using the input-state data and the knowledge of the IQCs. We derive two semidefinite programs (SDPs) that minimize an upper bound on the worst-case cost over all possible system dynamics and uncertainties. By iteratively solving these SDPs, the proposed state-feedback control law is obtained. We further prove that the resulting closed-loop system is exponentially stable and satisfies the input and state constraints. A numerical example demonstrates the validity of the proposed method.
This paper investigates near-field propagation in optical reconfigurable intelligent surface (ORIS)-assisted free-space optical (FSO) communication systems. Unlike conventional far-field scenarios, near-field propagation involves complex diffraction effects that hinder tractable closed-form analysis. To address this issue, a numerical framework for evaluating the optical field distribution of ORIS-assisted FSO links is proposed. Specifically, two numerical approaches are considered: direct Riemann-sum evaluation and a fast Fourier transform (FFT)-based method. Although the Riemann sum approach provides accurate field estimation, it incurs extremely high computational complexity due to the fine spatial discretization of the ORIS surface required at optical wavelengths. To improve computational efficiency, the optical-field calculation is reformulated as a convolution in the spatial-frequency domain, enabling efficient FFT-based propagation analysis. Simulation results demonstrate that the proposed FFT-based method achieves accuracy comparable to that of the Riemann-sum approach while significantly reducing computational complexity.
While Large Language Models (LLMs) offer a promising path toward intent-driven network management by translating natural language human intents into machine-readable configurations, they often suffer from hallucinations and structural inconsistencies in multi-step and complex tasks. To address these challenges, this paper proposes a retrieval-augmented and task decomposition-based multi-agent LLM framework for Beyond 5G network auto-configuration. The framework employs a semantic retrieval-augmented generation pipeline to ensure that its outputs are aligned with technical standards and vendor-specific manuals. Furthermore, it introduces a modular architecture for configuration generation, closed-loop configuration verification, and network deployment, in which complex tasks are decomposed into smaller sub-tasks handled by specialized agents. In this architecture, hallucinated configuration parameters are identified by the configuration verifier agent and corrected through low computational segment-level regeneration. The performance evaluation experiments with the OpenAirInterface emulator demonstrate that the proposed task decomposition-based configuration and verification approach improves the average success rate by 22.7% over monolithic methods, achieving 94.4% success in network configuration.
Accurate prediction of fruit sugar content is essential for quality control and market valuation in agriculture. Conventional measurement techniques rely on destructive, time-consuming processes (e.g., juicing and refractometry) or direct contact instruments, which hinder high-throughput operations. This paper introduces SweetFruit, a mobile two-stage system that leverages low-cost sensors to estimate fruit sugar content without contact. In Stage 1, we implement a lightweight 3D deep learning model (SF-PointNet) that uses point clouds from a Time-of-Flight (ToF) depth camera to classify fruit as high or low sugar. In Stage 2, a regression network (SF-Net) predicts the fruit's Brix value using measurements from a compact 18-channel near-infrared (NIR) spectrometer. The system uses simple off-the-shelf sensors (AS7265x NIR and Arducam ToF) with efficient processing pipelines for real-time execution on embedded platforms. Experiments on green 'Granny Smith' apples and strawberries demonstrate the system's effectiveness. Stage 1 achieves over 90% classification accuracy, enabling rapid prescreening, while Stage 2 delivers precise sugar estimates, with a root mean square error (RMSE) of 0.57 Brix, reducing error by 22% compared to using NIR sensing alone. SweetFruit offers a scalable, field-ready solution for rapid fruit quality screening, showcasing the benefits of task-specific multimodal sensing in mobile agricultural applications.
Recently, under the presumption of a noise-free input, the augmented complex least lncosh (ACLlncosh) method was introduced for a power system frequency estimate and showed robust performance when impulsive noise polluted the output signal. However, in practical terms, noise often contaminates input signals, which drastically reduces the efficiency of the ACLlncosh method. To enhance robustness against noisy input-output while maintaining resilience to impulsive noise in the output signal, this paper proposed an online censoring-based widely linear total least lncosh (OC-WL-TLlnC) method. This method improves performance under both balanced and unbalanced settings by filtering out less valuable data via online censoring, hence reducing the computing burden. Furthermore, a variable parameter approach is incorporated to accelerate convergence and improve steady-state accuracy, thereby ensuring adaptability to dynamic power system conditions. The proposed methods significantly enhance frequency estimate performance by addressing the constraints of current techniques and offering a computationally efficient, noise-resilient solution for real-time power system monitoring.
Real-scene indoor millimeter-wave simulation requires efficient modeling of radio frequency (RF)-computable geometry and electromagnetic material properties. To address the low efficiency of manual scene modeling, the limited RF adaptability of visually reconstructed meshes, and the lack of material binding in 28 GHz ray-tracing simulation, RFDT-Channel is developed as an RF digital twin scene construction workflow based on red-green-blue (RGB) images and light detection and ranging (LiDAR) point clouds. Indoor videos and point clouds are collected by a Jetson Orin platform with LiDAR and GMSL cameras. An initial triangular mesh is generated through COLMAP, 3D Gaussian Splatting, and SuGaR. The LiDAR point cloud then provides geometric and scale references for RF-oriented regularization in Blender, including alignment, wall solidification, door/window opening construction, and topology repair. OpenScene semantic segmentation maps major indoor structures to concrete, glass, wood, and metal materials, and Sionna RT performs 28 GHz ray tracing. Under a fixed transmitter-receiver deployment, the generated channel impulse response (CIR), channel frequency response (CFR), and Radio Map results show that material binding mainly changes weak reflection, transmission, and scattering paths, reducing the number of effective paths from about 742 to about 52 while keeping the dominant path amplitude nearly unchanged.
Next-generation Internet-of-Things (IoT) is evolving toward a ubiquitous, ultra-low-power, and multi-band heterogeneous networking paradigm that seamlessly integrates terrestrial, non-terrestrial, and ambient devices. This vision places unprecedented demands on conventional radio frequency (RF) receivers, whose fundamental bottlenecks in sensitivity, power consumption, coverage, and multi-band operation are rooted in the RF antenna. To tackle these issues, we show that the quantum properties of Rydberg atomic quantum receivers (RAQRs), including ultra-high sensitivity, broad frequency agility, and diverse reception modalities, provide a physically distinct receiver-side path that replaces the conventional antenna-and-low-noise-amplifier chain. Using LoRa, narrowband IoT, and ambient IoT as case studies, this article shows that RAQRs deliver significant gains in weak-uplink, low-power, and battery-free regimes. A stochastic-geometry analysis in cellular and cell-free architectures then maps these device-level gains onto network coverage, where the RAQR retains roughly a 4 dB half-coverage advantage over the RF receiver in sparse deployments at \(\lambda \sim 10^{-5}~{\mathrm m}^{-2}\), with the gain eroded as device density grows. The open challenges are presented to stand between current RAQR prototypes and deployable IoT infrastructure.
The transition to electric mobility calls for charging infrastructure that is both efficient and socially equitable. This paper examines fairness in electric vehicle (EV) charging station pricing and capacity through a game-theoretic perspective. We model a non-cooperative market in which competing charging service providers set prices and capacities while customers choose stations based on generalized cost, leading to a market equilibrium. We then benchmark this decentralized outcome against an idealized planner solution that jointly optimizes efficiency and equity. To align market outcomes with socially desirable goals, we design targeted incentives that guide operators toward more fair charger placement. Case studies demonstrate that unregulated competition tends to exacerbate disparities in charger access across demographic groups, whereas carefully calibrated incentives can reduce inequities without significant efficiency loss. The framework provides insights for policymakers on reconciling free-market dynamics with the broader societal goals of fairness in electrified mobility systems.
Accurate segmentation of fetal brain tissues in Magnetic Resonance Imaging (MRI) is critical for early diagnosis of congenital abnormalities and improving prenatal care. However, the task remains difficult because of fetal motion, low tissue contrast, and major anatomical variability throughout gestational ages, particularly in segmenting complex structures such as white matter, gray matter, lateral ventricles, deep gray matter, extra-cerebrospinal fluid, cerebellum, and brainstem. As a solution to these difficulties, this research introduces a novel deep learning model that combines a ResNet-34 encoder with a lightweight decoder leveraging multi-layer perceptron (MLP) modules for adaptive feature refinement. This design specifically enhances the model's ability to preserve anatomical boundaries and mitigate segmentation errors caused by motion artifacts and intensity inhomogeneities. Computational efficiency is achieved by reducing parameter count, employing bilinear upsampling instead of transposed convolutions, and optimizing the decoder for speed without sacrificing accuracy. Trained and validated on the FeTA 2021 dataset using 5-fold cross-validation, the proposed model outperforms baseline architectures such as UNet, UNet++, DeepLabV3, and DeepLabV3+, achieving an average Accuracy of 97.37% with a mean Dice Similarity Coefficient (DSC) of 90.33%, mean Intersection over Union (IoU) of 86.93%, and Precision of 90.83%. Additionally, its fast inference time and reduced computational load make it well-suited for integration into real-time clinical workflows.
The integration of Artificial Intelligence (AI) and emerging 6G networks introduces new opportunities for scalable coordination in tactical autonomous vehicle systems. This paper proposes a communication-centric hierarchical architecture for Tactical Autonomous Defense Vehicle Networks (TADVNs) that models the integration of edge-assisted Large Language Model (LLM) reasoning with 6G-enabled connectivity and semantic communication. The framework is designed to improve coordination efficiency, reduce communication overhead, and enhance latency resilience under increasing fleet-scale operation. Unlike conventional task-specific AI pipelines that rely on structured feature processing and rule-based coordination, the proposed approach incorporates semantic abstraction and context-aware decision support within a layered edge-cloud communication architecture. We evaluate communication and coordination performance via Monte Carlo simulations across fleet sizes of 5-30 vehicles under contested network conditions. Results indicate that at a 30-vehicle scale, the 6G-LLM configuration achieves 75.2% latency reduction (29.1 ms vs. 117.5 ms), a 68.7 percentage point increase in mission success rate (82.9% vs. 14.2%), and an 88.6% reduction in communication overhead compared to a 5G-based conventional AI baseline. These findings demonstrate measurable benefits in coordination and communication when semantic reasoning is combined with low-latency 6G connectivity.
This paper proposes a data-driven controller design method for unknown nonlinear systems based on a Koopman bilinear realization. Using Koopman operator theory, the nonlinear system can be represented as a bilinear discrete-time system with a residual error term. The residual error is proportionally bounded by the norm of the lifted state and input, while the system matrices of the bilinear model are unknown. Assuming that bounds on the residual error are available, the unknown system matrices are characterized via a set-membership representation using the collected input-state data pairs of the nonlinear system. A data-driven controller design method is proposed to ensure stability for all bilinear systems within this set-membership description and for all admissible residual errors. More specifically, we design a rational state-feedback controller that stabilizes the bilinear model with residual error and, consequently, the original nonlinear system, by solving a sum-of-squares (SOS) program. The effectiveness of the proposed approach is demonstrated through numerical examples.
Unmanned aerial vehicles (UAVs) are emerging as a key enabler of next-generation wireless networks, particularly for applications that require ultra-reliable and low-latency communication (URLLC), such as emergency response, industrial automation, and autonomous systems. In these scenarios, maintaining reliable connectivity under strict transmission time constraints is challenging due to dynamic environments, mobility, and limited onboard energy. In particular, communication performance and energy are closely coupled with UAV movement, making trajectory design a critical component of system operation. Most existing approaches rely on offline joint communication and trajectory optimization, where the UAV trajectory and communication parameters are optimized prior to execution based on assumed system information. Although effective under ideal assumptions, such designs cannot adapt to real-time variations in user demand, channel conditions, or environmental disturbances, which are particularly critical in URLLC settings. To address these challenges, this article investigates model predictive control (MPC) as an adaptive framework for UAV-enabled communications. Using a receding-horizon strategy, MPC enables the UAV to continuously update its trajectory based on real-time information, improving reliability and robustness in dynamic environments. Representative application scenarios are discussed to highlight the role of MPC in UAV-enabled URLLC systems. Furthermore, a case study is presented to illustrate key design trade-offs and performance insights under finite blocklength-based URLLC transmission, followed by a discussion on open challenges and future research directions for practical and scalable MPC-enabled UAV communication systems.
With the increasing demand for edge computing and AI-driven workloads, integrating small and medium-sized edge data centers into distribution networks has become increasingly important. This paper investigates the hosting capacity of distribution networks for data center integration and identifies the key physical mechanisms that limit the maximum allowable data center load. The baseline analysis shows that data center hosting capacity varies significantly across candidate buses due to network topology and electrical distance. Three dominant limiting mechanisms are identified: current-constrained locations, voltage-constrained locations, and mixed-constrained locations where both current loading and voltage deviation jointly affect hosting capacity. To increase the hosting capacity, this study evaluates multiple flexible resources, including battery energy storage systems (BESS), dispatchable distributed generators (DDG), and static synchronous compensators (STATCOM). Numerical results demonstrate that these resources provide complementary benefits through active power support, sustained local generation, and reactive power compensation, effectively expanding data center hosting capacity in distribution systems.
Radio-over-fiber centralizes radio access networks by using a low-loss optical fiber link between the remote radio head and the central unit. Analog radio-over-fiber (A-RoF) transmits RF signals modulated directly onto an optical carrier, avoiding digitization and digital signal processing at the remote radio head. In this way, A-RoF shifts power-hungry processing from the antenna to the baseband unit. This paper outlines a mathematical framework to analyze the effect of fiber nonlinearity in an uplink wireless system supported by A-RoF. We model an input/output relationship that incorporates the wireless channel, thermal noise, and impairments encountered in the optical fiber channel: chromatic dispersion, electrical-to-optical conversion loss, amplification noise, and fiber nonlinear interference. We compare A-RoF with DSP-assisted A-RoF and digital radio receivers. Our results show that A-RoF achieves higher energy efficiency as compared to digital receivers with 8- and 16-bit analog-to-digital converters and DSP-assisted A-RoF. We further characterize the trade-offs among transmit power, nonlinear interference, and spectral efficiency, demonstrating that nonlinear effects fundamentally limit achievable rates. These results identify the linear operating regions where A-RoF is most effective for uplink wireless communication.
Radio frequency spectrum awareness requires the ability to detect, localize, and characterize emitters in dense and contested wireless environments. In this work, we propose a task-oriented distributed compression framework for joint multi-emitter localization and characterization using spatially distributed receivers. Each receiver observes a short window of complex IQ samples, converts the observation to a time--frequency representation, and encodes it into a compact latent vector. A central fusion decoder combines the receiver latents to estimate an unordered set of active emitters, including their locations, center-frequency offsets, occupied bandwidths, and waveform families. A permutation-invariant training objective is used to handle the arbitrary ordering of emitters and predictions. Experiments on synthetic multi-emitter scenes with spectral overlap show that even extremely compact receiver-side representations can preserve useful information for emitter counting and waveform-family estimation. However, accurate localization and spectral-parameter regression require larger latent dimensions. Increasing the receiver latent dimension from $d_{\mathrm{rx}}=1$ to $d_{\mathrm{rx}}=16$ provides the largest improvement, while further increasing to $d_{\mathrm{rx}}=64$ gives smaller gains. These results demonstrate the potential of learned task-oriented compression for communication-efficient distributed spectrum awareness.
The next generation of 6G networks aims to utilize ultra-wideband spectrum and massive antenna arrays to serve multiple users with both control and data channels at low latency and high efficiency. However, phased arrays at mmWave and mid-bands are fundamentally constrained to a single beam or suffer sharp beamforming loss when split across directions, limiting simultaneous control-data support. In FlexLink, we introduce and prototype a novel delay-phased array architecture that overcomes this limitation by redistributing energy jointly across frequency and space, enabling multiple narrow beams without sacrificing per-beam gain or requiring additional power. We design and prototype FlexLink on a custom 4-7 GHz hardware testbed, demonstrating for the first time that control and data beams can be decoupled in practice, achieving nearly double spectral efficiency compared to conventional phased arrays.
Adversarial feature extraction and blocking jamming threaten tactical CPM links. This paper presents a unitary spreading-based Transmission Security (TRANSEC) enhancement to obscure physical-layer signatures and improve anti-jamming (AJ) resilience. The enhancement can be used to augment existing techniques. The enhancement preserves the constant-envelope (0 dB PAPR) nature of CPM, ensuring compatibility with high-efficiency tactical amplifiers. Analysis of symbol distributions, spectra, and cyclostationary features demonstrates that the technique masks inherent signatures, preventing modulation classification. We leverage convex optimization to recover symbols under blocking jamming, reducing uncoded BER from 6.25% to 0.04%. Finally, we characterize the engineering trade-offs between security, bandwidth, and BER.
Aggregated battery energy storage systems (BESS) enable large fleets of heterogeneous battery elements to participate in system-level optimization and electricity markets. Scheduling each element independently is computationally impractical at scale. While many aggregate battery models rely on convex relaxations, they often ignore element complementarity constraints, leading to dispatch solutions that may be infeasible when implemented on individual battery elements. This paper develops a realizable composite battery model for parameter-heterogeneous BESS fleets that guarantees feasibility at the element-level while preserving computational tractability. We derive simple linear conditions under which aggregate charging and discharging trajectories can be safely disaggregated while respecting individual power limits, energy limits, and complementarity constraints under a priority-based controller. Numerical experiments in a unit-commitment setting demonstrate that the proposed realizable composite battery formulation produces feasible dispatch solutions. Solve times are effectively independent of system size, unlike micro-model mixed-integer formulations. Solutions obtained from the proposed formulation converge to the optimal benchmark as control granularity is refined. Additional studies illustrate the robustness of the framework to moderate violations of key modeling assumptions, including heterogeneous power-to-energy ratios.
As beyond-diagonal reconfigurable intelligent surfaces (BD-RISs) gain increasing attention in high-frequency wireless communications, accurate and scalable channel-estimation methods become essential. This paper develops a parametric channel-estimation and beamforming framework that deconstructs the composite BD-RIS channel into its generating directional factors, revealing the tensor structure induced jointly by propagation geometry and beyond-diagonal scattering. We propose two tensor-based estimators: Fourth-Order Tucker Channel Estimation (FORTE), which models the partially structured channel as a fourth-order Tucker tensor, and Fourth-Order PARAFAC Channel Estimation (FORPE), which captures the fully structured channel through a fourth-order PARAFAC model. By exploiting partial and full channel geometry, the proposed methods achieve higher estimation accuracy than Least Squares and Block Tucker Kronecker Factorization benchmarks. In particular, FORTE outperforms FORPE due to its more compact representation, attaining an NMSE of about 10^{-4} at 5 dB SNR. In contrast, FORPE provides essentially unique estimates of the composite-channel factor matrices, whereas FORTE identifies their subspaces. The proposed deconstruction also provides a structured representation useful for sensing-oriented parameter extraction and tensor-structured system optimization. Finally, the Tensor Optimization Framework for Beamforming, Combining, and Scattering (TenFormer) achieves spectral efficiency comparable to the benchmark design while significantly reducing computational complexity through parallel tensor-structured optimization.
Predicting patient-specific facial soft-tissue deformation is critical for iterative orthognathic surgery planning. However, current computational methods face a strict accuracy-efficiency trade-off: high-fidelity Finite Element Methods (FEM) are computationally prohibitive, whereas pure deep learning models often produce biomechanically inconsistent results. While Physics-Informed Neural Networks (PINNs) offer a promising avenue, learning the complex heterogeneous mechanics of bone--soft-tissue interactions with only partial clinical supervision (i.e., outer facial surfaces) remains highly unstable. To overcome these challenges, we present PINNOCHIO, a novel physics-informed framework for facial soft-tissue simulation. PINNOCHIO introduces a hybrid sequential decomposition that explicitly decouples discontinuous bone--soft-tissue interface movements from continuous volumetric hyperelastic deformation. This structural separation enables stable training and facilitates a physics-enabled sim-to-real adaptation strategy, ensuring internal biomechanical consistency without requiring volumetric ground truth. Evaluated on a 40-patient clinical cohort, PINNOCHIO outperforms existing baselines in both surface accuracy and physical validity. Furthermore, it achieves a substantial speedup over FEM, successfully resolving the accuracy-efficiency trade-off to provide a highly reliable and practical tool for interactive surgical planning.
Terahertz (THz) communication and extremely large-scale MIMO (XL-MIMO) are essential for achieving ultra-high data rates in future 6G systems. However, at sub-millimeter wavelengths, typical indoor materials exhibit significant roughness that invalidates conventional ideal smooth surface assumptions, while massive array apertures introduce pronounced near-field effects and spatial non-stationarity. To address these challenges, this paper proposes a hybrid near-field channel model utilizing surface scattering characteristics based on distinct measurement campaigns. First, based on typical indoor materials scattering measurements across the 260-400 GHz band, an improved Beckmann-Kirchhoff (B-K) model is developed to accurately characterize surface roughness and diffuse scattering behavior. The model independently analyzes single-bounce (SB) and multi-bounce (MB) clusters by applying deterministic rough surface scattering theory and geometry-statistical approach, respectively. Then, using near-field spatial non-stationarity measurements from a 630-element virtual array in the 330-360 GHz band, a Dual-Gaussian Mixture Model (DMM) and a Negative Binomial (NB) distribution are adopted to describe the lengths and the number of spatial visibility regions (VRs), respectively. Additionally, a Weibull distribution is employed to model the intra-region power fluctuations. Finally, comprehensive XL-MIMO channel evaluations within the same band demonstrate that the proposed model aligns closely with measured results in terms of the spatial cross-correlation function (SCCF), frequency cross-correlation function (FCF), and channel capacity. By reproducing the spatial sparsity of THz band, the proposed model overcomes the limitation of conventional standard models, such as 3GPP 38.901 and WINNER II, in significantly overestimating channel capacity.
This paper presents an overview of DCASE 2026 Challenge Task 2, titled "Noise-aware unsupervised anomalous sound detection (UASD) for machine condition monitoring." The task aims to advance noise-robust anomalous sound detection for machine condition monitoring under the unsupervised setting, where only normal machine sounds are available for training. Reliable detection under noisy conditions is crucial for practical deployment, but previous DCASE Task 2 settings provided limited information about environmental noise, potentially limiting UASD performance in highly noisy situations. To address this limitation, DCASE 2026 allows participants to exploit two-channel audio samples simultaneously captured at locations near and far from the target machine. Since the distant microphone is expected to contain relatively stronger environmental noise and weaker direct machine sounds, it may help distinguish environmental noise components from the target machine sounds. After the challenge submission deadline, challenge results and an analysis of the submitted systems will be added.
This work presents Energy-based Profile Encoding, EPEN, a joint reconstruction framework for high-resolution diffusion-weighted MRI from undersampled 3D multi-slab k-space acquisitions, designed to suppress slab-boundary artifacts while preserving fine anatomical detail. EPEN formulates the multi-slab acquisition process using a bilinear forward model in which both the diffusion-weighted image volume and slab excitation profiles are treated as unknown variables. Reconstruction is posed as a maximum a posteriori optimization problem with three components: a Gaussian data-fidelity term enforcing consistency with the acquired k-space measurements, a CNN-based deep energy prior that represents the negative log distribution of clean diffusion-weighted images, and a quadratic regularization term that constrains the estimated slab profiles toward an initial profile estimate. The gradient of the learned energy prior guides accelerated reconstruction toward an artifact-free image distribution. The resulting nonconvex objective is solved using alternating minimization, with image-volume updates performed through a majorize-minimize scheme using conjugate-gradient optimization and slab-profile updates estimated by regularized least squares. Across multiple acceleration factors and slab configurations, EPEN substantially reduced slab-boundary artifacts compared with conventional slab-boundary correction methods, while improving structural consistency and preserving diffusion-weighted contrast. These results demonstrate that EPEN enables robust joint 3D multi-slab diffusion MRI reconstruction and slab-profile correction within a unified optimization framework supported by deep energy-based image priors.
Deep learning has advanced pathological voice detection rapidly, yet rare laryngeal diseases remain underexplored due to data scarcity. Recurrent Respiratory Papillomatosis (RRP) exemplifies this gap: an HPV-induced disease of the larynx in which patients oscillate between recurrence and post-surgical remission over the years. RRP demands continuous voice monitoring that existing cross-sectional corpora cannot support. We introduce the first longitudinal voice dataset for RRP, comprising recordings from 26 patients with up to ten years of follow-up. Each session pairs sustained vowels with sentence-level utterances, which are annotated by otolaryngologists and confirmed synchronously with laryngoscopy. Building on this resource, we establish a systematic benchmark spanning handcrafted features, end-to-end deep networks, self-supervised pretrained models, and recent audio large language models, all evaluated under session-level cross-validation with patient-level audit. Per-subject longitudinal analyses further confirm that the cross-sectional discriminative signal reflects laryngoscopic disease state rather than stable speaker attributes. This work lays a foundation for rare longitudinal pathological voice tasks in low-resource clinical settings.
Recently, partial differential equations (PDEs) have been used to directly model the measurement process in signal processing, although their evaluation is costly. In this paper, we propose a novel alternating direction method of multipliers (ADMM)-based algorithm called physics-aware linearized ADMM (PA-LADMM) for inverse problems from PDE-based measurement processes. The key idea is the linearization of the subproblem with PDEs, leading to a cost-efficient update rule that calls only a PDE solver and its gradient evaluation per iteration. The algorithm has a theoretical convergence guarantee under certain conditions. In addition, we combine it with deep unfolding to unroll the PA-LADMM and train its internal parameters using supervised data. Two distinct experiments, compressed sensing with optical fiber communication and image restoration from noisy anisotropic diffusion, demonstrated the effectiveness of the proposed algorithms.
Kinship verification (KV) from voice, the task of determining whether two speakers are biologically related, has received only little attention. Our work establishes a foundational basis for this emerging frontier, contributing to both performance evaluation and detection methodologies. First, leveraging the speech recordings of the large-scale audio-visual dataset, KAN-AV, we propose a revised evaluation protocol that controls for various confounders and adopts a family-disjoint train--test split to address open-set KV. Second, we analyze the close connection between speaker verification and KV, showing that genealogical similarity of speaker pairs plays opposite roles in the two tasks. Third, we tackle KV using three neural speaker embedding extractors (ECAPA-TDNN, WavLM-ECAPA, and ReDimNet) combined with various back-ends. In zero-shot KV including same-speaker target trials, ReDimNet achieves the lowest equal error rate (EER) of $20.8\%$; however, performance degrades to $39.7\%$ under strict kin trials, where same-speaker target trials are excluded. Our best trainable back-end, which applies asymmetric processing of the embedding pair to mitigate age-difference effects, obtains an EER of $32.0\%$ ($18.6\%$ with speaker target trials included). These results highlight the difficulty of KV while showing that speaker embeddings encode familial cues, offering a promising foundation for voice-based kinship analysis.
Cell-free massive multiple-input multiple-output (CF-mMIMO) systems provide enhanced coverage and capacity for next-generation wireless networks. However, CF-mMIMO systems face significant challenges in downlink power allocation (PA) due to imperfect channel state information (CSI), severe multi-user interference (MUI), and high computational complexity. To address these issues, rate-splitting multiple access (RSMA) is adopted as a robust interference management strategy. Accordingly, this paper proposes an unsupervised and scalable graph neural network (GNN) framework for PA in rate-splitting CF-mMIMO (RS-CF-mMIMO) systems, relying exclusively on large-scale fading (LSF) coefficients without instantaneous CSI. To resolve the dimensionality mismatch in dynamic networks, we introduce a slice-based adaptive layer that projects variable-dimension features into a fixed latent space. This mechanism enables a unified model to generalize across diverse topologies without retraining. Within this architecture, the sum spectral efficiency (SE) is maximized under per-AP power constraints, assuming maximum-ratio precoding for common streams and regularized zero-forcing precoding for private streams. We also derive a weighted minimum mean-square error-alternating direction method of multipliers (WMMSE-ADMM) algorithm as a performance upper bound. Extensive simulations verify that the proposed GNN framework achieves near-optimal SE and outperforms unsupervised deep neural networks (DNNs) across diverse system sizes and pilot assignment schemes. Furthermore, the scalable variant maintains robust performance while reducing the trainable parameter count by over 57% relative to DNNs and decreasing inference latency by up to three orders of magnitude compared with WMMSE-ADMM.
Direct-to-cell (DTC) satellite communication is regarded as one of the most recent technologies that provides global connectivity. However, with the growing number of wireless users and devices, the design of DTC communications must satisfy the requirements of high-scale capabilities and efficient spectrum utilization. To this end, integrating satellite communications with advanced multiple-access techniques, such as non-orthogonal multiple access (NOMA), has attracted considerable interest in developing NOMA-DTC communications. In this article, we first introduce the fundamentals of NOMA-DTC communications, including architectural fundamentals, system design aspects, and potential applications. Given the various cooperative modes and the still-evolving satellite network (SatNet) architectures, such as cooperative SatNets and multi-tier SatNets, we explore protocols that suit future SatNets and enhance system performance. Furthermore, a case study is conducted to investigate the benefits of NOMA schemes for DTC communications and to compare them with OMA schemes. Finally, to inspire further research, several opportunities for NOMA-DTC communications are presented.
With the scaling of sensor channel counts, systems confront challenges in frontend data sensing and on-implant data processing. This work presents a 32-channel fully event-based iBMI SoC in 65nm CMOS for an efficient neuromorphic signal processing pipeline. The SoC integrates a 32-channel dual-threshold delta modulation (DTDM) frontend array that provides up to 26x data compression at the frontend, an in-memory computing (IMC) spike detector (SPD) for efficient in-pixel spike detection, and a bipolar LIF-based spiking neural network (Bi-SNN) decoder for on-chip motor intention decoding (MID). Consuming only 3.53 {\mu}W per channel and achieving ~0.62 decoding R2 with a compact 0.034 mm2 per-channel area, the chip enables high-efficiency signal recording, processing, and decoding for implantable devices.
Although accelerated MRI reconstruction has advanced rapidly through end-to-end learning, deploying a single unified network that generalizes across diverse anatomies and contrasts under constrained computational resources remains challenging. In this paper, we introduce MoRE, a sparsely activated mixture-of-experts (MoE) module integrated into an end-to-end variational network. MoRE couples a shared encoder with sample-wise, unsupervised routing to activate a minimal subset of expert decoders while strictly preserving physics-based data consistency. Evaluated on the fastMRI multi-coil brain and knee datasets under 8x undersampling, MoRE achieves highly stable SSIM and PSNR performance across multi-contrast datasets. Furthermore, t-SNE visualization of the routing embeddings reveals interpretable, modality-aware expert specialization. The sparse conditional computation mechanism ensures that the architectural overhead remains modest. These results demonstrate that MoE-style capacity scaling can significantly enhance general-purpose MRI reconstruction without requiring proportional increases in computational power.
Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce \textbf{SpeechEditBench}, a bilingual multi-attribute benchmark for instruction-guided speech editing. SpeechEditBench encompasses seven atomic editing tasks, as well as compositional editing tasks that integrate multiple operations within a single instruction. We propose an anchor-based evaluation protocol that separately assesses the edit success of target attributes and the preservation of untargeted attributes, leading to three metrics: target success, preservation success, and joint success. Using this benchmark, we evaluate mainstream Speech LLMs and specialized speech editing systems. The results reveal three key findings: (1) no single model performs well across all editing dimensions; (2) closed-source Speech LLMs generally outperform open-source models; (3) compositional editing remains highly challenging, with even the most advanced models struggling to achieve high joint success. SpeechEditBench provides a rigorous diagnostic framework to identify bottlenecks in Speech LLMs, thereby facilitating the development of next-generation Speech LLMs with more robust and precise instruction-guided editing capabilities. Data and code will be released upon acceptance.
Face detection with visible-spectrum cameras can capture facial features, but it often fails to distinguish live subjects from spoof sources such as photographs, masks, or statues. Previous approaches based on texture, motion, or physiological cues are sensitive to illumination changes and show limited robustness against spoofing attacks. Thermal imaging helps overcome these limitations by detecting heat emissions, naturally excluding spoof faces. This study proposes a hybrid approach that fuses the edge information of RGB images with corresponding thermal images using a custom ARISTOF dataset containing live and spoof faces. The fused images are first evaluated using the YOLOv8-Face model to compare face detection performance across RGB, thermal, and fused modalities. The results show that the proposed method enhances the face detection accuracy of thermal images. The fused images are subsequently used to train a YOLOv8-Face model for live and spoof classification, demonstrating that the proposed multimodal fusion effectively supports robust face liveness detection.
Fluid antenna systems (FAS) represent a paradigm shift in which antenna elements (ports) emulate the illusion of motion or fluidity within a spatial aperture to optimize performance. One of FAS's key use cases is the provision of open-loop fluid antenna multiple access (FAMA), enabling multiplexing gains through spatial interference nulling without requiring channel state information (CSI) at the transmitter side. However, this comes at the price of requiring a precise channel reconstruction at the receiver to successfully identify the optimal port. Current research efforts map this sensing task to a legacy MIMO-style estimation problem focused on minimizing global reconstruction errors such as normalized mean-squared error (NMSE). In this work, we argue that because FAS is inherently selection-based, NMSE-like approaches often lead to excessive training overhead and reduced net throughput. We revisit the problem of channel estimation and reconstruction in FAS, challenging some prevalent myths related to (i) the adequacy of global error metrics; (ii) the convenience of reconstructing channels or aggregate interference; (iii) the need for spatial oversampling; and (iv) the impact of port selection accuracy. We also identify four critical questions that must be answered for successfully enabling FAMA deployments: (i) the definition of a selection-optimal sampling law; (ii) the identification of proper reconstruction methodologies; (iii) the inherent trade-offs between multi-port sensing and selection gain; and (iv) the challenges introduced when moving towards electronically reconfigurable FAS.
Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.
Joint communication and sensing (JCAS) typically rely on coherent downconversion to recover the phase relationships required for array processing. Meanwhile, Local Oscillators (LOs) are a major source of cost, power consumption, and implementation complexity in millimeter-wave (mmWave) and sub-THz receivers. Existing LO-free receiver designs are typically based on envelope detection or related non-coherent operations that do not preserve inter-branch phase information, which limits their applicability to JCAS. This work proposes an LO-free JCAS receiver architecture that leverages pairwise inter-branch correlation processing to suppress the common carrier component and to synthesize relative-phase observables across the antenna array, enabling both data communication and Direction-of-Arrival (DoA) estimation. The transmitted symbols are designed to induce distinct phase-difference patterns, such that the resulting correlation phases contain both a data-dependent component and a DoA-dependent component. We formulate recovery as inference over a correlation graph, where branches are nodes and pairwise correlations are edges, and show that the resulting cycle-consistent redundancy enables robust relative-phase recovery under noise and perturbations. We further derive a topology-aware Cramér-Rao lower bound for DoA estimation under a locally unwrapped approximation. Numerical results confirm that increasing graph connectivity improves both bit-error rate and DoA accuracy, with sensing performance approaching the derived bound.
Objective: laryngectomees depend on an electromechanical device to generate electrolaryngeal (EL) speech. Compared with normal speech, EL speech suffers from severe distortion, limited phonetic variation, unnatural prosody, and temporal shifts, degrading naturalness and intelligibility. Although sequence-to-sequence (seq2seq) voice conversion (VC) based EL-speech-to-normal-speech conversion (EL2SP) is promising, substantial mismatches between EL and normal speech inevitably cause cumulative mapping errors that limit performance. To address this, we describe a novel representation learning framework integrating speech and text representations to improve mapping and reconstruction quality within a seq2seq VC model. Methods: our methodology comprises two main stages: 1) representation integration and learning, and 2) reconstruction training. A network capable of incorporating auxiliary text information is first constructed with pretrained modules to learn speech--text-based integrated representations. Then, an autoencoder-style reconstruction strategy finalizes EL2SP model to inherit these representations without increasing model complexity. We introduce three fusion strategies including middle-, input-, and hybrid-level fusion strategies that progressively enhance learning. Moreover, besides standard seq2seq VC objectives, an additional reconstruction loss on the integrated representation is introduced to refine representation transfer. Results: experiments under different EL2SP datasets consistently demonstrate that our methods, combined with data augmentations, outperform baselines relying solely on speech representations. Furthermore, progressive improvements with system design depth validate the effectiveness of our methods. Significance: the proposed methods provide an extensible and practical methodology for EL speech enhancement and assistive communication technologies.
This paper investigates the secrecy sum rate (SSR) of rate-splitting multiple access (RSMA)-based visible light communication (VLC) systems considering internal eavesdropping, where legitimate users may intercept private data intended for others. We formulate an optimization problem to maximize the SSR of the system, which is inherently non-convex due to the complex coupling of the objective function and constraints. To this end, two different approaches based on the convex-concave procedure (CCCP) and semidefinite relaxation (SDR) are leveraged to solve the non-convex parameterized problem. A central focus of this work is the investigation of channel similarity (CS), which serves as a metric for quantifying spatial correlation, and its impact on SSR performance. To mitigate the performance degradation caused by high spatial correlation, we propose a channel similarity reduction (CSR) clustering strategy that proactively minimizes CS to restore the system's degrees of freedom (DoF). Numerical results are provided to demonstrate the performance of the two proposed algorithms under various levels of CS. More importantly, the findings reveal that our proposed CSR-clustering strategy significantly outperforms existing baselines, effectively overcoming the secrecy performance ceiling caused by high spatial correlation.
Actuator saturation is a fundamental nonlinearity that significantly degrades the performance of PID-controlled systems by inducing integrator windup, leading to overshoot, slow recovery, and even instability. Although numerous anti-windup strategies have been proposed, their practical tuning remains largely heuristic and suboptimal in many industrial scenarios. This paper presents a comprehensive comparative study of classical and advanced anti-windup techniques for PI-controlled first-order-plus-dead-time (FOPDT) processes under a wide range of operating conditions. The analysis includes dynamic and instantaneous back-calculation, conditional integration, and adapted schemes. In addition, a novel hybrid anti-windup strategy is proposed, combining conditional integration with dynamic back-calculation to improve responsiveness during saturation, whilst preserving smooth recovery dynamics. Moreover, a key contribution of this work is the development of systematic tuning rules for the tracking time constant in back-calculation schemes, specifically optimised for load-disturbance rejection. These rules are derived from an extensive optimisation study that considers the saturation ratio, controller aggressiveness, and disturbance characteristics. The resulting guidelines provide simple yet effective formulas that achieve near-optimal performance without requiring complex computations. Simulation results demonstrate that the proposed methods significantly outperform commonly used heuristic rules, particularly in disturbance rejection scenarios, and provide clear, practical recommendations for selecting and tuning anti-windup strategies in industrial applications.
The separation of multicomponent signals with crossing instantaneous frequency (IF) curves remains a fundamental challenge in time-frequency analysis. Although the synchrosqueezed wavelet-chirplet transform (SWCT) enhances time-frequency readability by introducing a chirprate variable, its effectiveness is constrained by the underlying assumption of local linear chirp. Consequently, this method does not perform well when analyzing signals characterized by strong frequency modulation. This paper extends the SWCT framework by relaxing the linear chirp assumption. We model signal components as having polynomial phase behavior over short intervals and derive compact expressions for high-order IF and chirprate reassignment operators. The proposed high-order synchrosqueezed wavelet-chirplet transform (HSWCT) enables accurate estimation of both IF and chirprate, and supports robust mode retrieval even with intersecting IF curves. Another key contribution is a rigorous mathematical analysis of the approximation errors of arbitrary-order reassignment operators for IF and chirprate estimation. When the chirprate vanishes, HSWCT simplifies to the traditional high-order synchrosqueezed wavelet transform. To our best knowledge, no theoretical analysis exists in the literature on the approximation of arbitrary-order SST IF reassignment operators to the IF. As a by-product of this work, our established theorem provides such an analysis, thereby filling a gap in the theoretical framework of high-order SSTs.
The evolution from 5G to 5G-Advanced and the vision of 6G demand unprecedented levels of network performance, in which meeting stringent network Key Performance Indicators (KPIs), including capacity, latency, coverage, and reliability, is critical to supporting emerging applications such as autonomous driving, industrial automation, and immersive communications. Traditional reactive network management is insufficient in this context, driving the need for predictive, data-driven approaches. Machine Learning (ML) has emerged as a key enabler, enabling the forecasting of KPI trends from diverse data sources and thereby enabling proactive, AI-native automation in mobile networks. This survey provides the first comprehensive and systematic review of data-driven KPI prediction methods for future 6G networks. We introduce a multi-dimensional taxonomy that classifies prediction approaches by KPI type, data source, the network protocol stack at which the KPI is predicted, prediction horizon, model family, and prediction objective. Using this taxonomy, we analyze the state of the art across various KPIs, highlighting representative methods ranging from classical statistical models to deep learning and reinforcement learning. We further discuss enabling system aspects, including data collection and learning architectures, and examine deployment challenges, including data availability, scalability, privacy, and sustainability. Finally, we outline open research directions spanning new KPI definitions, probabilistic and explainable predictions. This survey aims to provide researchers and practitioners with a structured understanding of the KPI prediction landscape and a roadmap toward predictive network automation in future 6G systems.
The Automatic Generation Control (AGC) system, reliant on real-time measurements over communication networks, is susceptible to stealthy false data injection attacks (FDIAs), risking equipment damage and economic losses. We propose a robust FDIA detection method using maximum likelihood estimation (MLE) of a drifted multivariate Ornstein-Uhlenbeck (OU) process. Independent of load observability, in various cyberattack scenarios, the proposed FDIA detection method delivers accurate and rapid detection of sophisticated FDIAs, outperforming traditional unknown input observer (UIO) methods, which miss detections, and Long Short-Term Memory Autoencoder (LSTM-AE) approaches, which suffer from prolonged detection times.
The rise of low-altitude economies and 6G is driving the evolution of low-altitude networks (LANs), making communication security a pressing concern. Unlike traditional security approaches, covert communication offers enhanced protection by hiding the transmission behavior itself. Integrated sensing and communication (ISAC), a key technology of 6G, efficiently supports both sensing and communication tasks through hardware integration, thereby promising significant gains for covert communication. Nevertheless, the complexity and dynamics of urban environments pose critical challenges. Drawing on the latest advances in smart radio environment (SRE) technologies, this paper introduces them into integrated sensing and covert communication (ISACC) to suppress covert channel fading and counteract sensing precision loss in LANs. We first survey the applications and state-of-the-art findings of ISACC in LANs, highlighting key practical challenges. Subsequently, we introduce the core concept of SRE and elaborate on its enabling techniques across four dimensions. To deliver more insights, we explore potential pathways for integrating SRE into ISACC. To maximize covert throughput, a reinforcement learning-based case study is conducted by jointly optimizing flight trajectory, jamming power, movable antenna position, bandwidth allocation, and beamforming vectors. Simulation results show that the proposed scheme achieves superior performance compared to the benchmark. Finally, some open challenges and potential directions are discussed.
Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.
Wireless connectivity underpins modern society and industry, enabling critical applications such as 5G ultra-reliable low-latency communication (URLLC) for industrial automation. However, the openness of the wireless medium exposes it to spectrum anomalies, including unintentional interference and malicious jamming, which threaten communication and sensing functionalities in 5G and emerging 6G networks. Despite its importance, spectrum anomaly detection research is hindered by a lack of publicly available datasets reflecting real-world scenarios. To address this, we present a benchmark dataset for spectrum anomaly detection in orthogonal frequency-division multiplexing access (OFDMA) systems, a core technology for 5G and beyond. The dataset includes spectrograms generated across a distributed network of sensing units, covering five distinct jammer types, from simple noise to advanced pilot-aware attacks. These anomalies are simulated in an industrial factory environment using a versatile open-source framework developed and published as part of this work, enabling extensibility to new scenarios and interference types. We provide baseline evaluations for supervised and unsupervised learning methods, demonstrating the challenges posed by different jammers and highlighting areas for further research. The dataset and framework support reproducible studies and serve as a foundation for advancing spectrum anomaly detection, with applications extending to network digital twins. By bridging the gap in open dataset availability, this work empowers the research community to validate and compare advanced detection methods for resilient next-generation wireless systems.
The localization of moving sound sources using a microphone array is typically based on modifying the signal to compensate for the Doppler effect. In the time domain this compensation is done on a sample-by-sample basis. In the frequency domain short time segments need to be used in which the Doppler effect is assumed to be approximately constant and a discrete Fourier transform is done on each segment. In contrast, the authors developed an inverse 2.5D localization method for uniformly moving single-frequency sources that works in the spectral domain and allows for the use of longer windows. This was achieved by modifying the 2.5D forward model to directly compute the effect of the motion in the static observer position. The method does neither require to modify the measured signal nor does it require quasi-stationary of the measurements within the window used. Unfortunately, this approach is not directly suitable for broad-band stochastic sources, and in the present work we will investigate how the statistical properties of a uniformly moving stochastic source change when observed at a static observer. Using a 2.5D setting, the relation between the power spectral density of the moving source and the Loève spectrum, which is a generalization of the cross-spectral density at the static receivers, was derived. Based on simulated data with speeds up to 100 m\,s$^{-1}$, the work presented here provides a proof of concept for a method based on multi-taper estimates for the Loève spectrum to localize moving broad-band stochastic sources . Currently, the method requires a stationary source signal and that the spectral density is flat within a certain range around the frequency of interest. Also, correlations between sources are currently not considered.
Anastomotic leak remains one of the most serious complications following colorectal cancer surgery, substantially affecting patient outcomes, recovery trajectories, and healthcare costs. Despite advances in imaging technology, current preoperative assessment relies only on clinical assessment, a process that is subjective, error-prone, and highly dependent on individual expertise. To date, no validated CT-based method exists to predict anastomotic leak risk prior to surgery. This protocol paper outlines a comprehensive framework for developing and validating an AI-driven system for preoperative risk assessment using pre- and post-contrast CT imaging. The study describes the stages of data collection, ethical handling, and preprocessing of patient data in accordance with GDPR, image preprocessing, and the exploration of deep learning architectures designed to generate clinically interpretable outputs. Two integrated tools constitute the main deliverables of this workflow: 1) a risk assessment module, which quantifies the likelihood of leakage by analyzing vascular and tissue features in CT scans, and 2) a Content-Based Medical Image Retrieval (CBMIR) module, which identifies and displays similar historical cases to support evidence-based surgical decision making. The protocol paper requires close collaboration between hospitals and universities; this protocol demonstrates that such a system is technically feasible and clinically implementable within existing healthcare infrastructures. By following the proposed methodological stages and regulatory principles, other institutions can reproduce this workflow to develop analogous decision-support tools. Ultimately, this interdisciplinary framework aims to enhance surgical planning, reduce leak incidence, and contribute to a broader paradigm shift toward explainable, data-driven precision surgery.
This paper presents the Domain-Agnostic Incremental Learning for Audio Classification Task of the DCASE 2026 Challenge. Incremental learning refers to sequentially learning new tasks with the same system while maintaining its knowledge and performance on the previously learned task. Domain-incremental learning for sound classification refers to learning the same sound classes but in different acoustic domains, and was formalized as a data challenge for the first time in DCASE 2026. Participants will train a system to learn ten sound classes in three different domains, with learning at each incremental task not having access to previous task data. Submitted systems will be ranked by the overall average accuracy calculated over the three domains. The provided baseline system obtains a modest performance of 44.9\% accuracy over the three domains, mostly due to erroneous inference of the domain for the test sample.
Speakers in dialogue continuously adapt their communicative behavior across acoustic, lexical, and semantic dimensions, a phenomenon known as conversational entrainment. Modeling this process requires representations that capture the global structure of interaction, yet prior approaches fail to disentangle dyad-specific patterns from speaker-specific traits, limiting their ability to capture true conversational adaptation. We address this with the Dyadic Distance Matrix (DDM), which encodes all pairwise similarities between the turns of two speakers over an entire conversation, capturing long-range cross-speaker dependencies. This raises a key question: does the DDM represent genuine interaction, or merely reflect individual speaker characteristics? We propose the speaker-switch test, a principled control in which one speaker's turns are replaced with those from an unrelated speaker drawn from a different conversation. This preserves turn-level statistics while disrupting the original dyadic coadaptation. The ability to distinguish real from switched DDMs thus directly evaluates whether the representation encodes interaction-specific structure. Across four embedding types and classifiers including ResNet-50 on the CANDOR corpus, real DDMs are consistently distinguishable from their switched counterparts. Comparisons with LibriSpeech show higher discriminability in read speech, highlighting the role of prosodic variability in naturalistic conversations. GradCAM analysis further reveals distinct structural signatures driving classification. These results establish the speaker-switch test as a robust diagnostic for validating representations of dyadic conversational interaction.
Self-supervised speech representation learning has made significant progress through Siamese networks, which leverage different views of the same input. However, existing methods often require frame-wise alignment between these views, overlooking the broader linguistic context invariance across different speaking styles. We introduce SiamCTC, a framework that integrates Siamese networks with Connectionist Temporal Classification (CTC) to learn speech representations without strict frame-level correspondence. By employing CTC loss to establish flexible, monotonic alignments between differing temporal realizations of the same content, SiamCTC accommodates speed perturbations and other temporal augmentations. This design relaxes frame-wise constraints while preserving temporal coherence and enhancing robustness to speaking-rate variations in downstream tasks. Our experiments demonstrate that SiamCTC leads to more adaptable speech representations, particularly at diverse speaking rates.
Clock asynchronism between base stations (BSs) and users significantly degrades scatterer localization accuracy. To address this issue, this paper proposes a multi-BS joint channel estimation and localization scheme that exploits shared scatterer information among multiple BSs. First, channel modeling in the location domain is performed by leveraging the joint sparsity of multi-BS channels. Subsequently, a multi-BS scatterer association algorithm is developed based solely on Angle of Arrival (AoA) estimates. By utilizing the shared scatterers and the geometric relationships among the scatterers, BSs, and the user equipment (UE), coarse estimates of the UE location and timing offsets are obtained. Based on these coarse estimates of scatterer locations, UE location, and timing offsets, an expectation-maximization (EM) framework is employed. Specifically, the UE location and timing offsets are iteratively refined while jointly enabling high-precision estimation of scatterer locations and channel coefficients. Simulation results demonstrate that the proposed scheme achieves significant improvements in both channel estimation and localization accuracy compared with baseline methods.
State-space models are traditionally based on physical knowledge, but multi-step predictions from these physical models can be poor due to model inaccuracy. Black-box deep learning has shown promise as an alternative. However, these methods rely on the availability of large datasets and potentially available physical knowledge is neglected. We propose the PG-RSSNN, a physics-guided recurrent state-space neural network that incorporates recurrent structures to enable the use of non-saturating activation functions in multi-step prediction. It mitigates the vanishing gradients and eliminates the risk of numerical divergence in training seen in existing structures that feed back state estimates. Results across multiple systems with various physical model imperfections, from linear state-space models with Gaussian noise to a robotic arm and a cascaded water tank system, show that the proposed PG-RSSNN maintains stable training behavior, and improves multi-step predictions, as compared with black-box neural networks and physics-only models, even with limited training data and when physical models are only partially known.
Cell-free Massive multiple input and multiple output (MIMO) is recognized as a key technology for beyond-5G networks, where distributed access points (APs) jointly serve user equipments (UEs) to address the inherent inter-cell interference issue inherent in cellular systems. While conventional distributed signal detection methods offer a practical balance between performance and fronthaul load, they are fundamentally limited by linear processing constraints. In this paper, we propose a novel deep learning based uplink detection framework by introducing the distributed mixture of experts detection network (DMoE-DetNet). In this architecture, each AP acts as a local expert employing convolutional neural networks (CNNs) for non-linear feature extraction, and transmits the local minimum mean square error (MMSE) detection results and statistical channel information to the central processing unit (CPU). In the CPU, an attention-based encoder module captures complex spatio-temporal dependencies among users for global feature fusion, with a gating network at the central processor dynamically weighting the contributions from different APs. At last, a linear detector outputs the symbol probability. Simulation results demonstrate that the proposed DMoE-DetNet significantly outperforms conventional linear processing based cell-free signal detection methods in terms of symbol error rate, showcasing the potential of artificial intelligence-enabled communication systems.
Speech denoising is an often necessary step not only for human listening, but also for downstream processing by systems lacking robustness to noisy, real-world acoustic conditions. Unfortunately, denoising is a problem where conventional in-domain supervised training is not trivial, as the training targets cannot be annotated by humans: producing a clean version of a naturally-noisy speech recording is itself the task to solve. Supervised training is typically performed through the artificial addition of noise to clean speech recordings, which can only be sourced from controlled domains, a significant limitation due to the poor out-of-domain generalization of neural networks. An alternative is noisy target training (NyTT), which simply replaces the clean speech with in-domain noisy recordings, with the hope that learning to remove the artificial noise will extend to the natural. Though having shown promising results, NyTT's training objective is not minimized by clean speech estimates. We show that by estimating the artificial noise in addition to the naturally-noisy speech, the undesirable optimum can actually be exploited: the residual noise in the speech estimate can be canceled by the noise estimate via simple subtraction. Crucially, the optimum is fully compatible with conventional artificial mixtures, enabling joint training using both types of data with consistent optimization targets, opening the door to improved domain adaptability. The effectiveness of our approach is demonstrated through WHAM! and CHiME-3-based benchmarks.
Microwave linear analog computers (MiLACs) offer a transformative paradigm for future multiple-input multiple-output (MIMO) systems by shifting complex signal processing into the analog domain, thereby significantly reducing computational complexity, radio-frequency chains, and analog-digital converters, while speeding up computation. However, the practical deployment of MiLACs is severely constrained by the inherent hardware losses of the tunable admittance components (TACs) interconnecting MiLAC ports, which introduce severe inter-stream interference and fundamentally limit the spectral efficiency (SE) of the system. In addition, while denser architectures offer greater spatial degrees of freedom to mitigate inter-stream interference, the cumulative hardware losses and power consumption of massive TACs severely degrade the system's energy efficiency (EE). Consequently, designing architectures for lossy MiLACs emerges as a critical yet unresolved challenge, as it necessitates striking a delicate tradeoff between interference suppression and cumulative hardware losses/power consumption. To address this challenge, this paper investigates the joint MiLAC architecture design and performance (SE/EE) maximization in lossy MiLAC-aided MIMO systems. We propose a novel learning-based joint architecture and performance optimization framework (LJAPOF) that unifies the design of MiLAC architectures and analog beamforming configurations for lossy MiLACs under both SE- and EE-oriented objectives. Numerical results demonstrate that by intelligently navigating the fundamental tradeoff between interference suppression and hardware/power consumption, the proposed LJAPOF can design optimal MiLAC architectures that consistently outperform stem-connected and fully-connected MiLACs in maximizing the system's SE and EE.
Recent advances in Automatic Speech Recognition (ASR) and Large Language Models (LLMs) have significantly improved speech understanding capabilities. However, multi-speaker speech transcription remains challenging task, constrained by highly similar speaker voices, rapid turn-taking transitions, overlapping utterances and inaccurate speaker boundary segmentation. These challenges become particularly pronounced in real-world conversational audio, where speaker dynamics and acoustic conditions are highly variable. This technical report presents SoulX-Transcriber, a unified multi-speaker transcription system that jointly models speaker diarization (SD) and ASR within an LLM-based framework. SoulX-Transcriber adopts a two-stage training strategy to improve both speaker discrimination and transcription robustness. In the first stage, speaker-aware multi-task continuous pre-training enhances speaker representation learning and boundary perception. In the second stage, supervised fine-tuning further optimizes the model for accurate end-to-end speaker-attributed transcription under complex multi-speaker conditions. SoulX-Transcriber delivers strong performance and robustness across multiple public benchmarks, including AliMeeting, AISHELL-4, and AMI, while maintaining high adaptability to multi-domain scenarios.
Publicly available phonocardiogram (PCG) datasets remain limited in size and pathological diversity, constraining both auscultation training and the generalisation of automated heart-sound classifiers. A class-conditional diffusion model for PCG generation is developed in the log-mel domain and synthetic fidelity is assessed using complementary (i) physiology-inspired plausibility metrics, (ii) downstream label-consistency evaluation, and (iii) expert listening. Experiments use the Phy-sioNet/Computing in Cardiology Challenge 2016 dataset (3240 recordings) with recording-level splits. After preprocessing and quality control, 16,749 non-overlapping 4 s clips are mapped to a normalised 1 x 128 x 128 log-mel representation to train a conditional 2D U-Net denoiser with classifier-free guidance. Signal-level plausibility is quantified on reconstructed waveforms using three lightweight metrics: an envelope-autocorrelation rhythm score, an amplitude-based explosion score, and the dominant cycle lag. Synthetic clips preserve similar dominant cycle durations but exhibit reduced envelope periodicity and increased transient burstiness relative to real clips. For downstream evaluation, a ResNet-50 classifier achieves 92.24% accuracy on the held-out real test set and 82.8% accuracy on class-balanced synthetic batches, indicating that generated signals retain discriminative structure relevant to normal/abnormal classification. In a pilot expert listening study (60 clips, two clinicians), most synthetic clips are judged as heart-sound-like, while abnormality sensitivity is low for both real and synthetic 4 s excerpts. Overall, the results provide a practical baseline for diffusion-based PCG generation while highlighting remaining challenges in retaining abnormal acoustic cues and reducing reconstruction-induced artefacts.
Public transportation (PT) agencies generate vast amounts of heterogeneous data from automatic fare collection (AFC), automatic passenger counting (APC), vehicle location (AVL/CAD), schedule and real-time feeds (GTFS/GTFS-RT), and proprietary platforms. These datasets offer unprecedented opportunities for data-driven planning, operations, and passenger services, but their potential is constrained by fragmentation, inconsistent update frequencies, and the lack of reproducible, interoperable pipelines. While contemporary data platform patterns and architectural styles from enterprise computing address analogous challenges in other sectors, their adaptation to the PT domain remains mostly underexplored. Transit systems present unique conditions, including the convergence of Information Technology (IT) and Operational Technology (OT), long asset lifecycles, rigorous security requirements, multi-agency coordination requirements, and the need to operate on live systems that preclude controlled experimentation.
Diffusion and flow-matching based text-to-speech (TTS) models excel in naturalness but often lack explicit emotion control, as emotional signals remain entangled with speaker identity. We discover that emotion embedding emerges as a linearly decodable direction of frozen hidden states, nearly orthogonal to the direction embedding speaker identity. This inspires a plug-and-play framework DUET for emotion control over pretrained diffusion and flow-matching based TTS models. During generation, DUET unifies dual-space control to achieve fine-grained emotion intervention in a single per-step update: hidden space steering shifts generation along the target emotion direction, while mel-space guidance refines spectral details through gradients backpropagated from a differentiable vocoder. We validate DUET on five architecturally diverse pretrained TTS backbones across three datasets, where it outperforms 10 supervised state-of-the-art emotional TTS baselines across paradigms and achieves the highest human-rated emotion appropriateness. To further showcase its qualitative behavior, we deploy DUET on an Ameca humanoid robot, where it produces richly expressive emotional speech on the humanoid, demonstrating the strong potential for plug-and-play affective interaction for embodied agents.
LiDAR semantic segmentation is a core perception capability for autonomous vehicles and mobile robots. However, safe operation also depends on knowing when predictions are unreliable. Existing approaches typically rely on softmax confidence, which is often miscalibrated and overconfident, while stronger uncertainty estimates from Monte Carlo dropout or ensembles are often computationally expensive for real-time use. To this end, we introduce a novel, architecture-agnostic uncertainty-aware Adapter Head. It decomposes the prediction into a Preference Head for class ranking and a Strength Head that refines uncertainty assessment, thereby enabling a principled construction of evidential Dirichlet representations. Building on this design, we propose our inverse-vacuity self-calibration objective (Invascal), which directly supervises the strength signal to produce reliable and well-calibrated uncertainty estimates while preventing runaway evidence growth. We evaluate our framework across multiple LiDAR datasets and backbone architectures. We compare against deterministic training, Monte Carlo dropout and ensembles, and prior evidential methods. Our approach consistently improves uncertainty calibration over traditional deterministic methods with minimal computational overhead. At the same time, it preserves competitive segmentation accuracy, where prior evidential methods often suffer performance degradation.
We introduce segmentation-guided spatial indexing for generalizable and explainable deepfake detection. The key idea reverses the standard design order: rather than pooling all facial tokens and classifying afterward, we first select semantically meaningful patch tokens, then pool only those. A frozen FaRL parser assigns each DINOv3 ViT-L/16 patch token a semantic label; non-target tokens are discarded; a linear probe classifies the retained region. This spatial indexing exploits DINOv3's patch-level spatial consistency, the same property that enables emergent segmentation, to present the probe with a purer regional subspace where manipulation-relevant evidence is less diluted by whole-face cues. Region attribution is structural: when the mouth model predicts fake, the decision used only mouth tokens, not an overlaid saliency map. On Celeb-DF v2, the mouth-indexed probe achieves AUC 0.905, outperforming LipForensics (+8.1 pp) and Xception (+16.9 pp), with no DINOv3 or FaRL fine-tuning and no target-domain data. Ablations isolate the mechanism: replacing regional selection with DINOv3's CLS token drops Celeb-DF v2 AUC by 26.4 pp; replacing DINOv3 with FaRL features drops it by 20.9 pp. Both DINOv3 representation and the spatial index are independently necessary; neither alone approaches the full system.
In terrestrial networks, especially in urban areas, cell-edge users often face significant capacity limitations due to high path loss, shadowing, and inter-cell interference (ICI). This paper proposes integrating a high-altitude platform station (HAPS) into terrestrial networks, where terrestrial base stations (BS) can alleviate these issues by relaying data intended for cell-edge users via HAPS, thereby leveraging line-of-sight (LoS) links. We formulate an energy-efficiency (EE) maximization problem to jointly design beamforming vectors at the BS and HAPS with the goal of improving cell-edge user performance. Since the resulting problem is non-convex, we develop an online optimization framework based on a graph neural networks (GNN), which effectively captures the network topology. Numerical results show that the proposed HAPS-assisted architecture improves network performance, particularly by increasing the 5th-percentile EE, thereby enhancing service for cell-edge users.
We propose a new architecture for practical quantum computing that combines three established principles: symmetry protection of relative-motion qubits via the generalized Kohn theorem, control via twisted-light orbital angular momentum, and metamaterial nanofocusing (e.g. using Weyl-semimetal plasmonics). Crucially, the core mechanism is generic: it applies to any current or future quantum computing system involving parabolic confinement, including cold atoms, ions, and semiconductor dots.
Conventional community detection requires centralized network data, making it unsuitable for distributed or privacy-preserving systems. In this paper, we demonstrate that macroscopic graph partitioning can emerge purely from strictly local, privacy preserving interactions driven by social learning. By reframing clustering as a symmetry-breaking process within nonlinear opinion dynamics, we show that exchanging saturated state dependent signal (like public actions) forces a network to naturally fracture along its sparsest cuts. We mathematically establish the spectral conditions under which dense core communities lock into stable, polarized states, robustly resisting external influence. To apply this mechanism, we propose three decentralized algorithms, leading up to the Score-based Edge Reliability (SER) framework. By evaluating network ties across multiple independent discussion topics, SER statistically bypasses the errors of traditional greedy bisections and naturally isolates structurally ambiguous frontier nodes. Validations on the ABCD benchmark and the real-world Ngogo chimpanzee network confirm that our fully decentralized approach matches the accuracy of globally optimized heuristics (e.g., Louvain, Leiden) up to a theoretical limit of detectable graphs.
Data-driven reduced-order modeling is an essential component in the computer-aided design of control systems. In this work, we present a novel symmetric Hermite formulation of the quadrature-based balanced truncation algorithm that constructs linear reduced-order models from evaluations of the full-order system's transfer function and its derivative. Significantly, the Hermite formulation preserves desirable qualitative properties of the system used to generate the data, such as state-space Hermiticity and, consequently, asymptotic stability.
While Model Predictive Control (MPC) provides strong stability and robustness, it imposes a significant computational burden on real-time systems. This paper investigates the application of Behavior Cloning to approximate MPC policies for the real-time control of a 3-degree-of-freedom robotic manipulator. We present a baseline controller combining Inverse Kinematics with MPC and evaluate neural network architectures, ranging from classical regression algorithms to deep learning models including Deep MLPs and RNNs, to derive computationally efficient surrogate policies. We analyze generalization capabilities, stability considerations, and the trade-offs inherent in different architectural choices. Our empirical study employs both online and offline evaluations to assess performance regarding accuracy, computational efficiency, and fidelity to the original MPC policy. Our results demonstrate that Behavior Cloning can effectively reduce the computational burden of MPC policies for 3-DOF robotic manipulators, achieving a 3x reduction in inference latency with a 84.98% success rate under relaxed tolerances. Notably, we find that static architectures outperform temporal variants, confirming the sufficiency of instantaneous state observations for this task. However, we observe a precision gap under strict tolerances, which suggest that while Behavior Cloning captures the global optimal trajectory, further research is needed to minimize terminal steady-state error.
High-resolution energy system capacity expansion models (CEMs) for energy transition planning often result in large-scale mixed-integer linear programming (MILP) formulations. Benders decomposition (BD) offers a scalable solution approach by iteratively solving a master problem (MP) for investment decisions and multiple subproblems (SPs) for operational decisions. However, accumulated Benders cuts generated by the SPs can make MP solution a major computational bottleneck. Incomplete SP parallelization can also introduce further bottlenecks when SPs exceed available CPUs. We develop clustering-enhanced BD methods to address these challenges, by using clustering to group similar SPs for: a) aggregated Benders cut construction and b) identification of representative SPs to be solved most frequently. For grouped-cuts, we examine two adaptive formulations based on dual variables and a fixed-grouping formulation based on exogenous time-series inputs. We evaluate these methods in an electricity-sector CEM across varying system sizes, temporal SP lengths, inter-SP coupling strengths represented by CO2 policy, computational resources, and stochastic settings. Relative to a benchmark regularized multi-cut formulation, adaptive grouped cuts outperform fixed grouping and provide substantial benefits under weak inter-temporal coupling. The largest gains occur in larger systems with shorter SP horizons, where the MP accounts for a greater share of runtime. Their effectiveness declines under strong inter-temporal coupling, such as annual CO2 emissions limits, where the benchmark multi-cut performs best. The representative-SP method outperforms the benchmark under limited parallelization when SP solution dominates runtime. Overall, the preferred BD strategy depends on inter-SP coupling strength and whether computational burden lies in the MP or the SPs.
Soft wearable robotic systems have emerged as a promising solution for assisting individuals with reduced hand function. This paper presents SoFiE, a modular soft finger exoskeleton designed to assist index-finger flexion during grasping tasks. The proposed system is primarily fabricated using 3D-printed flexible materials, enabling a lightweight, low-profile, and modular design. Actuation is achieved through a tendon-driven mechanism powered by a compact DC motor, while passive extension is provided by a compliant conductive spring. This element, termed StretchSense, also functions as a proprioceptive sensor by exhibiting resistance changes under deformation. Furthermore, a novel tactile sensing approach, MagSense, is introduced, using a magnet and magnetometer pair embedded in a soft fingertip structure to estimate contact force and object compliance. The system is fully untethered and controlled by an embedded microcontroller. In addition, actuator-level sensing through motor encoder feedback enables estimation of the system state, providing a foundation for safe and adaptive control strategies. Experimental validation demonstrates the capability of the system to provide reliable pose estimation, distinguish between materials with different stiffness, and generate distinct sensor signatures across different grasping tasks. This paper details the design, fabrication, and sensing concepts of the proposed exoskeleton as a proof of concept toward modular, soft, and assistive wearable robotics.
Global parameter identifiability is a property of a parametric ODE model to recover the parameter values uniquely from the input-output data. Not all parametric ODE models have this property, and checking for parameter identifiability is a prerequisite to perform numerical parameter estimation. There are many algorithms and software packages for global parameter identifiability, and frequently the runtime is large. However, the computational complexity for this problem has not been analyzed yet, though there are complexity results for local (finitely many values fit the data) parameter identifiability. In this paper, we estimate the complexity of checking global parameter identifiability over real fields for ODE models that depend linearly on the state variables and rationally on the parameters. In particular, we prove that it is equivalent to the injectivity problem.
Compliant force or torque control are approaches often investigated to achieve safe physical human-robot interaction (pHRI). However, these approaches have limitations. Force control requires a robot to be equipped with external force sensors to track the amplitude and direction of applied forces. Torque control requires torque sensing or estimation in each joint. As this is not available on every robot, energy-based approaches offer a promising alternative. Such approaches aim to achieve safe pHRI by limiting the mechanical energy of the robot. Current schemes leveraging an energy-based approach tend to have a complex implementation, and some may require further stability verification. We hence propose an adaptive proportional-derivative (PD) controller that can limit a robot's energy under any given limit to achieve safe pHRI. The proposed controller can limit both the kinetic and potential energy of a robot, and the behaviour of the controller gains can be shaped using various parameters, defining precisely the cutoff limit and sharpness. We construct a stability proof for the controller and define a condition to ensure the controller's stability. The proposed controller's behaviour and compliance are tested on the TALOS robot from PAL Robotics both in simulation and on hardware, verifying the expected compliant and energy-limiting behaviour of the controller.
Speech-aware large language models often generalize poorly to out-of-domain settings. We propose SALSA (Speech-Aware LLM Adaptation via Learned Steering Activations), a lightweight adaptation method that learns layer-wise steering vectors. Unlike commonly used steering approaches that rely on contrastive activation differences, SALSA directly optimizes steering vectors using a supervised objective. Across children's speech, multilingual speech, and Mandarin-English code-switching benchmarks, SALSA substantially improves performance over zero-shot inference and speech in-context learning baselines, achieving up to 46.8% relative improvements over zero-shot. Analysis further demonstrates that steering the encoder, particularly the later layers, is more effective than steering the LLM backbone. These findings suggest that steering improves downstream ASR performance by adapting higher-level acoustic and phonetic representations to better align with the pretrained language model representation space, rather than by modifying the decoder itself.
Tremor is a common movement disorder associated with conditions like Parkinson's disease and Essential tremor, traditionally diagnosed through expert clinician assessment. Current automated detection methods rely on frequency-domain features informed by clinical expertise. In this work, we present an explainable, two-stage hierarchical framework for tremor detection in the time domain that learns tremor patterns directly from 3D kinematic marker time-series data across entire tremor-provoking trials. Our framework combined a deep convolutional and long short-term memory network to learn tremor representations from short, discrete, non-overlapping time segments of kinematic time series data from trials, which are then processed by a vision transformer that models their long-term temporal dynamics of time segment features for trial (session) level classification. Evaluated across nine body parts, the framework achieved F1-scores of 0.594 - 0.947 depending on body parts (average: 0.765), falling short of the frequency-domain state-of-the-art performance (0.909) while requiring minimal preprocessing. Attention weights and gradient-based class activation maps (Grad-CAM) identified time-domain features of tremor across body parts. This proof of concept demonstrated the feasibility of data-driven time-domain modeling for tremor detection across anatomically diverse body parts, while reducing reliance on expert-engineered spectral features and providing posthoc interpretability of temporal and anatomical patterns of tremor.
This research presents a novel stochastic framework for proactive cybersecurity defense timing under a single attack scenario. The approach models the defense process as a continuous observation mechanism in which the defense instant and the subsequent observation slot follow independent exponential distributions. Laplace-Carson transforms combined with first-excess theory yield the joint detection function that brackets the attack moment. Marginalization under Markovian Poisson arrivals then produces the probability density of the defense moment and conditional expectations of pre-attack and post-attack observation times. These closed-form results enable quantitative assessment of defense timing sensitivity to threat intensity and support precise calibration of observation parameters for low-latency proactive measures. Major contributions include the explicit derivation of marginal distributions and expected values, visualization of defense moment density, and the bridging of stochastic duel methodology with practical cybersecurity applications.
Contact-rich manipulation demands both high-level semantic reasoning and the safe regulation of high-frequency contact dynamics. While Vision-Language-Action (VLA) models provide unprecedented semantic generalization, their low-rate outputs lack the reliability required for direct plant authority in force-sensitive tasks. To bridge this semantic-to-control gap, we introduce PaCo-VLA, a passivity-shielded compliance prior that recasts the VLA interface. Rather than trusting VLAs with direct motor commands, PaCo-VLA treats network outputs as task-level compliance proposals: semantic bindings, task stages, and admittance schedules. A high-frequency, proposal-independent passivity shield governs these proposals through energy-tank accounting and boundary checks, preventing invalid, stale, or unverified model predictions from bypassing low-level contact physics. This decoupled architecture also enables causal evaluation, isolating semantic contributions from geometric shortcuts. Extensive simulated and real-world connector-insertion experiments demonstrate that PaCo-VLA achieves superior precision over unshielded VLA baselines, sustaining zero passivity violations even under adversarial compliance shifts. This framework establishes a provably sampled-passive runtime contract at the admittance port and provides a runtime interface for deploying foundation models in contact-rich domains.
Multi-robot systems (MRS) increasingly offload compute-intensive perception tasks to edge nodes to meet strict time-sensitive Quality-of-Service (QoS) constraints. However, static task orchestration on a shared edge node can severely degrade QoS due to network latency, jitter, and edge-resource contention. We present a pilot edge-centric MRS testbed using Raspberry Pi nodes to evaluate a camera-to-manipulator pipeline under three modes: local execution, static offloading, and a QoS-aware Adaptive Task Placement (ATP) controller. ATP scores candidate placements using a multi-metric cost (normalized latency, CPU utilization, and switching overhead) over two-second control windows. The closed-loop visual servoing testbed is instrumented with sub-millisecond clock synchronization, network emulation, and detailed monitoring of multiple metrics across nodes to capture realistic jitter. Experimental results under compute-stress and network-fault scenarios show that static edge offloading reduces on-board CPU load but amplifies tail latency and deadline misses. In contrast, the QoS-aware ATP controller, by switching task placement based on measured latency and utilization thresholds, consistently lowers deadline violations and tail latency. Overall, the results position ATP as a practical edge-side control primitive for MRS and concrete design guidelines for Cloud-Edge Robotics deployments within the broader cloud-fog automation, while motivating QoS-aware multi-objective workload orchestration for industrial cyber-physical systems.
Large, fast-controllable loads such as Bitcoin mining facilities are increasingly viewed as potential sources of flexibility in modern power systems, yet the conditions under which this flexibility is realized remain incompletely understood. Using the Texas power market as an empirical setting, we examine how Bitcoin-mining load responds to two distinct electricity-sector cost channels: contemporaneous wholesale electricity prices and incentives created by coincident-peak-based transmission charges. We find that mining load responds to both cost channels in a manner consistent with miners operating around a breakeven point. At the aggregate level, we observe that mining load decreases as electricity-sector costs rise, but the strength of this response depends on hashprice, a measure of expected mining revenue from the crypto-financial sector. When hashprice is higher, aggregate load responsiveness is weaker. This mechanism is especially evident in the wholesale-price response. Mining load remains largely online at low prices and begins to decline only when electricity costs become large relative to expected mining revenue, with higher hashprice shifting the implied curtailment threshold toward higher wholesale prices. These findings indicate that Bitcoin-mining demand response to electricity-sector costs is economically state-dependent and shaped by revenue conditions in the crypto-financial sector. Treating such loads as stable demand-response resources may therefore overstate available grid flexibility, with implications for power-system planning, market design, and reliability assessment.
Sound design workflows frequently oscillate between time-consuming library searches and the complexity of procedural synthesis, with practitioners typically relying on disconnected tools to address each challenge separately. This paper introduces Quality Audio Prototyping (QuAP), a working prototype that unifies content-based audio retrieval and procedural sound generation within a single interface, reducing the procedural distance between a narrative concept and its sonic realisation. QuAP integrates a similarity-based retrieval engine with real-time procedural audio models, complemented by a rule-based assistant that provides perceptually informed parameter guidance, offering definitions and recommendations derived from empirical optimisation rather than requiring prior synthesis knowledge. Preliminary evaluation confirms the viability of this approach: subjective assessment demonstrated statistically significant quality improvements in five of six embedded synthesis models, and an encoder ablation study established the preferred retrieval architecture on a sound effect dataset. A user evaluation with 16 practitioners confirmed the tool's workflow utility, with all participants agreeing that the parameter assistant preserved creative agency while lowering the barrier to procedural interaction.
Semantic communications which can significantly reduce spectrum consumption in wireless networks, have recently become a popular research area. When combined with wireless power transfer (WPT), semantic communications can help achieve high spectral efficiency for energy-limited devices in wireless communications. In energy-constrained and link budget-limited scenarios such as UAV networks, the integration of semantic communications and WPT enables highly energyefficient transmission mechanisms. In this paper, we investigate semantic communications in UAV-enabled WPT networks. To achieve adaptability to varying signal-to-noise ratio (SNR) and task requirements, we introduce a multi-layer hybrid bit and semantic communication framework. We adopt a semantic communication efficiency metric and aim to maximize it by jointly optimizing UAV trajectory, energy harvesting base station (EHBS) selection, user association, semantic mode selection, and energy harvesting time allocation. To address this complex longterm optimization problem, we introduce the distributional soft actor-critic (DSAC) algorithm and introduce a decision assistant to further enhance the convergence performance of DSAC. Simulation results validate the effectiveness of the proposed method and framework and demonstrate that our algorithm can achieve superior long-term optimization performance in dynamic network environments.
Inference and control in engineered physical systems pay a heavy physics cost at deployment: state estimators, inverse-problem solvers, model-predictive controllers, schedulers, and observers are often not closed-form and must re-solve a numerical optimization per instance, with the operator re-supplied each time. Physics-informed learning moves this cost to training, but uses a single encoder pathway whose latent geometry de-learns under fine-tuning and admits no quantitative transfer guarantee. We propose an asymmetric two-pathway architecture that resolves both issues. A teacher encoder consumes privileged dense states from a high-fidelity simulator and represents the system through operator-polynomial features stable under spectral perturbation; a student encoder learns the same latent geometry from sparse field data and operator descriptors. At deployment the teacher is discarded, and the frozen student runs in a single forward pass with a transfer certificate. The design connects to privileged-information learning, knowledge distillation, and cross-modal distillation, but targets cross-instance transfer rather than fixed-instance prediction: topology and operator may change, while the latent task does not. We establish sufficient and near-necessary transfer conditions via Wasserstein proximity between latent laws, yielding a zero-shot error bound, and develop a finite-sample certification protocol with active expansion when coverage is incomplete. The framework applies wherever a system admits an operator with reportable spectrum. On power-system estimation, it achieves zero-shot transfer to 100 unseen topologies, a 95% certificate pass rate, accuracy competitive with topology-aware Newton--Raphson, and sub-millisecond inference. These results suggest asymmetric pathways plus operator-anchored latent geometry provide a foundation for certified zero-shot inference and control.
In recent years, graph signal processing has emerged as a powerful framework at the intersection of signal processing and graph theory, providing tools for the analysis of signals defined on nodes while accounting for their relationships represented by edges. These tools have been successfully applied to various settings, including statistical hypothesis testing. In particular, non-parametric approaches based on surrogate generation have been proposed for signals on undirected graphs. However, they are yet to be extended to directed graphs. In this work, we first revisit the notion of stationary graph signals on directed graphs. Specifically, and through the eigendecomposition of the graph shift operator, we define directed graph wide-sense stationary signals. Then, we propose a new framework to generate surrogate graph signals that preserve covariance structure under stationarity assumptions. Null distributions of the test metric can then be constructed from these surrogates and serve as a reference for the empirical data. Finally, we provide guiding examples and an application on real data, in which we compare the performance of our framework with existing techniques for undirected graphs or based on naive permutation, demonstrating feasibility and superiority of the proposed approach.
Empathetic spoken dialogue systems must infer a user's emotional state to respond appropriately, yet everyday speech often carries weak, neutral, or ambiguous affective cues. To address this, we introduce Sympatheia, a speech-to-speech dialogue framework conditioned on affect inferred from the user's speech and, when available, explicit affect specifications provided as a continuous valence--arousal (VA) control signal by a multimodal sensing module or user interface. To train our model, we construct Sympatheia-18k, an emotion-conditioned synthetic spoken dialogue corpus with 12 emotion anchors. This dataset includes an emotional split for learning affective speech behavior, and a neutral split that pairs emotionally neutral queries with multiple emotion-conditioned responses to isolate explicit emotion control in emotionally ambiguous cases. Empirical results show that Sympatheia outperforms speech conversational baselines in generating responses whose semantic content and spoken delivery are both emotionally appropriate. We further show that the same VA interface can integrate emotion estimates from diverse sensing modules, including facial expression, biosignals, and textual affect descriptions, improving response alignment when speech alone provides limited emotional evidence. These results suggest that continuous affect conditioning is an effective practical step for building emotionally adaptive voice assistants.
Recent publications have suggested using the Shap- ley value for sensor anomaly/attack localization. We study the performance of such an approach by using mathematically de- fined optimum binary classifiers in the Shapley value calculation. To judge localization performance, we study the ability of the Shapley value of a given sensor observation to determine if that observation is anomalous. First, we prove that for cases with independent sensor observations, an optimized anomaly test using the Shapley value is equivalent to an optimized lower-complexity anomaly test using a single term in the Shapley value calculation, yielding the exact same probability of error. For some popular dependent observation cases involving two sensors, including correlated bivariate Gaussian/Laplacian probability density functions and constant/Gaussian at- tacks/anomalies, we prove that these two tests are fundamentally different, yielding different decision regions and error probabil- ities. Further, we prove that the Shapley value test is sometimes strictly inferior to the other (single term in Shapley calculation) test in certain statistically dependent bivariate Gaussian scenarios with large correlation magnitude and additive attacks/anomalies, while it is strictly superior in others, depending on the sign of the correlation. One can combine these two approaches to obtain a strictly better approach in these cases. These results, which provide the first theoretical statistical analysis of Shapley-based localization, seem very interesting based on the wide acceptance of the Shapley value by many researchers and should encourage further research on this topic. Numerical results are provided which illustrate our findings.
The convergence of Artificial Intelligence, the Internet of Things, and Robotics is no longer a futuristic vision; it is rapidly becoming the foundation of real-time, intelligent, and context-aware systems. AI enables perception and reasoning, IoT provides scalable sensing and communication, and robotics delivers embodied actuation. Despite significant progress in pairwise combinations such as AIoT and the Internet of Robotic Things (IoRT), there remains a lack of unified design frameworks that fully integrate all three. This survey synthesizes the state-of-the-art across these domains, emphasizing the emerging role of Small Language Models (SLMs) at the edge and Large Language Models (LLMs) in the cloud for distributed cognition and autonomous decision-making. We propose a modular system architecture that aligns with these trends, analyze persistent gaps in interoperability and feedback control, and classify existing work by integration depth. Our review highlights how hybrid SLM-LLM systems, when coupled with IoT infrastructure and robotic agents, can address challenges in real-time adaptation, scalability, and reliability. This work offers a conceptual and technical roadmap for designing next-generation AI-IoT-Robotic ecosystems that are modular, interpretable, and capable of learning within dynamic environments, paving the way for the emerging paradigm of Connected Robotics and Physical AI.
While End-to-End (E2E) Speech-Large Language Models (Speech-LLMs) are rapidly evolving, their evaluation methodologies remain limited to the era of simple transcription. Existing benchmarks suffer from three critical limitations: a pronounced bias towards high-resource languages, a focus on low-level recognition (ASR) rather than semantic reasoning, and a neglect of regional dialects. To bridge this gap, we introduce PolySpeech-100, a massive-scale benchmark designed to assess `native-level' speech comprehension across 110 linguistic variants. We employ a novel hybrid construction pipeline that augments gold-standard human recordings with instruction-driven synthetic speech, allowing us to cover 19 distinct Chinese dialects and over 80 low-resource languages. Extensive evaluation of 22 state-of-the-art models (including Gemini-3, GPT-Audio, and Qwen2.5-Omni) yields pivotal insights. First, we demonstrate that open-source E2E models outperform Cascade (ASR+LLM) systems on heavy dialects, proving that direct audio processing preserves critical paralinguistic cues and prosodic features (e.g., intonation, stress) that are often lost in standard transcription. Second, we reveal a significant performance gap: while commercial models maintain robustness, open-source models suffer catastrophic degradation on low-resource languages. Finally, counter-intuitively, we observe that under standard zero-shot settings, Chain-of-Thought prompting frequently degrades speech understanding performance for most evaluated models, revealing a potential modality alignment gap in current architectures. PolySpeech-100 establishes a rigorous standard for the next generation of inclusive, omni-capable Speech-LLMs. The data, demo, and code are publicly available at this https URL.
For near-field communications, it is a hardware-efficient means to form an extremely large-scale array (XL-array) by concatenating multiple modular arrays (also referred to as subarrays). In this letter, we aim to investigate the effect of time synchronization errors among transmissions of different subarrays on the beam-focusing performance. To this end, we first characterize the beam pattern function when the transmit beamforming is designed based on maximum ratio transmission (MRT) under the premise of perfect time synchronization. As this function is highly difficult for analysis, we then consider a typical case with two subarrays. Interestingly, we show that for this case, the beam-focusing effect still persists even in the presence of time synchronization errors, while the focused location is deviated from the user location with an angle offset upper-bounded by 1/M, where M denotes the number of antennas in each subarray. Subsequently, for the general case with multiple subarrays, despite analytical intractability, we numerically show that time synchronization errors give rise to an imbricated (instead of focused) beam pattern. This may significantly degrade multi-user communication performance in practice due to the reduced desired signal power and increased inter-user interference.
Standard positional encodings for transformers - sinusoidal and rotary (RoPE) - treat every position as equally local: they encode where a token is, but not how far its positional influence should extend. We propose that the Morlet wavelet, which simultaneously minimises uncertainty in position and frequency, is the natural basis for positional encoding, and introduce Morlet Positional Encoding (MoPE): each embedding dimension learns its own frequency and locality bandwidth from data. The main theoretical result is a unification: sinusoidal PE and the RoPE correlation kernel both emerge as limiting cases of MoPE when locality is switched off (sigma_i -> infinity). The phase of MoPE recovers the RoPE rotation angle exactly; the amplitude adds a learned Gaussian locality kernel that standard encodings lack. Empirically, MoPE combined with Energy-Gated Attention achieves +0.119 improvement over standard attention on TinyShakespeare, outperforming either component alone. Analysis of the learned parameters reveals that all 128 frequency-bandwidth pairs converge to the wavelet admissibility boundary - an empirical observation consistent with a companion result on energy gating, suggesting a reproducible property of character-level language signals that warrants further investigation.
We present a multimodal dataset of 1020 hours of simultaneously recorded scalp electroencephalography (EEG), facial electromyography (EMG), and speech audio from three healthy native Japanese speakers during open-vocabulary overt speech. Recordings were acquired with three EEG systems-an ultra-high-density system (this http URL) and two cap-type systems (this http URL and eegosports), spanning 62-128 channels-across many sessions over several months. Each session provides time-synchronized EEG, facial EMG, and audio, together with speech-event annotations and transcriptions. Although collected with speech decoding as a primary motivation, the dataset also supports work on multimodal signal processing, artifact modeling, longitudinal and cross-device adaptation, and EEG representation learning. Technical validation included power spectral density and event-related potential analyses across participants, devices, and tasks, which showed the expected 1/f spectral profile, task-related alpha-band attenuation, and time-locked evoked responses. The dataset is released in Brain Imaging Data Structure (BIDS) format via OpenNeuro under a CC0 waiver to support both speech-related and broader EEG research.
Current end-to-end autonomous driving systems predominantly rely on frame-based sensors, which suffer from inherent perception latency and motion blur during highly dynamic encounters, specifically sudden pedestrian crossings. To address this critical safety vulnerability, we propose DeepIPCv3, a novel multi-modal autonomous navigation framework that synergizes the dense 3D spatial geometry of LiDAR point clouds with the microsecond-level asynchronous event streams of a Dynamic Vision Sensor (DVS). We introduce a Transformer-inspired cross-modal attention mechanism to dynamically correlate these distinct modalities, allowing the network to instantaneously prioritize high-speed dynamic updates without sacrificing structural scene awareness. The fused latent representations are then mapped to safe local waypoints and executable control commands via a hybrid policy network that blends heuristic trajectory tracking with direct neural predictions. Due to the severe physical risks associated with live testing of these sudden crossing scenarios, the framework is rigorously evaluated offline using a custom multi-modal dataset collected across both well-illuminated noon and challenging evening conditions. Extensive comparative and ablation studies demonstrate that DeepIPCv3 achieves state-of-the-art predictive performance. By effectively eliminating exposure failures and motion blur, the proposed LiDAR and DVS fusion yields the lowest trajectory and control command errors, enabling highly reactive, mathematically bounded evasive maneuvers regardless of ambient illumination. To support future research, we will release the codes to our GitHub repo at this https URL.
Semantic communication has emerged as a promising paradigm for improving transmission efficiency by conveying task-relevant semantics rather than raw data. Although recent studies have achieved notable gains in communication efficiency and average task performance, reliability remains a fundamental bottleneck in dynamic and uncertain environments. In particular, most existing designs are still optimized mainly for average-case behavior, while lower-tail performance under adverse transmission conditions remains insufficiently understood and inadequately protected. In this article, we present a unified perspective on reliable semantic communication beyond average performance. We first review three reliability-oriented design categories: channel-aware adaptation, robustness-oriented codec design, and hybrid automatic repeat request (HARQ)-based retransmission. We show that these approaches address reliability from complementary perspectives, but each still has inherent limitations. Motivated by these observations, we discuss two solution directions: robust adaptive semantic communication under imperfect CSI, and joint source-channel-check coding with adaptive retransmission for sample-level reliability enhancement. Finally, we outline several future research directions, including the joint design of robustness and retransmission, reliability metrics beyond averages, and compatibility with existing digital wireless networks.
Subpacketization remains a major obstacle to the practical deployment of coded caching (CC) in multi-antenna wireless networks. In this paper, we propose a low-complexity multiple-input multiple-output (MIMO) CC scheme that enables flexible delivery rate adaptation while substantially reducing subpacketization requirements. The proposed design builds on a virtual decomposition of the broadcast channel and extends the shared-cache model to multi-antenna receivers, enabling adaptive selection of feasible user and stream configurations and thereby providing explicit control over the spatial multiplexing gain under linear decodability constraints. Analytical results show that the proposed framework can asymptotically approach the best-known achievable degrees of freedom (DoF) under linear decodability constraints while requiring orders-of-magnitude lower subpacketization than existing schemes. Numerical evaluations further demonstrate that this flexibility yields notable throughput improvements at practical signal-to-noise ratios.
Model-based reinforcement learning (MBRL) infers information about the environment from a learned dynamics model and bears the potential to address open problems such as data efficient and safe learning in robotics. However, inaccuracies of the learned dynamics model are typically exploited by the agent, substantially hampering the capabilities of MBRL methods. We present a framework for dealing with inaccuracies of probabilistic models through targeted handling of uncertainty that effectively mitigates model exploitation. We present recent successes in learning directly on hardware and safe exploration, and discuss future directions for uncertainty-aware MBRL.
A fixed-wing UAV must hold airspeed, altitude, and heading references under wind, gusts, and turbulence, channels coupled so that correcting one can degrade another. Classical autopilots stabilize the airframe well but adapt poorly when a hard crosswind meets an aggressive turn, while reinforcement-learning (RL) policies acting directly on the surfaces concentrate exploration risk at the actuator interface. We place a learned supervisor above an unchanged autopilot rather than inside it: it selects a residual from a finite, bounded action set on the commanded airspeed, altitude, and heading; the modified reference is projected into an admissible command envelope before reaching the autopilot, which stays the only actuator-facing controller. What is new is how the residual is chosen. HJB residual scores candidates with a semi-discrete value-iteration critic in the spirit of the Hamilton-Jacobi-Bellman (HJB) equation, ranks them by a no-op-relative Hamiltonian advantage, and filters them through a control-Lyapunov- and control-barrier-inspired finite-action shield that always keeps a no-op fallback. On a shared 12-state runtime holding the plant, autopilot, and actuator model fixed, so the comparison is at the package level, HJB residual lowers mean RMS path-tracking error to 44.809 m, against 338.617 m for the baseline autopilot and 88.809 m for a tabular-Q residual, an 86.77% reduction over the baseline and 49.54% over Q-learning. The gain concentrates where the baseline fails worst and comes with a measured rise in airspeed error, so no method dominates every metric. We present this autopilot-preserving residual command-supervision design and benchmark with its trade-offs reported intact.
Accurate modeling of leaf spectral reflectance from physiological and biochemical traits is essential for advancing remote sensing applications in plant science and precision agriculture. Widely used radiative transfer models, such as PROSPECT-PRO, rely on generalized trait-reflectance relationships developed from a wide range of species, which may not fully capture the spectral behavior of specific crops like grapevines. In this study, we developed a trait-to-spectra prediction model using a multi-head attention neural network trained on a grapevine-specific dataset that includes 16 leaf traits measured across multiple varieties, growth stages, and years. The model was evaluated using stratified 5-fold cross-validation and achieved an average coefficient of determination (R^2) of 0.84 and normalized root mean squared error (NRMSE) of 1.52 percent, demonstrating high accuracy and generalizability. When compared to PROSPECT-PRO in forward mode, the neural network exhibited lower mean absolute error (MAE), especially in the near-infrared (NIR) and shortwave-infrared (SWIR) regions. These results emphasize the importance of species-specific modeling approaches and show that integrating biochemical and structural traits into data-driven architectures can significantly improve spectral prediction. The proposed model provides a robust framework for generating accurate leaf-level reflectance data, with potential applications in canopy trait retrieval, vineyard monitoring, and remote sensing-driven crop management.
Multi-pitch estimation (MPE) typically predicts which pitches are active in a mixture, but not which instrument or source produced them. This paper investigates a lightweight slot-attention framework for multi-instrument MPE (MI-MPE), where a mixture CQT is mapped to an unordered set of source-like pitch maps. The model uses permutation-invariant Hungarian matching to avoid fixed output semantics and treats the number of slots as an upper bound on the number of active sources. We further study two modular extensions: a self-supervised timbre encoder that provides training-time targets for slot-level timbre embeddings, and a polyphony branch that regularizes the pitch density of mixture- and slot-level predictions. Experiments show that Hungarian matching substantially improves instrument family decomposition on URMP. Stem-level prediction remains more challenging: timbre and polyphony supervision improve selected configurations, but do not consistently resolve source assignment. The results suggest that slot-based architectures are a promising direction for source-aware MPE, while highlighting the need to couple auxiliary musical cues to slot identity more carefully.
High-quality, large-scale synthetic data from simulations is becoming a cornerstone for pushing the capabilities of robot algorithms. While aerial robotics simulators have evolved to support specialized needs such as fidelity, differentiability, and swarms independently, a unified platform that can synthesize data across all these domains is missing. In this work, we propose Crazyflow, a simulator designed to push the limits of aerial-robotics algorithm development, from model-based to data-driven methods, gradient-based to sampling-based approaches, and single-agent to multi-agent systems. Compared to existing state-of-the-art drone simulators, it achieves speeds more than an order of magnitude faster for a single drone and can simulate thousands of swarms of 4000 drones each. Real-world experiments show Crazyflow supports both analytical-gradient-based policy learning, achieving sub-centimeter trajectory tracking accuracy without domain randomization, and sampling-based obstacle avoidance at speeds exceeding half a billion steps per second. Breaking the traditional train-then-deploy paradigm, we show that its unprecedented speed even enables in-flight reinforcement learning; we demonstrate this by throwing a physical drone into the air and training a recovery policy from scratch in 0.38 seconds, successfully stabilizing the drone. Crazyflow supports multiple levels of simulation abstraction, is directly compatible with all open-source Crazyflie models, and enables rapid reconfiguration across custom drone platforms and applications by providing a light-weight system identification pipeline. By pushing accuracy, speed, and differentiability simultaneously, Crazyflow serves as an open-source resource for synthetic data generation, with emerging capabilities for large-scale parallelization for online, in-execution learning and optimization, opening the door to novel algorithm development.
Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cross-chunk context and need brittle heuristics to align speakers and timestamps at boundaries. Long-context ASR models resolve everything in a single pass for better accuracy, but are an order of magnitude slower. We propose Murmur, an inference system that overcomes this trade-off by operating at two levels. At the inter-chunk level, we revisit the chunk-based pipeline for modern long-context ASR, treating chunk size as a tunable hyperparameter, and show that intermediate chunk sizes strike a good balance of accuracy and latency. At the intra-chunk level, we exploit attention sparsity through a sliding window KV cache eviction policy applied to both output and speech tokens. On AMI-IHM, Murmur matches single-pass accuracy while reducing latency by 4.2x, with further gains from token eviction at less than 1% relative tcpWER degradation. The code of Murmur is available at this https URL.
In this article, we establish the global convergence properties of the FilterDDP algorithm, which extends the discrete-time differential dynamic programming (DDP) algorithm of Mayne and Jacobson [\emph{International Journal of Control}, 3, (1966), pp. 85-95] to handle nonlinear constraints over states and controls, in addition to the dynamics. FilterDDP adopts a line-search filter procedure for step acceptance. However, instead of a damped Newton step applied in the general nonlinear programming setting, the computation of a trial point involves applying a backward recursion and a forward simulation. We establish the global convergence of FilterDDP by showing that for a subset of constrained optimal control problems, the this backward-forward procedure satisfies the same properties as a Newton step for the purpose of establishing global convergence of a line-search filter method, following the analysis of Wächter and Biegler [\emph{SIAM Journal on Optimization}, 16 (2005), pp. 1-31].
Everywhere learning is a new paradigm whereby Artificial Intelligence (AI) systems are trained to satisfy loss constraints with probability one over the data distribution. This is in contrast to the standard paradigm of training AI systems to minimize average losses. We develop an approximate duality theory to substantiate a generalization analysis that establishes the proximity between solutions of empirical and statistical everywhere learning problems. Our results show that dual variables reweigh the data distribution towards points in which loss constraints are more difficult to satisfy and that generalization is controlled by the mismatch between the concentration of mass of the data distribution and the concentration of mass on points where constraints are more difficult to satisfy. We further show that we can control generalization with a sparse L1 penalty on constraint relaxations. We illustrate the merits of everywhere learning with an experiment in agentic classification for language model tasks.
We propose FlipItRight, a framework for stable planar pose-targeted throw-flip with a high-DoF manipulator. The task is decomposed into an object-level planner, which generates candidate release states satisfying the desired landing pose, and a robot-level planner, which evaluates executability and constructs a feasible swing motion. Treating the release state as an explicit intermediate representation enables principled candidate filtering, adaptive selection of release and pre-swing configurations, and structured near-release motion design -- in particular, approximately constant end-effector velocities during the final swing phase to improve robustness to release-timing uncertainty. We validate on a real platform across objects of varying shape, size, and mass, achieving a 90% success rate across 120 trials. Ablation studies confirm that each design choice contributes to throwing performance, and the framework requires no prior data or learned model, enabling direct deployment on new objects and targets without environment-specific calibration or data collection.
Photorealistic style transfer aims to match the color and tone of an input image to that of a style target while preserving the content and details of the original scene. Although existing large image models can facilitate these kinds of appearance edits, their high computational demands, potential for hallucinations, and limited user control make them unsuitable for high-resolution, real-time workflows. We introduce Hist2Style, a bilateral-grid formulation for fast, edge-aware stylization that preserves visual fidelity by constraining operations to locally affine transforms in bilateral space. Our model distills a large image editing model into a lightweight network by training on a large supervised corpus generated with language and vision-language models, targeting spatially varying color edits. The network conditions on a histogram-based embedding of the style target to provide an interpretable interface for adjusting the output style by modifying the target color distribution. Overall, Hist2Style maintains content structure by construction, avoids hallucinations, and supports real-time, high-resolution photorealistic stylization with interactive user-controllable color and tone adjustments.
We present Echo, a proof-of-concept audio system built around a single 25 M-parameter ViT encoder. The encoder is pretrained with a JEPA objective and then specialised by stages to carry speaker identity, phonetic content, and dynamic source routing in the same 512-dimensional latent space, with no per-task fine-tuning at deployment. Light heads handle diarization (ArcFace + VBx) and dynamic source separation (null-target K-set prediction). On synthetic VoxCeleb2 mixtures with unknown K, the canonical stack reaches 15.00% blind DER, 97.80% PIT separation accuracy with +9.52 dB latent SI-SDR, and a +53.50-point speaker/content factorisation gap on a held-out k-NN probe. The point of Echo is not a new SOTA on any single task but the joint coexistence of three tasks on one encoder at this footprint. We document the design stage by stage, report the dead-ends, and identify the structural wall on end-to-end ASR through the VQ bottleneck that still bounds the PoC.
Reliable autonomous UAV swarms in Search and Rescue (SAR) missions require fault-tolerant coordination capable of sustaining operations despite agent degradation. This paper introduces the Intelligent Replanning Drone Swarm (IRDS), a distributed coordination architecture designed for resource-constrained environments. The proposed framework employs a Reverse-Auction market mechanism where agents bid to service search sectors based on a distance-weighted cost function, coupled with a geometric consensus protocol for target verification. We evaluate the approach through physics-based simulations (N=8 agents, 8x8 grid) subjected to stochastic fault injection. Results indicate that the swarm autonomously reallocates tasks from failed agents with low latency relative to the total mission duration, maintaining a mission success rate of 93% under 25% workforce degradation. The proposed framework demonstrates a robust, empirically tested method for self-healing aerial robotic coordination.
Diffusion models have shown remarkable success in video generation. However, whether such models are truly aware of the 3D structure underlying visual observations, rather than simply reproducing plausible 2D projections, remains an open question. In this work, we investigate this question through human motion control, a task that requires precise modelling of 3D human geometry, motion, camera viewpoint, and scene context. Unlike prior methods that rely on rendered 2D motion guidance videos, we propose a render-free framework that conditions video generation directly on compressed 3D human mesh tokens. This representation preserves full 3D geometric information while enabling a unified token-based generation pipeline that processes video tokens jointly with motion tokens in a DiT-based architecture. This design requires the model to reason jointly about appearance, 3D structure, and camera viewpoint during video generation. Experimental results demonstrate strong performance on human motion control benchmarks, while reducing artifacts induced by view-dependent 2D guidance and trajectory-pose mismatches during editing. These findings suggest that video diffusion models, when equipped with mesh tokenization, can better capture complex 3D human structures and their interactions with the surrounding environment.
This paper develops a switched event-triggered adaptive boundary control for a class of reaction-diffusion PDE-ODE cascade systems, where the system and input matrices in the ODE as well as the spatially-varying reaction coefficient in the PDE are uncertain. A two-step backstepping transformation is constructed to derive the continuous-time control law. Then a novel dynamic event-triggered control strategy for the PDE-ODE cascade is proposed based on a switched event-triggering mechanism, ensuring global exponential stability of the closed-loop system in place of the exponential convergence commonly achieved with backstepping-based classical dynamic ETC, while inherently excluding Zeno behavior. To address the uncertainties in the PDE-ODE cascade, adaptive update laws are developed, leading to time-varying gain kernels that are adaptively scheduled through the event-triggered control mechanism. Furthermore,to facilitate efficient real-time implementation, deep neural operators (DeepONet) are employed to approximate the backstepping kernels as mappings from the estimated parameters to kernel functions, thereby eliminating the need to repeatedly solve kernel PDEs online. Through a Lyapunov analysis that incorporates the effects of the event-triggering mechanism, parameter adaptation, and kernel approximation errors, we prove the $L^2$ global asymptotic regulation of the resulting closed-loop system. In summary, the key contributions of the paper are threefold: (i) developing an adaptive DeepONet-based framework for reaction-diffusion PDE-ODE cascade systems; (ii) extending the existing adaptive event-triggered control design for reaction-diffusion PDEs to the case with more complex uncertainties; and (iii) generalizing switched dynamic ETC with global exponential stability to PDE-ODE cascades. The effectiveness of the proposed approach is demonstrated through numerical simulations.
Effective scheduling in the energy sector is essential to ensure the reliable operation of electrical grids and their connected assets by, for instance, optimizing the dispatch of generation units and storage systems. An effective planning strategy must (a) accommodate advanced and potentially non-linear system models -- exploiting the increasing data availability of modern grids, and (b) explicitly handle uncertainties arising, for instance, from the integration of renewable energy sources. While existing approaches can address either non-linearity (e.g., Monte Carlo Tree Search) or uncertainty (e.g., stochastic mathematical optimization), there is a lack of planning techniques capable of addressing both challenges simultaneously. To bridge this gap, we propose a Stochastic Scenario-Structured Tree Search (S3TS) algorithm that explicitly represents uncertainty through scenario trees while enabling the integration of advanced non-linear models. We evaluate S3TS on a simulated demand response signal publication problem, largely mimicking the imbalance settlement mechanism in Belgium. The results demonstrate near-optimal performance in linear, analytically tractable settings, with costs within 14% of the mathematically optimal solution conditioned to the scenario trees. In highly non-linear scenarios, S3TS significantly outperforms baseline methods, achieving cost reductions of up to 51% and 5.4% compared to a myopic algorithm and deterministic MCTS, respectively.
Robust state estimation is central to robotic autonomy, yet classical Kalman filters struggle with frequency-dependent disturbances and model mismatch such as sensor vibrations, electromagnetic interference, and periodic noise. Although Deep Kalman Filter (DKF) variants extend the Extended Kalman Filtering (EKF) framework by learning latent transitions, they lack explicit mechanisms to suppress band-limited noise components that typically corrupt sensor measurements in real-world scenarios. We introduce the Frequency-Weighted Neural Kalman Filter (FW-NKF), a unified hybrid approach that embeds a causal spectral-shaping operator into the Kalman measurement residual and jointly learns observation, and transition networks. By adapting both the filter spectrum and the latent state representation, FW-NKF attenuates the noise-dominated frequency bands while capturing complex residual structures. We conduct extensive experiments on four heterogeneous benchmarks, including chaotic systems such as multi-dimensional Lorenz systems and full-body inertial pose estimation, and find a reduction in localization error of up to 10% as well as marked improvements in orientation accuracy. Our ablation studies confirm that frequency weighting and deep latent-state modeling contribute to overall performance.
Packet networks are controlled dynamical systems with discontinuities, delayed observations, and partial state information. Adaptive or learning-driven proposers can improve performance, but an unsafe proposal may still cause starvation, tail-delay spikes, or unstable queue behaviour. This paper treats packet-network control as an executed-action certification problem. A certified operator sits between any proposer and the dataplane. At each control tick, the proposer emits an arbitrary candidate action $\tilde u(t)$. The operator either projects it to an executable action $u(t)$ that satisfies a configuration-compiled certificate, or reports INFEASIBLE and executes an always-defined fallback with quantified slack. The certificate also exports an auditable envelope $\bar z(t)$ for downstream composition. The guarantees are conditional and explicit. They apply on ticks where the operator reports CERTIFIED, the declared arrival envelope and backlog bound are valid, and the platform realises the assumed service lower bound. Under these conditions, one mechanism covers backlog caps, service floors, mitigation caps, Foster--Lyapunov drift constraints, and compositional envelope contracts. We prove operator-level safety, feed-forward compositional safety and stability using exported envelopes, and a cyclic closure result under a small-gain condition. We also define breach and infeasibility semantics, discuss calibration of the service-tracking factor that links certified targets to realised scheduler behaviour, and evaluate the design under delayed telemetry, delayed actuation, weak proposers, envelope mismatch, overload, and millisecond-scale certification. The present evaluation validates the certified execution boundary in a byte-level closed-loop backend; deployment-level scheduler tracking is left to future Linux or hardware experiments.
Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying the route-lead aircraft whose navigation decisions govern fleet behavior presents a fundamental speed--accuracy trade-off: fast methods enable real-time traffic management but sacrifice detection fidelity, while accurate methods may exceed the time budget for actionable airspace deconfliction. This paper presents a game-theoretic decision framework that resolves this trade-off by formulating method selection as a two-player zero-sum game between a Monitor (selecting computational methods and parameters) and Nature (selecting the unknown traffic scenario). We construct an end-to-end pipeline from trajectory surveillance data through eight candidate detection algorithms, a Monte Carlo sensitivity analysis characterizing their stochastic performance, and finally a multi-objective optimization layer that identifies Pareto-optimal method portfolios. The minimax solution provides a robust mixed strategy with a probability distribution over methods that guarantees worst-case performance regardless of scenario uncertainty. Experimental evaluation across 200 randomized configurations spanning 5--50 aircraft demonstrates that the framework recommends distinct method portfolios depending on operational priority: Koopman Phase dominates balanced (70.6%) and speed-priority (79.7%) profiles, while CRQA emerges as primary (47.4%) when route-lead identification is prioritized. The framework achieves a guaranteed game value of 0.29--0.53 (normalized utility) across all tested preference profiles, providing the first principled, scenario-adaptive methodology for computational method selection in UTM fleet monitoring operations.
Incentive design studies how a central authority can influence strategic agents through payments, subsidies, or taxes, so that individual objectives align with collective welfare. This paper introduces a No-Regret Adaptive Incentive Design (RAID) framework for nonlinear games with continuous action spaces and private agent costs. In this framework, the authority (planner) designs incentives that regulate the Nash equilibrium toward a socially optimal action profile, while simultaneously learning agents' unknown preferences from repeated strategic responses. We formulate the RAID problem and construct a least-squares estimator whose strong consistency requires only diminishing excitation. Leveraging this weak excitation requirement, we propose a switching incentive policy that alternates between probing (exploration) and estimate-based (exploitation) incentives. The resulting policy achieves an $O(t^{-0.5})$ parameter estimation rate and accumulates $O(t^{0.5}\log t)$ squared social-cost regret, almost surely. We further extend the framework to an endogenous-noise response model, where standard least-squares estimation is biased due to an error-in-variables correlation between the noise and agent responses. We utilize a repeated-sampling estimator and corresponding switching policy that retain the same almost-sure convergence and regret rates. Numerical experiments validate the effectiveness and predicted convergence rates of the method.
Autonomous robots that interact with people must make safe and efficient decisions under human-induced uncertainty, such as their preferences, goals, competency, and willingness to cooperate. Safety filters are a popular approach for ensuring safety in interactive robotics, since their modular design separates safety from performance, allowing robots to operate safely around people with minimal impact on task efficiency. While traditional safety filters typically operate only in the physical space, neglecting the robot's ability to learn and adapt online, the recently proposed belief-space safety filter (BeliefSF) reasons about robot safety in closed-loop with runtime inference that actively reduces the robot's uncertainty online, thereby reducing conservativeness in filtering. However, providing formal safety guarantees for robots deploying BeliefSF remains a significant challenge due to errors in runtime inference and neural approximation of safety filters required to handle the high dimensionality of belief spaces. In this paper, we propose an algorithmic approach to certify high-probability safety of BeliefSF using conformal prediction, while explicitly accounting for the reliability of the robot's runtime inference module. Our method leverages the structure of belief-space safety filtering by focusing verification on a region where inference is expected to be reliable. It preserves the simplicity and sample complexity of standard conformal prediction, yet can certify a substantially less conservative safety filter. Through a simulated human-vehicle interaction benchmark, we show that our approach verifies a significantly more permissive belief-space safety filter than a standard conformal prediction baseline.
This paper presents an autoencoder with ordered variance (AEO), in which the conventional reconstruction loss is augmented by a variance-based regularization term that promotes an ordered structure within the latent space. In this structure, the latent variables are ordered by their variance computed over the training data, facilitating systematic determination of the latent space dimensionality. The AEO is further extended using residual networks, resulting in a ResNet-based AEO (RAEO). Both AEO and RAEO green lead to discovery of nonlinear relationships among variables in unlabeled datasets, thereby enabling unsupervised static model extraction. Theoretical contributions include formal guarantees on the ordering of latent variances. The practical utility of the framework is demonstrated through its application to the identification of nonlinear steady-state models and their use in real-time optimization, with a continuous stirred tank reactor process serving as a representative case study.
This paper presents a method to synthesize neural network controllers to maximize reward subject to the hard constraint that the feedback system of plant and controller be dissipative, certifying requirements such as stability and $L_2$ gain bounds. It considers nonlinear and uncertain plants, modeled as the interconnection of a linear time-invariant (LTI) system and an uncertainty block, which incorporates nonlinearities. The uncertainty of the plant and the activation functions of the neural network are both described using integral quadratic constraints (IQCs). First, a dissipativity condition is derived for uncertain LTI systems. Second, this condition is used to construct a linear matrix inequality (LMI) which can be used to synthesize neural network controllers. Finally, this convex condition is used in a projection-based training method to synthesize neural network controllers with dissipativity guarantees. Numerical examples on an inverted pendulum and a flexible rod on a cart are provided to demonstrate the effectiveness of this approach.
This paper addresses the problem of identifying a nonlinear state-space model, along with an adequate model order, from a given input-output training dataset. To this end, a novel framework, termed state-space neural network with ordered variance (SSNNO), is proposed. In SSNNO, the state variables are ordered according to their variances computed using the training data. This ordering is achieved by introducing a variance-regularization term into the loss function used for SSNNO training and it facilitates a distinction between significant states, which exhibit high variance from the other residual states with near-zero variance. The number of significant states is indicative of a suitable model order. The variance-regularization mechanism is designed to minimize the number of significant state variables, thereby promoting a minimal order of the identified state-space model without significantly compromising its prediction accuracy. A systematic procedure is then introduced to obtain a reduced-order state-space model from the trained SSNNO, yielding a reduced-order SSNNO (R-SSNNO). The existence of an SSNNO with variance-ordered states, based solely on input-output data, as well as an upper bound on its output prediction error, are formally established. A practical and robust method is proposed for ensuring variance-ordered states in an SSNNO, even when the network is trained using local optimization algorithms. The effectiveness of the proposed method for identification of nonlinear state space models is demonstrated through simulation studies on a nonlinear continuous stirred-tank reactor process. The identified model is further used for state estimation and prediction in a model predictive control implementation.
Spatial transcriptomics (ST) provides essential spatial context by mapping gene expression within tissue, enabling detailed study of cellular heterogeneity and tissue organization. However, aligning ST data with histology images poses challenges due to inherent spatial distortions and modality-specific variations. Existing methods largely rely on direct alignment, which often fails to capture complex cross-modal relationships. To address these limitations, we propose a novel framework that aligns gene and image features using a ranking-based alignment loss, preserving relative similarity across modalities and enabling robust multi-scale alignment. To further enhance the alignment's stability, we employ self-supervised knowledge distillation with a teacher-student network architecture, effectively mitigating disruptions from high dimensionality, sparsity, and noise in gene expression data. Extensive experiments on seven public datasets that encompass gene expression prediction, slide-level classification, and survival analysis demonstrate the efficacy of our method, showing improved alignment and predictive performance over existing methods.
Wireless time-sensitive networking (WTSN) is essential for Industrial Internet of Things. We address the problem of minimizing time slots needed for WTSN transmissions while ensuring reliability subject to interference constraints -- an NP-hard task. Existing semidefinite programming (SDP) methods can relax and solve the problem but suffer from high polynomial complexity. We propose a sparse interference graph-aided SDP (SIG-SDP) framework that exploits the interference's sparsity arising from attenuated signals between distant user pairs. First, the framework utilizes the sparsity to establish the upper and lower bounds of the minimum number of slots and uses binary search to locate the minimum within the bounds. Here, for each searched slot number, the framework optimizes a positive semidefinite (PSD) matrix indicating how likely user pairs share the same slot, and the constraint feasibility with the optimized PSD matrix further refines the slot search range. Second, the framework designs a matrix multiplicative weights (MMW) algorithm that accelerates the optimization, achieved by only sparsely adjusting interfering user pairs' elements in the PSD matrix while skipping the non-interfering pairs. We also design an online architecture to deploy the framework to adjust slot assignments based on real-time interference measurements. Simulations show that the SIG-SDP framework converges in near-linear complexity and is highly scalable to large networks. The framework minimizes the number of slots with up to 10 times faster computation and up to 100 times lower packet loss rates than compared methods. The online architecture demonstrates how the algorithm complexity impacts dynamic networks' performance.
Wi-Fi 7 introduces the restricted target wake time (RTWT) mechanism, which is vital for Industrial IoT (IIoT) applications requiring periodic, reliable, and low-latency communication. RTWT enables deterministic channel access by assigning scheduled transmission slots to stations (STAs), minimizing contention and interference. However, determining efficient RTWT slot assignments remains challenging in dense networks, where conventional interference graph-based models lack flexibility and scalability. To overcome this, we propose a scalable interference graph learning (IGL) framework that learns optimal interference graph representations for graph coloring-based RTWT scheduling. The IGL leverages an evolution strategy (ES) to train a neural network (NN) using a single network-wide reward, avoiding costly edge-wise feedback. Furthermore, a deep hashing function (DHF) groups interfering STAs, limiting training and inference to relevant subsets and greatly reducing complexity. Simulation results demonstrate that the proposed IGL improves slot efficiency by up to 25\%, reduces packet losses by up to 30\% in dynamic environments. Thanks to DHF, it also reduces the training and inference time of IGL by 4 and 8 times, respectively, and the online slot assignment time by 3 times in large networks.
A radiomap represents the spatial distribution of wireless signal strength, which is critical for applications like network optimization. However, constructing a radiomap relies on measuring radio signal power across the entire system, which is costly in outdoor environments due to large network scales. We present RadCloudSplat, a framework that extends 3D Gaussian Splatting (3DGS) to radio frequencies for efficient and accurate radiomap extrapolation from sparse measurements. RadCloudSplat models environmental scatterers and radio paths using 3D Gaussians, capturing key factors of radio wave propagation. It employs a relaxed-mean (RM) scheme to reparameterize the positions of 3D Gaussians from noisy and dense 3D point clouds. A camera-free 3DGS-based projection is proposed to map 3D Gaussians onto 2D radio beam patterns. Furthermore, a regularized loss function and recursive fine-tuning using highly structured sparse measurements in real-world settings are applied to ensure robust generalization. Experiments on synthetic and real-world data show state-of-the-art extrapolation accuracy and execution speed, solidifying the framework's credibility for real-world deployment.
Hyperspectral image (HSI) analysis plays a critical role in remote sensing, agriculture, and environmental monitoring. However, traditional methods often struggle to handle the high dimensionality, spectral redundancy, and noise inherent in HSI data, limiting their accuracy and scalability. Recently, diffusion models including denoising diffusion probabilistic models and other generative frameworks based on stochastic differential equations have shown strong potential in capturing complex spectral spatial structures and generating high fidelity HSI data. These models offer effective solutions for tasks such as noise supression, data augmentation, classification, and anomaly detection. This review presents a systematic summary of recent advances in diffusion models for HSI processing. We categorize existing methods, highlight their strengths in handling high dimensional data, and compare their performance with conventional approaches. Special attention is given to critical applications such as change detection and post disaster anomaly identification. The review also discusses current limitations, such as computational cost and training stability, and outlines potential research directions. Our main contributions can be summarized as follows: we provide a systematic taxonomy of diffusion based HSI methods, examine their applications across major remote sensing tasks, and offer perspectives on potential directions for future research. With these efforts, this review seeks to support the community in harnessing deep learning models to achieve more effective and efficient hyperspectral image analysis.
We study parameterizations of stabilizing nonlinear policies for learning-based control. We propose a structure based on a nonlinear version of the Youla-Kucera parameterization combined with robust neural networks such as the recurrent equilibrium network (REN). The resulting parameterizations are unconstrained, and hence can be searched over with first-order optimization methods, while always ensuring closed-loop stability by construction. We study the combination of (a) nonlinear dynamics, (b) partial observation, and (c) incremental closed-loop stability requirements (contraction and Lipschitzness). We find that for the combination of (c) with either (a) or (b), a contracting and Lipschitz Youla parameter always leads to contracting and Lipschitz closed loops. However, if all three hold, then incremental stability can be lost with exogenous disturbances. Instead, a weaker condition is maintained, which we call d-tube contraction and Lipschitzness. We further obtain converse results showing that the proposed parameterization covers all contracting and Lipschitz closed loops for certain classes of nonlinear systems. Numerical experiments illustrate the utility of our parameterization when learning controllers with built-in stability certificates for: (i) ``economic'' rewards without stabilizing effects; (ii) short training horizons; and (iii) uncertain systems.
We present a true-dynamics-agnostic, statistically rigorous framework for establishing exponential stability and safety guarantees of closed-loop, data-driven nonlinear control. Central to our approach is the novel concept of conformal robustness, which robustifies the Lyapunov and zeroing barrier certificates of data-driven dynamical systems against model prediction uncertainties using conformal prediction. It quantifies these uncertainties by leveraging rank statistics of prediction scores over system trajectories, without assuming any specific underlying structure of the prediction model or distribution of the uncertainties. With the quantified uncertainty information, we further construct the conformally robust control Lyapunov function (CR-CLF) and control barrier function (CR-CBF), data-driven counterparts of the CLF and CBF, for fully data-driven control with statistical guarantees of finite-horizon exponential stability and safety. The performance of the proposed concept is validated in numerical simulations with four benchmark nonlinear control problems.
Medical image segmentation is a fundamental task in computer-aided diagnosis and treatment. Existing approaches based on CNNs, ViTs, Mamba, and hybrid models still suffer from limitations such as restricted receptive fields, high computational cost, or insufficient accuracy. Recently, Vision Receptive-field Weighted Key-Value (VRWKV) models have emerged as a promising alternative,delivering strong long-range dependency modeling for visual tasks. However, current studies on VRWKV-based medical image segmentation mainly focus on hybrid architectures trained from scratch, while the potential of large-scale pretrained pure VRWKV models remains unexplored. In this work, we systematically investigate the effectiveness of pure VRWKV architectures for medical image segmentation. We construct Med-URWKV-T and Med-URWKV-S by reusing pretrained VRWKV encoders at different scales and pairing them with pure VRWKV decoders, enabling a comprehensive evaluation of pretrained pure VRWKV models in this domain. To further enhance performance, we propose two VRWKV-compatible modules: a Frequency-Aware Wavelet Attention (FAWA) module, which exploits wavelet transforms to capture edge details and structural characteristics, and a Multi-Scale Channel Fusion (MSCF) module, which integrates multi-scale features to strengthen informative channel representations. By incorporating them into Med-URWKV-T, we obtain the enhanced model Med-URWKV†. Extensive experiments on five medical image segmentation datasets demonstrate that Med-URWKV achieves performance comparable to or superior to state-of-the-art methods and carefully designed hybrid VRWKV architectures. Moreover, Med-URWKV† further improves segmentation accuracy, surpassing Med-URWKV-S while using only half of its parameter count, and achieves the highest average Dice similarity coefficient of 88.00%. The codes will be released.
Recent developments in the Internet of Bio-Nano-Things (IoBNT) are laying the foundation for innovative healthcare applications that envision a network of remotely coordinated nanodevices within the human body to monitor and actuate over potential diseases. However, interconnecting such nanodevices requires communication strategies that can cope with molecular communication (MC) channels, whose complex, stochastic, and dynamic behavior often makes accurate physical modeling infeasible. To explore the limits of nanodevice interconnectivity under these conditions, this survey focuses on data-driven communication strategies for MC systems, with particular emphasis on machine learning (ML) methods and neural network (NN) architectures for a robust and adaptive communication scheme at the nanoscale. Research on NN-enabled MC spans several aspects covered in this survey, including NNs for communication in IoBNT networks, the feasibility of biocompatible NN realization, explainable approaches, and the generation of training datasets. We also include open-source code examples to support reproducible research across key MC scenarios. Finally, we identify emerging challenges, including the need for robust NN architectures, biologically integrated NN modules, and scalable training strategies.
Low-altitude wireless networks (LAWNs) have been envisioned as flexible and transformative platforms for enabling delay-sensitive control applications in Internet of Things (IoT) systems. In this work, we investigate the real-time wireless control over a LAWN system, where an aerial drone is employed to serve multiple mobile automated guided vehicles (AGVs) via finite blocklength (FBL) transmission. Toward this end, we adopt the model predictive control (MPC) to ensure accurate trajectory tracking, while we analyze the communication reliability using the outage probability. Subsequently, we formulate an optimization problem to jointly determine control policy, transmit power allocation, and drone trajectory by accounting for the maximum travel distance and control input constraints. To address the resultant non-convex optimization problem, we first derive the closed-form expression of the outage probability under FBL transmission. Based on this, we reformulate the original problem as a quadratic programming (QP) problem, followed by developing an alternating optimization (AO) framework. Specifically, we employ the projected gradient descent (PGD) method and the successive convex approximation (SCA) technique to achieve computationally efficient sub-optimal solutions. Furthermore, we thoroughly analyze the convergence and computational complexity of the proposed algorithm. Extensive simulations and AirSim-based experiments are conducted to validate the superiority of our proposed approach compared to the baseline schemes in terms of control performance.
This work addresses the critical challenge of guaranteeing safety for complex dynamical systems where precise mathematical models are uncertain and data measurements are corrupted by noise. We develop a physics-guided, direct data-driven framework for synthesizing robust safety controllers for discrete-time nonlinear polynomial systems that are subject to unknown-but-bounded disturbances. To do so, we introduce a notion of safety through robust control barrier certificates, which ensure avoidance of unsafe regions, offering a less conservative alternative to existing methods based on robust invariant sets. To achieve data efficiency, we further integrate physical information, formulated as quadratic constraints on system and control matrices, with observed noisy data. This integration drastically reduces data requirements, enabling robust safety analysis with significantly shorter trajectories compared to purely data-driven methods. The proposed synthesis procedure is formulated as a sum-of-squares optimization program that systematically designs the barrier and its associated controller by leveraging both collected data and underlying physical laws. The efficacy of our framework is demonstrated on three benchmark systems, confirming its ability to offer robust safety guarantees with reduced data demands.
Oscillometry is the standard method for non-invasive, cuff-based blood pressure (BP) measurement, but it introduces systematic errors that may impact clinical accuracy. This study investigates the sources of these errors--primarily the limitations of oscillometry itself and respiration-induced fluctuations--using BP waveform data from the MIMIC database. Oscillometry tends to underestimate systolic BP and overestimate diastolic BP, while respiration introduces cyclical variations that further degrade measurement precision. To mitigate these effects, we propose an estimation-theoretic framework employing least squares (LS) and maximum likelihood (ML) methods for correcting both single and repeated BP measurements. LS estimation supports conventional multi-measurement averaging protocols, whereas the ML approach incorporates prior knowledge of measurement errors, offering improved performance. Our results demonstrate that leveraging statistical priors across multiple readings can enhance the accuracy of non-invasive BP monitoring, with potential implications for improving cardiovascular diagnosis and treatment.
Millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems operate over wide bandwidths and frequency-selective channels, making orthogonal frequency-division multiplexing (OFDM) a natural transmission scheme. In such systems, fully digital precoding is often impractical because the large antenna arrays require high hardware cost and power consumption, so hybrid precoding that combines digital and radio frequency (RF) processing is an attractive alternative. However, OFDM introduces high signal peaks that may cause clipping and generate out-of-band (OOB) emissions, while practical, nonideal phase shifters (PSs) at the RF precoder and user combiner suffer from phase errors. We study the problem of robust digital-RF precoding optimization for the downlink sum-rate maximization in multi-user (MU) MIMO-OFDM systems under maximum transmit power, clipping, and OOB emission mask constraints. The formulated maximization problem is nonconvex and difficult to solve. We propose a weighted minimum mean squared error (WMMSE) based block coordinate descent (BCD) method to iteratively optimize digital-RF precoders at the transmitter and digital-RF combiners at the users. Low-cost and scalable optimization approaches are proposed to efficiently solve the BCD subproblems. Extensive simulation results are conducted to demonstrate the efficiency of the proposed approaches and exhibit their superiority relative to well-known benchmarks.
Artificial intelligence (AI) is anticipated to emerge as a pivotal enabler for the forthcoming sixth-generation (6G) wireless communication systems. However, current research efforts regarding large AI models for wireless communications primarily focus on fine-tuning pre-trained large language models (LLMs) for specific tasks. This paper investigates the large-scale AI model designed for beamforming optimization to adapt and generalize to diverse tasks defined by system utilities and scales. We propose a novel framework based on bidirectional encoder representations from transformers (BERT), termed BERT4beam. We aim to formulate the beamforming optimization problem as a token-level sequence learning task, perform tokenization of the channel state information, construct the BERT model, and conduct task-specific pre-training and fine-tuning strategies. Based on the framework, we propose two BERT-based approaches for single-task and multi-task beamforming optimization, respectively. Both approaches are generalizable for varying user scales. Moreover, the former can adapt to varying system utilities and antenna configurations by re-configuring the input and output module of the BERT model, while the latter, termed UBERT, can directly generalize to diverse tasks, due to a finer-grained tokenization strategy. Extensive simulation results demonstrate that the two proposed approaches can achieve near-optimal performance and outperform existing AI models across various beamforming optimization tasks, showcasing strong adaptability and generalizability.
Designing socially optimal policies in multi-agent environments is a fundamental challenge in both economics and artificial intelligence. This paper studies a general framework for learning Stackelberg equilibria in dynamic and uncertain environments, where a single leader interacts with a population of adaptive followers. Motivated by pressing real-world challenges such as equitable electricity tariff design for consumers with distributed energy resources (such as rooftop solar and energy storage), we formalize a class of Stackelberg Markov games and establish the existence and uniqueness of stationary Stackelberg equilibria under mild continuity and monotonicity conditions. We then extend the framework to incorporate a continuum of agents via mean-field approximation, yielding a tractable Stackelberg-Mean Field Equilibrium (S-MFE) formulation. To address the computational intractability of exact best-response dynamics, we introduce a softmax-based approximation and rigorously bound its error relative to the true Stackelberg equilibrium. Our approach enables scalable and stable learning through policy iteration without requiring full knowledge of follower objectives. We validate the framework on an energy market simulation, where a public utility or a state utility commission sets time-varying rates for a heterogeneous population of prosumers. Our results demonstrate that learned policies can simultaneously achieve economic efficiency, equity across income groups, and stability in energy systems. This work demonstrates how game-theoretic learning frameworks can support data-driven policy design in large-scale strategic environments, with applications to real-world systems like energy markets.
Pathology whole-slide images (WSIs) are widely used for cancer survival analysis because of their comprehensive histopathological information at both cellular and tissue levels, enabling quantitative, large-scale, and prognostically rich tumor feature analysis. However, most existing methods in WSI survival analysis struggle with limited interpretability and often overlook predictive uncertainty in heterogeneous slide images. In this paper, we propose DPsurv, a dual-prototype whole-slide image evidential fusion network that outputs uncertainty-aware survival intervals, while enabling interpretation of predictions through patch prototype assignment maps, component prototypes, and component-wise relative risk aggregation. Experiments on five publicly available datasets achieve the highest mean concordance index and the lowest mean integrated Brier score, validating the effectiveness and reliability of DPsurv. The interpretation of prediction results provides transparency at the feature, reasoning, and decision levels, thereby enhancing the trustworthiness and interpretability of DPsurv.
Spatial audio enhances immersion by reproducing 3D sound fields, with Ambisonics offering a scalable format for this purpose. While first-order Ambisonics (FOA) notably facilitates hardware-efficient acquisition and storage of sound fields as compared to high-order Ambisonics (HOA), its low spatial resolution limits realism, highlighting the need for Ambisonics upscaling (AU) as an approach for increasing the order of Ambisonics signals. In this work we propose DiffAU, a cascaded AU method that leverages recent developments in diffusion models combined with novel adaptation to spatial audio to generate 3rd order Ambisonics from FOA. By learning data distributions, DiffAU provides a principled approach that rapidly and reliably reproduces HOA in various settings. Experiments in anechoic conditions with multiple speakers, show strong objective and perceptual performance.
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
Networked integrated sensing and communication (ISAC) has emerged as a pivotal paradigm for next-generation wireless networks, where dedicated target monitoring terminals (TMTs) can be extensively leveraged for their low-cost flexible deployment and capability to facilitate bistatic and multistatic sensing. Nevertheless, the coordinated beamforming design for networked ISAC tailored for time-of-arrival (ToA)-based multi-TMT localization remains largely unexplored. To address this gap, we present a comprehensive study in this paper. Specifically, we first establish signal models for both communication and localization, and, for the first time, derive a closed-form Cramer-Rao lower bound (CRLB) to quantify the localization performance. Leveraging this CRLB, we formulate two optimization problems focusing on sensing-centric and communication-centric criteria, respectively, to thoroughly investigate the fundamental communication-localization trade-offs. For the sensing-centric problem, we develop a globally optimal algorithm based on semidefinite relaxation (SDR), applicable to scenarios where the number of BS antennas exceeds the total number of communication users. In parallel, for the communication-centric problem, we design a globally optimal algorithm for the single-BS case utilizing bisection search. To address the general cases of both problems, we propose a unified and efficient successive convex approximation (SCA)-based algorithm, which is further extended to multi-target scenarios. Finally, simulation results demonstrate the effectiveness of our proposed algorithms, reveal the intrinsic trade-offs between communication and localization, and further show that deploying more TMTs is more beneficial than deploying more BSs in networked ISAC systems.
This paper presents a fully data-driven 3-D path-following framework for autonomous underwater vehicles (AUVs), a representative class of underwater field robotics, based on Data-Enabled Predictive Control (DeePC). The approach eliminates explicit hydrodynamic modeling by exploiting measured input-output trajectories to predict and optimize future system behavior. Classic DeePC is employed for heading control, while a cascaded DeePC architecture with loop-frequency separation is proposed for depth regulation, extending DeePC to plants whose dominant output evolves significantly slower than the actuator bandwidth. For 3-D waypoint path following, the Adaptive Line-of-Sight (ALOS) guidance law is extended to a predictive multistep formulation (PALOS) that supplies the horizon-consistent reference required by receding-horizon predictive controllers. All methods are validated in high-fidelity 6 degrees of freedom simulation on the REMUS~100 AUV under nominal operation, ocean-current disturbances, operation beyond the data regime, and 3-D waypoint path following, consistently outperforming the corresponding state-of-the-art benchmarks. In 3-D waypoint path following, the framework reduces cross-track error by approximately 28\% relative to the ALOS-PI/PID baseline.
The Koopman operator and extended dynamic mode decomposition (EDMD) as a data-driven technique for its approximation have attracted considerable attention as a key tool for modeling, analysis, and control of complex dynamical systems. However, extensions towards control-affine systems resulting in bilinear surrogate models are prone to demanding data requirements rendering their applicability intricate. In this paper, we propose a framework for data-fitting of control-affine mappings to increase the robustness margin in the associated system identification problem and, thus, to provide reliable bilinear EDMD schemes. In particular, guidelines for input selection based on subspace angles are deduced such that a desired threshold with respect to the minimal singular value is ensured. Moreover, we derive necessary and sufficient conditions of optimality for maximizing the minimal singular value. Further, we demonstrate the usefulness of the proposed approach using bilinear EDMD with control for nonholonomic robots.
Accurate Channel State Information (CSI) is critical for Hybrid Beamforming (HBF) tasks. However, obtaining high-resolution CSI remains challenging in practical wireless communication systems. To address this issue, we propose to utilize Graph Neural Networks (GNNs) and score-based generative models to enable robust HBF under imperfect CSI conditions. Firstly, we develop the Hybrid Message Graph Attention Network (HMGAT) which updates both node and edge features through node-level and edge-level message passing. Secondly, we design a Bidirectional Encoder Representations from Transformers (BERT)-based Noise Conditional Score Network (NCSN) to learn the distribution of high-resolution CSI, facilitating CSI generation and data augmentation to further improve HMGAT's performance. Finally, we present a Denoising Score Network (DSN) framework and its instantiation, termed DeBERT, which can denoise imperfect CSI under arbitrary channel error levels, thereby facilitating robust HBF. Experiments on DeepMIMO urban datasets demonstrate the proposed models' superior generalization, scalability, and robustness across various HBF tasks with perfect and imperfect CSI.
This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance across diverse conditions. We investigate the performance of a convolutional neural network (CNN) model using various combinations of amplitude-based features (magnitude spectrogram, interaural level difference - ILD) and phase-based features (phase spectrogram, interaural phase difference - IPD). Evaluations on in-domain and out-of-domain data with mismatched head-related transfer functions (HRTFs) reveal that carefully chosen feature combinations often outperform increases in model complexity. While two-feature sets such as ILD + IPD are sufficient for in-domain SSL, generalization to diverse content requires richer inputs combining channel spectrograms with both ILD and IPD. Using the optimal feature sets, our low-complexity CNN model achieves competitive performance. Our findings underscore the importance of feature design in binaural SSL and provide practical guidance for both domain-specific and general-purpose localization.
This paper considers multi-view imaging in a sixth-generation (6G) integrated sensing and communication network, which consists of a transmit base-station (BS), multiple receive BSs connected to a central processing unit (CPU), and multiple extended targets. Our goal is to devise an effective multi-view imaging technique that can jointly leverage the targets' echo signals at all the receive BSs to precisely construct the image of these targets. To achieve this goal, we propose a two-phase approach. In Phase I, each receive BS recovers an individual image based on the sample covariance matrix of its received signals. Specifically, we propose a novel covariance-based imaging framework to jointly estimate effective scattering intensity and grid positions, which reduces the number of estimated parameters leveraging channel statistical properties and allows grid adjustment to conform to target geometry. In Phase II, the CPU fuses the individual images of all the receivers to construct a high-quality image of all the targets. Specifically, we design edge-preserving natural neighbor interpolation (EP-NNI) to map individual heterogeneous images onto common and finer grids, and then propose a joint optimization framework to estimate fused scattering intensity and BS fields of view. Extensive numerical results show that the proposed scheme significantly enhances imaging performance, facilitating high-quality environment reconstruction for future 6G networks.
Electroencephalography (EEG) signals are inherently non-linear, non-stationary, and vulnerable to noise sources, making the extraction of discriminative features a long-standing challenge. In this work, we investigate the non-linear Teager-Kaiser Energy Operator (TKEO) for modeling the underlying energy dynamics of EEG in three representative tasks: motor imagery, emotion recognition, and epilepsy detection. To accommodate the narrowband nature of the operator, we employ Gabor filterbanks to isolate canonical frequency bands, followed by the Energy Separation Algorithm to decompose the TKEO output into an amplitude envelope and instantaneous frequency components. We then derive a set of energy descriptors based on this demodulation and compare their classification performance against established EEG features. The proposed TKEO-based pipeline offers an intuitive, physiologically grounded framework for capturing EEG signal dynamics, while remaining simple, training-free, and data-efficient. Our findings suggest that combining TKEO features with conventional ones improves Balanced Accuracy by approximately 15 percent in epilepsy detection, yields modest gains in motor imagery, and achieves on par performance in emotion recognition, reflecting the pipeline's ability to capture transient neural dynamics.
This paper introduces a novel Lyapunov-based small-gain methodology for establishing fixed-time stability (FxTS) guarantees in interconnected dynamical systems. Specifically, we consider interconnections in which each subsystem admits an individual fixed-time input-to-state stability (ISS) Lyapunov function that certifies FxT-ISS. We then show that if a nonlinear small-gain condition is satisfied, then the entire interconnected system is FxTS. Our results are analogous to existing Lyapunov-based small-gain theorems developed for asymptotic and finite-time stability, thereby filling an important gap in the stability analysis of interconnected dynamical systems. The proposed theoretical tools are further illustrated through analytical and numerical examples, including the first result on fixed-time feedback optimization of dynamical systems without time-scale separation between the plant and the controller.
The excellent structural and piezoresistive properties of continuous carbon fiber make it suitable for both structural and sensing applications. This work studies the use of 3D printed, continuous carbon fiber reinforced beams as self-sensing structures. It is demonstrated how the sensitivity of these carbon fiber strain gauges can be increased irreversibly by means of a pretreatment by pre-stressing the sensors with a large compressive bending load. The increase in the gauge factor is attributed to local progressive fiber failure, due to the combination of the thermal residual stress from the printing process and external loading. The coextrusion of conductive filament around the carbon fibers is demonstrated as a means of improving the reliability, noise and electrical connection of the sensors. A micrograph of the sensor cross section shows that the conductive filament contacts the various carbon fiber bundles. All-in-all, the use of pre-stressing carbon fiber strain gauges in combination with coextrusion of conductive filament hold promises for 3D printed structural sensors with a high sensitivity.
This paper presents an analysis and rigorous procedure for determining the optimal lengths of line standards in multiline thru-reflect-line (TRL) calibration of vector network analyzers (VNAs). The solution is obtained through nonlinear constrained optimization of the eigenvalue problem in multiline TRL calibration. Additionally, we propose a simplified approach for near-optimal length selection based on predefined sparse rulers. Alongside the length calculation, we discuss the required number of lines to meet bandwidth requirements. The proposed methods are validated through measurements of multiple multiline TRL calibration kits on printed circuit boards of different materials and stackups, covering frequencies up to 150 GHz. A measurement-based Monte Carlo uncertainty analysis, using error boxes derived from impedance standard substrate measurements, demonstrates that the proposed line lengths distribute calibration uncertainty more evenly across lines compared to a commercial calibration kit. Practical examples are provided for various applications, including lossy and dispersive lines, as well as banded solutions for waveguides.
The temporal evolution of the propagation environment plays a central role in integrated sensing and communication (ISAC) systems. A slow-time evolution manifests as channel aging in communication links, while a fast-time one is associated with non-zero Doppler clutter. Nevertheless, the joint impact of these two phenomena on ISAC performance has been largely overlooked. This paper addresses this research gap in a network utilizing orthogonal frequency division multiplexing waveforms. Here, a base station simultaneously serves a user equipment (UE) device and performs monostatic sensing. Channel aging is captured through an autoregressive model with exponential correlation decay. Clutter is modeled as a collection of uncorrelated, coherent patches with non-zero Doppler, resulting in a Kronecker-separable covariance structure. We propose an aging-aware channel estimator that uses prior pilot observations to estimate the time-varying UE channel, characterized by a non-isotropic multipath fading structure. The clutter's structure enables a novel low-complexity pre-detection radar processing pipeline: clutter statistics are estimated from raw data and subsequently used to suppress the clutter's action, after which range-angle and range-velocity maps are computed. We evaluate the influence of frame length and pilot history on channel estimation accuracy and demonstrate substantial performance gains over block fading in low-to-moderate mobility regimes. The sensing pipeline is implemented in a clutter-dominated environment, demonstrating that effective clutter suppression can be achieved under practical configurations. We analyze the robustness of our proposed pipeline against non-separable clutter by introducing a controllable degree of non-separability. Our results highlight the benefit of sensing streams and that our pipeline can withstand a moderate degree of non-separability.
Electromagnetic formation flying (EMFF) is challenging due to the complex coupling between the electromagnetic fields generated by each satellite in the formation. To address this challenge, this article uses alternating magnetic field forces (AMFF) to decouple the electromagnetic forces between each pair of satellites. The key idea of AMFF is that a pair of alternating (e.g., sinusoidal) magnetic moments results in a nonzero time-averaged interaction force if and only if those alternating magnetic moments have the same frequency. Hence, the approach in this article is to drive each satellite's electromagnetic actuation system with a sum of sinusoids, where each frequency is common to only a pair of satellites. Then, the amplitudes of each sinusoid are modulated (i.e., controlled) to achieve the desired forces between each pair of satellites. The main contribution of this article is an experimental demonstration of 3-satellite decentralized closed-loop EMFF using AMFF. To the authors' knowledge, this is the first demonstration of AMFF with at least 3 satellites in open or closed loop. This is noteworthy because the coupling challenges of EMFF are only present with more than 2 satellites, and thus, a formation of at least 3 is necessary to evaluate the effectiveness of AMFF. The experiments are conducted on a ground-based testbed consisting of 3 electromagnetically actuated satellites on linear air tracks. The closed-loop experiments demonstrate decentralized EMFF with AMFF where the maximum steady-state formation error is less than $\pm $0.01 m and the settling time is less than 30 s. These experiments validate the decoupling of intersatellite forces through frequency-multiplexed AMFF. The closed-loop experimental results are compared with the behavior of numerical simulations.
Scaling Multimodal Large Language Models (MLLMs) to long-form speech is bottlenecked by the explosive growth of input tokens. Unlike images or videos, audio lacks overlapping information, making extreme 1-token compression highly susceptible to the loss of fine-grained acoustic cues. To overcome this, we propose FastSLM, a token-efficient architecture featuring the Hierarchical Temporal Abstractor (HTA). HTA progressively distills non-overlapping acoustic features across multiple temporal scales, achieving an extreme compression rate of 1.67 tokens per second a 97% reduction without losing critical context. Experimental results show that FastSLM achieves competitive performance with state-of-the-art models on long-form benchmarks despite operating with significantly fewer FLOPs and parameters. The source code and model checkpoints are available at this https URL.
Camera-based visible light positioning (VLP) is a promising technique for accurate and low-cost indoor camera pose estimation (CPE). To reduce the number of required light-emitting diodes (LEDs), advanced methods commonly exploit LED shape features for positioning. Although interesting, they are typically restricted to a single LED geometry, leading to failure in heterogeneous LED-shape scenarios. To address this challenge, this paper investigates Lamé curves as a unified representation of common LED shapes and proposes a generic VLP algorithm using Lamé curve-shaped LEDs, termed LC-VLP. In the considered system, multiple ceiling-mounted Lamé curve-shaped LEDs periodically broadcast their curve parameters via visible light communication, which are captured by a camera-equipped receiver. Based on the received LED images and curve parameters, the receiver can estimate the camera pose using LC-VLP. Specifically, an LED database is constructed offline to store the curve parameters, while online positioning is formulated as a nonlinear least-squares problem and solved iteratively. To provide a reliable initialization, a correspondence-free perspective-n-points (FreePnP) algorithm is further developed, enabling approximate CPE without any pre-calibrated reference points. The performance of LC-VLP is verified by both simulations and experiments. Simulations show that LC-VLP outperforms state-of-the-art methods in both circular- and rectangular-LED scenarios. Compared to a perspective arcs algorithm, LC-VLP can achieve reductions of both over 30% in average position and rotation errors. Experiments further show that LC-VLP can achieve an average position accuracy of less than 4 cm.
In recent years, unmanned aerial vehicles (UAVs) equipped with imaging sensors and automated processing algorithms have emerged as a promising tool to accelerate large-area surveys while reducing risk to human operators. Although hyperspectral imaging (HSI) enables material discrimination using spectral signatures, standardized benchmarks for UAV-based landmine detection remain scarce. In this work, we present a systematic benchmark of four classical statistical detection algorithms, including Spectral Angle Mapper (SAM), Matched Filter (MF), Adaptive Cosine Estimator (ACE), and Constrained Energy Minimization (CEM), alongside a proposed lightweight Spectral Neural Network utilizing Parametric Mish activations for PFM-1 landmine detection. We also release pixel-level binary ground truth masks (target/background) to enable standardized, reproducible evaluation. Evaluations were conducted on inert PFM-1 targets across multiple scene crops using a recently released VNIR hyperspectral dataset. Metrics such as receiver operating characteristic (ROC) curve, area under the curve (AUC), precision-recall (PR) curve, and average precision (AP) were used. While all methods achieve high ROC-AUC on an independent test set, the ACE method observes the highest AUC of 0.989. However, because target pixels are extremely sparse relative to background, ROC-AUC alone can be misleading; under precision-focused evaluation (PR and AP), the Spectral-NN outperforms classical detectors, achieving the highest AP. These results emphasize the need for precision-focused evaluation, scene-aware benchmarking, and learning-based spectral models for reliable UAV-based hyperspectral landmine detection. The code and pixel-level annotations will be released.
Credit-based congestion pricing (CBCP) and discount-based congestion pricing (DBCP), which respectively allot travel credits and toll discounts to subsidize low-income users' access to tolled roads, have emerged as promising policies for alleviating the societal inequity concerns of congestion pricing. However, since real-world implementations of CBCP and DBCP are nascent, their relative merits remain unclear. In this work, we compare the efficacy of deploying CBCP and DBCP in reducing user costs and increasing toll revenues. We first formulate a non-atomic congestion game in which low-income users receive a travel credit or toll discount for accessing tolled lanes. We establish that, in our formulation, Nash equilibrium flows always exist and can be computed or well approximated via convex programming. Our main result establishes a set of practically relevant conditions under which DBCP provably outperforms CBCP in inducing equilibrium outcomes that minimize a given societal cost, which encodes user cost reduction and toll revenue maximization. Finally, we validate our theoretical contributions via a case study of the 101 Express Lanes Project, a CBCP program implemented in the San Francisco Bay Area.
Attitude estimation methods typically rely on full vector measurements from inertial sensors such as accelerometers and magnetometers. This paper shows that reliable estimation can also be achieved using only scalar measurements, which naturally arise either as components of vector readings or as independent constraints from other sensing modalities. We propose nonlinear deterministic observers on $\mathbf{SO}(3)$ that incorporate gyroscope bias compensation and guarantee uniform local exponential stability under suitable observability conditions. A key feature of the framework is its robustness to partial sensing: accurate estimation is maintained even when only a subset of vector components is available. Experimental validation on the BROAD dataset confirms consistent performance across progressively reduced measurement configurations, with estimation errors remaining small even under severe information loss. To the best of our knowledge, this is the first work to establish fundamental observability results showing that two scalar measurements under suitable excitation suffice for attitude estimation, and that three are enough in the static case. These results position scalar-measurement-based observers as a practical and reliable alternative to conventional vector-based approaches.
Vertical farming is a controlled-environment agriculture (CEA) approach in which crops are grown in stacked layers under regulated climate and lighting, enabling predictable production but requiring high electricity input. This study quantifies the techno-economic impact of roof-mounted daylighting in a three-tier container vertical farm using a light-pipe (LP) system that delivers sunlight to the upper tier. The optical chain, comprising a straight duct and a tilting aluminum-coated mirror within a rotating dome, was modelled in Tonatiuh to estimate crop-level photon delivery and solar gains. These outputs were coupled with a transient AGRI-Energy model to perform year-round simulations for Dubai. Tier-3 strategies were compared against a fully LED benchmark, including daylight-only operation, on/off supplementation, PWM dimming, UV-IR filtering, variable-transmittance control, and simple glazing. Ray-tracing predicted an overall LP optical efficiency of 45%-75%, depending on solar position, quantifying the fraction of incident daylight at the collector aperture delivered to the target growing zone. Daylight-only operation reduced the total three-tier yield by 17% and was not economically viable despite 27-29% electricity savings. Hybrid daylight-LED strategies preserved benchmark yield while reducing electricity use. PWM dimming combined with UV-IR filtering achieved the lowest specific electricity energy consumption (6.32 kWh/kg), 14% below the benchmark. Overall, viability remains CAPEX-limited because achievable electricity savings are insufficient to offset the added investment and thus improves mainly under high electricity and carbon-price contexts, although the LP system delivers a 15-38% lower light cost than an optical-fiber reference under identical incident daylight.
In this study, we propose a two-party computation protocol for approximate matrix multiplication of fixed-point numbers. The proposed protocol is provably secure under standard lattice-based cryptographic assumptions and enables matrix multiplication at a desired approximation level within a single round of communication. We demonstrate the feasibility of the protocol by applying it to the secure implementation of a linear control law. Our evaluation reveals that the client achieves lower online computational complexity compared to the original controller computation, while ensuring the privacy of controller inputs, outputs, and parameters. Furthermore, a numerical example confirms that the proposed method maintains sufficient precision of control inputs even in the presence of approximation and quantization errors.
This paper investigates the nonlinear dynamics and phase transitions in power packet network connected with routers, conceptualized as macroscopic information-ratchets. In the emerging paradigm of cyber-physical energy systems, the interplay between stochastic energy fluctuations and the thermodynamic cost of control information defines fundamental operational limits. We first formulate the dynamics of a single router using a Langevin framework, incorporating an exponential cost function for information acquisition. Our analysis reveals a discontinuous (first-order) phase transition, where the system adopts a strategic abandon of regulation as noise intensity exceeds a critical threshold $D_c$. This transition represents a fundamental information-barrier inherent to autonomous energy management. Here, we extend this model to network configurations, where multiple routers are linked through diffusive coupling, sharing energy between them. We demonstrate that the network topology and coupling strength significantly extend the bifurcation points, with collective resilient behaviors against local fluctuations. These results provide a rigorous mathematical basis for the design of future complex communication-energy network, suggesting that the stability of proposed systems is governed by the synergistic balance between physical energy flow and the thermodynamics of information exchange. It will serve to design future complex communication-energy networks, including internal energy management for autonomous robots.
This paper presents an overview of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2026 Challenge Task 4, Spatial Semantic Segmentation of Sound Scenes (S5). The S5 task focuses on the joint detection and separation of sound events in complex spatial audio mixtures, contributing to the foundation of immersive communication. First introduced in DCASE 2025, the S5 task continues in DCASE 2026 Task 4 with key changes to better reflect real-world conditions, including allowing mixtures to contain multiple sources of the same class and to contain no target sources. In this paper, we describe task setting, along with the corresponding updates to the evaluation metrics and dataset. The experimental results of the submitted systems are also reported and analyzed. The official access point for data and code is this https URL.
This paper presents a framework for abstracting uncertain or non-polynomial components of dynamical systems using polynomial constraints. This enables the application of polynomial-based analysis tools, such as sum-of-squares programming, to a broader class of non-polynomial systems. A numerical method for constructing these constraints is proposed. The relationship between polynomial constraints and existing integral quadratic constraints (IQCs) is investigated, providing transformations of IQCs into polynomial constraints. The effectiveness of polynomial constraints in characterizing nonlinearities is validated via numerical examples to compute inner estimates of the region of attraction for two systems.
The growing integration of renewable and decentralized generation increases the need for flexibility in distribution systems. This flexibility, typically represented in a PQ capability curve, is constrained by network limits and topology. Distribution system reconfiguration (DSR) introduces additional degrees of freedom through switching actions. This paper proposes an AC-constrained methodology to assess flexibility under network reconfiguration, explicitly considering radial operation. The impact of topology changes on PQ capability curves, which serve as a measure of flexibility potential, is analyzed. To that end, a novel measure called location-invariant flexibility potential (LI-FP) is introduced. Results show that reconfiguration can significantly influence and improve operational flexibility. The approach presented enables transparency for system operators, facilitating improved coordination of flexibility providers.
Recent advancements in large audio language models have extended Chain-of-Thought (CoT) reasoning into the auditory domain, enabling models to tackle increasingly complex acoustic and spoken tasks. To elicit and sustain these extended reasoning chains, the prevailing paradigm -- driven by the success of text-based reasoning models -- overwhelmingly relies on Reinforcement Learning with Verified Rewards (RLVR). However, as models are strictly optimized to distill rich, continuous auditory contexts into isolated, verifiable text labels, a fundamental question arises: are we fostering true audio intelligence, or merely reducing a continuous sensory medium into a discrete puzzle? We identify this as the "verifiable reward trap." While RLVR yields remarkable scores on standardized objective benchmarks, it systematically degrades the real-world conversational feel of audio models. By prioritizing isolated correctness over acoustic nuance, RLVR reduces dynamic interactions to mechanical "answering machines," severely compromising prosodic naturalness, emotional continuity, and user immersion, particularly in long-turn dialogues. To bridge the gap between mechanical objective verification and genuine sensory empathy, we introduce Step-Audio-R1.5, marking a paradigm shift toward Reinforcement Learning from Human Feedback (RLHF) in audio reasoning. Comprehensive evaluations demonstrate that Step-Audio-R1.5 not only maintains robust analytical reasoning but profoundly transforms the interactive experience, redefining the boundaries of deeply immersive long-turn spoken dialogue.
Robust and accurate calibration of macroscopic traffic flow models such as METANET is critical for reliable prediction and effective control. While gradient-based methods are desirable for high-dimensional parameter spaces, their application to real-world traffic scenarios is hindered by highly nonconvex optimization landscapes. Consequently, standard static calibration frequently yields parameter sets that produce unstable, unrealistic traffic dynamics, undermining confidence in the estimated parameters and compromising the simulation's utility for counterfactual scenario testing. To address this, we propose a dynamic, rolling-horizon calibration framework. By reformulating static one-time estimation into a dynamic control problem, parameters better maintain stability and accuracy amid measurement noise. Using real-world data from the I-24 MOTION testbed, this work empirically characterizes the instability of standard methods. It then shows that the proposed approach simultaneously enhances robustness to perturbations and achieves a 48% improvement in predictive accuracy over conventional static calibration.
This paper presents an efficient implementation of the extended object Poisson multi-Bernoulli (PMB) filter under the zero-inflated Poisson (ZIP) object measurement model using particle belief propagation (BP). The ZIP measurement model separates a Bernoulli object detection event from the conditional Poisson generation of object measurements, enabling principled handling of empty measurement sets. Building upon the PMB mixture posterior, we present a factorized joint posterior over set of objects with object detection variables and a dual representation of data association using both object-oriented and measurement-oriented association variables. Notably, this representation replaces the implicit high-order global hypothesis constraint by local consistency factors, yielding a factor graph amenable to BP. In addition, we present a particle-based implementation, in which the Poisson intensity for undetected objects is analytic, whereas the single object densities of Bernoulli components for the detected objects are represented using particles. Simulation results demonstrate that the proposed method has superior performance than existing sampling-based implementations of extended object PMB filter with ZIP model in terms of both estimation accuracy and runtime.
In this paper, for Markov Decision Processes (MDPs) with standard Borel spaces, (i) we first provide a discretization based approximation method for MDPs with continuous spaces under average cost criteria, and provide error bounds for approximations when the dynamics are only weakly continuous (for asymptotic convergence of errors as the grid sizes vanish) or Wasserstein continuous (with a rate in approximation as the grid sizes vanish) under certain ergodicity assumptions. In particular, we relax the total variation condition given in prior work to weak continuity or Wasserstein continuity. (ii) We provide synchronous and asynchronous (quantized) Q-learning algorithms for continuous spaces via quantization (where the quantized state is taken to be the actual state in corresponding Q-learning algorithms presented in the paper), and establish their convergence. (iii) We finally show that the convergence is to the optimal Q values of a finite approximate model constructed via quantization, which implies near optimality of the arrived solution.
Major government studies and policy reports project that substantial expansion of interregional transmission will be needed to integrate clean energy and ensure reliability in decarbonized power systems. Using the open-source Switch capacity expansion model with detailed representation of existing U.S. generation and transmission infrastructure, solar, wind, and storage resources, and hourly operations, we evaluate the role of interregional transmission across least-cost, carbon-priced, and zero-emissions scenarios for 2050. An optimal nationwide plan would more than triple interregional transmission capacity, yet this reduces the cost of a zero emissions system by only 7% relative to relying on existing interregional transmission, as storage, solar and wind siting, and nuclear generation serve as close substitutes. Regional cost and rent effects vary, with transmission generally favoring wind and hydrogen resources over solar and batteries. Sensitivity analysis shows diminishing returns: one-fifth of the benefits of full expansion can be achieved with one-twelfth of the added capacity, while cost reductions for batteries and hydrogen provide comparable or greater system savings than interregional transmission. Upgrading existing interregional corridors with advanced conductors roughly doubling capacity per link at half the cost of new builds reduces system costs by only 1.6%, suggesting that reconductoring benefits are modest and that realizing their full potential likely requires pairing with new connections on key corridors or complementary reductions in battery costs. These results suggest that while substantial transmission expansion is economically justified, a diverse set of flexibility resources can substitute for large-scale grid build out, and the relative value of transmission is highly contingent on technological and cost developments.
We present the design, development, and experimental validation of BlueME, a compact magnetoelectric (ME) antenna array system for underwater robot-to-robot communication. BlueME employs ME antennas operating at their natural mechanical resonance frequency to efficiently transmit and receive very-low-frequency (VLF) electromagnetic signals underwater. We outline the design, simulation, fabrication, and integration of the proposed system on low-power embedded platforms, focusing on portable and scalable applications. For performance evaluation, we deployed BlueME on an autonomous surface vehicle (ASV) and a remotely operated vehicle (ROV) in open-water field trials. Ocean trials demonstrate that BlueME maintains reliable signal transmission at distances beyond 700 meters while consuming only 10 watts of power. Field trials show that the system operates effectively in challenging underwater conditions such as turbidity, obstacles, and multipath interference -- conditions that generally affect acoustics and optics. Our analysis also examines the impact of complete submersion on system performance and identifies key deployment considerations. This work represents the first practical underwater deployment of ME antennas outside the laboratory and implements the largest VLF ME array system to date. BlueME demonstrates significant potential for marine robotics and automation in multi-robot cooperative systems and remote sensor networks.
Zero-shot learning enables models to generalise to unseen classes by leveraging semantic information, bridging the gap between training and testing sets with non-overlapping classes. While much research has focused on zero-shot learning in computer vision, the application of these methods to environmental audio remains underexplored, with poor performance in existing studies. Generative methods, which have demonstrated success in computer vision, are notably absent from zero-shot environmental sound classification studies. To address this gap, this work investigates generative methods for zero-shot learning in environmental audio. Two successful generative models from computer vision are adapted: a cross-aligned and distribution-aligned variational autoencoder (CADA-VAE) and a leveraging invariant side generative adversarial network (LisGAN). Additionally, we introduced a novel diffusion model conditioned on class auxiliary data. Synthetic embeddings generated by the diffusion model are combined with seen class embeddings to train a classifier. Experiments are conducted on five environmental audio datasets, ESC-50, ARCA23K-FSD, FSC22, UrbanSound8k and TAU Urban Acoustics 2019, and one music classification dataset, GTZAN. Results show that the diffusion model outperforms all baseline methods on average across six audio datasets. This work establishes the diffusion model as a promising approach for zero-shot learning and introduces the first benchmark of generative methods for zero-shot environmental sound classification, providing a foundation for future research.
We study binary-action pairwise-separable graphical games that encompass both coordination and anti-coordination network games. Our model is grounded in an underlying directed signed graph, where each link is associated with a signed weight that describes both nature and the strength of the strategic pairwise interaction. Specifically, positive link weight corresponds to a strategic complement type interaction, whereas negative link weight corresponds to strategic substitute type interaction. The utility for each player is then an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent cohesive subset of players, who are either connected exclusively by positive weights, or form a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Under suitable properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by either consensus or polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the super-modular properties of network coordination games and on a novel use of the concept of graph cohesiveness.
To steer the behavior of selfish, resource-sharing agents in a socio-technical system towards the direction of higher efficiency, the system designer requires accurate models of both agent behaviors and the underlying system infrastructure. For instance, traffic controllers often use road latency models to design tolls whose deployment can effectively mitigate traffic congestion. However, misspecifications of system parameters may restrict a system designer's ability to influence collective agent behavior toward efficient outcomes. In this work, we study the impact of system misspecifications on toll design for atomic congestion games. We prove that tolls designed under sufficiently minor system misspecifications, when deployed, do not introduce new Nash equilibria in atomic congestion games compared to tolls designed in the noise-free setting, implying a form of local robustness. We then upper bound the degree to which the worst-case equilibrium system performance could decrease when tolls designed under a given level of system misspecification are deployed. We validate our theoretical results via Monte-Carlo simulations as well as realizations of our worst-case guarantees.
Lyrics translation requires both accurate semantic transfer and preservation of musical rhythm, syllabic structure, and poetic style. In animated musicals, the challenge intensifies due to alignment with visual and auditory cues. We introduce Multilingual Audio-Video Lyrics Benchmark for Animated Song Translation (MAVL), the first multilingual, multimodal benchmark for singable lyrics translation. By integrating text, audio, and video, MAVL enables richer and more expressive translations than text-only approaches. Building on this, we propose Syllable-Constrained Audio-Video LLM with Chain-of-Thought SylAVL-CoT, which leverages audio-video cues and enforces syllabic constraints to produce natural-sounding lyrics. Experimental results demonstrate that SylAVL-CoT significantly outperforms text-based models in singability and contextual accuracy, emphasizing the value of multimodal, multilingual approaches for lyrics translation.
Due to possible devastating consequences, counteracting sensor data attacks is an extremely impor- tant topic, which has not seen sufficient study. To the best of our knowledge, this paper develops the first meth- ods that accurately identify/eliminate only the problem- atic attacked sensor data presented to a sequence es- timation/regression algorithm under any attack from our attack model. The approach does not assume a known form for the statistical model of the sensor data, allow- ing data-driven and machine learning sequence estima- tion/regression algorithms to be protected. A simple pro- tection approach for attackers not endowed with knowledge of the details of our protection approach is first developed, followed by additional processing for attacks based on pro- tection system knowledge. Experimental results show that the simple approach achieves performance indistinguish- able from that for an approach which knows which sensors are attacked. For cases where the attacker has knowledge of the protection approach, experimental results indicate the additional processing can be configured so that the worst-case degradation under the additional processing and a large number of sensors attacked can be made signif- icantly smaller than the worst-case degradation of the sim- ple approach, and close to an approach which knows which sensors are attacked, with just a slight degradation under no attacks. Mathematical descriptions of the worst-case attacks are used to demonstrate the additional processing will provide similar advantages for cases for which we do not have numerical results. All the data-driven/machine learning processing used in our approaches employ only unattacked training data.
The development of sixth-generation (6G) mobile networks imposes unprecedented latency and reliability demands on multiple-input multiple-output (MIMO) communication systems, a key enabler of high-speed radio access. Recently, deep unfolding-based detectors, which map iterative algorithms onto neural network architectures, have emerged as a promising approach, combining the strengths of model-driven and data-driven methods to achieve high detection accuracy with relatively low complexity. However, algorithmic innovation alone is insufficient; software-hardware co-design is essential to meet the extreme latency requirements of 6G (i.e., 0.1 milliseconds). This motivates us to propose leveraging in-memory computing, which is an analog computing technology that integrates memory and computation within memristor circuits, to perform the intensive matrix-vector multiplication (MVM) operations inherent in deep MIMO detection at the nanosecond scale. Specifically, we introduce a novel architecture, called the deep in-memory MIMO (IM-MIMO) detector, characterized by two key features. First, each of its cascaded computational blocks is decomposed into channel-dependent and channel-independent neural network modules. Such a design minimizes the latency of memristor reprogramming in response to channel variations, which significantly exceeds computation time. Second, we develop a customized detector-training method that exploits prior knowledge of memristor-value statistics to enhance robustness against programming noise. Furthermore, we conduct a comprehensive analysis of the IM-MIMO detector's performance, evaluating detection accuracy, processing latency, and hardware complexity. Our study quantifies detection error as a function of various factors, including channel noise, memristor programming noise, and neural network size.
Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.
Individual Head-Related Transfer Functions (HRTFs) are starting to be introduced in many commercial immersive audio applications and are crucial for realistic spatial audio rendering. However, one of the main hesitations regarding their introduction is that creating individual HRTFs is impractical at scale due to the complexities of the HRTF measurement process. To mitigate this drawback, HRTF spatial upsampling has been proposed with the aim of reducing the measurements required. While prior work has seen success with different machine learning (ML) approaches, these models often struggle with long-range preservation of local spatial variation patterns across neighbouring source directions and generalization at high upsampling factors. In this paper, we propose a novel transformer-based architecture for HRTF upsampling, leveraging the attention mechanism to better capture spatial correlations across the HRTF sphere. Working in the spherical harmonic (SH) domain, our model learns to reconstruct high-resolution HRTFs from sparse input measurements with significantly improved accuracy. To enhance spatial coherence, we introduce a neighbour dissimilarity loss that promotes magnitude smoothness, yielding more realistic upsampling. We evaluate our method using both perceptual localization models and objective spectral distortion metrics. Experiments show that our model outperforms existing methods across several evaluation metrics in generating realistic, high-fidelity HRTFs.
We present Seq-DeepIPC, a sequential end-to-end perception-to-control model for legged robot navigation in real-world environments. Seq-DeepIPC advances intelligent sensing for autonomous legged navigation by tightly integrating multi-modal perception (RGB-D + GNSS) with temporal fusion and control. The model jointly predicts semantic segmentation and depth estimation, giving richer spatial features for planning and control. For efficient deployment on edge devices, we use a lightweight model as the encoder, reducing computation while maintaining accuracy. Heading estimation is simplified by removing the noisy IMU and instead deriving global heading via differential analysis of sequential GNSS coordinates. We collected a larger and more diverse dataset that includes both road and grass terrains, and validated Seq-DeepIPC on a robot dog. Comparative and ablation studies show that sequential inputs improve perception and control in our models, while other baselines do not benefit. Seq-DeepIPC achieves competitive or better results with reasonable model size; although GNSS-only heading is less reliable near tall buildings, it is robust in open areas. Overall, Seq-DeepIPC extends end-to-end navigation beyond wheeled robots to more versatile and temporally-aware systems. To support future research, we will release the codes to our GitHub repo at this https URL.
This paper presents a measurement-driven study of early warning for reliability breakdown events in 5G non-standalone (NSA) railway networks. Using 10~Hz metro-train measurement traces with serving- and neighbor-cell indicators, we benchmark six representative learning models, including CNN, LSTM, XGBoost, Anomaly Transformer, PatchTST, and TimesNet, under multiple observation windows and prediction horizons. Rather than proposing a new prediction architecture, this study develops a measurement-driven benchmark to quantify the feasibility and operating trade-offs of seconds-ahead reliability prediction in 5G NSA railway environments. Experimental results show that learning models can anticipate radio link failure (RLF)-related reliability breakdown events seconds in advance using lightweight radio features available on commercial devices. The presented benchmark provides insights for sensing-assisted communication control and offers an empirical foundation for integrating sensing and analytics into future mobility control.
Transmitting information about quantum states over classical noisy channels is an important problem with applications to science, computing, and sensing. This task, however, poses fundamental challenges due to the exponential scaling of state space with system size. We introduce shadow tomography-based transmission with unequal error protection (STT-UEP), a novel communication protocol that enables efficient transmission of properties of quantum states, allowing decoder-side estimation of arbitrary local Pauli observables. Unlike conventional approaches requiring the transmission of a number of bits that is exponential in the number of qubits, STT-UEP achieves communication complexity that scales logarithmically with the number of observables, depending on the observable weight. The protocol exploits classical shadow tomography for measurement efficiency, and applies unequal error protection by encoding measurement bases with stronger channel codes than measurement outcomes. We provide theoretical guarantees on estimation accuracy as a function of the bit error probability of the classical channel, and validate the approach against several benchmarks via numerical results.
Autonomous driving in interactive traffic scenarios remains challenging because of the mutual influence among vehicles and the inherent uncertainty of surrounding agents. Several model predictive control (MPC) formulations have been proposed to address this challenge, each adopting a different model of inter-agent interaction. While higher-fidelity interaction models enable more intelligent behavior, they incur substantially greater computational cost. Since strong interactions arise only occasionally in real traffic, a practical strategy for balancing performance and computational overhead is to invoke an appropriate controller based on situational demands. To this end, we first conduct a comparative study to assess and hierarchize the interactive capabilities of different MPC formulations. Building on this hierarchy, we then develop a neural network-based classifier for situation-aware switching among these controllers. We demonstrate that, by invoking the most advanced interactive MPC only in rare but critical situations and relying on a basic MPC in the majority of situations, situation-aware switching substantially improves overall performance while significantly reducing computational load.
Unified hyperspectral image (HSI) restoration aims to recover diverse degradations within a single model. However, current methods often rely on impractical explicit priors or opaque black-box representations that overfit to training distributions, hampering generalization to unseen scenarios. To bridge this gap, we propose Degradation-Aware Metric Prompting (DAMP), a novel framework that characterizes multi-dimensional degradations through interpretable spatial-spectral metrics. These metrics serve as Degradation Prompts (DP), enabling the model to capture shared characteristics across tasks and adapt to unknown corruptions. Central to our framework is the Degradation-Adaptive Mixture-of-Experts (DAMoE), where Spatial-Spectral Adaptive Modules (SSAMs) serve as experts that utilize learnable fusion coefficients to specialize in distinct degradation degrees. By using DP as a gating router, DAMoE dynamically activates specialized experts tailored to the specific degradation profile. Extensive experiments on natural and remote sensing HSI datasets demonstrate that DAMP achieves state-of-the-art performance and exhibits exceptional zero-shot generalization on unseen restoration tasks. Code is publicly available at \href{DAMP}{this https URL}.
Audio Language Models (ALMs) offer a promising shift towards explainable audio deepfake detections (ADD), moving beyond \textit{black-box} classifiers by providing transparency to their predictions via reasoning traces. However, such reasoning may not support the model predictions, reflecting poor coherence, or, worse, may rationalize incorrect predictions with plausible but misleading explanation. Moreover, the behavior of ALM reasoning under adversarial attacks remains under-explored, raising questions about the practical reliability of such explanation capabilities. To address this gap, this study introduces \textbf{SARA} (\textbf{S}hift \textbf{A}nalysis of \textbf{R}easoning in \textbf{A}udio), a diagnostic framework that evaluates ALM reasoning across three dimensions: acoustic perception, reasoning-verdict coherence and dissonance. We test five open-source ALMs against both acoustic and linguistic adversarial attacks. We show that acoustic attacks significantly degrade reasoning-verdict coherence (average decrease of 14.20\%), frequently inducing internal logical conflicts. Conversely, linguistic attacks achieve higher attack success rates while maintaining reasoning coherence. We further demonstrate that the textual coherence of generated reasoning traces also serves as a latent indicator of adversarial inputs, enabling effective detection of perturbed audio (0.78 in F1) \textit{without accessing the raw acoustic signal}. These findings suggest that reasoning traces provide diagnostic utility that persists even when final classification outputs are compromised.
Speech tokenizers are a key building block of fully discrete Speech LLMs. Existing tokenizers either prioritize semantic encoding, fuse semantic content with acoustic style inseparably,or achieve incomplete semantic-acoustic disentanglement. To achieve better disentanglement,we propose DSA-Tokenizer,which explicitly disentangles speech into discrete semantic and acoustic tokens via distinct optimization this http URL,semantic tokens are supervised by ASR to capture linguistic content,while acoustic tokens focus on mel-spectrograms restoration to encode this http URL further introduce a hierarchical Flow Matching decoder and a joint reconstruction-context inpainting training strategy,allowing the model to support both high-fidelity reconstruction and cross-utterance voice this http URL speed up inference,we distill the DiT decoder to reduce sampling steps of inference to 4 and improve synthesis quality with GAN this http URL demonstrate that DSA-Tokenizer provides strong semantic-acoustic disentanglement,reliable controllable voice cloning,and efficient high-fidelity generation with low WER/CER.Moreover, our results suggest that disentangled tokenization provides a more effective interface for downstream large-model speech this http URL samples are avaialble at this https URL.
Balancing safety, efficiency, and operational costs in highway driving poses a challenging decision-making problem for heavy-duty vehicles. A central difficulty is that conventional scalar reward formulations, obtained by aggregating these competing objectives, often obscure the structure of their trade-offs. We present a Proximal Policy Optimization based multi-objective reinforcement learning framework that learns a set of policies explicitly representing these trade-offs and evaluates it on a scalable simulation platform for tactical decision making in trucks. The proposed approach learns a set of Pareto-optimal policies that capture the trade-offs among three conflicting objectives: safety, quantified in terms of collisions and successful completion; energy efficiency and time efficiency, quantified using energy cost and driver cost, respectively. The resulting Pareto frontier is smooth and interpretable, enabling flexibility in choosing driving behavior along different conflicting objectives. This framework allows seamless transitions between different driving policies without retraining, yielding a robust and adaptive decision-making strategy for autonomous trucking applications.
In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.
Facial feminization surgery (FFS) is a key component of gender affirmation for transgender and gender diverse patients, aiming to reshape craniofacial structures toward a female morphology. Current surgical planning procedures largely rely on subjective clinical assessment, lacking quantitative and reproducible anatomical guidance. We therefore propose AutoFFS, a novel data-driven framework that generates counterfactual skull morphologies through adversarial free-form deformations. Our method performs a deformation-based targeted adversarial attack on an ensemble of pre-trained binary sex classifiers that learned sexual dimorphism, effectively transforming individual skull shapes toward the target sex. The generated counterfactual skull morphologies provide a quantitative foundation for preoperative planning in FFS, driving advances in this largely overlooked patient group. We validate our approach through classifier-based evaluation, propose Morphological Fréchet Distance (MFD) and Morphological Kernel Distance (MKD) to evaluate distributional alignment of generated and real populations, and perform a human perceptual study, confirming that the generated morphologies exhibit target sex characteristics.
Decoding natural language from non-invasive EEG signals is a promising yet challenging task. However, current state-of-the-art models remain constrained by three fundamental issues: Semantic Bias, where outputs collapse into generic linguistic templates; Signal Neglect, where models rely heavily on LLM priors to hallucinate fluent text even in the absence of meaningful signals; and the "BLEU Trap", where high-frequency stopwords inflate n-gram metrics, masking a lack of true semantic fidelity. To resolve these challenges, we move beyond conventional end-to-end pipelines and propose SemKey, a novel multi-stage framework that enforces signal-grounded generation through four decoupled semantic objectives: sentiment, topic, length, and surprisal. We extract these semantic anchors from EEG embeddings directly, then unify them with an Active Retrieval Decoding mechanism, compelling the LLM to ground its token generation in the neural signals rather than defaulting to linguistic priors. Furthermore, we break the BLEU Trap by establishing a comprehensive evaluation protocol using rigorous retrieval and distribution-based metrics such as Fréchet Distance. Extensive experiments demonstrate that SemKey effectively mitigates hallucinations on noise inputs and achieves SOTA performance on these robust protocols. Code will be released upon acceptance at this https URL.
Designing effective auxiliary rewards for cooperative multi-agent systems remains challenging, as misaligned incentives can induce suboptimal coordination, particularly when sparse task rewards provide insufficient grounding for coordinated behavior. This study introduces an autonomous reward design framework that uses large language models (LLMs) to synthesize executable reward programs from environment instrumentation. The procedure constrains candidate programs within a formal validity envelope and trains policies from scratch using Multi-Agent Proximal Policy Optimization (MAPPO) under a fixed computational budget. The candidates are then evaluated on the basis of their performance, and selection across generations solely based on the sparse task returns. The framework is evaluated in four Overcooked-AI layouts characterized by varying levels of corridor congestion, handoff dependencies, and structural asymmetries. The proposed reward design approach consistently yields higher task returns and delivery counts, with the most pronounced gains observed in environments dominated by interaction bottlenecks. Diagnostic analysis of the synthesized shaping components reveals stronger interdependence in action selection and improved signal alignment in coordination-intensive tasks. These results demonstrate that the proposed LLM-guided reward search framework mitigates the need for manual engineering while producing shaping signals compatible with cooperative learning under finite budgets.
State-of-the-art optical wireless positioning (OWP) commonly reaches centimeter-level accuracy by depending on dense multi-light-emitting diodes (LED) infrastructures, photodiode (PD) arrays, or image-sensor receivers, incurring hardware complexity and deployment cost. This paper introduces a single beam-steered LED, single-PD OWP architecture that achieves three-dimensional (3D) localization without receiver rotation, cameras, or PD arrays; the core idea is to steer the transmitter through K known orientations and exploit the resulting received-signal-strength variations at the PD to estimate LED-to-PD direction and distance. We derive a composite Cramer-Rao lower bound and position-error bound (PEB) for the joint observation model, and cast the steering-pattern design as a genetic algorithm that minimizes the PEB over a 3D testbed. We develop both model-based a constrained nonlinear estimator and closed-form direction estimators: a statistically efficient generalized least squares solution, and a lightweight weighted least squares approximation. Simulations demonstrate centimeter-level accuracy for 3D OWP with a single beam-steered LED and a single PD.
In Global Navigation Satellite System (GNSS)-denied underwater environments, individual unmanned underwater vehicles (UUVs) suffer from unbounded dead-reckoning drift, making collaborative navigation (CN) crucial for accurate state estimation. However, the severe communication delay inherent in underwater acoustic channels poses serious challenges to real-time state estimation. Traditional filters, such as Extended Kalman Filters (EKFs) or Unscented Kalman Filters (UKFs), usually block the main control loop while waiting for delayed data, or effectively discard Out-of-Sequence Measurements (OOSMs), resulting in serious drift. To address this, we propose an Asynchronous Two-Speed Kalman Filter (TSKF) enhanced by a novel projection mechanism, which we term Variational History Distillation (VHD). The proposed architecture decouples the estimation process into two parallel threads: a fast-rate thread that utilizes Gaussian Process (GP) compensated dead reckoning to guarantee high-frequency real-time control, and a slow-rate thread dedicated to processing asynchronously delayed collaborative information. By introducing a Finite-Length Circular State Buffer (FLCSB), the algorithm applies delayed measurements to their corresponding historical states, and utilizes a VHD-based projection to fast-forward the correction to the current time without computationally heavy recalculations. Simulation results demonstrate that the proposed TSKF maintains a trajectory error comparable to computationally intensive batch-optimization methods under severe delays (up to 30\,s). Executing in sub-millisecond time, it significantly outperforms standard EKF/UKF. The results demonstrate an effective control, communication, and computing (3C) co-design that significantly enhances the resilience of autonomous marine automation systems.
Impulse-to-peak response (I2P) analysis for state-space ordinary differential equation (ODE) systems is a well-studied classical problem. However, the techniques employed for I2P optimal control of ODEs have not been extended to partial differential equation (PDE) systems due to the lack of a universal transfer function and state-space representation. Recently, however, partial integral equation (PIE) representation was proposed as the desired state-space representation of a PDE, and Lyapunov stability theory was used to solve various control problems, such as stability and optimal ${H}_\infty$ control. In this work, we utilize this PIE framework, and associated Lyapunov techniques, to formulate the I2P response analysis problem as a solvable convex optimization and obtain provable bounds for the I2P-norm of linear PDEs. Moreover, by establishing strong duality between primal and dual formulations of the optimization problem, we develop a constructive method for I2P optimal state-feedback control of PDEs and demonstrate the effectiveness of the method on various examples.
This paper develops a Finsler-based LMI for robust $\mathcal{H}_\infty$ observer design with integral quadratic constraints (IQCs) and block-structured uncertainty. By introducing a slack variable that relaxes the coupling between the Lyapunov matrix, the observer gain, and the IQC multiplier, the formulation addresses two limitations of the standard block-diagonal approach: the LMI requirement $\mathrm{He}(PA) \prec 0$ (which fails for marginally stable dynamics), and a multiplier--Lyapunov trade-off that causes infeasibility for wide uncertainty ranges. For marginally stable dynamics, artificial damping in the design model balances certified versus actual performance. The framework is demonstrated on quaternion attitude estimation with angular velocity uncertainty and mass-spring-damper state estimation with uncertain physical parameters.
In this work, we study the interface of the Brazilian e-Voting Machine (BVM) in the context of electromagnetic side-channel threats commonly referred to as TEMPEST attacks. In a TEMPEST attack against video displays, an eavesdropper uses Software-Defined Radios (SDRs) to recover sensitive information by intercepting electromagnetic emanations generated during video signal transmission. We emulate the BVM using a VGA monitor by leveraging publicly available information disclosed by the electoral authority, including technical specifications, operational rules of the system, and the official BVM interface. Based on this setup, we investigate whether the BVM interface gives rise to a distinctive spectral signature observable through its unintended electromagnetic emissions. Our findings show that design characteristics relevant to a nationwide electoral process -- such as high image contrast, minimal on-screen information, and the prohibition of other electronic devices within the polling station -- result in a simple and highly distinctive spectral signature that can be observed even through a wall in our experiments. Although our experiments do not involve actual BVM hardware, the results raise concerns regarding the system's susceptibility to TEMPEST attacks and highlight the need for further research on protective countermeasures. In this context, our findings may support the design of automatic jammers capable of adaptively targeting compromising frequencies. To the best of our knowledge, this is the first study investigating TEMPEST attacks in the context of an electronic voting system officially adopted by a country.
This paper details two novel frameworks for developing autonomous, agentic AI in scientific workflows. Both systems leverage a hybrid Local Body, Remote Brain architecture via Google Colab, utilizing Python-based local orchestrators to invoke large language model (LLM) cloud backends. The first agent, DeepTS/DeepCollector, automates the large-scale curation, extraction, and deduplication of time-series datasets. The second, DeepScribe, is an autonomous presentation analyzer that converts visually dense, mathematically complex physics lectures into structured scientific reports. Through practical systems engineering-such as granular attribute extraction (Cellular RAG), remote data inspection, and distributed concurrency controls-we demonstrate how agentic AI can overcome the context and reasoning limitations of current state-of-the-art systems to rigorously support scientific workflows. Finally, we outline a generalization of DeepTS to support deep knowledge graphs and discuss the application of this conceptual approach to high-energy physics (DeepQCD).
We present a provably safe sampling-based motion planning algorithm for robotic systems affected by random disturbances of unknown distribution. We consider systems with linear or linearizable dynamics evolving in workspace with arbitrary-shaped obstacles subject to state and control constraints. Safety requirements are formulated as chance-constraints. Our approach leverages data from trajectories of the system to learn a Wasserstein ambiguity tube, i.e., a sequence of ambiguity sets, which contains the trajectory of the system's state distribution with high confidence. This ambiguity tube is then used in a probabilistically complete algorithm to grow a sampling-based motion planning tree that respects the constraints of the problem. We show that learning several lower-dimensional ambiguity tubes instead of a single high-dimensional one effectively reduces the conservatism and boosts scalability. Additionally, we design an efficient bandit-based validity checker that remarkably increases the empirical performance of our approach without sacrificing probabilistic completeness. Case studies show our algorithm finds valid plans in cluttered environments under strict safety thresholds, outperforming state-of-the-art methods.
Unified speech foundation models require a holistic tokenization space that is both learnable by language models and decodable into high-quality waveforms. Existing speech tokenizers, however, often fail to satisfy these requirements simultaneously, leading to increased architectural complexity and more involved training designs. We propose HoliTok, a continuous Holistic speech Tokenization model designed for unified generation-understanding modeling. HoliTok encodes 48~kHz speech into a compact 25~Hz sequence of 128-dimensional latents. It is trained with a progressive strategy that jointly preserves signal-level fidelity, incorporates semantic information, and maintains strong latent learnability. Based on this tokenization, we build a unified AR+DiT model for speech synthesis and recognition, where the same latent sequence supports both generation-specific and unified generation-understanding tasks. Experiments show that HoliTok achieves competitive reconstruction fidelity, improves generative learnability for high-quality and controllable synthesis, and, among the evaluated representations, is the only one that operates robustly in our unified generation-understanding architecture without additional optimization tricks. These results suggest that HoliTok serves as an effective speech tokenizer and a foundational representation interface for unified spoken language modeling. The code is available at: this https URL.