We present a flow tube reactor design for gas-phase kinetics studies near ambient temperature and pressure. Built entirely from standard tubing, the setup simplifies conventional flow tube configurations based on injector translation while allowing tighter adjustment of reaction time. The reactor spans residence times from sub-second to several minutes through two operating modes: (i) a variable-length mode, in which reaction time is controlled by the tube length, and (ii) a variable-flow mode, in which the second arm acts as an exhaust branch that decouples reactor pressure from inlet flows while allowing the reactor flow rate to be adjusted over a setup-dependent range. Key advantages include a narrow and well-characterized residence time distribution, rapid radial mixing in millimeter-scale tubing, low wall reactivity through the use of perfluoroalkoxy alkane (PFA), and operation of the reaction section at nearly uniform pressure independently of detector constraints. We characterize the residence time distribution and demonstrate reactor performance with the ozonolysis of 2,3-dimethyl-2-butene. Overall, the method provides a compact, low-cost, and versatile alternative to conventional movable-injector flow tubes, with potential applications in atmospheric chemistry, fundamental kinetics, and gas--liquid or gas--solid uptake studies.
The Intelligent Driver Model (IDM) is a cornerstone of Adaptive Cruise Control (ACC), valued for its interpretable parameters and effectiveness in car-following behavior modeling. However, its inherent conservatism leads to prolonged stabilization and reduced traffic efficiency, which have received limited attention. In this paper, we propose SEIDM (Safe and Efficient Intelligent Driver Model), an enhanced IDM extension designed to improve traffic flow efficiency without sacrificing safety. SEIDM introduces an adaptive safety factor to dynamically modulate the impact of the safe deceleration term in acceleration decisions. This allows vehicles to follow more assertively under safe conditions while behaving more cautiously in potential hazards. Extensive urban traffic simulations show that SEIDM achieves significantly shorter stabilization spacing and faster convergence to traffic flow equilibrium, outperforming the original IDM and its variants in traffic stability and efficiency.
The growing share of Renewable Energy Sources (RES) in modern power systems increases both grid imbalances and frequency deviations, reinforcing the need for ancillary services such as Frequency Containment Reserve (FCR) and passive balancing. Battery Energy Storage Systems (BESS) are well-suited for these services, but prior research typically relies on uniform FCR bids that remain constant throughout the control period. Such static bids fail to fully exploit BESS flexibility, as they do not balance the trade-off between reserving energy for FCR delivery and using it for imbalance arbitrage, limiting the achievable value in value-stacking settings. To address this limitation, we propose a two-stage control framework for the European context that introduces non-uniform FCR bids. In the first stage, we derive a time-varying bid sequence using data-driven Monte Carlo (MC) optimization. In the second stage, a Deep Reinforcement Learning (DRL) agent leverages the residual flexibility for real-time imbalance trading while proactively managing the State of Energy (SoE) to ensure compliance with FCR requirements. The framework is presented as a proof of concept, highlighting the potential benefits of time-varying bidding strategies. By incorporating daily cycle budgets and time-varying reserve commitments, our approach achieves a 7.56% profit increase compared to uniform baselines. These results show that non-uniform bidding can unlock additional value by more effectively aligning reserve obligations with rapidly changing imbalance opportunities.
Radio Environment Maps (REMs) have the potential to serve as an important enabler for intelligent modeling and control in emerging AI-native 6G networks. Despite significant progress, most REM construction methods remain passive, relying on interpolation or static uncertainty models and lacking an explicit mechanism to reason about how future measurements will affect reconstruction quality under a limited measurement budget. In this paper, we formulate REM construction as a sequential decision-making problem and propose a world-model-inspired framework for active Received Signal Strength Indicator (RSSI) map reconstruction. By learning an internal representation of the radio environment and employing a dreaming mechanism to simulate the impact of candidate measurements, the proposed approach actively selects measurement locations under a limited budget. Experimental results on real indoor RSSI data demonstrate that the proposed method significantly outperforms Gaussian Process-based interpolation in the few-shot regime, achieving up to a fivefold reduction in Root Mean Square Error (RMSE) with the same number of measurements. These results highlight the potential of world models as a powerful paradigm for sample-efficient radio environment mapping and intelligent model-based sensing in 6G and beyond networks.
Machine learning deployments in real-world wireless communication tasks face significant generalization challenges due to location and environment-specific signal structure, high diversity in data across different deployments, and limited availability of real-world data. Current approaches for assessing data similarity between training and inference (deployment) distributions, as well as evaluating model transferability, suffer from high computational costs and inconsistent performance, leaving critical model deployment and model life cycle management decisions without a principled foundation. To address this, we introduce a dataset similarity framework built upon the feature space of a pretrained wireless foundation model. Our method, LWM-CDE (Contrastive learning of Dataset Embedding), fine-tunes the dataset embeddings of the foundation model using a combination of contrastive and geometry-shaping losses, creating a structured manifold where distance reliably indicates transferability. Extensive experiments on wireless benchmarks show that LWM-CDE achieves stronger correlation with empirical transfer performance than existing metrics while being more computationally efficient. The learned representation space supports more effective and data-efficient decision-making for tasks like source dataset selection, label-aware augmentation, and budgeted pretraining, demonstrating its broader utility across different wireless communication applications.
This paper addresses the problem of reaching consensus under input saturation and intermittent communication, which can hinder the convergence of the system. We propose a method that translates the consensus into an equivalent stability problem. Then, we compute bounded sets that enclose the initial conditions and the evolution of trajectories leading to local input-to-state stability for systems interconnected over directed intermittent topologies. Our contributions include sufficient conditions for stability and stabilization of multi-agent systems under intermittent interactions and saturating inputs, with the ability to evaluate disturbance tolerance and rejection based on the regions that enclose the system's trajectories. We define disturbance rejection in terms of the $\mathscr{L}_2$ gain, and formulate stability and controller design conditions as convex optimization problems. Our method enable the maximization of regions that ensure local input-to-state stability, we provide numerical examples highlighting the trade-offs between mean frequency of intermittent interactions, disturbance energy, and convergence region size.
This paper provides a comparative study of modern uncertainty quantification (UQ) methods. To greatly enhance real-time performance, both differential algebra (DA) and a directional differential algebra (DDA) approach are employed. This can enable fast UQ in the case of non-Gaussian statistics. Higher-order moments, namely skew and kurtosis, can be computed quickly by several means. This motivates their implementation in an analytic approximation of the confidence bounds for the so-called "banana-shaped" non-Gaussian distributions encountered often in nonlinear astrodynamics problems. This method improves greatly on a linear covariance approach, with only 5x its runtime in numerical tests, even before DA methods are employed. Test problems in this work include a restricted three-body cislunar example and an Earth-return aerocapture example.
Parkinson's disease (PD) is a highly heterogeneous disease, including which motor symptoms are dominating. Imaging biomarkers that support subtype stratification could also improve biological understanding and study design, and enable personalized treatment strategies. This study evaluates whether deep-learning based automatic brain segmentation, in addition to quantitative maps from 7 Tesla MRI, can highlight differences between Healthy Controls (HC), Postural Instability and Gait Difficulty (PIGD) and Tremor Dominant (TD), and subsequently be used for objective PD stratification. The performance of machine learning classifiers may be improved with feature selection. 21 HC, and 24 people with PD (PwP) were included. The U-Net training was assessed with DSC. Two classification approaches using 5-fold cross-validation were defined across three tasks: (1) HC vs PwP; (2) PIGD vs TD; (3) multiclass, HC vs PIGD vs TD. Approach A used all extracted features. Approach B found the optimal subset of features for the classification tasks. The U-Net achieved mean DSC of 0.86 for all ROIs during training. Approach A: Task 1 best accuracy of 0.69 and best AUC of 0.73. Task 2 accuracy 0.69, AUC 0.90. Task 3 accuracy 0.62, AUC 0.66. Approach B: Task 1 accuracy of 0.82 and AUC of 0.93. Task 2 accuracy 1.00, AUC 1.00. Task 3 accuracy 0.73, AUC 0.91. DL-based segmentation combined with qMRI feature selection improved classification relative to using all features, supporting the potential of interpretable, low-dimensional imaging signatures for PD diagnosis support and phenotype stratification. Larger, multi-site studies are warranted to assess generalizability and stability.
Preserving stability is a central problem in data-driven model order reduction of dynamical systems. For linear systems whose dynamics depend on geometric or physical parameters, multivariate rational approximation algorithms such as the Parameterized Sanathanan-Koerner iteration and the pAAA algorithm construct parameterized reduced models from sampled transfer function data. In this setting, stability must be enforced robustly across the parameter domain. This paper introduces a necessary and sufficient criterion for characterizing the stability of parameterized models. Within a unified framework, the results apply to models with general rational as well as polynomial dependence on the parameters. Building on this criterion, we develop and demonstrate a rational approximation algorithm that includes robust stability constraints through convex optimization. Relative to the state of the art, the approach enforces stability without conservatism while allowing increased flexibility in the choice of model structure.
This study explores the use of Visible Light Communication (VLC) in Collective Perception (CP), a technology that enables vehicles and infrastructure to share sensor information to help reduce traffic accidents. Recent advances in Vehicle-to-Everything (V2X) communication have spurred growing research interest in CP. However, in regions such as the United States and Japan, only 30 MHz of radio spectrum is allocated for V2X, which is insufficient to effectively support CP. In this paper, we propose integrating VLC into V2X systems to enhance CP, complementing the existing 5.9 GHz band for V2X communications. VLC can coexist with wireless systems that use radio waves, providing an additional optical channel for data exchange. To the best of the authors' knowledge, this is the first study to investigate VLC for CP. We evaluate the feasibility of VLC-based CP through three experiments. First, we measured the application-level delay of a VLC-based CP system in a stationary indoor environment. Next, we evaluated its communication range in a stationary outdoor setting. Finally, to assess robustness under realistic conditions, we conducted driving experiments at vehicle speeds up to 90 km/h. The results demonstrate that VLC-based CP is feasible and could serve as a promising solution to spectrum scarcity in the 5.9 GHz band for future V2X communications.
This article introduces the "chart & chirp" method of one-shot, in situ VCO tuning curve estimation, learning, and predistortion, which provides an alternative to prevailing LMS-based background calibration loops. The proposed approach utilizes a cycle-counting FDC to estimate the VCO tuning curve and a linearly interpolating QDAC to actuate the predistortion. An LLSE-based learning process generates predistorted DAC control codes for any physically achievable chirp parameters. In simulation, 50 and 51 kHz of RMS FM error is achieved for 5 and 20 us chirps, respectively. The locations of nonlinearity-induced spurs in the IF spectrum are predicted via analysis of the DFT of FM error in generated chirps, with higher frequency DAC updates weakening these spurs and moving them out of the IF band of interest. Simulation in a coherent, monostatic radar model reveals that deterministic phase error from the chirp nonlinearity arises at a level above -70 dBc/Hz at 1 MHz offset, dominating random phase noise and setting the IF SNDR for R < 23 m and R = 92 m around 40 dB and 24 dB, respectively.
Many chemical engineering systems are governed by mechanisms that switch across operating regimes, making the data-driven discovery of regime-dependent governing equations essential for predictive modeling, optimization, and control. We propose symbolic decision trees for the data-driven discovery of regime-dependent governing equations. The method simultaneously learns interpretable splitting conditions to partition the input domain and local governing equations that describe each regime. To improve tractability, both the splitting conditions and governing equations are parametrized using basis functions, resulting in a mixed-integer optimization learning problem. We use the proposed approach to learn hybrid dynamical models and a constitutive equation for the zero-shear viscosity of polymer melts. Symbolic decision trees identify physically interpretable regimes and local governing equations while improving predictive accuracy relative to approaches that learn a single global model or use existing decision tree models. This framework provides an interpretable and generalizable route for discovering regime-dependent models in chemical engineering systems.
Low-power wireless-capable systems-on-chips (SoCs) are critical for researching many of our current environmental issues. The scale at which these devices are needed for many applications necessitates innovation in their design to reduce the various capital and labor costs involved with operating an extensive sensor network. This can be difficult for devices with novel wireless architectures, as many emerging architectures lack commercially available development platforms. This makes pre-silicon validation challenging, and the impact of a failed tapeout is unacceptable when the cost is of primary concern for these devices. In this work, we propose a digital twin ecosystem for Bluetooth Low-Energy (BLE) with physical-layer (PHY) control intended for novel device development and demonstrated through use with crystal-free single-chip sensor motes. We present this system operating with multiple RF front ends and digital baseband implementations, including a commercially available Software Defined Radio (SDR) with synthesized RTL and embedded firmware, along with an existing crystal-free SoC front end and FPGA digital baseband. These configurations are shown to be capable of communicating sensor data with commercially available BLE devices and achieving receiver sensitivities up to -82 dBm, exceeding the minimum BLE specification. This approach is extendable to other hardware and communication protocols and promises to enable inexpensive, reusable validation and verification tools for novel wireless devices.
The global push for electric vehicles (EVs) has sharply increased demand for critical minerals such as cobalt and lithium, creating a tension between rapid industrial growth and long-term sustainability. Extraction is concentrated in a few regions -- notably the Democratic Republic of Congo (DRC), Chile, and Argentina -- where it has produced serious socio-environmental harms, including ecosystem degradation, labour exploitation, and the displacement of Indigenous communities. In the DRC, cobalt mining is frequently linked to child labour and hazardous working conditions; in Chile, lithium extraction intensifies water scarcity and threatens local agriculture and biodiversity. Policy instruments such as the U.S. Inflation Reduction Act (IRA) seek to promote ethical sourcing, but an extraction-driven model continues to deepen global inequalities. This chapter examines the contested temporalities of the transition, in which the short-term economic incentives of extraction conflict with longer-term environmental and social goals. It argues for a place-based framework built on community-centred governance, sustainable mining practices, and circular-economy strategies, including recycling and material substitution, to align resource security with equity and ensure that the shift to EVs does not reproduce the injustices it aims to address.
Digital audio broadcasting plus (DAB+) is an attractive illuminator for passive radar because it provides persistent, high-power, and geographically widespread very high frequency (VHF) orthogonal frequency-division multiplexing (OFDM) signals. A channel state information (CSI) sensing approach can convert a single received DAB+ stream into a CSI sequence for radar sensing, avoiding the need for a separately received reference signal in conventional passive radars. However, CSI estimation in DAB+ is challenging due to the differentially encoded communication symbols across time. A wrong symbol transition estimation leads to a persistent multiplicative error in the sequential CSI sequence within a DAB+ frame. This paper formulates single-stream DAB+ passive radar as a posterior-probability-aware differential CSI tracking problem. The proposed method uses the previously tracked CSI as a channel prior, performs prediction-aided maximum a posteriori detection of current symbol, converts posterior transition reliability into observation uncertainty, and applies linear minimum mean squared error fusion to obtain a stable tracking CSI. A reliability-informed CSI fusion strategy is also introduced to preserve weak target information. Theoretical analysis is provided, showing guaranteed performance again in symbol and CSI estimation. Simulation results show that the proposed method can reduce CSI estimation error by over 15~dB compared with prior art. It also improves median target-to-background ratio by more than 11~dB in random fading scenes. Experiments in Sydney, Australia demonstrate improved range-Doppler maps for commercial aircraft sensing.
We present a finite-time framework for identifying stable and unstable linear time-invariant (LTI) systems from a single closed-loop input-output trajectory. The method does not require knowledge of the stabilizing controller, an intermediate observer, or prior separation of the plant into stable and unstable components. The approach uses a non-causal finite impulse response (FIR) model obtained from a Laurent expansion of the transfer function. In this representation, stable dynamics are captured by causal Markov parameters, while unstable dynamics are captured by non-causal coefficients associated with reverse-time stable evolution. This avoids the growth of causal unstable Markov parameters. A key advantage is that the coefficients multiplying both the input and the process noise remain controlled by stable and reverse-time stable decay rates, rather than by growing forward-time unstable dynamics. To handle closed-loop data, we use the injected excitation as an instrumental variable, which removes the bias caused by correlation between the feedback input and the process noise. Under explicit instrument-strength and closed-loop concentration conditions, we derive a non-asymptotic error bound for the estimated Laurent/FIR Markov parameters with the usual $\mathcal{O}(N^{-1/2})$ statistical rate, up to logarithmic factors and truncation terms. The bound captures the effects of process noise, measurement noise, FIR horizons, closed-loop state moments, and controller-dependent instrument conditioning. Numerical experiments support the finite-time analysis by showing the predicted Markov-parameter convergence rate and illustrating how controller-dependent instrument conditioning affects the sample complexity of closed-loop identification.
In this paper, we consider a mixed ensemble containing a mixture of cesium-type and hydrogen maser-type atomic clocks. For the mixed ensemble, the conventional Kalman filtering algorithm has certain limitations due to divergence of the error covariance matrix. To overcome these limitations, we obtain a Kalman filtering algorithm based on observable canonical decomposition that does not have any diverging terms. We use the estimates from the transformed Kalman filter to propose a time scale generation algorithm called explicit ensemble mean synchronization algorithm for the mixed ensemble. In this algorithm, we synchronize the time deviation of each clock from the ideal clock behavior to the unobservable ensemble mean of the phases where the weighting can be decided by the user. By regulating the free-running dynamics associated with the unobservable state, through choosing an appropriate weight vector, the frequency stability of the generated time scale or the synchronized time shared by the clocks is optimized over shorter (resp. longer) intervals, as measured by Hadamard variance. An illustrative example is given to demonstrate the efficiency of our algorithm.
This paper investigates satellite navigation and communication systems in both low-Earth-orbit (LEO) and medium-Earth-orbit (MEO) satellites, which systematically outlines the fundamental principles of satellite navigation systems (SNS), satellite communication systems (SCS), and integrated navigation and communication (INAC) systems. By exploring the enhanced capabilities of satellite systems, the article emphasizes how INAC systems improve overall functionality by enabling efficient signal multiplexing and multiple access, positioning multi-functional satellites as promising alternatives to traditional architectures. Moreover, it introduces emerging frontiers for LEO-based SNS and MEO-based SCS through the integration of advanced sixth-generation (6G) wireless technologies, which cannot be realized through mere extensions of existing communication or navigation techniques. Motivated by these insights, the article further discusses various conceptual transitions required to unlock the full potential of INAC systems, with particular focus on channel capacity, positioning accuracy, and artificial intelligence-enabled waveform design.
Data streams in real-world industrial scenarios often contain transitional operating conditions that are uncovered during offline training, leading to significant distribution shifts. To bridge the gap between static offline models and dynamic online data, a novel asymmetric adaptation-based fault diagnosis method is proposed in this paper. Specifically, in the offline stage, we employ domain generalization techniques to extract domain-invariant features from multiple stable conditions and construct robust normalized fault prototypes as reference anchors. Subsequently, during online inference, we design an online test-time adaptation method based on a periodic prototype re-projection mechanism to dynamically update prototype positions. Furthermore, we utilize the geometric distribution derived from anchors to guide the updates of classifiers and adopt an asymmetric learning rate strategy for the feature extractor and classifier. The proposed approach ensures rapid adaptation to new transitional conditions while preserving the discriminative power inherited from the offline domain generalization initialization. Experimental results demonstrate that this mechanism effectively leverages offline generalized knowledge to guide online inference, significantly improving robustness in non-stationary environments.
Urban heat islands (UHI) are formed due to complex interactions between various factors. UHI, its contributing factors, and their interaction vary over time and location. Accordingly, understanding the causal relation between UHI and its contributing factors is essential to minimizing its adverse effects on the environment and human health. Here, we proposed a statistical method based on Hotelling's T-square test to analyze this association. The proposed test estimates the UHI trends across different urban districts and compares the UHI contributing factors between the districts with increasing and non-increasing UHI trends. This comparison, if significantly different, can be interpreted as evidence of a causal association between the factor and UHI. This research used the proposed test to analyze the UHI and its contributing factors across 22 municipal districts of Tehran between 2003 and 2021. We examined the time series of weather conditions (measured by precipitation, NDSI, and NDWI), vegetation cover (measured by NDVI and EVI), and urban density (measured by NDBI) as factors contributing to the UHI, which was measured through nighttime LST. The results showed that all districts in Tehran exhibited stable or increasing trends in LST, leading to UHI effects. The proposed test indicated that the temporal changes in NDWI and NDBI did not have a causal relationship with UHIs. Meanwhile, variations in other factors were identified as contributing to the intensification of UHIs.
This paper addresses the problem of providing runtime assurance for systems operating online under unknown and potentially time-varying data distributions. We propose Cost-Aware Adaptive Conformal Inference (ACI), a novel framework that incorporates constraint violation costs directly into the conformal adaptation mechanism. Our key insight is that uncertainty margins should adapt not only to the frequency of constraint violations but also to their severity. We formalize this through a cost-aware loss function that couples the miscoverage indicator with violation costs. Unlike existing methods that regulate a single controlled metric, our approach provides a dual statistical guarantee: simultaneously bounding the long-run average violation frequencies (reliability) and cumulative violation cost (harm). By weighting prediction failures according to their severity, the algorithm enables the controller to respond proportionally to violation severity, expanding prediction sets aggressively when necessary while maintaining efficiency during nominal operation. We integrate Cost-Aware ACI into a robust control synthesis framework, creating a closed-loop system that balances task performance with runtime risk control without requiring explicit model knowledge. Experiments validate its effectiveness for online risk-aware controller synthesis.
Open-source energy system models disaggregate zonal electricity demand to substations through Voronoi-based preprocessing pipelines that combine socioeconomic weighting with auxiliary spatial corrections. Whether the same auxiliary data helps or harms when the weighting component shifts from rule-based to learned has not been investigated. We fix Voronoi partitioning and cross two design axes on metered demand from 1,891 British primary substations: the demand-weighting method and the mechanism through which Nighttime Light (NTL) intensity and substation-proximity signals enter the allocation, giving 15 configurations. Mechanism-isolation experiments further test additive post-correction and random-noise controls to pinpoint the structural cause of any performance reversal. The same auxiliary data reduces RMSE by 41 % on the static base but increases it by 21 % on the GNN base under multiplicative post-correction (p < 0.001 for both); the best static pipeline outperforms the best GNN variant by 19 %. Post-correction on the GNN improves rank-order correlation (p < 0.001) yet worsens absolute error, so correlation-only evaluation masks the calibration penalty. The isolation experiments trace this reversal to the multiplicative correction form under demand conservation constraints, not to signal redundancy; switching to additive post-correction eliminates the antagonism entirely. A transfer check on 13 German primary substations confirms directional replication and shows amplified antagonism where the GNN baseline already explains over 95 % of demand variance. The NTL and proximity signals behind the 41 % static improvement are publicly available at no cost and should be adopted as default corrections in static pipelines; method evaluation should report RMSE and correlation jointly, as the two metrics diverge under post-correction on learned representations.
The deployment of learning-based models in safety-critical control systems demands mathematical guarantees that standard regression architectures cannot provide. This paper presents an integrated framework that bridges Neural Ordinary Differential Equations (Neural ODEs), measurement-induced geometric structures, and Koopman operator theory, with the explicit aim of producing data-driven models whose stability certificates are computable, not merely conjectured. Three complementary components are developed and analyzed. First, ControlSynth Neural ODEs enforce global convergence through tractable linear matrix inequalities (LMIs), enabling complex nonlinear dynamics to be captured without sacrificing boundedness guarantees. Second, the ICODE formulation incorporates extrinsic environmental inputs into the learned vector field, while measurement-induced bundle structures confine state trajectories to physically admissible manifolds. Third, a systematic ISS verification pipeline certifies the input-to-state stability of Koopman-identified models via a convex $L_2$-gain LMI, converting an otherwise intractable robustness question into a solvable semidefinite program. The certified model is embedded in an ICODE-MPPI controller, which uses continuous-time residual learning inside a stochastic sampling loop to deliver robust path tracking under parametric uncertainty and persistent disturbances. Numerical experiments on a vehicle path-tracking benchmark and a nonlinear mechanical oscillator demonstrate up to a 61\% reduction in tracking RMSE and a 54\% reduction in state estimation error relative to uncertified baselines, with near-zero LMI violation rates across all evaluated disturbance levels.
This study evaluates remote Photopletismography (rPPG) algorithms, Spatial Subspace Rotation (2SR), Chrominance-based method (CHROM), Plane-Orthogonal-to-Skin (POS), and Principal Component Analysis (PCA), applied to selected superpixel-based facial regions (with target counts of 10 and 20 regions) for monitoring in a driving simulator. Two novel peak enhancement approaches, based on the Lp norm and Fractional-Order Derivative (FOD), are introduced to enable robust Heart Rate Variability (HRV) estimation. A signal-to-noise ratio-based quality assessment of 20 s segments serves as a data cleaning mechanism to mitigate motion artifacts inherent to dynamic recording conditions. In a sample of 29 participants recorded during baseline and driving simulation conditions, Pulse Rate (PR) is calculated with clinically acceptable accuracy across configurations (validated against simultaneous Electrocardiography (ECG) recordings), achieving the lowest Mean Absolute Error (MAE) of 1.92 bpm (sd = 1.72) using 2SR with FOD and 20 superpixel regions. The best-case MAE reached 0.061 s for Standard Deviation of Normal-to-Normal intervals (SDNN) and 0.081 s for Root Mean Square of Successive Differences (RMSSD), with inter-beat interval detection yielding an F1 score of 0.93. Optimal parameters clustered around p = 6-7 for Lp norm and fractional orders of 1.0-1.4. All rPPG-derived parameters reproduced the statistical structure of the reference ECG across conditions and configurations. Caution is advised when using FOD due to slow changes in the rPPG waveform. Overall, 2SR is recommended for PR, while CHROM for HRV estimation, using Lp norm with 20 superpixels, providing clear methodological guidance for rPPG monitoring in driving simulators
Recent advances in zero-shot text-to-speech (TTS) have enabled accurate imitation of reference speech in terms of both speaking style and speaker timbre. However, achieving disentangled control over these aspects from separate references remains a challenging task. Several studies have proposed disentangled speech representations that decompose speech into interpretable attributes (e.g., timbre, prosody, and content), providing a promising foundation for TTS with attribute control from separate references. Yet, how to effectively integrate such representations into TTS systems to achieve independent and precise control remains underexplored. In this paper, we present FC-TTS, a zero-shot TTS framework that enables disentangled control of style and timbre by conditioning on two distinct reference utterances. Unlike existing systems that inherit limitations from those pre-trained disentangled representations, FC-TTS introduces key design strategies, including architectural choices, training framework, and auxiliary training objectives, which improve the reliability of attribute separation and dual-reference control. Experiments show that FC-TTS achieves high-fidelity synthesis and competitive zero-shot naturalness, while uniquely supporting consistent and independent manipulation of style and timbre. Audio samples are available at this https URL
Machine learning has emerged as a promising approach to path loss prediction, yet its effectiveness often degrades when measurement data are scarce. To address this limitation, we propose an ensemble-based machine learning framework that integrates real measurements with synthetic data generated using a lidar-based simulator. The simulator provides broad spatial coverage through static path loss values that capture terrain variations and physical obstacles in the propagation environment. A dynamically weighted ensemble then combines simulation results with measured data, balancing the contribution of both data sources and improving generalization across diverse environments. To further mitigate the effects of limited measurements, we incorporate the Synthetic Minority Over-sampling Technique (SMOTE), a data augmentation technique that synthesizes additional samples through interpolation between measurements while preserving their statistical properties. By leveraging simulation data, SMOTE, and engineered propagation features, the proposed framework captures geographical and physical variability, enabling adaptability across urban, suburban, residential, industrial, and rural environments. Experimental results demonstrate that the proposed method achieves up to a 50% reduction in mean absolute error (MAE), compared with models trained solely on real data, and up to a 25% improvement relative to models trained exclusively on synthetic data, particularly for cross-environment generalization. These findings highlight the effectiveness of combining simulation-based synthetic data with SMOTE to overcome data scarcity and enhance the model's generalization ability. Overall, the proposed framework provides a robust and practical solution for path loss prediction across diverse environments with limited measurement data, supporting cost-effective planning and optimization of wireless networks.
Dynamic models of power systems are critical for analyzing grid response to disturbances and blackouts, but the release of real-world dynamic models is hindered by privacy and cybersecurity concerns, as such models carry sensitive information about transmission, generation, and load parameters. We develop an algorithm for synthesizing dynamic grid models from real-world power grids balancing two objectives: the privacy of the source grid, quantitatively measured using the notion of differential privacy, and the fidelity of the synthesized model. The algorithm applies privacy-preserving noise to obfuscate the original grid parameters, but then optimizes the perturbed parameters to ensure that the resulting model dynamics are statistically consistent with those observed in the source grid. Application to the frequency dynamics of the IEEE 30-bus system reveals the inherent privacy-fidelity trade-off: stricter privacy requirements degrade modeling fidelity, yet optimization significantly improves the quality of the synthesized models.
This paper presents a novel passivity-based semi-autonomous attitude control framework, with a particular focus on attitude kinematics defined on the special orthogonal group $SO(3)$. While human-robot interaction facilitates the successful execution of complex tasks, ensuring stability of human-in-the-loop systems on the $SO(3)$ manifold remains a largely unsolved challenge. We first propose a new control architecture in which a multi-robot system preserves invariance of the average information fed back to the human operator through so-called stealthy control, and the human intervention is mediated through a virtual leader, which is coupled with the robots via a passivity-based attitude synchronization law. We then rigorously prove closed-loop stability of the proposed human-in-the-loop system under the assumption that the human behaves as a passive system. To support this analysis, simulation studies are conducted to identify the human operator as a dynamical system, and to examine passivity properties of the identified model.
Critical retained foreign objects (RFOs) on intraoperative chest radiographs are rare but high-risk events. Their scarcity limits robust automated detection model training and generalization. We introduce SurgRFO, a two-stage synthesis framework for generating realistic RFO-present intraoperative chest X-rays. In Stage 1, a Roentgen chest X-ray foundation model is fine-tuned on surgical-domain images to generate realistic RFO-free backgrounds that preserve anatomy, indwelling lines and tubes, and intraoperative imaging characteristics. In Stage 2, a lightweight generator trained on localized RFO patches from limited positive cases synthesizes diverse RFO instances, which are composited onto generated backgrounds using conditional Poisson fusion to improve photometric consistency. We evaluate SurgRFO through (i) a blinded clinician study assessing realism and clinical plausibility, and (ii) downstream detection experiments in which synthesized data are used to augment Faster R-CNN, YOLOv8, and RetinaNet. SurgRFO consistently improves sensitivity at low false-positive-per-image (FPPI) operating points on internal and external test sets. Clinician ratings indicate that the synthesized images achieve realism comparable to real intraoperative images. Ablation analyses further examine fusion strategies and synthesis scale. Ethical safeguards for synthetic surgical data are also discussed.
In dynamic acoustic environments with time-varying interferers, effective beamforming requires identifying stationary regions over time. The Capon beamformer, a whitened matched filter constrained to maintain unity gain in the desired direction, theoretically relies on the instantaneous ensemble covariance matrix. Practical implementations rely on the batch Capon (or Sample Matrix Inversion), which estimates the sample covariance matrix (SCM) by averaging over a block of snapshots. This practical approach implicitly assumes that the data within the batch window is stationary and can be coherently combined. In non-stationary settings, a batch approach that averages over fixed or excessively long windows fails, as moving interferers smear the SCM and degrade the beamformer's nulling capabilities. To address this, this paper introduces a temporally segmented distortionless response beamformer. Inspired by the segmented least squares method, which fits piecewise polynomials to data while penalizing excessive segmentation to prevent overfitting, the framework extends practical Capon beamforming by incorporating data-driven temporal segmentation. This formulation minimizes output power while dynamically adapting the SCM estimation windows to local stationarity, offering a principled approach to tracking time-varying interferers.
Advanced driver assistance systems (ADAS) play an important role in modern automotive intelligence, significantly enhancing vehicle safety and stability. The performance of ADAS critically relies on accurate and reliable vehicle state estimation, particularly from vehicle dynamic sensors. Among these signals, wheel load is a key variable for chassis control and safety-critical functions, yet it remains difficult to estimate robustly due to complex suspension geometry, nonlinear dynamics, and measurement noise. To address this issue, we propose DBPnet, a Bayesian physics-informed neural network (PINN) with a physics-aware embedding module inspired by damper characteristics. First, this paper presents a suspension linkage-level modeling (SLLM) approach that constructs a nonlinear instantaneous dynamic model by explicitly considering the complex geometric structure of the suspension. Building upon SLLM, Bayesian inference is integrated into the PINN to effectively cope with noise and uncertainty in the vehicle chassis system, thereby improving the model's robustness. Then, a physics-informed loss function is employed to ensure consistency with fundamental physical principles, while the damper characteristics-inspired embedding module extracts temporal variation features of input signals and incorporates them into each layer of the PINN, ensuring that physical observations guide the neural network without being constrained by fixed physical models. Extensive evaluations on high-fidelity simulations and real-world experiments demonstrate that our DBPnet consistently achieves lower RMSE and MaxError than baseline methods. These results highlight the potential of our DBPnet to advance wheel load estimation and contribute to the development of more reliable ADAS actuator functions.
Speech and audio systems operate in inherently non-stationary environments, yet continual learning (CL) research in this domain, especially in the foundation model era, remains fragmented that fail to account for the coupled, geometry-sensitive nature of acoustic representations. Modern speech foundation models operate over highly entangled, continuous representations that jointly encode linguistic, speaker, and paralinguistic factors within a shared latent space. CL is therefore fundamentally about preserving and evolving shared representation structure rather than retaining isolated task knowledge. In this work, we revisit CL for speech from a representation-centered perspective, and introduce a new taxonomy that organizes CL according to how underlying representation geometry evolves under non-stationary acoustic conditions. We further identify key mismatches between current CL assumptions and speech foundation model behavior, and finally outline a set of open challenges and future research directions.
Communication-aware trajectory generation for unmanned aerial vehicles (UAVs) operating in urban environments requires simultaneous consideration of vehicle dynamics, wireless communication quality, obstacle avoidance, and onboard energy limitations. In such missions, UAVs must navigate through obstacle-rich environments while ensuring reliable relay of mission-critical sensory information to ground infrastructure. This results in a highly nonlinear and nonconvex optimal control problem involving coupled communication and flight-dynamics constraints. This paper presents a communication-constrained energy-optimal trajectory generation framework for quadrotor UAVs operating in urban environments. The proposed formulation incorporates full rigid-body quadrotor dynamics, urban wireless communication models, cumulative data throughput constraints, and obstacle avoidance requirements within a unified free-final-time optimal control framework. Unlike conventional approaches based on simplified kinematic or point-mass models, the proposed framework generates dynamically feasible trajectories suitable for practical aerial platforms. The resulting nonconvex optimal control problem is solved iteratively using sequential convex programming (SCP). Numerical simulations for multiple urban mission scenarios demonstrate the ability of the proposed framework to generate energy-efficient and communication-aware trajectories while adapting mission duration according to data relay requirements. The proposed methodology provides a practical framework for autonomous UAV operations requiring reliable communication in dense urban environments.
Distributed vertical power delivery (DVPD) architectures employ multiple parallel voltage regulators (VRs) to meet the high-power and high current density demands of modern high performance computing (HPC) systems. While full parallel activation maximizes efficiency near peak load, medium to light load operation leads to efficiency degradation when all VRs remain active due to persistent switching and gate drive losses. This work proposes a load aware power system activation framework targeted at the medium to light load regime, in which the number of active VRs scales proportionally with instantaneous load power. A spatially informed selection strategy determines which VRs are activated from the available pool, aligning regulator placement with localized power demand. This locality aware activation minimizes lateral redistribution currents within the power plane and reduces conduction losses and voltage drops. Simulation results on a representative DVPD system demonstrate 2x to 3x switching loss reduction relative to conventional full-parallel light load operation, while sustaining an approximately 87% efficiency plateau across the 5% to 30% load range. Output ripple constraints are preserved, with inductor current ripple maintained within 6% and output voltage ripple within 2%, ensuring regulation integrity while improving overall conversion efficiency.
Decarbonizing aviation remains challenging because energy-dense jet fuels dominate beyond short-range operations, while batteries impose severe range and payload penalties. Here we evaluate a new infrastructure pathway in which utility-scale solar farms equipped with solar phased arrays wirelessly beam microwave power to hybrid-electric aircraft during cruise. Integrating 143,152 U.S. flight trajectories, 5,712 solar farms and wireless power transfer models, we quantify the spatial, temporal, and operational potential of this concept at continental scale. We find that benefits are highly concentrated in solar-rich, traffic-dense states and are dominated by short- and medium-range flights, accounting for nearly all delivered energy and cost savings. Schedule optimization and higher cruise altitudes further increase value by improving alignment between aircraft demand and beaming availability. Market penetration analysis reveals non-linear scaling between solar farm and flight adoption. These results show that wireless power beaming is best understood as a corridor-specific strategy complementing other aviation decarbonization pathways.
Distributed vertical power delivery has emerged as a promising approach to meet aggressive current-density, efficiency, and transient response requirements in high-performance computing systems. Tight integration of voltage regulators within stacked substrates, however, increases the vulnerability of the power delivery system to short-circuit and open-circuit faults arising from elevated thermal and mechanical stresses. Such faults can propagate through the shared power delivery network, leading to rapid degradation of system-wide efficiency at worst-case rates of up to 0.5% per microsecond. Advanced fault-tolerant power management strategies are therefore required to ensure efficient power delivery. A real-time fault-detection and isolation methodology are proposed in this paper for vertical power delivery systems. The methodology is developed based on an analytical inductor-current models that rely solely on signals available within the converter control circuitry, thereby eliminating additional sensing overhead. The proposed framework is designed and simulated in SPICE environment, demonstrating sub-microsecond fault detection and effective dual-fuse isolation, maintaining uninterrupted power delivery with a system-wide efficiency degradation of less than 2%.
Fast-charging of lithium-ion batteries is essential for electric vehicle adoption, but aggressive charging can accelerate its degradation and create safety risks. This study investigates a control framework that coordinates charging current with active thermal management to minimise charging time, while respecting constraints on electrochemical degradation and thermal safety. A single particle model with electrolyte dynamics (SPMe), extended with a two-node thermal model, represents the battery dynamics and enables the prediction of internal states - used in the control strategy - including anode potential, core temperature, and cell voltage. Two multi-input multi-output control strategies are developed and compared: a classical approach using parallel proportional-integral-derivative (PID) controllers and an advanced model predictive control (MPC) with dual resolution prediction. Both controllers regulate the charging current and thermal resistance to minimise charging time while keeping within the limits of anode potential, core temperature, and cell voltage. The results demonstrate that coordinated thermal-electrochemical optimal control outperforms conventional approaches, achieving a 42.2% reduction in charging time compared to the manufacturer's charging recommendation, without increasing degradation. MPC reduces the charging time by 5.2% compared to PID control, but at a significant computational cost. This improvement demonstrates the untapped potential of integrated thermal management in fast-charging protocols.
Retinal imaging provides a non-invasive window into systemic microvascular health and has emerged as a potential biomarker for systemic diseases. However, whether retinal features encode biologically meaningful systemic signals that can be reliably interpreted using explainable artificial intelligence (XAI) remains unclear. An explainable multi-task deep learning framework was developed to investigate associations between retinal microvascular features and systemic abnormalities in Type 2 Diabetes Mellitus. A total of 11,011 fundus images from 2,719 individuals were analysed using a shared neural network with task-specific heads for glycaemic status, kidney abnormality, and multi-system involvement. Model interpretability was evaluated using Gradient-weighted Class Activation Mapping (Grad-CAM), anatomical masking, and vessel alignment analysis. The framework demonstrated task-dependent predictive performance, with the best discrimination observed for kidney abnormality (AUC up to 0.63), whereas glycaemic status prediction showed limited performance (AUC = 0.49-0.61). Explainability analyses consistently localized model attention to retinal vessels and peripapillary regions. Masking experiments showed that occlusion of vascular regions caused the greatest performance decline, indicating that retinal vessels were the primary predictive source. Different architectures exhibited heterogeneous attention patterns, suggesting multiple representational pathways for systemic signal encoding. This pilot study demonstrates that retinal microvascular features contain measurable signals associated with systemic abnormalities, particularly microvascular damage. By integrating multi-task learning with quantitative XAI validation, this framework advances retinal imaging toward interpretable digital biomarkers for systemic risk stratification in diabetes.
We propose a lattice-theoretic framework for modulo sampling of multidimensional bandlimited signals. Standard modulo analog-to-digital converters (ADCs) fold the signal component-wise into a square domain, reducing the recovery problem to independent one-dimensional problems. We extend the recovery guarantees to any lattice, requiring the same sampling rate as in the standard component-wise modulo setting. We also extend existing recovery algorithms to the general highdimensional lattice setting. Selecting a lattice with a smaller normalized second moment reduces the reconstruction mean squared error (MSE) through two complementary mechanisms: it lowers the folded signal power, which reduces the absolute noise energy at a fixed signal-to-noise ratio (SNR), and it reduces the quantization error when a matched lattice quantizer is applied. Higher-dimensional lattices offer better second moment compared to the hypercube lattice, with gains that grow substantially with dimension. Instantiating the framework in two dimensions with the hexagonal lattice reduces the MSE relative to the square at the same inradius by 16.7%. Furthermore, simulations on 8-dimensional signals using the E8 lattice to achive 57% in both additive and quantization noise. A topological interpretation connects each folding geometry to a surface whose genus reflects the lattice complexity, and reveals a natural hardware implementation via comparator circuits.
Body area networks (BANs) require lightweight session key establishment, yet public key exchange imposes computation and energy costs that exceed the budgets of deeply constrained wearable nodes. This brief presents HHK, a hardware-oriented cross-location photoplethysmography (PPG) key generation architecture for BANs. The proposed datapath extracts inter-beat intervals (IBIs) from green-light PPG at multiple body sites, aligns beat timestamps across locations, applies Gray-coded equal-frequency quantization, and employs a rate-1/3 polar code fuzzy commitment (N=128, K=42) to reconcile residual timing mismatches. Post-implementation synthesis on a Xilinx XC7Z020 maps the complete datapath to 18,760 lookup tables and 20,971 flip-flops with no multipliers or embedded memories, giving 48 uJ per key generation event and 0.4 uW average power over a 120-second acquisition window. Validation across 16 participants from a real ambulatory dataset (216 hours; head, wrist, and ankle) yields 86.0-90.1% raw IBI bit agreement and 51.3-69.8% key agreement. To the best of our knowledge, HHK is the first synthesizable register-transfer level (RTL) key generation architecture for BANs validated across multiple body locations on ambulatory data.
This paper develops a unifying analytical framework for comparing deployment and duplexing paradigms in distributed cell-free massive multiple-input multiple-output (CF-mMIMO) integrated sensing and communication (ISAC) systems. The system comprises distributed access points (APs) serving multiple downlink and uplink users while simultaneously detecting radar targets. Four configurations are analysed - separated and shared AP deployment under half-duplex (HD) and full-duplex (FD) operation, each incorporating realistic impairments: residual self-interference (SI) from transmit-receive leakage, imperfect interference cancellation due to channel estimation errors, and clutter. Kullback-Leibler divergence (KLD) is applied to serve as a unified measure, enabling direct comparison of communication and radar performance on a common scale. A generalised likelihood ratio test (GLRT) framework is developed to produce closed-form expressions linking KLD to detection probability. Monte Carlo simulations are used to verify our expressions, which demonstrate that FD operation achieves substantial gains over HD, provided sufficient SI suppression and IC quality are maintained, while preserving strong radar detection. It is also shown that shared deployment enhances radar performance via a larger effective aperture but exhibits tighter communication-radar coupling than separated deployment. These results establish deployment guidelines and quantitative design thresholds for next-generation CF-mMIMO ISAC systems.
While highway automation is advancing rapidly, road operators still lack practical methods to assess the readiness of their infrastructures for supporting automated driving systems. This work proposes a quantitative Highway Readiness Index (HRI) that maps static Operational Design Domain (ODD) infrastructure conditions into measurable attributes and weights them through an expert survey to evaluate readiness across Society of Automotive Engineers (SAE) automation levels. A real corridor case study shows how HRI scores can be computed, interpreted, and used to identify infrastructure gaps that limit higher automation. Finally, we outline how these indicators can be integrated into a standardized Cooperative Intelligent Transport System (C-ITS) message, i.e., Infrastructure-to-Vehicle Information Message (IVIM), to communicate segment-level automation guidance to connected vehicles.
Automatic modulation classification (AMC) is an essential technique for noncooperative spectrum monitoring and intelligent wireless receivers. However, practical AMC models must identify modulation formats from short and noisy I/Q observations while maintaining low computational and storage overhead. Existing deep-learning approaches often improve recognition accuracy by expanding generic neural backbones, which increases deployment cost and weakens their suitability for resource-constrained receivers. To bridge the gap between recognition performance and model efficiency, this letter proposes a Complex Subband Phase-Motion Network, designated as CSPMNet, for lightweight AMC from raw I/Q samples. Specifically, learnable complex subband filters are introduced to adaptively extract frequency-selective baseband responses while preserving the algebraic coupling between in-phase and quadrature components. Then, an amplitude-preserving phase-motion module captures multi-lag temporal rotation dynamics within each subband, and a lightweight temporal classifier performs efficient sequence aggregation. Rigorous experimental evaluations on public RadioML benchmark datasets demonstrate that CSPMNet achieves highly competitive recognition accuracy while requiring substantially lower model complexity than many existing AMC models.
Thin-layer photobioreactors (TLRs) exhibit fast hydrodynamic and thermal dynamics, strong nonlinear photosynthetic responses and significant time-variability due to irradiance fluctuations and biomass growth. These characteristics challenge conventional model-based control strategies, whose tuning degrades under rapidly changing operating conditions. This work presents the experimental implementation of a model-free control approach, Extremum Seeking Control (ESC), for performance optimization in a semi-industrial thin-layer photobioreactor. Unlike previous studies in raceway ponds, the reduced hydraulic inertia of TLR systems enables the adaptation of this control strategy to accelerate convergence while preserving gradient estimation accuracy. The proposed approach is experimentally compared against classical on-off control and ESC configurations with and without feedforward compensation of solar irradiance. Beyond control performance metrics, biological indicators such as biomass concentration and productivity are evaluated to assess the impact on process efficiency. Results show that the proposed ESC strategy reduced cumulative CO$_2$ consumption by approximately 39 % and decreased the accumulated pH tracking error by more than 60 % compared with conventional on-off control, while biomass- and irradiance-normalised indicators confirmed a more efficient use of injected carbon. These results demonstrate that high-frequency ESC can improve regulation performance and carbon utilisation efficiency in fast photobioreactor systems, highlighting its suitability for thin-layer cultivation under outdoor conditions.
This work presents the experimental validation of a turbidostat strategy for biomass control in a semi-industrial outdoor raceway reactor. The proposed approach regulates biomass concentration by automatically triggering dilution when the online biomass estimate exceeds a predefined threshold. To ensure safe outdoor operation, dilution was restricted to daylight periods, avoiding biomass removal under low-radiation conditions. The strategy was implemented through an industrial control architecture using an optical monitoring system for online biomass estimation. Experiments were conducted over 14 consecutive days in an 80 m$^2$ (12000 L) raceway reactor. A second parallel reactor operated in chemostat mode, with a nominal dilution of 20 % of the total volume during operating days, provided contextual information under the same outdoor conditions. The analysis focuses on the ability of the sensor-based strategy to configure and maintain the desired biomass concentration, rather than on a direct reactor-to-reactor performance ranking. During the campaign, the biomass threshold in the turbidostat reactor was changed from 1.0 to 0.8 g L$^{-1}$, demonstrating the flexibility enabled by online biomass monitoring. Excluding initial adjustment and transition days, harvested areal productivity increased from 9.52 to 23.20 g m$^{-2}$ d$^{-1}$ after reducing the operating threshold. The overall biomass balance also showed higher net areal productivity in the turbidostat reactor, reaching 20.34 g m$^{-2}$ d$^{-1}$ compared with 11.16 g m$^{-2}$ d$^{-1}$ in the parallel chemostat reactor. These results demonstrate the feasibility of robust turbidostat-based biomass control in large-scale outdoor raceway photobioreactors.
The sub-terahertz frequency band offers extremely large bandwidth and enables ultra-high data rates for future wireless applications. However, severe propagation loss and blockage significantly limit coverage at these frequencies. Reconfigurable intelligent surfaces can dynamically shape EM wave propagation and provide a promising solution for coverage enhancement. Realizing such surfaces using standard printed circuit board technology is attractive due to its low cost and scalability, but it remains challenging around 100 GHz because of fabrication limits, limited switch availability, large switch size compared with the unit cell, switch parasitic effects, and high control complexity. In this work, we demonstrate a wideband PCB-based reconfigurable intelligent surface operating around 100 GHz. The design combines an orthogonal-polarization slot-coupled patch structure with subarray partitioning to mitigate switch-induced parasitic effects, reduce the required number of RF switches, and simplify the control architecture. The reconfigurability is achieved using AlGaAs SP3T bare-die switches integrated through optimized bond-wire interconnections. For proof of concept, a six-subarray structure with 4 by 4 elements per subarray is designed for different beamforming angles, and a 12 by 8 prototype is fabricated and experimentally characterized. The measured results show a gain enhancement of about 10 dB from 86 to 100 GHz and about 5 dB from 100 to 106 GHz, while maintaining a low power consumption of 0.165 W. These results validate the feasibility of practical wideband PCB-based reconfigurable intelligent surfaces for sub-terahertz wireless systems.
This paper presents a backstepping approach for the boundary control of first-order hyperbolic equations with spatially varying coefficients posed on domains of arbitrary dimension. The method is based on a change of variables induced by the characteristic flow of the time-invariant transport operator, transforming the original multidimensional system into a continuum of decoupled one-dimensional hyperbolic equations evolving along individual characteristic curves. A backstepping controller is then designed for each equation in the decomposition, and the resulting control laws are reassembled in the original coordinates to achieve finite-time stabilization of the full system. The framework relies on the existence of characteristic curves foliating the spatial domain, with uniformly bounded transit times (non-trapping).
This letter studies distributed stochastic optimization over a peer-to-peer network when agents can query only zeroth-order function values. We propose ZOOM-PB, a coordinate-sampling distributed zeroth-order method equipped with a fractional-power powerball map. Unlike existing distributed zeroth-order methods that mainly refine gradient estimation or introduce primal--dual tracking, the proposed mechanism acts as a nonlinear feedback gain on the estimated gradient: it amplifies weak signals in flat regions and attenuates large stochastic estimates without adding transmitted states. Under standard smoothness, oracle-variance, and network-connectivity assumptions, ZOOM-PB achieves the leading nonconvex stationarity rate $\mathcal{O}(\sqrt{p/(nT)})$, where $p$ is the decision dimension, $n$ is the number of agents, and $T$ is the iteration horizon. Under the Polyak--Łojasiewicz condition, it further attains the leading objective residual rate $\mathcal{O}(p/(nT))$. Thus the method preserves the known distributed ZO order while changing the finite-time behavior through a local nonlinear control gain. Simulations on black-box learning and sensor-driven UAV source seeking show faster empirical convergence in weak-signal regimes.
This paper presents an efficient implementation of the extended object Poisson multi-Bernoulli (PMB) filter under the zero-inflated Poisson (ZIP) object measurement model using particle belief propagation (BP). The ZIP measurement model separates a Bernoulli object detection event from the conditional Poisson generation of object measurements, enabling principled handling of empty measurement sets. Building upon the PMB mixture posterior, we present a factorized joint posterior over set of objects with object detection variables and a dual representation of data association using both object-oriented and measurement-oriented association variables. Notably, this representation replaces the implicit high-order global hypothesis constraint by local consistency factors, yielding a factor graph amenable to BP. In addition, we present a particle-based implementation, in which the Poisson intensity for undetected objects is analytic, whereas the single object densities of Bernoulli components for the detected objects are represented using particles. Simulation results demonstrate that the proposed method has superior performance than existing sampling-based implementations of extended object PMB filter with ZIP model in terms of both estimation accuracy and runtime.
In UAV-to-UAV communication, airborne UAVs need to detect the location and direction of ultra-high-speed millimeter-wave (mmWave) and Terahertz (THz) coverage areas, referred to as ultra-spots. This predictive capability allows UAVs to optimally adjust their flight paths, altitude, and velocity, thereby maximizing the utilization of ultra-spot services. A space-time synchronization technique employing multiple Wireless Two-way Interferometry devices (multi-Wi-Wi) is proposed in this paper to detect mmWave/THz ultra-spot locations during UAV operations. This paper proposes an algorithm that estimates the likelihood of nearby ultra-spots by considering the UAV flight route and ultra-spot direction, and by sharing location and pose information among UAVs in the network via a 920 MHz wireless communication link. For the first time, this work addresses the problem of optimizing UAV flight routes to maximize ultra-spot utilization. To address the inherent challenges of Wi-Wi, such as phase data unreliability, RSSI attenuation, or packet loss caused by obstructions from the UAV's own body, this study proposes the use of multiple Wi-Wi devices equipped with antennas positioned at different positions around the arms of the UAV to leverage spatial diversity effects. The proposed method's effectiveness is confirmed through experimental data derived from real-world UAV-to-UAV communication tests. An error of 37.16 cm was observed experimentally in ultra-spot location estimation, corresponding to 186 ms error in temporal prediction of ultra-spot entry from an in-flight UAV, demonstrating its effectiveness in addressing ultra-spot detection challenges in mmWave communication.
Low-dose computed tomography (LDCT) reconstruction faces a critical tradeoff between reconstruction quality and resource requirements. While recent deep learning methods achieve state-of-the-art performance, they typically rely on over 500,000 parameters trained on large-scale datasets exceeding 35,000 scans. This work investigates whether graph-based regularization can provide meaningful noise reduction under strict resource constraints. We propose Deep Graph Laplacian Regularization (Deep GLR), integrating quadratic graph regularization into a Proximal Forward-Backward Splitting optimization framework with three lightweight CNN modules. Evaluated on the LoDoPaB-CT benchmark, Deep GLR achieves 30.70 dB PSNR, representing a 6.33 dB improvement over filtered backprojection, while using only 91,848 parameters trained on 1000 samples (2.8\% of standard training set). Compared to benchmark methods, this represents 5.8 times better parameter efficiency and 30 times better data efficiency per dB improvement. The learned graph bandwidth parameter ($\epsilon$=1.25) converges to interpretable values, suggesting the method captures meaningful image priors rather than overfitting. While a 13 dB gap remains versus state-of-the-art methods, results demonstrate that graph-based regularization provides a favorable efficiency-quality tradeoff for resource-constrained medical imaging scenarios.
The convergence of large language models (LLMs) with 6G networks is fostering a paradigm of autonomous multi-agent cooperation, which in turn is expected to substantially increase east-west traffic. Although latent-space interaction mechanisms can enable more efficient collaboration than symbolic natural-language (NL) exchanges, prior work often abstracts away the associated communication overhead under practical wireless constraints. In embodied multi-agent settings, heterogeneous interaction media incur disparate inference and transmission costs, thereby inducing an inherent end-to-end (E2E) latency trade-off. To address this, we propose a joint design that integrates communication-media selection with wireless resource allocation. Through analytical characterization and simulation-based evaluation, we show that neither token-based transmission nor key-value (KV) cache-based transmission is uniformly optimal across operating regimes, as performance depends critically on system parameters such as available computational resources and channel conditions. Accordingly, we formulate a joint optimization problem aimed at minimizing the E2E latency of multi-agent collaboration and develop a low-complexity joint media selection and resource allocation (JMSRA) algorithm. Numerical results further confirm that, by adaptively coordinating the interaction media and bandwidth allocation over heterogeneous links, the proposed scheme achieves markedly reduced E2E latency relative to conventional NL-only and KV-cache-only baselines, enabling efficient and robust multi-agent collaboration in future wireless networks.
We examine how aircraft seat configuration interacts with daily operation in Regional Air Mobility by applying a joint supply-demand optimization framework that simultaneously determines market share, fare, and flight schedule. The framework integrates a binary logit discrete choice model into a task assignment formulation, capturing passengers' mode choice between Regional Air Mobility and driving across spatiotemporal origin-destination pairs. We evaluate three U.S. college town corridors under 4-, 6-, and 8-seat configurations across cost scales from 0.4 to 1.0 and fleet sizes from 12 to 30 aircraft. Profitability and throughput serve as primary performance metrics, and we analyze pricing power, operating cost, and revenue to explain performance variation across markets. We find that larger aircraft configurations and fleet sizes do not improve profitability universally. Larger aircraft are preferred where economies of scale are favorable and demand is sufficient and directionally balanced. The best configuration in these case studies is the 4-seat in imbalanced markets and the 6-seat in balanced or dense markets.
This paper presents an innovative approach to enhancing machine learning based communication systems, specifically focusing on multiple-input multiple-output (MIMO) configurations using autoencoders. We optimize the transmitter, receiver, and channel simultaneously under conditions of noise and channel fading, aiming to minimize the bit error rate (BER). By incorporating the Rayleigh fading channel a widely recognized model for wireless channel impairments into the autoencoder framework, we directly train the communication system to handle real world conditions. We introduce a novel optimization process tailored for deep learning-based MIMO communication, and thoroughly analyze the resulting BER performance across various signal to noise ratio (SNR) levels. Our simulation results reveal that the proposed end-to-end wireless communication system achieves significantly lower BER compared to conventional block-based processing methods, highlighting its potential for more efficient and reliable wireless communication.
Passive multi-target tracking (MTT) aims to infer the kinematic states of multiple targets from noisy sensor data in which contributions from unknown target-emitted signals are superposed. Track-before-detect (TBD) methods improve robustness to noise by operating directly on raw sensor data without relying on a preceding detection stage. However, many existing TBD methods assume that each target's contribution to the sensor data is determined solely by its kinematic state. This assumption limits their applicability to passive MTT, where each target's contribution depends on both its kinematic state and the unknown emitted signal. We propose subspace TBD, a passive multi-target TBD method based on a likelihood derived from the complex Bingham distribution that does not require explicit modeling or estimation of the unknown emitted signals. In a particle filter (PF) framework, each multi-target hypothesis is mapped to a low-dimensional subspace spanned by the steering vectors corresponding to the hypothesized target states. The likelihood is then used to evaluate the alignment of the normalized multichannel sensor data with this subspace. Preliminary experiments with simulated acoustic measurements and a given target activity pattern show that the proposed method can track two moving targets emitting unknown signals at a signal-to-noise ratio (SNR) of -10dB, whereas a conventional TBD baseline yields substantially larger tracking errors.
While current emotional Text-to-Speech (TTS) models have successfully controlled verbal prosody, they often ignore non-verbal vocalizations (NVs), which are essential for authentic human emotion. Although some non-verbal datasets have recently emerged, they often lack high-quality, fine-grained annotations, which restricts a model's ability to precisely control NV generation. To address this limitation, we propose a novel approach for fine-grained non-verbal expression synthesis. We curate and reprocess female NV utterances from the EARS corpus, develop a new annotation scheme using tags to encode NV types, frequencies, and durations, and build an emotional TTS benchmark to demonstrate its effectiveness. Our evaluation shows that while our NV approach leads to minor trade-offs in perceived naturalness, it significantly improves expressiveness (eMOS 4.20) and emotional recognition accuracy (78.8%). Emotion-specific analysis further reveals that NV cues are highly effective for high-arousal emotions like happy (82.5%) and fear (82.7%), and almost perfectly convey sadness (98.3%).
Most neural vocoders are limited to one type: either GAN or diffusion-based. While state-of-the-art models like Vocos and WaveNeXt use powerful ConvNeXt-based generators, they have only been used in GAN frameworks and have limited performance in multi-speaker settings. Moreover, diffusion models, despite training faster than GANs, have slow CPU inference. In this paper, we introduce WaveNeXt 2, a unified ConvNeXt-based framework compatible with both GAN and diffusion vocoders. Its core innovation is residual denoising and sub-modeling, where each sub-model progressively refines the waveform. Experimental results in the multi-speaker dataset demonstrate the effectiveness of our approach: (1) GAN-WaveNeXt 2 is much faster than HiFi-GAN and WaveFit, and (2) Diff-WaveNeXt 2 also delivers much faster inference and competitive synthesis quality compared with FastDiff with 4 steps. The Diff-WaveNeXt 2 is very training-efficient, training in only 32 hours, making it ideal for resource-constrained applications.
Mask-based blind speech separation (BSS) estimates source-wise time-frequency (TF) masks by clustering multichannel observations using spatial information. The directional statistical approach clusters normalized multichannel observations on the complex unit sphere, without explicitly extracting phase and level difference features based on the plane-wave or spherical-wave assumptions. However, prior studies have mostly compared a small number of separately defined directional statistical mixture models, whereas a broader distribution family would enable a more systematic study of how density profiles affect separation performance. We propose the complex spherical Student's t mixture model (cSTMM), a directional mixture model that connects the complex angular central Gaussian mixture model (cACGMM), complex Bingham mixture model (cBMM), and complex Watson mixture model (cWMM) through the degrees-of-freedom parameter $\nu$. We also derive a generalized minorization-maximization (MM) based procedure for parameter estimation. A no-restart evaluation on noise-free LibriSpeech mixtures reverberated with measured room impulse responses shows that a single development-selected value $\nu^\ast=1$ achieved higher test-set mean signal-to-distortion ratio improvements (SDRi) than the cACGMM-equivalent setting $\nu=M$ in all acoustic conditions, with an average condition-wise gain of 0.25dB. The experiments also numerically verify that the proposed formulation numerically recovers the cACGMM, cBMM, and cWMM cases.
Deep reinforcement learning (DRL) has long been a promising solution for sequential resource management in wireless networks. However, conventional DRL methods are fundamentally limited by their reliance on unimodal policy distributions, inefficient exploration in high-dimensional action spaces, and poor adaptability to dynamic and heterogeneous environments. Meanwhile, diffusion models (DMs) as one of the most powerful families of generative AI have demonstrted remarkable capabilities in modeling complex, multi-modal data distributions across diverse domains. The integration of DMs and DRL has opened a new and rapidly growing research direction, in which DM-enabled policies substantially enhance decision quality by capturing the complex, discontinuous, and multimodal action structures inherent in wireless resource management. In this paper, we present a comprehensive survey of DM-enabled DRL algorithms and their applications for various issues in wireless networks. Particularly, we first provide the theoretical background of DM and present different DM-enabled DRL algorithms. We then systematically review applications of DM-enabled DRL for across computation offloading in mobile edge computing, UAV-assisted, vehicular, and AIGC-driven systems, as well as wireless resource allocation, physical-layer security, and robotics and UAV planning. We conclude the paper by higlight future research directions.
Motivated by structural biology applications, we study the projected multi-reference alignment (MRA) model, in which an unknown signal is observed through noisy samples, each generated by applying a random cyclic shift followed by a fixed projection. The projection merges reflection-symmetric index pairs, thereby discarding orientation information. The goal is to recover the dihedral orbit of the signal. We prove that in the high-noise regime, the first three moments of the projected observations determine a generic dihedral orbit. The main mechanism is a reduction, at the moment level, from projected MRA to the reflection-invariant phase-coupling structure of dihedral MRA. In Fourier-cosine coordinates adapted to the projection, the first moment determines the mean component, the second moment determines the Fourier magnitudes, and selected third moments yield the cosine phase-coupling relations appearing in the dihedral bispectrum. These relations lead to a constructive recovery scheme from moments up to order three. We complement the population theory with finite-sample experiments comparing expectation--maximization (EM), direct moment optimization, and direct Fourier-cosine moment optimization. The results show that, in the high-noise regime, both EM and direct moment optimization are consistent with the predicted third-moment sample-complexity scaling $n \gtrsim \sigma^6$, where $n$ is the number of observations and $\sigma^2$ is the noise variance.
Integrated sensing and communications (ISAC) is a key use case for sixth-generation (6G) wireless systems, where parametric channel estimation (PCE) plays a central role in enabling sensing, localization, and channel equalization in high-mobility scenarios. However, PCE is typically more computationally demanding than conventional channel estimation, which motivates the development of lower-complexity solutions. In this letter, we propose a fast PCE algorithm for time-varying and frequency-selective (TVFS) channels based on canonical polyadic (CP) decomposition and tensor processing, combined with ESPRIT-based initialization, component refinement, and exact line-search alternating coordinate descent. Two variants are presented: one for fully digital and another for hybrid receiver architectures. Numerical results show that the proposed method clearly outperforms a related CP-based baseline while achieving estimation performance close to a multiple-start SAGE benchmark at a substantially lower computational cost, with about one order of magnitude shorter execution time.
In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performance of stimulus reconstruction-based AAD remains unexplored. In this study, three publicly available EEG-AAD datasets - KUL, DTU, and NJU cEEGrid - are used to construct both balanced and unbalanced experimental conditions. We hypothesize and demonstrate that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets. To address this issue, we propose a leave-one-paired-envelope-out (LOPEO) cross-validation protocol. Experimental results confirm that LOPEO effectively prevents inflated decoding accuracy on unbalanced datasets. While balanced datasets are generally preferred in experimental design, LOPEO provides a principled evaluation framework for unbalanced datasets that have already been published, filling an important gap in the field.
To date, most of the research on transport planning has focused on optimizing revenues or utilitarian metrics such as average travel times, which often ends up penalizing the worst-off for the sake of profit or efficiency. At the same time, most of the research in transport justice has focused on assessing injustices, without being able to prescribe operational solutions. This paper contributes to bridging this gap and presents optimization models for justice-informed operational planning of intermodal mobility systems that explicitly account for the budget and safety limitations of users, and for infrastructural capacity constraints. Specifically, we first focus on an intermodal Autonomous Mobility-on-Demand (AMoD) system -- where self-driving robotaxis provide on-demand mobility jointly with public transit and active modes -- and characterize its operations from a mesoscopic planning perspective via network flow models. Second, we leverage these models to optimize system operations through both utilitarian efficiency and justice-informed objectives. We showcase our framework in a real-world case-study for Manhattan, New York. Our results show that monetary budgets significantly limit the social justice potential of AMoD systems if they are to be deployed as transportation network companies. At the same time, granting free public transit can result in sufficiency levels very close to a completely free intermodal AMoD system, where justice-informed operations can be achieved without compromising standard efficiency metrics, ultimately highlighting the strong potential of social policies.
Ultra-low-bitrate speech coding is pivotal for bandwidth-constrained communication and deep compression, yet maintaining naturalness and speaker identity at such extreme bit budgets remains challenging due to pronounced information loss and quantization instability. To this end, we propose FMelCodec, an ultra-low-bitrate neural speech codec in the mel-spectrogram domain, cast as a three-stage coding-refinement-reconstruction (CRR) framework that can operate at as low as 250 bps. In the CRR framework, the front-end mel-spectrogram coding stage employs a highly aggressive 640x compression/decompression encoder-decoder structure with a single 1024-entry VQ codebook, coupled with an online clustering strategy that reassigns underused codewords to prevent codebook collapse and preserve codebook diversity. The subsequent conditional flow matching (CFM)-based mel-spectrogram refinement stage leverages a lightweight velocity-field estimator and CFM-based solver to refine the codec-degraded mel-spectrogram produced by the preceding decoder, and adopts a self-consistency training scheme that supports fewer iterative inference steps for the purpose of reducing computational overhead. Finally, the vocoding-driven waveform reconstruction stage employs a HiFi-GAN vocoder to faithfully reconstruct waveform from the refined mel-spectrogram. Experiments conducted on two datasets spanning two sampling rates show that, under ultra-low-bitrate constraints of 250 bps for 16 kHz and 750 bps for 48 kHz, both objective and subjective evaluations consistently demonstrate that FMelCodec achieves higher speech reconstruction quality and speaker similarity, while incurring lower computational and model complexity.
Quantum wireless sensing using Rydberg atomic receivers enables high-sensitivity signal acquisition direction-of-arrival (DoA) estimation. However, it suffers from a fundamental limitation, where only the magnitude of the received signal is observable. The recently proposed Quantum-MUSIC algorithm addresses this problem by recovering phase information through alternating minimization and subsequently applying the MUSIC algorithm for DoA estimation. However, the existing approach relies on an $\ell_2$-norm phase retrieval step, making it highly sensitive to outlier measurements produced by hardware faults, sensor saturation, or adversarial interference. In this letter, we propose a \emph{Robust Quantum-MUSIC} (RobQMUSIC) framework that replaces the $\ell_2$-norm with an $\ell_1$-norm formulation. The resulting weighted phase-retrieval problem is solved efficiently via an Iteratively Reweighted Least Squares (IRLS) scheme embedded within the alternating minimization loop, requiring no increase in structural complexity relative to the baseline algorithm. Simulation results demonstrate that RobQMUSIC achieves near-identical DoA estimation accuracy to Quantum-MUSIC under ideal conditions, while maintaining robust performance over a wide range of outlier contamination levels at which Quantum-MUSIC fails entirely.
We investigate deterministic and nonblocking supervisory control of discrete event systems under cyber-attacks using the ALTER (Attack Language for Transition-basEd Replacement) model. While prior works consider supervisory control that achieves either the large (upper bound) language or small (lower bound) language separately, deterministic supervisory control achieves both large language and small language at the same time to ensure that the language generated by the supervised system is unique and deterministic. We introduce two new concepts of CA-D-controllability and CA-D-observability and prove that they are necessary and sufficient for the existence of a deterministic supervisor. For nonblocking supervisory control, the objective is to ensure that the supervised system can always reach marked states under any attack scenario. We prove that relative closure, CA-D-controllability, and CA-D-observability together are necessary and sufficient for the existence of a nonblocking supervisor. We further develop methods to verify CA-D-controllability and CA-D-observability. We also illustrate our results using a robotic system example.
This paper proposes a coordinated energy-mobility dispatch framework for grid support service provision in smart cities under time constraints. In particular, a scenario in which a distributed system operator requests a specified amount of energy within a given deadline is considered. A fleet of connected autonomous electric vehicles equipped with virtual battery partitioning is dynamically dispatched toward vehicle-to-grid stations. The routing problem is formulated as a periodically updated resource-constrained shortest path, accounting for time and energy constraints with congestion-dependent travel times derived from a dynamic traffic model. At the vehicle level, a model predictive control strategy regulates speed to satisfy mobility energy requirements while ensuring deadline compliance. The framework is validated through simulations on the urban network of Rapallo (Italy), demonstrating robustness against congestion-induced delays.
Controlling partial differential equations (PDEs) with learning-based policies remains fundamentally limited by fixed-dimensional representations: policies trained for a specific sensor, actuator, or agent configuration typically fail when the configuration changes. This limitation is particularly severe in multi-agent PDE control, where policies do not scale across population sizes without retraining. We address this challenge by introducing Cardinality Invariant Neural Operator Control (CINOC), reformulating PDE control as an operator learning problem that maps state fields to continuous control functions and trains them end-to-end through differentiable PDE solvers, yielding policies that naturally adapt to varying sensor and actuator configurations. Remarkably, CINOC policies trained on small swarms exhibit cardinality invariance, allowing for zero-shot transfer to significantly larger populations as well as robustness to partial agent failure. This scalability arises from agents sharing a common policy and coordinating through their physical environment, which produces an emergent self-normalization effect. To explain this phenomenon, we provide a theorem grounded in mean-field theory demonstrating that policy gradients computed from finite-agent systems converge to those of a continuous control limit. Empirically, we validate CINOC on tracking, stabilization, and density transport across linear, nonlinear, chaotic, and turbulent PDEs.
The aim of this Lecture Note is to introduce the Signal Processing (SP) community to a powerful yet still under-utilised tool: the semiparametric statistics. In short, the semiparametric framework allows us to estimate or perform hypothesis testing on a finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter (i.e. a function), such as the density of the noise. Clearly, this framework is general enough to include almost every SP application. Remarkably, as the title suggests drawing on George R. R. Martin's famous book series, the greatest advantage of semiparametric statistics over parametric and non-parametric ones lies in the fact that it is able to reconcile two seemingly dichotomous concepts: statistical efficiency and robustness. Here, robustness is understood in the sense of distribution-freeness, that is the estimation performance must be robust with respect to the lack of knowledge of the functional form of the generating data distribution. To explain exactly what this means, in this Lecture Note we will focus our attention on the famous and fundamental symmetric location problem. The symmetric location problem is a fundamental problem that can be found (in various forms) in countless areas of SP: source localization, time synchronization, array signal processing, and distributed sensor networks, just to name a few. Furthermore, it is important to note that the methodology we will develop for this specific problem can be extended to much more general semiparametric estimation problems, such as the estimation of the location vector and covariance matrix in elliptical data.
Pathological assessment guides lung cancer diagnosis, treatment selection, and prognostic evaluation, yet current CPath approaches rely on task-specific models for isolated objectives. Although pan-cancer foundation models offer versatility, they lack subspecialty-level depth and have not been evaluated across clinical workflows or prospectively validated in real-world settings. We introduce PulmoFoundation, a multi-center, prospectively validated, randomized controlled trial (RCT)-evaluated foundation model for comprehensive lung pathology assessment across pre-operative, intra-operative, and post-operative care. Built upon Virchow2 via subspecialty-specific pretraining using ~40,000 diagnostic H&E-stained whole-slide images (WSIs), PulmoFoundation was systematically evaluated on ~26,000 WSIs across 32 clinically relevant tasks. In addition to accurately predicting molecular markers and patient survival, our model achieves clinical-grade performance in core diagnostic tasks across biopsy, frozen section, and surgical resection slides. In a registered prospective study of 1,357 patients across 11 diagnostic tasks, our model achieved an average AUC of 92.3%. Using pre-specified triage thresholds, PulmoFoundation could reduce additional second-review burden for 68.8% of biopsies and 83.0% of frozen sections, and defer 44.5% of IHC stain orders, with PPVs of 1.0, 0.991, and 0.966. Beyond prospective validation, we conducted a crossover RCT with eight pathologists, in which AI assistance improved diagnostic accuracy across 4,928 case-reader pairs (91.7% w/ AI vs. 83.8% w/o AI). AI assistance also reduced median diagnostic time by 19.6%, increased diagnostic confidence by 8.7%, and improved inter-rater agreement from moderate (kappa = 0.56) to substantial (kappa = 0.76). Together, these evaluations support PulmoFoundation as a clinically validated decision-support system for lung pathology.
Recent video super-resolution (VSR) approaches use deep neural networks to enhance low-quality input videos and recover visual detail, with diffusion-based methods in particular showing promising results. In this paper, we investigate whether existing video quality models can be used to assess the performance of these diffusion-based VSR methods, by comparing model predictions with results from a subjective test. The study compares six upscaling methods (Lanczos, Rhea, SCST, DOVE, SeedVR2, Starlight Mini) applied to both compressed (AV1 and DCVC-RT) and uncompressed low-resolution videos considering the play-out on a UHD-1/4K screen. A range of full- and no-reference quality models are used to assess their applicability to this new type of quality degradation, focusing on within-sequence performance. The results highlight that CNN-based full-reference models, such as LPIPS, DISTS, and CVQA-FR show significantly higher correlation coefficients than both conventional full- as well as the tested no-reference models. Most overestimate the overly sharp results of SCST, with VMAF mainly failing due to spatial inconsistencies introduced by Starlight Mini. None of the tested video quality models reach sufficient accuracy so as to replace complementary subjective testing. The reference, degraded and upscaled videos, as well as the user ratings and model scores are made available with the paper at this https URL as open data.
Microwave inverse scattering imaging (MISI) is a crucial computational technique in microwave nondestructive evaluation and near-field microwave sensing systems. However, quantitative reconstruction of high-contrast targets remains a formidable challenge due to severe multiple scattering effects and the inherent ill-posedness of electromagnetic inverse problems. To overcome this fundamental bottleneck in computational microwave imaging, this paper proposes an alternating optimization framework based on cross-correlated physics-informed neural network (Alt-CC-PINN). This architecture deeply decouples the evolution of the microwave physical field from the neural-network-based dielectric parameter inference, replacing traditional joint optimization with a hybrid alternating engine. Specifically, the method employs an analytical Polak-Ribière conjugate gradient (PR-CG) algorithm driven by a cross-correlated loss to optimally update the contrast sources, and deploys batched zero-padded 2D-FFT to ensure high computational efficiency. Subsequently, a deep learning optimizer is utilized to update the continuous neural representation. Extensive validations based on simulated and measured data demonstrate that Alt-CC-PINN effectively overcomes the local minima problem in high-contrast and low-signal-to-noise-ratio (SNR) environments. It exhibits superior reconstruction fidelity and robustness under the frequency-hopping probing strategy, providing a powerful and reliable computational electromagnetic solver for practical microwave imaging systems.
This paper presents a robust tracking controller for tracking curvature-constrained paths by vehicles/robots with uncertain Dubins dynamics. Although Dubins paths have been widely used in vehicular and robotic applications, robust and convergent tracking under model uncertainties remains understudied. To address this, we propose path tracking controllers based on sliding mode control, formulated in the transverse coordinate frame, which guarantee invariance and convergence of both lateral and heading errors to zero in the presence of bounded disturbances. Simulation results show that the proposed method reliably tracks paths despite disturbances and significantly outperforms existing methods based on sliding mode controllers.
We evaluate the M2M4 and EVM methods for real-time SNR estimation in FSO communication systems subject to deep fading. Using an experimental setup with controlled deep fading, we show that the M2M4 estimator reliably tracks the SNR profile, making it suitable for triggering transceiver adaptation.
In this work, we study the contraction conditions of iterative algorithms for stationary and finite-horizon discrete-time regularized mean-field games (MFGs) with multiple populations, where each population only interacts with the state distributions of the other populations. Due to the high dimensionality caused by the interaction of different populations, contraction rates for these algorithms cannot, in general, be expressed in terms of radicals. By studying the dynamics of these iterative algorithms and assuming that the system components of each population's MFG are Lipschitz continuous, we present explicit (eventual) contraction conditions for each algorithm in any normed space, relying only on these Lipschitz parameters. As a consequence of these contraction conditions, we provide convergence rates of finite-horizon mean-field equilibria to infinite-horizon stationary (and non-stationary) mean-field equilibria (MFEs), under restrictions on a variational characterization of the dynamics of these iterative algorithms. In the single-population case, the restrictions we impose on this variational characterization to obtain these convergence results are less restrictive than previous results in the literature.
Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtime. Prior work defined Reconstructive Authority (RAM) as a condition for valid execution: actions are permitted only if authority can be constructed from current state. This paper addresses enforcement at runtime: how to enforce this condition in a running system. We introduce a runtime execution model in which authority is evaluated at action time and execution is conditioned on its constructibility. This extends the execution state space beyond admit/deny with a third state, halt, representing cases where authority is undefined due to incomplete or uncertain observability. We define a concrete execution protocol including dynamic dependency resolution, authority reconstruction, and explicit decision semantics. We further introduce a Recovery Loop that integrates drift detection (IML) with execution control (ACP), allowing the system to suspend execution, acquire missing information, and re-attempt authority reconstruction. We show that this model guarantees safety -- no action is executed without constructible authority -- and conditional liveness: execution resumes when authority-defining variables become observable. This work operationalizes reconstructive authority as a runtime enforcement mechanism, providing the execution semantics required to apply RAM in real systems.
BJT-based 2D temperature-sensor arrays are factory-calibrated to +/-0.1 degC, but post-deployment thermal and mechanical stresses drift their per-sensor gain-offset parameters by an order of magnitude, and in-lab recalibration is impractical. We present RASC (Region-Aware Self-Calibration), a five-stage algorithm that decomposes the global ill-posed problem into local cluster-level problems, runs robust alternating estimation (trimmed-mean field reconstruction + Huber IRLS) inside each cluster, and reconciles overlapping estimates by linear consensus on the cluster-overlap graph with provable exponential convergence. On 7,632 frames from a deployed 16x16 array exhibiting ~5x factory-spec non-uniformity, RASC cuts the locally-non-smooth fixed-pattern residual by 71+/-5% (10-fold CV), restoring +/-0.1 degC accuracy while perturbing the calibrated field by only 0.041 degC RMSE; reduction concentrates at the edges (78% vs 55% interior). In simulations on 8x8 to 32x32 arrays, RASC matches an oracle centralized EKF within 0.10 degC with ~4x lower bandwidth.
Optimization modeling serves as the pivotal bridge between natural-language problem descriptions and optimization solvers, and remains a cornerstone for bringing operations research (OR) into real-world decision making. Recent advances in large language models (LLMs) have driven significant progress in automatic optimization modeling. However, existing methods still lack explicit validation during the modeling process, allowing errors introduced in earlier stages to carry through the pipeline and ultimately reduce final modeling accuracy. To address this challenge, we introduce TriVAL, a tri-validation framework that performs explicit validation at three stages of automatic optimization modeling: semantic specification, mathematical formulation, and code generation. At each stage, TriVAL follows a construct-validate-revise loop that assesses the current result against stage-specific criteria and revises it when needed. This design helps identify and correct errors before they accumulate across stages, helping preserve faithfulness throughout the modeling process. To evaluate automatic optimization modeling on more challenging combinatorial problems, we further introduce NL4COP, a benchmark of 150 instances across 50 diverse problem types with more complex decision logic, more tightly coupled constraints, and more demanding modeling requirements than existing benchmarks. Experiments on NL4COP and established benchmarks show that TriVAL consistently outperforms state-ofthe-art methods, with the largest gains on the most challenging problems.
This paper audits benchmark evaluation in clinical-interview depression detection through four complementary probes across DAIC/E-DAIC, CMDC, ANDROIDS, MODMA, and PDCH. First, we re-evaluate E-DAIC under strict subject-disjoint leave-one-subject-out cross-validation. A lightweight hybrid text-plus-LLM-score model reaches macro-F1 = 0.723 - the highest reported under this protocol, to our knowledge - providing a conservative out-of-fold reference point that does not depend on the privileged official holdout. Second, we test whether the E-DAIC official split supports fine-grained leaderboard rankings by sweeping 96 model configurations across modality bundles, pooling strategies, and learners. Development-side cross-validation and official-test rankings align only moderately: the best cross-validation configuration ranks twentieth on the official test, the official-test winner ranks forty-first by cross-validation, top-3 overlap is zero, and the apparent winner is rank-1 in only 32.3% of subject bootstraps. Third, we externally validate strong public CMDC and ANDROIDS baselines that achieve near-ceiling in-domain performance. Zero-shot transfer to external corpora is substantially weaker. Finally, we stress-test E-DAIC text and audio models using paired symptom-dense versus symptom-light interview slices defined by an SRDS-based annotator. Text scores rise sharply on symptom-dense slices, whereas audio scores remain nearly flat; the text-minus-audio gap is positive across all five seeds.
Generative AI research increasingly confronts a shared problem: systems must sustain yet govern their own generative activity when uncertainty is high, evidence is missing, or context is insufficient. This position paper argues that metacognition should become the scientific framework for bounded and effective self governance in generative AI, where output generation is properly evaluated together with the capacities through which generative systems navigate and regulate their own activity. We advance this position by showing that bounded and effective AI self-governance requires metacognitive alignment across computational, algorithmic, and ecological levels. At the computational level, metacognition specifies the meta-level functions a system is meant to serve, such as monitoring, evaluation, control, and adaptation. At the algorithmic level, these functions are realized through procedures such as elicitation, iteration, and modularization. At the ecological level, metacognitive signals become meaningful, actionable, and accountable within the interface, workflow, and accountability arrangements. Metacognition thus makes it possible to conceive generative AI as both capable and well-governed, rather than treating capability and governance as competing aims.
We present a brain-to-image system that decodes visual stimuli from EEG signals recorded during natural image viewing. Our system addresses two tasks: (1) EEG-to-image retrieval, which ranks the correct stimulus image among 200 candidates given an EEG segment, and (2) EEG-to-image reconstruction, which generates an image consistent with the perceived stimulus. For retrieval, we implement a multi-level blurring approach improved with biologically inspired EVNet features and trained with the InfoNCE loss. Evaluated over 10 random seeds for a single subject, the retrieval model achieves a mean final-epoch Top-1 accuracy of 86.30% and Top-5 accuracy of 98.55%. For reconstruction, we implement CognitionCapturerPro, which aligns EEG representations to multi-modal CLIP embeddings, including image, text, depth, and edge embeddings, and synthesizes images with SDXL-Turbo conditioned via IP-Adapter. Averaged over 10 seeds, the reconstruction model achieves a CLIP score of 0.903 using ViT-H-14, a CLIP score of 0.870 using ViT-L/14, and an SSIM of 0.409. These results demonstrate the feasibility of decoding rich visual representations from EEG signals using modern multi-modal alignment and generative modeling techniques.
Robots deployed in dynamic environments must contend with environment-driven changes that reshape computation at runtime: new tasks may appear, precedence relations can shift, and overall workload structure evolves, all of which degrade performance, especially when multi-task inference is required under tight resource and real-time budgets. We present RED, a real-time scheduling framework for multi-task deep neural network workloads on resource-constrained robotic platforms that adapts to Robotic Environmental Dynamics (RED) while preserving end-to-end timing guarantees under modeling assumptions. The core of RED is a deadline-aware scheduler that assigns intermediate sub-deadlines, allowing it to accommodate evolving computation graphs and asynchronous inference induced by unpredictable conditions. The framework also supports flexible deployment of MIMONet (multi-input multi-output neural networks), commonly used in multi-tasking robots to alleviate memory pressure through weight sharing. RED explicitly leverages this shared-parameter property via a workload refinement and graph-reconstruction procedure that aligns MIMONet structure with schedulability requirements, improving compatibility and efficiency. We implement RED on NVIDIA Jetson family platforms and on an Apple M-series MacBook and evaluate it on navigation-oriented workloads representative of real robotic scenarios. Experiments show consistent gains over existing methods in throughput, deadline satisfaction, robustness to interference, adaptability, and runtime overhead.
Frequency-modulated continuous-wave radar sensing often relies on labeled measurements that are costly, restricted, or difficult to collect at scale. This work evaluates physics-informed digital twins as controlled testbeds for early-stage quantum-classical radar learning. Two synthetic radar benchmarks are considered: unmanned aerial vehicle classification from range-Doppler maps and human fall detection from Doppler-time spectrograms. For both tasks, inputs are standardized, reduced using principal component analysis, and classified using either a radial basis function support vector classifier or a quantum support vector classifier. All quantum-kernel results are obtained using noiseless classical simulation; no quantum hardware is used, and no quantum-advantage claim is made. Across five random seeds, the quantum support vector classifier improves the UAV benchmark from four principal components onward, reaching an accuracy of 0.941 +/- 0.012 at eight components, compared with 0.880 +/- 0.029 for the classical baseline. On the fall-detection benchmark, both classifiers perform similarly, with a small quantum-kernel improvement at higher feature dimensions. A Gaussian-noise robustness study shows limited performance degradation across the tested noise levels, while preserving the UAV quantum-kernel gain. These results support digital twins as useful, controlled environments for radar-QML benchmarking prior to measured-data validation and hardware execution.
A critical challenge facing clinicians managing chronic disease interventions is sustaining long-run patient health given limited information and resources. Digital therapeutics (DTs) provide a cost-effective way to manage interventions at scale through repeated interactions (e.g. daily treatment recommendations), but patient success is highly dependent on their adherence. Behavioral psychology suggests that both treatment recommendations and past adherence affect future adherence, yet existing decision support frameworks for DTs model only recommendation effects or treat adherence as exogenous context, leaving a key gap in model and algorithm development. To address this gap, we present a DT decision support framework that captures both recommendation and adherence effects, allowing clinicians to better plan treatment recommendations. We model a patient's time-varying capacity for engagement with treatment using a linear dynamical system (LDS) that captures both recommendation and adherence effects, endogenously connected to adherence behavior with a logit link. We establish finite-time identification guarantees for this model, extending LDS results to our setting. Next, we propose an optimism-based algorithm, UCB-BOLD, for online treatment selection and prove that it achieves sublinear regret. We evaluate UCB-BOLD against benchmarks via ablation studies on a synthetic patient cohort generated using micro-randomized trial data. DT decision support tools can include dynamical models to enable decision makers to efficiently use the data in DT settings to improve patient health through effective resource allocation. While myopic or heuristic approaches suffice for some patient types, the benefits of explicitly planning around recommendation and adherence effects are significant for others; UCB-BOLD achieves 2-3x lower conditional value-at-risk regret than the next-best benchmark.
With the rapid proliferation of wireless and Internet of Things (IoT) devices, ensuring secure and reliable device identification has become a significant challenge. Traditional security techniques, such as IP or MAC address-based authentication, are susceptible to spoofing, whereas Radio Frequency Fingerprint Identification (RFFI) offers a more secure alternative by exploiting the unique hardware imperfections in devices' RF signals. In this paper, we propose a novel deep learning-based framework for RFFI that enhances both accuracy and reliability in challenging RF environments. The core of our approach is the Signal Inception Transformer (SinFormer), which leverages a specialized multi-scale self-attention mechanism to effectively capture both large-scale and fine-grained fingerprints in signals, significantly improving identification accuracy. To further enhance robustness and reliability, we introduce a two-stage training strategy that enables the model to learn general signal features and maintain performance under adverse conditions, such as low Signal-to-Noise Ratio (SNR) or channel variations. The effectiveness of the proposed method is validated using a real-world dataset. Experimental results show that the SinFormer framework consistently outperforms existing methods in accuracy and robustness across diverse and challenging scenarios.
The electric power supply for AI data centers is now the most significant bottleneck in the race toward Artificial General Intelligence, surpassing even the constraint of AI accelerator availability. To our knowledge, this paper is the first to describe the end-to-end power management process for a hyper-scale AI datacenter; from early power planning to accommodate next-generation accelerators 6--12 months before their general availability, to tuning power settings after large scale deployment, and finally to dynamic, runtime power management for evolving workloads. We present detailed power measurements for a 150 MW datacenter hosting a cluster of 83K GB200 GPUs. We share insights from building this state-of-the-art AI cluster. We hope this work encourages practitioners across the industry to share their own experiences as well.
Ring-like communication graphs appear in UAV formations, cyclic patrols, perimeter monitoring, and other multi-agent tasks in which agents exchange information mainly with neighboring vehicles along a closed route. When measurement and actuation noise are persistent, a useful augmentation should improve both the convergence rate of consensus and the steady-state disagreement level. This paper studies the addition of a single weighted chord to a connected weighted cycle. The central observation is that a chord is not just a generic rank-one edge update: it splits the cycle into two complementary resistance arcs, and this resistance split governs both the algebraic-connectivity gain and the Kirchhoff-index reduction. We first derive exact chord-induced effective-resistance and Kirchhoff-index update formulas, giving a closed-form coherence objective. We then prove that, under bounded conductances and small resistance discrepancy, near-antipodal resistance-balanced chords are near-optimal for algebraic-connectivity improvement; an i.i.d. bounded-conductance model yields the same conclusion with high probability. Finally, because the best convergence-rate chord and the best coherence chord need not coincide, we formulate the design as a finite Pareto problem and introduce RBAPS and AW-RBAPS, two resistance-balanced screening rules that retain only linear or near-linear candidate sets. Numerical experiments show that AW-RBAPS remains effective beyond the formal moderate-heterogeneity regime and approximates the exhaustive Pareto front with mean hypervolume ratio $0.9987$ while evaluating about $10.1\%$ of admissible chords.
The rapid growth of Electric Vehicle (EV) adoption challenges power distribution networks through peak load spikes, voltage instability, and transformer overloads from uncoordinated charging. While Model Predictive Control (MPC) and standard Reinforcement Learning (RL) methods have addressed these issues, existing approaches rarely treat real-time carbon intensity or fluctuating renewable energy (RE) availability as primary scheduling objectives, leaving substantial decarbonisation potential unrealised. This paper proposes an emission-aware RL strategy based on the Soft Actor Critic (SAC) algorithm, with a multi-objective reward that penalises carbon emissions, curtailed on-site renewables, and unmet user demand. The agent is trained within a unified benchmarking framework on the EV2Gym platform, incorporating behind-the-meter solar and wind profiles, time-varying EirGrid carbon intensity data, and realistic workplace EV behaviour across 25 Electric Vehicle Supply Equipment (EVSE) units. Nine control strategies, including heuristics, emission-aware MPC variants, and the proposed RL agent, are compared under five renewable penetration scenarios (0%-50%) over ten independent runs each. The RL agent achieves a carbon intensity as low as 23.96 grams of carbon dioxide per kilowatt-hour under 50% wind penetration, representing up to 87% emission reduction versus the uncontrolled baseline, and outperforms the external graph-based Power Distribution Network (PDN) benchmark. Transformer overload remains below 7 kWh across scenarios, against up to 1093 kWh for the As Fast As Possible (AFAP) heuristic, and renewable self-consumption reaches 52% under combined wind and solar supply. Embedding carbon intensity forecasts into the RL state and reward aligns charging with low-emission periods while preserving grid compliance and user satisfaction.
This paper presents reinforcement learning (RL) policies for dynamic quadrupedal locomotion in planetary exploration scenarios. Building on a taskoptimized quadruped with a 5-bar leg design, we develop RL policies for walking, vertical jumping, forward jumping, and in-flight attitude control, explicitly tailored to the reduced gravity on Mars. These policies jointly enable such robots to overcome obstacles larger than themselves through coordinated jumping and precise in-flight reorientation for safe landings. We demonstrate Sim2Real transfer of the attitude control policy on the Olympus quadruped through single-axis reorientation tests, while all locomotion policies are validated in simulation. A complete Mars exploration mission scenario demonstrates coordinated policy deployment across challenging terrain. Experimental results show 90° attitude reorientation in 2.6 seconds, with simulations demonstrating 3.1 meter vertical jumps and 3.9 meter forward jumps under Martian gravity conditions. - Supplementary video: this https URL
This letter introduces a physics-informed self-supervised framework for sonar image despeckling that reformulates despeckling as residual consistency in the homomorphic log domain. By constraining the log-ratio residual to obey multiplicative speckle statistics, the proposed method eliminates the need for clean supervision while preventing degenerate identity solutions. A variance-targeted statistical loss combined with edge-aware structural regularization and median-guided curriculum stabilization enables effective speckle suppression with preserved structural fidelity. This formulation along with a lightweight neural network achieves state-of-the-art performance across multiple real sonar datasets and demonstrates excellent cross-dataset robustness, while remaining suitable for real-time deployment.
This paper presents QCommE2E as an open-source simulation framework for end-to-end quantum communication systems, with explicit tutorial emphasis. The primary objective is to develop a comprehensive framework that includes transmitters, receivers, communication channels, performance metrics, and visualization tools, to facilitate the systematic design, configuration, and analysis of experimental simulations for novel quantum communication architectures. As the primary use case, we walk through the current quantum channel comparison, which maps textbook quantum-information channels and reduced optical-fiber/free-space surrogates into a single executable benchmark. We describe the common density-matrix interface, the matched modulation and detection chain, and the exact role of the channel classes Depolarizing Channel, Dephasing Channel, Erasure Channel, Bosonic Channel, Turbulence Channel, and PMD Channel. We also explain the current visualization layer, which projects received states onto constellation and Bloch representations for qualitative inspection. To keep the implementation-faithful, we provide a summary of the baseline execution, which uses a square 16-QAM embedding, a pretty-good-measurement detector constructed from the same reference-state codebook, and BER/SER. Finally, we position the channel-comparison as an entry point for broader future work, including equalization, quantum autoencoder, learning-based, and system-level algorithm integration.
Hyperspectral image restoration faces several challenges, including limited training data, strong sensor specificity, and high spectral dimensionality. These limitations hinder the learning of robust hyperspectral priors, motivating the reuse of priors learned from large-scale RGB data. In this work, we propose a minimally trained, lightweight adapter that repurposes frozen pretrained RGB denoisers for hyperspectral restoration through a projection mapping. The method denoises low-dimensional spectral projections and reconstructs the hyperspectral cube through constrained linear aggregation, while preserving plug-and-play compatibility and the stability properties of the underlying RGB denoiser. Experiments on denoising, deblurring, and super-resolution across multiple datasets demonstrate consistent improvements over hyperspectral-specific baselines, showing the strong transferability of large-scale RGB priors.
Vision Transformer (ViT) models, utilizing self-attention mechanisms, have demonstrated robust generalization capabilities across various vision tasks, including image classification. However, these models, typically pretrained on general public datasets, often lack the specialized domain knowledge necessary for medical imaging applications. In this study, we investigate the adaptation of ViT models, specifically for cardiac magnetic resonance (MR) images, using an in-house dataset. We found that pretrained ViT features do not effectively transfer to the cardiac MR domain. To overcome this limitation, we introduce an adaptation strategy that utilizes image-based self-supervised contrastive learning, demonstrating superior performance compared to traditional supervised training approaches. Moreover, our adapted ViT model exhibits strong generalization to external MR datasets such as BraTS and ADNI. Through ablation studies, we further investigate the impact of batch size and dataset scale on performance. Ultimately, our adapted model achieves classification AUC exceeding 0.75 across the four most common cardiac MR sequences.
We study stochastic density control between Gaussian-mixture endpoint distributions under Brownian prior dynamics. Since the direct Schrödinger bridge between Gaussian mixtures is generally not available in closed form, we introduce a lifted path-space construction in which each trajectory is augmented with a source--target component label. Consequently, the problem decomposes into Gaussian component-to-component Schrödinger bridges with explicit marginal, drift, and cost formulas, while the mixture-level assignment reduces to a finite-dimensional entropic coupling problem with a Sinkhorn scaling form. We then analyze the projection obtained by discarding or forgetting the label. By construction, the projected law satisfies the original Gaussian-mixture endpoint constraints, but its relative entropy generally differs from the lifted relative entropy by a nonnegative conditional label-information gap. This gap reveals a path-space obstruction: the lifted optimizer cannot, in general, be identified with the direct unlabeled Schrödinger bridge after projection. We also derive the posterior-averaged Markov drift associated with the projected marginal flow, prove a kinetic-energy upper bound, and identify a common path-potential condition under which the projection gap vanishes. Several numerical illustrations showing density and shape control are recorded for a self-contained exposition.
Large audio and language models have recently demonstrated zero-shot reasoning capabilities across various domains. However, it remains unclear how the form of audio input, whether handcrafted acoustic features extracted from speech or the raw audio waveform itself, affects performance for Parkinson's disease (PD) detection across different languages. In this study, we systematically compare two input modalities for zero-shot PD detection: (i) handcrafted acoustic features extracted from speech recordings analyzed by a general-purpose LLM, and (ii) direct waveform input analyzed by audio-capable models. Experiments on PD speech datasets in four languages show that performance varies across input modalities, speech tasks, and languages. Handcrafted acoustic features provide more stable performance in a low-resource language (e.g., Bengali), whereas audio input yields dataset-dependent gains. These findings highlight the impact of input modality on zero-shot PD detection from speech.
Sampling-based model predictive control methods, such as Model Predictive Path Integral (MPPI), offer derivative-free optimization and robustness in complex robotic systems. However, standard MPPI relies on cost-based soft penalties that cannot guarantee hard-constraint satisfaction, severely limiting its applicability to highly constrained tasks such as closed-chain manipulation. To address this, we propose Manifold-Constrained MPPI (MC-MPPI), a real-time sampling-based control framework that enforces manifold-based equality constraints while preserving the computational advantages of MPPI. The key idea is to decouple the constrained optimal control problem into latent-space planning and execution-level correction. At the planning stage, a Variational Autoencoder (VAE) learns a low-dimensional latent representation of the constraint manifold, enabling MPPI to efficiently generate near-feasible candidate trajectories without per-sample modification. Since this reference enables accurate linearization of the equality constraints, an execution-level Quadratic Programming (QP) controller resolves the residual manifold mismatch in a single solve rather than through iterative projection. Experiments on a 14-DoF closed-chain dual-arm system in both simulation and real-world settings demonstrate that MC-MPPI operates stably at 100 Hz, reliably navigates dynamic environments while effectively maintaining hard equality constraints, and significantly outperforms baseline methods in tracking accuracy. Supplementary videos and implementation details are available at this https URL.
Recent advances in learning-based model predictive control (MPC) have leveraged neural networks for online model learning, achieving strong performance when nonstationary system dynamics deviate from nominal models. However, existing approaches primarily address specific or relatively structured forms of dynamical variation, leaving more general, unknown, and unpredictable time-varying dynamics insufficiently handled. To tackle this challenge, we propose T2S-MPC, a framework that adaptively learns a residual dynamics model online and integrates it with the nominal model within the MPC framework to enable fast-evolving online planning. To make the model time-aware, we explicitly encode temporal information through a structured time embedding and employ a two-timescale update scheme, allowing the controller to capture nonstationary dynamics while balancing rapid adaptation with stable learning. We evaluate the proposed method on a 2D quadrotor across stabilization and trajectory tracking tasks under diverse time-varying disturbances, including linear drifting and periodic perturbations. Experimental results show that T2S-MPC consistently outperforms classical MPC, neural MPC, and ablated variants in control performance, while also demonstrating strong robustness across a wide range of disturbance conditions without additional tuning. The source code is publicly available at this https URL
Coordinating micro-robotic swarms in physiologically realistic, time-dependent fluid environments remains an unsolved challenge for biomedical and environmental applications. We present a hybrid Computational Fluid Dynamics - Multi-Objective Multi-Agent Reinforcement Learning framework that directly couples a high-fidelity incompressible Navier-Stokes solver with decentralized proximal policy optimization to learn physically consistent swarm control strategies in oscillatory flow. Sixteen magnetically actuated micro-robots navigate a pulsatile arterial waveform, simultaneously optimizing upstream progression, energy conservation, and motion smoothness, reconciled using PCGrad surgery. Without PCGrad, energy efficiency and smoothness rewards collapse to near zero within 10,000 training steps while progress exhibits persistent large-amplitude oscillations, confirming that gradient conflict resolution is a structural requirement rather than an optional refinement in this domain. The converged policy achieves a progress reward of 6.5-7.0, a sustained energy efficiency of 0.63-0.65, and near-maximum smoothness (0.97-0.99), representing improvements over brute-force baselines on the primary objective while both baselines yield negative energy efficiency throughout. Training reveals three emergent behavioral phases: a collective two-layer hydrodynamic throttling formation that suppresses peak channel velocities during forward flow, a cycle-synchronized ratchet mechanism that exploits flow reversals for upstream repositioning, and an individualized final approach as agents near the success boundary. These results establish that time-dependent fluid-agent interactions can be captured directly within multi-objective reinforcement learning loops, offering a physically grounded paradigm for micro-swarm control in biomedical navigation, environmental monitoring, and industrial microfluidics.
We propose a semi-analytical amplitude phase shift keying (APSK) signaling framework for integrated sensing and communication (ISAC), focusing on i.i.d. uniform discrete input distributions for practicality and analytical tractability. First, we establish APSK design criteria in which communication performance is measured by the gap to capacity and linked to the minimum Euclidean distance, while sensing performance is characterized by the symbol-energy variance. Based on these criteria, we propose a family of APSK constellations whose key parameters follow explicit scaling laws. Then we prove that this design achieves a constant gap to capacity independent of the signal-to-noise ratio. Building upon this foundation, we further construct a parametric APSK family that bridges the communication-optimal and sensing-optimal designs, with the communication and sensing (C&S) tradeoff controlled by the number of rings and energy allocation among rings. Simulation results show that the proposed APSK achieves C&S performance very close to the Pareto boundary achieved with time-independent, circularly symmetric, and otherwise unconstrained continuous input distributions.
This paper considers a safe trajectory tracking of the Stefan problem with a second-order moving boundary dynamics. The model is given by a parabolic Partial Differential Equation (PDE) defined on a time-varying domain of moving boundary governed by a second-order Ordinary Differential Equation (ODE) associated with the Neumann boundary condition. A feedforward control is designed by a series expansion approach to solve the inverse Stefan problem under given reference trajectory of the moving boundary, and the convergence of infinite series is proven. A trajectory tracking controller is derived based on an energy-shaping, which ensures the safety of the model constraint in the closed-loop system. The closed-loop system is also shown to be globally exponentially stable with respect to the tracking error by performing PDE backstepping transformation and Lyapunov analysis. Numerical simulation illustrates an effective tracking performance of the proposed method under a sinusoidal reference trajectory. Code is released at this https URL.
In many industrial domains, the Functional Mock-up Interface (FMI) is used to exchange simulation models as Functional Mock-up Units (FMUs) across different partners using various modelling tools. This opens up the possibilities for simulation-based verification and validation using FMUs for ensuring reliable system behaviour. However, deriving effective test oracles for these simulation models remains challenging due to the absence of explicit expected outputs. This limits the applicability of conventional testing approaches, which require access to the internal workings of the systems. Metamorphic testing (MT) addresses this limitation by leveraging metamorphic relations (MRs), but extracting such relations from specifications remains largely a manual and error-prone process. To address this challenge, we propose an LLM-powered multi-agent workflow for specification-based metamorphic testing of FMU-based simulation models. The approach takes functional and interface specifications as input and orchestrates multiple agents to extract requirements and derive MRs. These MRs are expressed using Given-When-Then patterns to structure input conditions (Given), transformations (When), and expected output behaviours (Then). These relations are then used to generate metamorphic test cases, execute simulations, and evaluate output consistency across multiple sessions. We evaluate the approach on a Lube Oil Cooling system FMU, demonstrating its ability to automatically generate meaningful MRs and corresponding test cases. Preliminary results indicate that the proposed workflow can effectively support the systematic verification and validation of dynamic simulation models by reducing manual effort and improving test generation.
Ultrasound computed tomography is emerging as a promising safe and accessible modality for soft-tissue medical imaging, with full waveform inversion playing a key role in unlocking its full potential for high-resolution, quantitative reconstructions. Frequency domain full waveform inversion (FDFWI) for reconstructing spatial maps of acoustic properties in the musculoskeletal system is highly sensitive to the quality of low-frequency signals, making the final imaging outcome vulnerable to issues such as inappropriate initial models and strong scatterings related to bones. To address these challenges, we propose a hybrid full waveform inversion (HFWI) algorithm that incorporates a traveltime inversion algorithm based on the generalized Rytov approximation into the FDFWI framework. This hybrid strategy enhances early-stage inversion quality and substantially reduces sensitivity to the initial model, all while maintaining computational efficiency. Importantly, HFWI achieves results comparable to those obtained using well-constructed initial models, without incurring extra computational cost, thus enabling accurate imaging under realistic, bandwidth-limited conditions. In addition, we introduce a near real-time strategy to update first-arrival traveltimes based on forward-scattered phase variations without requiring extra wavefield simulations. Numerical simulations, as well as \textit{in vitro} and \textit{in vivo} experiments confirm the robustness and efficiency of the proposed approach. HFWI also shows promise to extend to more complex scenarios of musculoskeletal parametric reconstruction.
In this work, we study the interface of the Brazilian e-Voting Machine (BVM) in the context of electromagnetic side-channel threats commonly referred to as TEMPEST attacks. In a TEMPEST attack against video displays, an eavesdropper uses Software-Defined Radios (SDRs) to recover sensitive information by intercepting electromagnetic emanations generated during video signal transmission. We emulate the BVM using a VGA monitor by leveraging publicly available information disclosed by the electoral authority, including technical specifications, operational rules of the system, and the official BVM interface. Based on this setup, we investigate whether the BVM interface gives rise to a distinctive spectral signature observable through its unintended electromagnetic emissions. Our findings show that design characteristics relevant to a nationwide electoral process -- such as high image contrast, minimal on-screen information, and the prohibition of other electronic devices within the polling station -- result in a simple and highly distinctive spectral signature that can be observed even through a wall in our experiments. Although our experiments do not involve actual BVM hardware, the results raise concerns regarding the system's susceptibility to TEMPEST attacks and highlight the need for further research on protective countermeasures. In this context, our findings may support the design of automatic jammers capable of adaptively targeting compromising frequencies. To the best of our knowledge, this is the first study investigating TEMPEST attacks in the context of an electronic voting system officially adopted by a country.
Federated learning (FL) is an effective paradigm for enhancing the learning capability of edge devices while preserving data privacy. In geographically dispersed FL systems, such as sensor networks in remote areas, unmanned aerial vehicles (UAVs) can flexibly establish high-quality communication links to support parameter exchange. However, device heterogeneity and the limited battery capacity of UAVs pose significant challenges. Specifically, data heterogeneity slows convergence, while scheduling all devices for global collaboration incurs excessive communication and energy costs. To overcome these challenges, we adopt a strict separation between a globally shared backbone and permanently local personalization heads, thereby mitigating the impact of data heterogeneity. Furthermore, we propose a gradient-based scheduling strategy that jointly considers energy efficiency and learning performance. In each communication round, the backbone is updated only by the top-$\alpha$ devices ranked by gradient $\ell_{2}$-norm, ensuring that optimization focuses on the most informative updates. Simulation results demonstrate that the proposed scheme achieves higher learning accuracy than state-of-the-art approaches while significantly reducing UAV energy consumption.
We present FusionCore, an open-source ROS 2 sensor fusion package that fuses IMU, wheel encoder odometry, GPS, and Visual SLAM pose into a single 100 Hz odometry stream using a 23-state Unscented Kalman Filter (UKF). The 23rd state is an online estimate of the wheel encoder's systematic yaw rate bias, identified through GPS heading cross-covariance and subtracted during GPS blackouts to reduce heading drift in coast mode. FusionCore also estimates gyroscope and accelerometer biases as explicit filter states, handles GPS natively in ECEF without a separate coordinate projection node, applies per-sensor Mahalanobis chi-squared outlier gating calibrated to measurement degrees of freedom, and adapts sensor noise covariance automatically from the innovation sequence. VSLAM pose fusion enables GPS-denied operation with any visual odometry or SLAM system, including automatic recovery from map reinitialization. We evaluate against robot_localization on twelve full-length sequences (55-92 min each) from the NCLT public dataset. FusionCore achieves lower Absolute Trajectory Error (ATE) on ten of twelve sequences, with improvements ranging from 1.2x to 22.2x on winning sequences. The robot_localization UKF diverges numerically on all twelve sequences. FusionCore is available at this https URL under the Apache 2.0 license.
Understanding the statistics of level crossings in stochastic processes is crucial across many scientific disciplines. The traditional Kac-Rice formula gives the mean rate of level crossings and has found broad use. However, that mean rate captures only a coarse summary of the crossing process. It depends entirely on local properties of the stochastic process at a given instant and is therefore blind to the correlation structure of the process over time. To understand whether crossing events, such as neuronal spikes, tend to cluster in time, spread apart, or exhibit more complex temporal organization, one must go beyond the mean rate and study higher-order crossing statistics. Here we go beyond the mean by deriving the exact analytical formulae for the variance and Fano factor of arbitrary level crossings in smooth stationary Gaussian processes. Our exact solution reveals how the full temporal correlation structure dictates whether crossings cluster or become regular. In systems with oscillatory correlations, such as a stochastic damped harmonic oscillator, a recent crossing suppresses an immediate subsequent one, producing sub-Poissonian statistics. However, as damping increases and oscillations disappear, a large and slow excursion above the threshold can produce multiple closely spaced crossings, yielding super-Poissonian statistics. In purely relaxational, non-oscillatory systems, such as a mean-reverting process driven by Ornstein-Uhlenbeck noise, the competition between the timescales of the driving noise and system relaxation produces a richer landscape, including reentrant transitions between sub- and super-Poissonian statistics as the threshold level is varied. Taken together, the exact variance and Fano factor derived here complement the Kac-Rice mean rate, enabling more robust parameter estimation and model selection across any setting where Gaussian processes are used.
Power flow feasibility assessment is computationally challenging for unbalanced three-phase distribution networks. This paper develops a vectorized semidefinite program (SDP) based on the bus injection model (BIM) and reformulates its dual as an exact-penalty problem, enabling us to develop a scalable three-cut proximal bundle method for feasibility assessment. The proposed bundle method is numerically over 400 times faster than MOSEK with less than 1/2000 of its memory; on the decomposed BIM-SDP, approximately 2 times faster with 75% less memory.
Heterogeneous Internet of Things (IoT) systems suffer from fragmentation across hardware architectures, networking stacks, and data serialization formats. Existing standards (such as MQTT, COAP, and DDS) rely on address-bound, imperative routing models that require hardcoded configurations and leave no flexibility for runtime schema translation. This paper presents TIP (The Intent Protocol), a decentralized, declarative network protocol. Instead of addressing specific physical endpoints, nodes submit abstract intents specifying desired capabilities, schemas, and Quality of Service (QoS) constraints. The TIP Engine resolves matching nodes using a hybrid discovery mechanism combining local multicast DNS (mDNS) with Kademlia Distributed Hash Tables (DHT). Selection is optimized via a multi-criteria scoring algorithm incorporating network latency, historical reputation, and contract compliance. Mismatched data representations are reconciled on-the-fly inside isolated WebAssembly (WASM) sandboxes compiled dynamically from TOML specifications. Security is enforced through Ed25519 signatures, X25519 key exchanges, and ChaCha20-Poly 1305 payload encryption. Evaluation of our reference implementation in Rust and C++ shows sub-millisecond translation overhead and robust resilience under industrial conditions.
Neural network (NN) dynamics models and control policies achieve strong performance in robotics, but providing sound guarantees under uncertainty remains difficult, especially for closed-loop NN systems. Existing reachability tools provide formal over-approximations, yet are often non-differentiable, overly conservative, or too slow for modern learning and online planning pipelines. To address this, we present a parallelizable, differentiable reachability framework in JAX for continuous- and discrete-time systems with analytical and NN-based dynamics and controllers. Our framework combines Taylor-model flowpipe construction with CROWN-style linear bound propagation through a unified representation that preserves affine dependencies while supporting GPU-batched computation and automatic differentiation. Building on this reachability primitive, we develop (i) a certified training method that encourages reachability-friendly dynamics models and controllers, and (ii) a reachability-aware sampling-based MPC scheme with gradient-based refinement. Experiments on non-prehensile manipulation and quadrotor tasks, including hardware and higher-dimensional evaluations (up to 72D), demonstrate practical online planning while maintaining certified reachable-set over-approximations under bounded uncertainty.
We study the restless contextual multi-play multi-armed bandit (MP-MAB) problem for channel allocation in the opportunity spectrum access (OSA) scenario. Most existing MP-MAB methods are impractical for real-world OSA systems as they assume many ideal conditions, incur a heavy computational cost, and most importantly, ignore the impact of channel noise which is directly related to the quality of service. In this study, we embody this impact by modeling channel noise as a perturbation of the arm's reward function in MP-MAB. As there is an implicit correlation between channel state information and channel noise, we take the former as a context for MP-MAB to present the perturbation caused by the latter. We investigate two types of correlation between the context and the perturbation -- linear and nonlinear, and derive two index policies, respectively. These policies learn the correlations through a linear model and a neural network, and use estimated noise value to adjust the upper confidence bound. Numerical experiments demonstrate that the proposed policies can achieve lower regret and select sub-optimal arms in a more reasonable way.
Cascaded Automatic Speech Recognition -- Large Language Model (ASR-LLM) pipelines remain popular for industrial Spoken Dialogue Systems (SDS), primarily because their decoupled design ensures perceptual verifiability. However, cascaded systems suffer from error propagation, as transcription failures inevitably cascade to subsequent components, thereby degrading the final interaction quality. Although ASR confidence scores offer a simple filter for unreliable inputs, this approach is fundamentally limited because it typically fails to detect deletion errors or to distinguish between acoustic (inability to hear clearly) and linguistic (inability to understand) mismatches, both of which require targeted recovery strategies. In this paper, we propose a cause-aware error recovery paradigm that fundamentally rethinks robustness in SDS. Unlike traditional confidence filtering, we introduce a suite of small precision-focused detectors that exploit deep ASR latent representations to disentangle token-level errors into perception, comprehension, and deletion failures. This fine-grained diagnostic intelligence empowers the LLM to orchestrate targeted, multi-turn clarification strategies, effectively transforming ambiguous signals into seamless user interactions. Experimental results validate the precision of our approach, which more than doubles the recall on domain-shift errors (57.96% vs. 23.66%) compared to baselines. Crucially, this diagnostic precision yields up to a 30% reduction in WER and a 17% improvement on the downstream task across diverse accents, distortions, and domains.
The 3GPP V2X resource allocation framework defines two entity classes -- the base station and the vehicle UE -- and four modes across LTE and NR generations. We demonstrate that this binary taxonomy is structurally incomplete. Base station-led scheduling saturates at high-density traffic nodes, producing latency-tail failures that persist even when mean packet delivery ratios approach the service-class target. UE autonomy is categorically incapable of pre-emergence warning for occluded traffic participants and insufficient for large-scope cascading environmental hazards. We propose Mode 0, a new 3GPP V2X category whose defining entity is the Roadside Computing Unit (RCU) -- an infrastructure ensemble integrating elevated sensing (Seeing), sidelink communication (Speaking), and local computational evaluation (Thinking), owned by traffic management authorities. Mode 0 defines a subfamily spectrum from Mode 0a (all-passive UEs, the guaranteed minimum) through Mode 0c (all-active UEs, the optimal target). Convergent deployment evidence from Chinese national standards (DB11/T 2329.1-2024, T/ITS 0224.1-2025), China Unicom RS-MEC infrastructure, and European and US C-V2X programs confirms that both institutional sides are converging on the roadside traffic node without a coordination standard. A fifteen-run Multi-Agent Proximal Policy Optimization (MAPPO) simulation validates the architectural family: Mode 0a in shared-pool baseline sits at the analytical symmetric-Nash coordination floor; Mode 0c with demand separation achieves strict Pareto improvement for both traffic classes (M0 PDR 0.999, M1 PDR 0.998 at $\rho_{\rm pool} \leq 1$) and lifts the worst-TTI delivery ratio from near-zero to 0.601 -- the only configuration satisfying the latency safety requirement structurally. We call for a 3GPP study item on Mode 0 within the NR-V2X sidelink enhancement work programme.
We describe the winning system for Task 2 of the KSAA-2026 Shared Task on Arabic Speech Dictation with Automatic Diacritization. The task requires producing fully diacritized Arabic text from speech audio and undiacritized transcripts, with only 2,327 training samples available and no external data permitted. Our system fine-tunes CATT-Whisper, a character-level multimodal model combining a pretrained CATT text encoder with a frozen Whisper speech encoder. The key to our approach is training regularization: R-Drop consistency regularization, Optuna-optimized hyperparameters with high weight decay, and Focal Loss. At inference, we average 200 stochastic forward passes across four model checkpoints using Monte Carlo Dropout at the softmax probability level. The system achieves 23.26% WER on the primary leaderboard metric (with case endings, including no-diacritic positions), placing 1st among all participants.
Conventional power system optimization framework is becoming less reliable and efficient due to the stability issues brought by the ever-increasing inverter-interfaced renewable penetration. To ensure system stability during system operation and to provide appropriate incentives in the future market-based stability maintenance framework, it is essential to develop a comprehensive set of power system stability constraints which can be incorporated into system optimization. In this paper, different system stability issues, including synchronization, voltage and frequency stability, are investigated and the corresponding stability conditions are analytically formulated as system operational constraints. A unified framework is further proposed to represent the stability constraints in a general form and enable effective reformulation of the impedance-based stability metrics. All the constraints are converted into linear or Second-Order-Cone (SOC) form, which can be readily implemented in any optimization-based applications, such as system scheduling, planning and market design, thus providing significant value for multiple system stability enhancement and studies.
Multiple operational constraints of power system stability are derived analytically and reformulated into Second-Order Cone (SOC) form through a unification method in Part I of this paper. The accuracy and conservativeness of the proposed methods are illustrated in the second part. The validity of the developed constraints is tested against dynamic simulations carried out based on the modified IEEE 39-bus system. Furthermore, the developed power system stability constraints are applied to the optimal system scheduling model. The resulting stability-constrained system scheduling problem aims to achieve most economic system operation while ensuring different stability in power systems with high Inverter-Based Resources (IBR) penetration. Moreover, based on the stability-constrained optimization model, a novel marginal unit pricing scheme is proposed to quantify the stability services of different units appropriately according to their economic value in maintaining system stability, thus providing rational incentives to the stability service provider and insightful information for the stability market development.
We investigate a framework for train-free MRI segmentation based on Topological Data Analysis. The pipeline proceeds in three steps, first identifying the whole object to segment via automatic thresholding, then detecting a distinctive subset whose topology is known in advance, and finally deducing the various components of the segmentation. A key ingredient is the extraction of approximate representative cycles from persistence diagrams, which provides an interpretable link between persistent features and anatomical components. To clarify the method's scope, we make the underlying topological and intensity assumptions explicit, quantify when they hold on real data, and analyze typical failure modes. We evaluate the approach on glioblastoma and on fetal cortical plate segmentation, with comparisons to unsupervised and deep-learning references. By operating without large annotated datasets, the method is well suited to scarce-data settings and provides an interpretable baseline and practical initialization for expert refinement or learning-based pipelines.
This study focuses on optimizing the design parameters of a Dual Active Bridge (DAB) converter for use in 350 kW DC fast chargers, emphasizing the balance between efficiency and cost. Addressing the observed gaps in existing high-power application research, it introduces an optimization framework to evaluate critical design parameters,number of converter modules, switching frequency, and transformer turns ratio,within a broad operational voltage range. The analysis identifies an optimal configuration that achieves over 95% efficiency at rated power across a wide output voltage range, comprising seven 50 kW DAB converters with a switching frequency of 30 kHz, and a transformer turns ratio of 0.9.
We introduce data to predictive control, D2PC, a framework to facilitate the design of robust and predictive controllers from data. The proposed framework is designed for discrete-time stochastic linear systems with output measurements and provides a principled design of a predictive controller based on data. The framework builds on a parameter identification method based on the Expectation-Maximization algorithm, which incorporates pre-defined structural constraints. An asymptotic approximation is leveraged to quantify the uncertainty in the parameter estimates. As the main contributions, a robust control and predictive control design are proposed tailored to the uncertainty characterization resulting from the identification. In particular, a strategy to synthesize robust dynamic output-feedback controllers is presented. Furthermore, a predictive control scheme that guarantees recursive feasibility and satisfaction of chance constraints is developed. This framework marks a significant advancement in integrating data-driven models into robust and predictive control designs. We demonstrate the efficacy of D2PC through a numerical example involving a $10$-dimensional spring-mass-damper system.
In Integrated Sensing and Communication (ISAC) networks, distributed devices can cooperate to produce radio images of the surrounding environment by exploiting phase-coherent signal processing. However, existing imaging methods are not well-suited for composite moving targets with multiple independently moving extended parts. This is due to simplistic isotropic scattering models and the lack of methods to compensate for distinct Doppler shifts from each component, which leads to image defocusing. We propose MOSAIC, the first hierarchical imaging method for composite moving targets using distributed User Equipments (UEs) and a single ISAC Base Station (BS). MOSAIC generates high-resolution images of each target part and estimates its velocity vector. Coherent imaging is performed within selected clusters of UEs observing a locally isotropic scattering from each part, while cluster-specific images are combined non-coherently across wide angles to improve the reconstruction. To mitigate Doppler-induced defocusing, Doppler components are pre-compensated before coherent imaging, turning a limitation into an additional means of resolving multiple target parts. This also enables low-complexity velocity estimation by associating Doppler frequencies across UEs. Simulations show over 50% improvement in image quality compared to existing methods, in terms of Wasserstein distance, and dm/s-level velocity estimation accuracy.
Automation of sleep analysis, including both macrostructural (sleep stages) and microstructural (e.g., sleep spindles) elements, promises to enable large-scale sleep studies and to reduce variance due to inter-rater incongruencies. While individual steps, such as sleep staging and spindle detection, have been studied separately, the feasibility of automating multi-step sleep analysis remains unclear. In this case study, we evaluate whether a fully automated analysis using validated machine learning models for sleep staging (RobustSleepNet) and subsequent spindle detection (SUMOv2) can replicate findings from an expert-based study of bipolar disorder. The automated analysis qualitatively reproduced key findings from the expert-based study, including significant differences in fast spindle densities between bipolar patients and healthy controls, accomplishing in minutes what previously took months to complete manually. While the results of the automated analysis differed quantitatively from the expert-based study, possibly due to biases between expert raters or between raters and the models, the models individually performed at or above inter-rater agreement for both sleep staging and spindle detection. Our results demonstrate that fully automated approaches have the potential to facilitate large-scale sleep research. We are providing public access to the tools used in our automated analysis by sharing our code and introducing SomnoBot, a privacy-preserving sleep analysis platform.
Hyperspectral image (HSI) analysis plays a critical role in remote sensing, agriculture, and environmental monitoring. However, traditional methods often struggle to handle the high dimensionality, spectral redundancy, and noise inherent in HSI data, limiting their accuracy and scalability. Recently, diffusion models including denoising diffusion probabilistic models and other generative frameworks based on stochastic differential equations have shown strong potential in capturing complex spectral spatial structures and generating high fidelity HSI data. These models offer effective solutions for tasks such as noise supression, data augmentation, classification, and anomaly detection. This review presents a systematic summary of recent advances in diffusion models for HSI processing. We categorize existing methods, highlight their strengths in handling high dimensional data, and compare their performance with conventional approaches. Special attention is given to critical applications such as change detection and post disaster anomaly identification. The review also discusses current limitations, such as computational cost and training stability, and outlines potential research directions. Our main contributions can be summarized as follows: we provide a systematic taxonomy of diffusion based HSI methods, examine their applications across major remote sensing tasks, and offer perspectives on potential directions for future research. With these efforts, this review seeks to support the community in harnessing deep learning models to achieve more effective and efficient hyperspectral image analysis.
Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructing dense physical fields from sparse observations and optimizing scattered sensor placements to observe maximum information. While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placements, leaving the mutual enhancement between reconstruction and placement on the shelf. To change this suboptimal practice, we propose PhySense, a synergistic two-stage framework that learns to jointly reconstruct physical fields and to optimize sensor placements, both aiming for accurate physics sensing. The first stage involves a flow-based generative model enhanced by cross-attention to adaptively fuse sparse observations. Leveraging the reconstruction feedback, the second stage performs sensor placement via projected gradient descent to satisfy spatial constraints. We further prove that the learning objectives of the two stages are consistent with classical variance-minimization principles, providing theoretical guarantees. Extensive experiments across three challenging benchmarks, especially a 3D geometry dataset, indicate PhySense achieves state-of-the-art physics sensing accuracy and discovers informative sensor placements previously unconsidered. Code is available at this repository: this https URL.
Autonomous driving simulators still lack high-fidelity radar, even though radar is critical for robust perception in adverse weather. A key obstacle is that raw radar point clouds are extremely sparse and stochastic, making it difficult to model; we argue that simulating the full range-azimuth-Doppler cube is a more principled target. Existing radar cube simulators either rely purely on neural generators, which are opaque and offer little control over sensor attributes, or on detailed electromagnetic pipelines, which are slow, require proprietary hardware specifications, and still struggle to capture real-world complexity. We introduce Ctrl-RS, a controllable radar cube simulation framework that combines the strengths of both worlds. First, we build an environment reflection tensor from diverse sensor sources (including LiDAR, monocular cameras, and existing radar). Second, we abstract radar physics into a compact set of waveform parameters that characterize the 3D point spread function, yielding an intuitive embedding of radar attributes such as range resolution, Doppler broadening, and azimuth beam shape. Third, we train a WARP-Net on a large mixed dataset that fuses real, analytically synthesized, and simulator-generated radar cubes to cover a wide distribution of radar attributes. Ctrl-RS supports viewpoint changes, actor removal, and attribute editing. Experiments on RADDet, Carrada, and nuScenes show that our simulated data can match or surpass real radar in 2D detection and semantic segmentation, and consistently boosts performance in 3D detection when combined with real data. The Project is available at this https URL.
Medical image segmentation plays a crucial role in clinical diagnosis and treatment planning, where accurate boundary delineation is essential for precise lesion localization, organ identification, and quantitative assessment. In recent years, deep learning-based methods have significantly advanced segmentation accuracy. However, two major challenges remain. First, the performance of these methods heavily relies on large-scale annotated datasets, which are often difficult to obtain in medical scenarios due to privacy concerns and high annotation costs. Second, clinically challenging scenarios, such as low contrast in certain imaging modalities and blurry lesion boundaries caused by malignancy, still pose obstacles to precise segmentation. To address these challenges, we propose MedSAM-CA, an architecture-level fine-tuning approach that mitigates reliance on extensive manual annotations by adapting the pretrained foundation model, Medical Segment Anything (MedSAM). MedSAM-CA introduces two key components: the Convolutional Attention-Enhanced Boundary Refinement Network (CBR-Net) and the Attention-Enhanced Feature Fusion Block (Atte-FFB). CBR-Net operates in parallel with the MedSAM encoder to recover boundary information potentially overlooked by long-range attention mechanisms, leveraging hierarchical convolutional processing. Atte-FFB, embedded in the MedSAM decoder, fuses multi-level fine-grained features from skip connections in CBR-Net with global representations upsampled within the decoder to enhance boundary delineation accuracy. Experiments on publicly available datasets covering dermoscopy, CT, and MRI imaging modalities validate the effectiveness of MedSAM-CA. On dermoscopy dataset, MedSAM-CA achieves 94.43% Dice with only 2% of full training data, reaching 97.25% of full-data training performance, demonstrating strong effectiveness in low-resource clinical settings.
The rapid growth of distributed energy resources (DERs), including rooftop solar and energy storage, is transforming the grid edge, where distributed technologies and customer-side systems increasingly interact with the broader power grid. DER aggregators, entities that coordinate and optimize the actions of many small-scale DERs, play a key role in this transformation. This paper presents a hybrid Mean-Field Control (MFC) and Mean-Field Game (MFG) framework for integrating DER aggregators into wholesale electricity markets. Unlike traditional approaches that treat market prices as exogenous, our model captures the feedback between aggregators' strategies and locational marginal prices (LMPs) of electricity. The MFC component optimizes DER operations within each aggregator, while the MFG models strategic interactions among multiple aggregators. To account for various uncertainties, we incorporate reinforcement learning (RL), which allows aggregators to learn optimal bidding strategies in dynamic market conditions. We prove the existence and uniqueness of a mean-field equilibrium and validate the framework through a case study of the Oahu Island power system. Results show that our approach reduces price volatility and improves market efficiency, offering a scalable and decentralized solution for DER integration in wholesale markets.
While deep learning offers tremendous promise for scientific and medical imaging, any failures and hallucinations (predictions that do not coincide with reality) are hard to pinpoint and can have serious downstream consequences. Uncertainty estimation techniques, such as conformal prediction, can help by predicting statistically valid error bars for a model's prediction. However, popular conformal prediction methods were not designed for high-dimensional image-valued problems and do not take into account spatial correlations within an image during conformal calibration, resulting in larger-than-necessary uncertainty intervals. We propose a practical simultaneous quantile regression method that enables non-linear, spatially-adaptive scaling during conformal calibration. Our method, QUTCC uses a U-Net architecture with a quantile embedding to learn a full conditional quantile distribution during training, and then leverages this non-linear, learned function for spatially-adaptive conformal calibration. At test time, our method can efficiently estimate uncertainty intervals with pixel-marginal coverage guarantees. In addition, QUTCC can also predict pixel-wise conditional probability density estimates without any built-in distributional assumptions. We evaluate our method on several denoising problems, accelerated magnetic resonance imaging, and quantitative phase microscopy. Our method consistently produces tighter uncertainty intervals than prior conformal methods at the same coverage level, can predict plausible conditional distributions for different tasks, and in some cases, high-uncertainty regions can help us locate hallucinations in a model's prediction.
Most existing antenna array-based source localization methods rely on fixed-position arrays (FPAs) and strict assumptions about source field conditions (near-field or far-field), which limits their effectiveness in complex, dynamic real-world scenarios where high-precision localization is required. In contrast, this paper introduces a novel scalable fluid antenna system (SFAS) that can dynamically adjust its aperture configuration to optimize performance for different localization tasks. Within this framework, we develop a two-stage source localization strategy based on the exact spatial geometry (ESG) model: the first stage uses a compact aperture configuration for initial direction-of-arrival (DOA) estimation, while the second stage employs an expanded aperture for enhanced DOA and range estimation. The proposed approach eliminates the traditional need for signal separation or isolation to classify source types and enables a single SFAS array to achieve high localization accuracy without field-specific assumptions, model simplifications, or approximations, representing a new paradigm in array-based source localization. Extensive simulations demonstrate the superiority of the proposed method in terms of localization accuracy, computational efficiency, and robustness to different source types.
Distributed systems require fusing heterogeneous local probability distributions into a global summary over sparse and unreliable communication networks. Traditional consensus algorithms, which average distributions in Euclidean space, ignore their inherent geometric structure, leading to misleading results. Wasserstein barycenters offer a geometry-aware alternative by minimizing optimal transport costs, but their entropic approximations via the Sinkhorn algorithm typically require centralized coordination. This paper proposes a fully decentralized Sinkhorn algorithm that reformulates the centralized geometric mean as an arithmetic average in the log-domain, enabling approximation through local gossip protocols. Agents exchange log-messages with neighbors, interleaving consensus phases with local updates to mimic centralized iterations without a coordinator. To optimize bandwidth, we integrate event-triggered transmissions and b-bit quantization, providing tunable trade-offs between accuracy and communication while accommodating asynchrony and packet loss. Under mild assumptions, we prove convergence to a neighborhood of the centralized entropic barycenter, with bias linearly dependent on consensus tolerance, trigger threshold, and quantization error. Complexity scales near-linearly with network size. Simulations confirm near-centralized accuracy with significantly fewer messages, across various topologies and conditions.
The control of high-dimensional distributed parameter systems (DPS) remains a challenge when explicit coarse-grained equations are unavailable. Classical equation-free (EF) approaches rely on fine-scale simulators treated as black-box timesteppers. However, repeated simulations for steady-state computation, linearization, and control design are often computationally prohibitive, or the microscopic timestepper may not even be available, leaving us with data as the only resource. We propose a data-driven alternative that uses local neural operators, trained on spatiotemporal microscopic/mesoscopic data, to obtain efficient short-time solution operators. These surrogates are employed within Krylov subspace methods to compute coarse stable and unstable steady states, while also providing Jacobian information in a matrix-free manner. Krylov-Arnoldi iterations then approximate the dominant eigenspectrum, yielding reduced models that capture the open-loop slow dynamics without explicit Jacobian assembly. Both discrete-time Linear Quadratic Regulator (dLQR) and pole-placement (PP) controllers are based on this reduced system and lifted back to the full nonlinear dynamics, thereby closing the feedback loop. The framework is validated by stabilizing an unstable steady-state of the Liouville-Bratu PDE, demonstrating consistent performance between the learned surrogate and the true system, with quantified degradation under plant-model mismatch.
Recent advances in text-to-speech (TTS) technology have enabled systems to generate speech that is often indistinguishable from human speech, bringing benefits to accessibility, content creation, and human-computer interaction. However, current evaluation practices are increasingly inadequate for capturing the full range of capabilities, limitations, and societal impacts of modern TTS systems. This position paper introduces the concept of Responsible Evaluation and argues that it is essential and urgent for the next phase of TTS development, structured through three progressive levels: (1) ensuring the faithful and accurate reflection of a model's true capabilities and limitations, with more robust, discriminative, and comprehensive objective and subjective scoring methodologies; (2) enabling comparability, standardization, and transferability through standardized benchmarks, transparent reporting, and transferable evaluation metrics; and (3) assessing governance, fairness, and security concerns around data provenance, disparities, misuse, spoofing, and traceability. Through this concept, we critically examine current evaluation practices, identify systemic shortcomings, and propose actionable recommendations. We hope this concept will not only foster more reliable TTS technology but also guide its development toward ethically sound and societally beneficial applications.
HAPS are emerging as key enablers in the evolution of 6G wireless networks, bridging terrestrial and non-terrestrial infrastructures. Operating in the stratosphere, HAPS can provide wide-area coverage, low-latency, energy-efficient broadband communications with flexible deployment options for diverse applications. This survey delivers a comprehensive overview of HAPS use cases, technologies, and integration strategies within the 6G ecosystem. The roles of HAPS in extending connectivity to underserved regions, supporting dynamic backhauling, enabling massive IoT, and delivering reliable low-latency communications for autonomous and immersive services are discussed. The paper reviews state-of-the-art architectures for terrestrial and non-terrestrial network integration, highlights recent field trials. Furthermore, key enabling technologies such as channel modeling, AI-driven resource allocation, interference control, mobility management, and energy-efficient communications are examined. The paper also outlines open research challenges. By addressing existing gaps in the literature, this survey positions HAPS as a foundational component of globally integrated, resilient, and sustainable 6G networks.
Underwater data infrastructures offer natural cooling and enhanced physical security compared to terrestrial facilities, but their storage systems remain susceptible to acoustic injection attacks, where sound-induced mechanical vibrations disrupt critical I/O operations and compromise data availability. This work presents a surveillance framework for localizing and tracking such close-range adversarial acoustic sources targeting offshore infrastructures, particularly underwater data centers (UDCs). We propose a scalable heterogeneous receiver configuration with one facility-mounted hydrophone and one mobile hydrophone carried by a surveillance robot. The resulting problem differs from conventional sound source localization (SSL) due to distributed facility scale, narrowband signaling with phase ambiguity, non-cooperative sources, and mobile receiver state uncertainty. To address these challenges, we formulate a Locus-Conditioned Maximum A-Posteriori (LC-MAP) scheme that generates acoustically informed priors, ensuring a physically plausible initial state for a joint time- and frequency-difference-of-arrival (TDOA-FDOA) filtering. We integrate this into an unscented Kalman filter (UKF) pipeline, along with a multipath-aware measurement model that compensates for surface and bed reflections, and an effective measurement covariance that accounts for mobile receiver uncertainty. Extensive Monte Carlo analyses, fixed-array baseline comparisons, Gazebo-based physics simulations, and field trials demonstrate reliable real-time localization and tracking. The framework achieves sub-meter localization accuracy and over 90% success rates in most scenarios, with convergence times nearly halved compared to baselines. Overall, this study establishes a geometry-aware, real-time approach for acoustic threat localization and advances autonomous surveillance capabilities of underwater infrastructure.
Accurate, real-time wireless signal prediction is essential for next-generation networks. However, existing vision-based frameworks often rely on computationally intensive models and are also sensitive to environmental interference. To overcome these limitations, we propose a novel, physics-guided and light-weighted framework that predicts the received signal strength indicator (RSSI) from camera images. By decomposing RSSI into its physically interpretable components, path loss and shadow fading, we significantly reduce the model's learning difficulty and exhibit interpretability. Our approach establishes a new state-of-the-art by demonstrating exceptional robustness to environmental interference, a critical flaw in prior work. Quantitatively, our model reduces the prediction root mean squared error (RMSE) by 50.3% under conventional conditions and still achieves an 11.5% lower RMSE than the previous benchmark's interference-eliminated results. This superior performance is achieved with a remarkably lightweight framework, utilizing a MobileNet-based model up to 19 times smaller than competing solutions. The combination of high accuracy, robustness to interference, and computational efficiency makes our framework highly suitable for real-time, on-device deployment in edge devices, paving the way for more intelligent and reliable wireless communication systems.
In this work, we consider the problem of executing multiple tasks encoded by value functions, each learned through Reinforcement Learning, using an optimization-based framework. Prior works develop this framework but did not address when learned value functions can be concurrently executed. This work's main contributions consist of theorems which provide necessary and sufficient conditions to concurrently execute sets of learned tasks within subsets of the state space using the previously proposed min-norm controller. These theorems provide insight into when learned control tasks can be made concurrently executable, when they may already be so, and when concurrent execution is not possible under the proposed framework. We also extend the proposed framework to account for value functions trained with a discount factor, making it more compatible with standard RL practices.
Deep neural network (DNN)-based receivers offer a powerful alternative to classical model-based designs for wireless communication, especially in complex and nonlinear propagation environments. However, their adoption is challenged by the rapid variability of wireless channels, which makes pre-trained static DNN-based receivers ineffective, and by the latency and computational burden of online stochastic gradient descent (SGD)-based learning. In this work, we propose an online learning framework that enables rapid low-complexity adaptation of DNN-based receivers. Our approach is based on two main tenets. First, we cast online learning as Bayesian tracking in parameter space, enabling a single-step adaptation, which deviates from multi-epoch SGD . Second, we focus on modular DNN architectures that enable parallel, online, and localized variational Bayesian updates. Simulations with practical communication channels demonstrate that our proposed online learning framework can maintain a low error rate with markedly reduced update latency and increased robustness to channel dynamics as compared to traditional gradient descent based method.
We investigate the problem of maximizing the sum-rate performance of a beyond-diagonal reconfigurable intelligent surface (BD-RIS)-aided multi-user (MU)-multiple-input single-output (MISO) system using fractional programming (FP) techniques. More specifically, we leverage the Lagrangian Dual Transform (LDT) and Quadratic Transform (QT) to derive an equivalent objective function which is then solved iteratively via a manifold optimization framework. It is shown that these techniques reduce the complexity of the optimization problem for the scattering matrix solution, while also providing notable performance gains compared to state-of-the-art (SotA) methods under the same system conditions. Simulation results confirm the effectiveness of the proposed method in improving sum-rate performance.
The increasing integration of renewable energy sources into electrical grids necessitates a paradigm shift toward advanced control schemes that guarantee safe and stable operations with scalable properties. Accordingly, this paper investigates large-signal stability guarantees for cyber-physical DC microgrids employing a nonlinear distributed consensus-based control scheme to enable coordinated integration and management of distributed generation units within an expandable framework. The proposed control framework adopts nested control loops; inner (decentralized) and outer (distributed), specifically designed to simultaneously achieve uniform voltage containment within pre-specified limits, and proportional current sharing in steady state. Our scalable stability result relies on singular perturbation theory and Lyapunov arguments to prove global exponential stability when imposing a sufficient time-scale separation at the border between the nested control loops, while relying on some practical parameter-setting schemes. The effectiveness and versatility of the proposed control strategy are then validated through time-domain simulations performed on a case-specific low-voltage DC microgrid and the modified IEEE 33-bus radial distribution system. Moreover, a small-signal stability analysis is conducted to derive practical guidelines that enhance the applicability of the method.
This paper studies the problem of distributed Riemannian optimization over a network of agents whose cost functions are geodesically smooth but possibly geodesically non-convex. Extending a well-known distributed optimization strategy called diffusion adaptation to Riemannian manifolds, we show that the resulting algorithm, the Riemannian diffusion adaptation, provably exhibits several desirable behaviors when minimizing a sum of geodesically smooth non-convex functions over manifolds of bounded curvature. More specifically, we establish that the algorithm can approximately achieve network agreement in the sense that Fréchet variance of the iterates among the agents is small. Moreover, the algorithm is guaranteed to converge to a first-order stationary point for general geodesically non-convex cost functions. When the global cost function additionally satisfies the local Riemannian Polyak-Lojasiewicz (PL) condition, we also show that it converges linearly under a constant step size up to a steady-state error. Finally, we apply this algorithm to a decentralized robust principal component analysis (PCA) problem formulated on the Grassmann manifold and low-rank matrix completion problems and illustrate its convergence and performance through numerical simulations.
Flow-based text-to-image (T2I) models excel at prompt-driven image generation, but falter on Image Restoration (IR), often "drifting away" from being faithful to the measurement. Prior work mitigate this drift with data-specific flows or task-specific adapters that are computationally heavy and not scalable across tasks. This raises the question "Can't we efficiently manipulate the existing generative capabilities of a flow model?" To this end, we introduce FlowSteer (FS), an operator-aware conditioning scheme that injects measurement priors along the sampling path,coupling a frozed flow's implicit guidance with explicit measurement constraints. Across super-resolution, deblurring, denoising, and colorization, FS improves measurement consistency and identity preservation in a strictly zero-shot setting-no retrained models, no adapters. We show how the nature of flow models and their sensitivities to noise inform the design of such a scheduler. FlowSteer, although simple, achieves a higher fidelity of reconstructed images, while leveraging the rich generative priors of flow models. All data and code will be publicly available \href{this https URL}{in this link}.
Endmember extraction from hyperspectral images aims to identify the spectral signatures of materials present in a scene. Recent studies have shown that self-dictionary methods can achieve high extraction accuracy; however, their high computational cost limits their applicability to large-scale hyperspectral images. Although several approaches have been proposed to mitigate this issue, it remains a major challenge. Motivated by this situation, this paper pursues a data reduction approach. Assuming that a hyperspectral image follows the linear mixing model with the pure-pixel assumption, we develop a data reduction technique to remove pixels corresponding to mixtures of multiple endmember signatures. We analyze the theoretical properties of this reduction step and show that it preserves pixels that lie close to the endmembers. Building on this result, we propose a data-reduced self-dictionary method that integrates the data reduction with a self-dictionary method based on a linear programming formulation. Numerical experiments demonstrate that the proposed method can substantially reduce the computational time of the original self-dictionary method without sacrificing endmember extraction accuracy.
Aerocapture is particularly challenging for semi-analytical propagation because the dynamics are dominated by nonconservative forces whose magnitudes vary significantly throughout the trajectory. State transition tensors (STTs), higher-order Taylor series expansions of the solution flow, have been widely used as a computationally efficient semi-analytical propagation method for orbital scenarios, but have not previously been applied to aerocapture. However, computing higher-order STTs requires integrating exponentially many equations as the state dimension increases. Directional state transition tensors (DSTTs) mitigate this cost by projecting the state into a reduced-dimension basis. This work develops novel dynamics analysis techniques to identify effective bases for this reduction, including augmented higher-order Cauchy Green tensors tailored to quantities of interest such as apoapsis radius. Results show that DSTTs constructed along these bases significantly reduce computational cost while maintaining accuracy in predicted apoapsis radius and terminal energy. In particular, certain of these DSTTs outperform traditional DSTTs in nonlinear perturbation propagation for key state subsets and quantities of interest. These results establish STTs and DSTTs as practical tools for aerocapture performance analysis to enable robust guidance and navigation.
An open multi-agent system (OMAS) features migrating agents which produce a flexible network that is naturally switching and size-varying. Meanwhile, agent migrations also make an OMAS prone to environmental adversities. In this work, we investigate the consensus tracking problem of OMASs suffering migration-induced adversities, including non-vanishing agent dynamics/state perturbations and repelling antagonistic interactions among agents, over an intermittently disconnected signed digraph. The OMAS is interpreted into a perturbed multi-mode multi-dimensional ($M^3D$) system in which unstable subsystems are created when repelling interactions dominate the cooperative ones in the network regardless of its connectivity. To handle the destabilizing effect brought by repelling interactions and non-vanishing perturbations, we extend the stability theory for $M^3D$ systems and apply it to the OMAS to show that ultimately bounded consensus tracking can be achieved if the network switching satisfies the piecewise average dwell time and activation time ratio conditions. Particularly, for vanishing perturbations, asymptotic tracking can be ensured under weaker switching conditions.
Accurate wireless localization underpins applications from autonomous systems to smart infrastructure. We study the mean-squared error (MSE) and conditional MSE (CMSE) of a practical fusion-based estimator in d-dimensional, stationary isotropic (translation- and rotation-invariant) random sensor networks, where a central processor combines received-signal-strength (RSS) and angle-of-arrival (AOA) measurements to infer a target's position. Our contributions are twofold. First, we establish an approximation theorem: when measurement noise is sufficiently large, the joint law of RSS and AOA observations under a broad class of stationary isotropic deployments is, in distribution, indistinguishable from that induced by a homogeneous Poisson point process (PPP). Second, leveraging this equivalence, we investigate a homogeneous PPP-based sensor network. We propose a fusion-based estimator in which a central processor aggregates RSS and AOA measurements from a set of spatially distributed sensors to infer the target position. For this PPP deployment within a finite observation region, we derive tractable analytical upper bounds for both the MSE and CMSE, establishing explicit scaling laws with respect to sensor density, observation radius, and noise variance. The approximation theorem then certifies these PPP-based bounds as reasonable proxies for non-Poisson deployments in noisy regimes. Overall, the results translate deployment and sensing parameters into achievable accuracy targets and provide robust, cost-aware guidance for the design of next-generation location-aware wireless networks.
Parallel imaging techniques reduce magnetic resonance imaging (MRI) scan time but image quality degrades as the acceleration factor increases. In clinical practice, conservative acceleration factors are chosen because no mechanism exists to automatically assess the diagnostic quality of undersampled reconstructions. This work introduces a general framework for pixel-wise uncertainty quantification in parallel MRI reconstructions, enabling automatic identification of unreliable regions without access to any ground-truth reference image. Our method integrates conformal quantile regression with image reconstruction methods to estimate statistically rigorous pixel-wise uncertainty intervals. We trained and evaluated our model on Cartesian undersampled brain and knee data obtained from the fastMRI dataset using acceleration factors ranging from 2 to 10. An end-to-end Variational Network was used for image reconstruction. Quantitative experiments demonstrate strong agreement between predicted uncertainty maps and true reconstruction error. Using our method, the corresponding Pearson correlation coefficient was higher than 90% at acceleration levels at and above four-fold; whereas it dropped to less than 70% when the uncertainty was computed using a simpler a heuristic notion (magnitude of the residual). Qualitative examples further show the uncertainty maps based on quantile regression capture the magnitude and spatial distribution of reconstruction errors across acceleration factors, with regions of elevated uncertainty aligning with pathologies and artifacts. The proposed framework enables evaluation of reconstruction quality without access to fully-sampled ground-truth reference images. It represents a step toward adaptive MRI acquisition protocols that may be able to dynamically balance scan time and diagnostic reliability.
Echocardiography is a cornerstone for managing heart failure (HF), with Left Ventricular Ejection Fraction (LVEF) being a critical metric for guiding therapy. However, manual LVEF assessment suffers from high inter-observer variability, while existing Deep Learning (DL) models are often computationally intensive and data-hungry "black boxes" that impede clinical trust and adoption. Here, we propose a backpropagation-free multi-task Green Learning (MTGL) framework that performs simultaneous Left Ventricle (LV) segmentation and LVEF classification. Our framework integrates an unsupervised VoxelHop encoder for hierarchical spatio-temporal feature extraction with a multi-level regression decoder and an XG-Boost classifier. On the EchoNet-Dynamic dataset, our MTGL model achieves state-of-the-art classification and segmentation performance, attaining a classification accuracy of 94.3% and a Dice Similarity Coefficient (DSC) of 0.912, significantly outperforming several advanced 3D DL models. Crucially, our model achieves this with over an order of magnitude fewer parameters, demonstrating exceptional computational efficiency. This work demonstrates that the GL paradigm can deliver highly accurate, efficient, and interpretable solutions for complex medical image analysis, paving the way for more sustainable and trustworthy artificial intelligence in clinical practice.
Recent advances in reconstructing speech envelopes from Electroencephalogram (EEG) signals have enabled continuous auditory attention decoding (AAD) in multi-speaker environments. Most Deep Neural Network (DNN)-based envelope reconstruction models are trained to maximize the Pearson correlation coefficients (PCC) between the attended envelope and the reconstructed envelope (attended PCC). While the difference between the attended PCC and the unattended PCC plays an essential role in auditory attention decoding, existing methods often focus on maximizing the attended PCC. We therefore propose a contrastive PCC loss which represents the difference between the attended PCC and the unattended PCC. The proposed approach is evaluated on three public EEG AAD datasets using four DNN architectures. Across many settings, the proposed objective improves envelope separability and AAD accuracy, while also revealing dataset- and architecture-dependent failure cases.
Decentralized air traffic management requires coordination among self-interested stakeholders operating under shared safety and capacity constraints, where conventional centralized or implicitly cooperative models do not adequately capture this setting. We develop a unified perspective on noncooperative coordination, in which system-level outcomes emerge by designing incentives and assigning signals that reshape individual optimality rather than imposing cooperation or enforcement. We advance this framework along three directions: scalable equilibrium engineering via reduced-rank and uncertainty-aware correlated equilibria, decentralized mechanism design for equilibrium selection without enforcement, and structured noncooperative dynamics with convergence guarantees. Beyond these technical contributions, we discuss core design principles that govern incentive-compatible coordination in decentralized systems. Together, these results establish a foundation for scalable, robust coordination in safety-critical air traffic systems.
This paper investigates covert multi-hop communication in wireless networks where an adversary employs a cyclostationary (cycle) detector to reveal hidden transmissions. The covert route employs direct sequence spread spectrum (DSSS) signaling to ensure either maximum end-to-end covertness maximization or minimum latency minimization-under quality-of-service (QoS) and link budget constraints. Optimal bandwidth, transmit power, and spreading gain for each hop jointly satisfy reliability and either rate or covertness requirements. We show the equivalence between the covertness and the detection SNR gain-based widest-path formulations, and, hence, enabling efficient route computation. Numerical simulations in a realistic 3D environment illustrate that (i) end-to-end latency increases exponentially with the covertness requirement, (ii) the end-to-end latency increase is super-linear with the packet size M, and (iii) cycle and energy detectors impose different latency behavior as a function of the message length and the covertness requirement. The proposed framework provides important insights into resource allocation and routing design for covert networks against advanced detection adversaries.
This paper investigates the application of Index Modulation (IM) to Modulation on Conjugate-Reciprocal Zeros (MOCZ) to enhance spectral efficiency (SE) in short packet communications. The proposed IM-MOCZ scheme splits an $N$-bit message into two streams: $N-K$ bits select one of $2^{N-K}$ uniquely designed codebooks, while the remaining $K$ bits are transmitted with conventional binary-MOCZ (BMOCZ) using the selected codebook. At the receiver, Root Finding Minimum Distance (RFMD) or Direct Zero-Testing (DiZeT) detectors evaluate all candidate codebooks and compute penalty metrics, with a majority vote rule selecting the most confident codebook and recovering the transmitted message. The proposed IM-MOCZ can provide higher SE gains than conventional BMOCZ at the cost of increasing the computational complexity, with simulations demonstrating improved bit error rate (BER) and block error rate (BLER) performance for larger $K$ relative to $N$, when compared to conventional BMOCZ.
Integrated Energy Systems (IES) are systems of interconnected electricity, gas, heating, and cooling networks, where the carriers interact and depend on one another. Beyond these core vectors, IES may also incorporate additional infrastructures, such as hydrogen, transportation and water networks, whenever sector coupling or cross-vector exchanges are relevant. Although modern cities already function as multi-energy systems, these networks are still planned and operated in isolation, which leads to inefficiencies and unused flexibility. As distributed energy resources (DERs) grow, local coupling among electricity, heating, and gas networks becomes stronger, so coordinated operation across carriers and infrastructures is essential. IES can improve efficiency, flexibility, and renewable integration, yet operation is challenging because of complex interdependencies, non-convex behaviors, and multi-scale dynamics of the energy networks. A key point that the literature often overlooks is the explicit role of network constraints and topology, which shape feasible operating regions, affect scalability, and determine how uncertainty and formal guarantees can be addressed. This review provides a first comprehensive analysis of network-aware modeling, optimization, and control methods for IES. We identify methodological limitations related to tractability, feasibility guarantees, and scalability. Building on these insights, we outline research directions that include distributed optimization with theoretical guarantees and control approaches informed by operational data. The review offers a foundation for scalable, network-aware operational frameworks for future low-carbon energy systems.
This paper proposes a subspace fusion sensing algorithm for cooperative integrated sensing and communication. First, we stack the received signals from access points (APs) into a third-order tensor and construct the equivalent virtual antenna (EVA) array via tensor unfolding. Then, a data association-free subspace-based fusion sensing algorithm is developed utilizing the EVA arrays from distributed APs. A derivation of Cramer-Rao lower bound (CRLB) is also presented. Finally, simulation results validate the effectiveness of the proposed algorithm compared to traditional techniques.
Spectrum sensing and the generation of 3D Radio Environment Maps (REMs) are essential for enabling spectrum sharing within cognitive radio networks. While Uncrewed Aerial Vehicles (UAVs) offer high-mobility 3D sensing, REM accuracy is challenged by dynamic flight behaviors, where fluctuations in UAV speed and direction introduce measurement inconsistencies. Furthermore, the airframe itself impacts the onboard antenna's radiation characteristics. In this paper, using real-world data, we systematically analyze how REM reconstruction accuracy is shaped by three key pillars: physical sensing parameters like altitude and bandwidth, environmental shadowing, and distortions caused by the UAV airframe. First, we benchmark diverse spatial prediction models, including simple Kriging (SK), ordinary Kriging (OK), trans-Gaussian Kriging, and Gaussian process regression (GPR). We demonstrate that while SK and its trans-Gaussian variant are highly accurate at extreme sample sparsity, OK improves as sample size increases, and GPR serves as the most stable overall baseline. Building on this, we propose a novel matrix completion (MC)-assisted GPR framework that enhances REM reconstruction in the presence of non-uniform spatial smoothness. The method operates by decomposing the REM into two distinct layers: a global smooth component and a highly varying local component. Our analysis based on real-world measurements reveals three key findings: 1) REM accuracy and shadowing variance follow a distinct tri-phasic trend as the UAV altitude increases; 2) REM accuracy significantly improves with increased spectrum bandwidth; and 3) antenna pattern calibration from in-field measurements significantly enhances REM accuracy by accounting for the effect of the UAV airframe.
Descriptor systems arise naturally in real-world applications governed by algebraic constraints, such as power networks, robotics and chemical processes. When a descriptor model contains a nontrivial nilpotent block, the discrete-time input--output map may be improper: the current output depends on future inputs and, in the stochastic case, on future noise terms. This letter proposes a data-driven predictive control framework for stochastic descriptor systems that handles these non-causal dependencies without explicitly identifying system matrices. The key idea is to split fast subsystem into noise-driven and input-driven parts, and then combine the former with the slow subsystem such that an innovation-driven Kalman filter can be appropriately defined to reformulate the stochastic descriptor system into an innovation-driven form. Based on this, a new behavioral system representation is derived, which inspires a data-driven innovation-based multi-step output predictor and a practical Inno-DeePC algorithm that enables data-driven predictive control design without known system matrices while implicitly handling algebraic constraints. Numerical experiments on a DC microgrid demonstrate the effectiveness of the proposed approach.
This study proposes a real-time control framework for cascaded hydropower systems that incorporates decision-dependent uncertainty (DDU) to capture the coupling of streamflow uncertainties across the reservoirs. The framework jointly models exogenous forecast errors and endogenous uncertainty propagation, explicitly characterizing the dependence between upstream releases and downstream inflow variability through a heteroskedastic variance model conditioned on past errors, variance, and control actions. We formulate a joint chance-constrained optimization problem to ensure reliable system operation under uncertainty and develop a tractable supporting hyperplane algorithm that enables explicit and adaptive risk allocation under DDU. We establish the convergence of the proposed method and show its risk allocation behavior under steady-state conditions. A randomized case study based on Columbia River data demonstrates that incorporating DDU reduces the constraint violations by up to 7.0\% and increases total generation by up to 0.5\% relative to decision-independent uncertainty (DIU). Sensitivity analyses of the dry-season streamflow conditions further highlight the value of adaptive risk allocation for resilient and risk-aware hydropower operations.
The decarbonisation of heavy-duty railway networks requires maximising the capacity of existing electrical infrastructure. Integrating heavy freight alongside fast passenger services exposes the hard physical limits of conventional alternating current traction networks, causing severe localised power quality degradation, phase unbalance, and low-voltage behaviour that triggers protective substation tripping. Because upgrading physical hardware is highly capital-intensive, software-based Energy Management Strategies (EMS) offer a potentially viable alternative for preventing these power capacity challenges. This systematic review synthesises the literature on algorithmic energy management for grid-constrained multi-train AC railway networks, classifying the reviewed studies along three axes: algorithm family, operational scope, and constraint coupling. The review documents three consistent findings across the included studies. First, single-train trajectory optimisation, however mathematically refined, cannot represent the coupled electrical interactions that increasingly define network capacity on mixed-traffic networks. Second, while multi-train Train-Track-Power (TTP) simulations correctly capture these interactions, the algorithm families currently used to solve them face well-documented trade-offs between computational tractability and constraint flexibility. Third, the literature increasingly identifies a gap between mathematically optimal speed profiles and operationally executable ones, particularly for networks operated by human drivers rather than Automatic Train Operation systems. The review delineates where current methods succeed, where they fail, and which directions the literature has identified as open.
System identification remains an intriguing challenge for lithium-ion batteries, as many models are nonlinear, exhibit multi-physics coupling, and involve a large number of parameters. In this paper, we address this challenge using the ensemble Kalman inversion (EnKI) method for battery system identification. EnKI performs maximum a posteriori parameter estimation through successive local Gaussian approximations, enabling an iterative and incremental search for unknown parameters. The search combines Monte Carlo sampling with Kalman-type updates to evolve an ensemble of samples, thereby offering empirical stability and the ability to handle strongly nonlinear models. We validate the proposed approach on two equivalent circuit models with coupled electro-thermal dynamics, through both simulation and experiments. The results demonstrate that the proposed approach achieves accurate parameter estimation with rapid iterative convergence, and it shows strong potential for application to other battery models.
We present principles of algebraic diversity (AD), a group-theoretic approach to signal processing exploiting signal symmetry to extract more information per observation, complementing classical methods that use temporal and spatial diversity. The transformations under which a signal's statistics are invariant form a matched group; this group determines the natural transform for analysis, and averaging an estimator over the group action reduces variance without requiring additional snapshots. The viewpoint is broadened in five directions beyond the single-observation measurement of a companion paper. Rank promotion admits AD on scalar data streams and identifies the law of large numbers as the trivial-group case of a $(G, L)$ continuum combining sample-count with group-orbit averaging. An eigentensor hierarchy handles signals with nested symmetry. A blind group-matching methodology identifies the matched group from data via a polynomial-time generalized eigenvalue problem on the unitary Lie algebra, placing the DFT, DCT, and Karhunen--Loève transforms as distinguished points on a transform manifold. A cost-symmetry matching principle then extends AD from measurement to blind and adaptive signal processing generally; blind equalization is given as a detailed example, with the Constant Modulus Algorithm's residual phase ambiguity predicted analytically and matched within two degrees on 3GPP TDL multipath channels, and other blind problems in signal processing are mapped into the framework. Four theorems formalize a structural capacity $\kappa$, the Rényi-2 analog of Shannon and von Neumann's Rényi-1 entropies, quantifying how a signal's information is organized rather than how much information it contains. AD complements prior algebraic approaches including invariant estimation, minimax robust estimation, algebraic signal processing, and compressed sensing.
Rapid growth of large loads led by data centers is straining grid capacity. These loads increasingly accept curtailment risk through non-firm interconnection agreements to gain faster grid access, expanding the pool of consumers subject to mandatory disconnection during supply shortfalls. Yet, blunt rules assign curtailment without reference to the wide variation in the value consumers place on avoiding curtailment, often captured by the value of lost load (VOLL). This paper introduces the network-constrained Curtailment Credit Market (CCM), a mechanism in which agents submit bids that determine bilateral credit flows, subject to transmission network constraints. We prove that the bilateral credit flow representation can reach every curtailment allocation available to an omniscient central planner (feasible-set equivalence). Under truthful bidding, the CCM achieves the planner's total value of served load. The CCM clearing problem is a linear program. When embedded in a strategic bidding model, where an upper-level agent anticipates the CCM clearing outcome, the resulting bilevel problem admits an exact single-level mixed-integer linear program (MILP), solved in 0.009 to 0.034 seconds on the reported test systems. Numerical experiments on the three test systems validate the mechanism at increasing scale and complexity. A 3-bus illustrative network isolates the core trading logic, the IEEE 24-bus reliability test system provides a standard benchmark, and a reduced New York (NY) grid captures coordination across NY load zones. Our simulations show that the CCM increases the total value of served load by 1.41 to 1.83 times relative to pro-rata curtailment. On the three test systems examined here, no participant is worse off under incentive-compatible benchmark payments than under the administrative baseline.
Body composition assessment (BCA) provides detailed information about the distribution of different tissue types in the body, enabling more precise characterization of individuals than BMI or weight alone. Consistent and frequent BCA would be valuable for personalized medicine, but the gold standard methods for BCA, such as CT and MRI, are only practical for opportunistic monitoring of patients with clinical indications for imaging and are not suitable for routine use in the general population. Here, we consider an imaging modality which is not currently used in medical applications: millimeter wave (mmWave) radar. Commonly used in security settings, mmWave scans enable fast, non-intrusive, and privacy-preserving reconstruction of full body shape without the need to remove clothing. To demonstrate the feasibility of fast and convenient BCA from mmWave scans, we present a method for BCA value regression using a multi-task learning strategy that leverages synthetic mmWave-like point clouds derived from clinical imaging and parametric human models. We evaluate the model on a pilot cohort of real mmWave scans with bioimpedance-derived body fat measurements, supporting the feasibility of estimating VAT and body fat percentage (BFP) from mmWave data acquired through clothing in a standing posture. We find that the model can predict VAT and BFP with a mean absolute error of 1.0 L and 3.2%, respectively, demonstrating the potential of mmWave scanning for routine BCA in a wide range of settings.
RADAR Challenge 2026 is an APSIPA Grand Challenge on Robust Audio Deepfake Recognition under Media Transformations, designed to simulate realistic media conditions in real-world audio distribution pipelines, including compression, resampling, noise, and reverberation. It consists of two phases: an English development phase with labeled data for analysis and paper writing, and a multilingual evaluation phase containing more than 100,000 utterances in English, Singapore English, Mandarin Chinese, Taiwanese Mandarin, Japanese, and Vietnamese. Systems are evaluated using equal error rate (EER) for binary real/fake classification. This paper describes the challenge task, the construction of the data set, the evaluation protocol, and the overall results. During the challenge, 33 teams submitted to the development phase and 22 teams submitted to the final evaluation phase. The reported results highlight the remaining challenges of robust audio deepfake detection under multilingual and media-transformed conditions.
Long-form audio understanding poses significant challenges for large audio language models (LALMs) due to the extreme length of audio sequences and the need to reason over heterogeneous acoustic cues distributed over time, such as speech content, speaker identity, emotion, and sound events. To address these challenges, we propose \textbf{PlanRAG-Audio}, a planning-based retrieval-augmented generation framework for scalable long-form audio understanding. Rather than having audio LALMs process entire recordings directly, PlanRAG-Audio explicitly plans which modalities and temporal spans are required for a given query, and retrieves only query-relevant information from a structured text and audio database. This retrieval planning enables effective reasoning over complex, cross-domain audio queries while substantially reducing the input length passed to the large language models. Experiments across a wide range of speech/audio retrieval demonstrate that PlanRAG-Audio improves reasoning accuracy and stabilizes performance as audio duration increases by decoupling inference cost from raw audio length.
In this paper, we study a cell-free multiple-input multiple-output network equipped with integrated sensing and communication (ISAC) access points (APs). The distributed APs are used to jointly serve the communication needs of user equipments (UEs) while sensing a target, assumed to be an eavesdropper (Eve). To increase the system's robustness towards said Eve, we develop an ISAC waveform model that includes artificial noise (AN) aimed at degrading the Eve channel quality. The central processing unit receives the observations from each AP and calculates the optimal precoding and AN covariance matrices by solving a semi-definite relaxation of a constrained Cramer-Rao bound (CRB) minimization problem. Simulation results highlight an underlying trade-off between sensing and communication performances: in particular, the UEs signal-to-noise and interference ratio and the maximum Eve's signal to noise ratio are directly proportional to the CRB. Furthermore, the optimal AN covariance matrix is rank-1 and has a peak in the eve's direction, leading to a surprising inverse-proportionality between the UEs-Eve distance and optimal-CRB magnitude.
Feedback optimization enables autonomous optimality seeking of a dynamical system through its closed-loop interconnection with iterative optimization algorithms. Among various iteration structures, model-based approaches require the input-output sensitivity matrix of the system to construct gradients, whereas model-free approaches eliminate this need by estimating gradients from real-time objective evaluations. These approaches offer complementary benefits in sample efficiency and accuracy against model mismatch, i.e., sensitivity errors. To achieve balanced closed-loop performance, we propose a gray-box feedback optimization controller, featuring systematic incorporation of approximate sensitivities into model-free updates via a tunable convex combination. We provide unified performance characterizations covering different approaches. We elucidate how cumulative sensitivity errors (model-based) and variances due to stochastic exploration (model-free) shape the closed-loop behavior and induce a trade-off between iteration and dimensional dependence. The proposed controller retains sample efficiency and provable (local) optimality for nonconvex problems despite inaccurate sensitivities. We further develop and characterize a running gray-box controller that handles constrained time-varying problems with changing objectives and steady-state input-output maps.
Electricity Consumption Profiles (ECPs) are crucial for operating and planning power distribution systems, especially with the increasing number of low-carbon technologies such as solar panels and electric vehicles. Traditional ECP modeling methods typically assume the availability of sufficient ECP data. However, in practice, the accessibility of ECP data is limited due to privacy issues or the absence of metering devices. Few-shot learning (FSL) has emerged as a promising solution for ECP modeling in data-scarce scenarios. Nevertheless, standard FSL methods, such as those used for images, are unsuitable for ECP modeling because (1) these methods usually assume several source domains with sufficient data and several target domains. However, in the context of ECP modeling, there may be thousands of source domains, e.g., households with a moderate amount of data, and thousands of target domains, e.g., households that ECP are required to be modeled. (2) Standard FSL methods usually involve cumbersome knowledge transfer mechanisms, such as pre-training and fine-tuning. To address these limitations, this paper proposes a novel FSL framework that integrates Transformers with Gaussian Mixture Models (GMMs) for ECP modeling. The proposed approach is fine-tuning-free, computationally efficient, and robust even with extremely limited data. Results show that our method can accurately restore the complex ECP distribution with a minimal amount of ECP data (e.g., only 1.6% of the complete domain dataset) and outperforms state-of-the-art time series modeling methods in the context of ECP modeling.
It is quite often claimed, and correctly so, that linear methods cannot achieve global stability results for attitude control, and conversely that nonlinear control is essential in order to achieve (almost) globally stable tracking of general attitude trajectories. On account of this definitive result, and also because of the existence of powerful nonlinear control techniques, there has been relatively very little work analyzing the limits and performance of linear attitude control. It is the purpose of this paper to provide a characterization of the stability achievable for one class of linear attitude control problems, namely those leading to a constant quaternion difference. In this paper, we analytically derive a critical error angle below which linearized dynamics lead to natural marginal stability for such a system, and above which the system is unstable. The dynamics are then used to derive a locally stable linear attitude controller whose performance is validated using simulations.
To support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requires the SLM to upload the full vocabulary distribution for each token. Moreover, both communication and computation resources are wasted when the LLM validates tokens that are highly likely to be accepted. To overcome these limitations, we propose communication-efficient and uncertainty-aware HLM (CU-HLM). In CU-HLM, the SLM transmits truncated vocabulary distributions only when its output uncertainty is high. We validate the feasibility of this opportunistic transmission by discovering a strong correlation between SLM's uncertainty and LLM's rejection probability. Furthermore, we theoretically derive optimal uncertainty thresholds and optimal vocabulary truncation strategies. Simulation results show that, compared to standard HLM, CU-HLM achieves up to 206$\times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.
Is basic visual understanding really solved in state-of-the-art VLMs? We present VisualOverload, a slightly different visual question answering (VQA) benchmark comprising 2,720 question-answer pairs, with privately held ground-truth responses. Unlike prior VQA datasets that typically focus on near global image understanding, VisualOverload challenges models to perform simple, knowledge-free vision tasks in densely populated (or, overloaded) scenes. Our dataset consists of high-resolution scans of public-domain paintings that are populated with multiple figures, actions, and unfolding subplots set against elaborately detailed backdrops. We manually annotated these images with questions across six task categories to probe for a thorough understanding of the scene. We hypothesize that current benchmarks overestimate the performance of VLMs, and encoding and reasoning over details is still a challenging task for them, especially if they are confronted with densely populated scenes. Indeed, we observe that even the best model (o3) out of 37 tested models only achieves 19.6% accuracy on our hardest test split and overall 69.5% accuracy on all questions. Beyond a thorough evaluation, we complement our benchmark with an error analysis that reveals multiple failure modes, including a lack of counting skills, failure in OCR, and striking logical inconsistencies under complex tasks. Altogether, VisualOverload exposes a critical gap in current vision models and offers a crucial resource for the community to develop better models. Benchmark: this http URL
This work demonstrates that the Lyapunov method can effectively identify the growth rate of a linear time-periodic system describing cold fresh water on top of hot salty water with a periodically time-varying background shear flow. We employ a time-dependent weighting matrix to construct a Lyapunov function candidate, and the resulting linear matrix inequalities are discretized in time using the forward Euler method. As the number of temporal discretization points increases, the growth rate predicted from the Lyapunov method or the Floquet theory will converge to the same value as that obtained from numerical simulations. Additionally, the Lyapunov method is used to analyze the most dangerous disturbance, and we also compare computational resource usage for the Lyapunov method, numerical simulations, and the Floquet theory.
Music performance is a distinctly human activity, intrinsically linked to the performer's ability to convey, evoke, or express emotion. Machines cannot perform music in the human sense; they can produce, reproduce, execute, or synthesize music, but they lack the capacity for affective or emotional experience. As such, music performance is an ideal candidate through which to explore aspects of collaboration between humans and machines. In this paper, we introduce the witheFlow system, designed to enhance real-time music performance by automatically modulating audio effects based on features extracted from both biosignals and the audio itself. The system, currently in a proof-of-concept phase, is designed to be lightweight, able to run locally on a laptop, and is open-source given the availability of a compatible Digital Audio Workstation and sensors.
Real-time speech-to-speech (S2S) models excel at generating natural, low-latency conversational responses but often lack deep knowledge and semantic understanding. Conversely, cascaded systems combining automatic speech recognition, a text-based Large Language Model (LLM), and text-to-speech synthesis offer superior knowledge representation at the cost of high latency, which disrupts the flow of natural interaction. This paper introduces a novel hybrid architecture that bridges the gap between these two paradigms. Our framework processes user speech through an S2S transformer for immediate responsiveness while concurrently relaying the query to a powerful back-end LLM. The LLM's text-based response is then injected in real time to guide the S2S model's speech generation, effectively infusing its output with rich knowledge without the full latency penalty of a cascaded system. We evaluated our method using a speech-synthesized variant of the MT-Bench benchmark that consists of multi-turn question-answering sessions. The results demonstrate that our system substantially outperforms a baseline S2S model in response correctness, approaching that of a cascaded system, while maintaining a latency on par with the baseline.
Diffusion Posterior Sampling (DPS) provides a principled Bayesian approach to inverse problems by sampling from $p(x_0 \mid y)$. While posterior sampling is valuable for capturing uncertainty and multi-modality, many classical and practical inverse problem settings ultimately prioritize accurate point estimation -- most notably the MAP estimator, which has long served as a standard reconstruction objective in imaging and scientific applications. We introduce Local MAP Sampling (LMAPS), a new inference framework that iteratively solves local MAP subproblems along the diffusion trajectory. This perspective clarifies their connection to global MAP and DPS, offering a unified probabilistic interpretation for optimization-based methods. Building on this foundation, we develop practical algorithms with a covariance approximation motivated by a Gaussian prior assumption, and a reformulated objective for stability and interpretability. Across a broad set of image restoration and scientific tasks, LMAPS achieves state-of-the-art performance.
Graphons, as limits of graph sequences, provide an operator-theoretic framework for analyzing the asymptotic behavior of graph neural operators. Spectral convergence of sampled graphs to graphons induces convergence of the corresponding neural operators, enabling transferability analyses of graph neural networks (GNNs). This paper develops a unified spectral framework that brings together convergence results under different assumptions on the underlying graphon, including no regularity, global Lipschitz continuity, and piecewise-Lipschitz continuity. The framework places these results in a common operator setting, enabling direct comparison of their assumptions, convergence rates, and tradeoffs. We further illustrate the empirical tightness of these rates on synthetic and real-world graphs.
Validation gating is a fundamental component of classical Kalman-based tracking systems. Only measurements whose normalized innovation squared (NIS) falls below a prescribed threshold are considered for state update. While this procedure is statistically motivated by the chi-square distribution, it implicitly replaces the unconditional innovation process with a conditionally observed one, restricted to the validation event. This paper shows that innovation statistics computed after gating converge to gate-conditioned rather than nominal quantities. Under classical linear--Gaussian assumptions, we derive exact expressions for the first- and second-order moments of the innovation conditioned on ellipsoidal gating, and show that gating induces a deterministic, dimension-dependent contraction of the innovation covariance. The analysis is extended to NN association, which is shown to act as an additional statistical selection operator. We prove that selecting the minimum-norm innovation among multiple in-gate measurements introduces an unavoidable energy contraction, implying that nominal innovation statistics cannot be preserved under nontrivial gating and association. Closed-form results in the two-dimensional case quantify the combined effects and illustrate their practical significance.
Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively restores safety across algorithms and patient populations. Across eight safe RL algorithms, three diabetes types, and three age groups, shielding achieves Time-in-Range gains of 13--14\% for strong baselines such as PPO-Lag and CPO while reducing clinical risk index and glucose variability. Our simulator and benchmark provide a platform for studying safety under distribution shift in safety-critical control domains. Code is available at this https URL and this https URL.
Modern voice cloning, also known as zero-shot text-to-speech (TTS), can synthesize speech that closely matches a target speaker from only seconds of reference audio, enabling applications such as personalized speech interfaces and dubbing. In practice, these systems often face noisy reference audio, imperfect text prompts, multilingual and long-form generation, post-processing, and adversarial perturbations, all of which can weaken robustness. Despite rapid progress in codec-token language models and diffusion-based TTS, robustness under realistic deployment shifts remains underexplored. This paper introduces RVCBench, a comprehensive dataset and benchmark for evaluating robustness in voice cloning. RVCBench provides task-aligned tests covering controlled text-audio pairing, multilingual and long-form scenarios, expressive prompts, post-processing conditions, and passive or proactive audio perturbations. Across 18 robustness evaluations, 225 speakers, and 14,370 utterances, RVCBench supports unified evaluation of input sensitivity, generation stability, output resilience, perturbation robustness, speaker similarity, and deepfake detectability. We evaluate 18 representative open-source voice cloning models and reveal systematic vulnerabilities in content consistency, speaker similarity, long-form stability, post-processing resilience, adversarial robustness, and detector-facing separability. We release the code and dataset to support reproducible evaluation and future research on robust voice cloning, speech synthesis, and audio generation. Code: this https URL. Dataset: this https URL.
Non-conservative uncertainty bounds are essential for making reliable predictions about latent functions from noisy data, and thus, a key enabler for safe learning-based control. In this domain, kernel methods such as Gaussian process regression are established techniques, thanks to their inherent uncertainty quantification mechanism. Still, existing bounds either pose strong assumptions on the underlying noise distribution, are conservative, do not directly apply in the multi-output case, or are difficult to integrate into downstream tasks. This paper addresses these limitations by presenting a tight, deterministic bound for multi-output functions in Reproducing Kernel Hilbert Spaces (RKHSs) subject to bounded noise. It is obtained through an unconstrained, duality-based formulation, which shares the same structure as classic Gaussian process confidence bounds, and can thus be straightforwardly integrated into downstream optimization pipelines. We show that the proposed bound generalizes existing results and illustrate its application using an example inspired by quadrotor dynamics learning.
Certifying power flow solvability is important for reliable power system operations under volatile operating conditions, but solving power flow equations repeatedly can be costly and may encounter convergence issues. In this paper, we develop an explicit cycle-based solvability condition for the lossless real power flow equations on meshed networks. We decompose every feasible nodal balance solution into a particular flow plus a cycle flow correction vector. The power flow problem is then reduced to enforcing edge-wise feasibility and cycle consistency. We show that the cycle consistency function is strongly monotone, and is the gradient of a strongly convex energy function. By exploiting these properties, we derive an explicit condition on the existence and uniqueness of power flow solution with bounded angle difference. The resulting condition is invariant under the choice of cycle basis and can be verified through simple algebraic computations. Numerical results on standard test systems show that the proposed condition is significantly less conservative than existing sufficient conditions and closely approximates true loading limits.
Fundamental limits on the performance of feedback controllers are essential for benchmarking algorithms, guiding sensor selection, and certifying task feasibility -- yet few general-purpose tools exist for computing them. Existing information-theoretic approaches overestimate the information a sensor must provide by evaluating it against the uncontrolled system, producing bounds that degrade precisely when feedback is most valuable. We derive a lower bound on the minimum expected cost of any causal feedback controller under partial observations by applying the Gibbs variational principle to the joint path measure over states and observations. The bound applies to nonlinear, nonholonomic, and hybrid dynamics with unbounded costs and admits a self-consistent refinement: any good controller concentrates the state, which limits the information the sensor can extract, which tightens the bound. The resulting fixed-point equation has a unique solution computable by bisection, and we provide conditions under which the free energy minimization is provably convex, yielding a certifiably correct numerical bound. On a scalar LQG problem the self-consistent bound captures over 80% of the known optimal cost at moderate sensor noise, and on a nonlinear Dubins car tracking problem it remains informative across all noise levels where a bound using the uncontrolled state distribution is vacuous.
The Sterile Insect Technique (SIT) against insect pests and insect vectors consists of releasing males that have been previously sterilized in order to reduce or eliminate a specific wild population. We study this complex control question via model-free control, ultra-local models, and intelligent proportional controllers that have already proven their effectiveness in various fields. They permit addressing, perhaps for the first time, the essential sampling question. Computer simulations are displayed and discussed.
Balancing the societal costs of non-pharmaceutical interventions with epidemic suppression requires adaptive feedback control. Rather than relying on state-dependent operational caps, we formulate an infinite-horizon optimal control problem for a networked SIQR model that strictly enforces suppression via a hard spectral constraint on the transmission dynamics. We derive a safety-critical Model Predictive Control (MPC) approximation that embeds this spectral certificate stage-wise, yielding a tunable exponential decay rate. Furthermore, we construct a terminal set ensuring recursive feasibility and a feasible continuation that decays globally, proving positive invariance directly via the physical depletion of susceptibles rather than standard quadratic Lyapunov functions. To handle prediction uncertainty, we develop a robust counterpart that replaces nominal constraints by upper-envelope versions, recovering recursive feasibility and finite-horizon realized decay. We conclude by validating our approaches using simulation studies that leverage public data from counties in the state of Massachusetts.
Existing Indic ASR benchmarks often use scripted, clean speech and leaderboard driven evaluation that encourages dataset specific overfitting. In addition, strict single reference WER penalizes natural spelling variation in Indian languages, including non standardized spellings of code-mixed English origin words. To address these limitations, we introduce Voice of India, a closed source benchmark built from unscripted telephonic conversations covering 15 major Indian languages across 139 regional clusters. The dataset contains 306230 utterances, totaling 536 hours of speech from 36691 speakers with transcripts accounting for spelling variations. We also analyze performance geographically at the district level, revealing disparities. Finally, we provide detailed analysis across factors such as audio quality, speaking rate, gender, and device type, highlighting where current ASR systems struggle and offering insights for improving real world Indic ASR systems.
We present a dual-barrier control barrier function (CBF) safety filter for real-time, safety-critical velocity control of holonomic robots operating in incrementally built occupancy grid maps. As a robot explores an unknown environment, unmapped regions introduce irreducible uncertainty, since obstacle geometry beyond the explored frontier is unknown, making entry into such regions a source of collision risk, especially with front-facing sensors. To address this, we enforce two constraints: avoidance of mapped obstacles and restriction from unexplored regions. Both constraints are derived analytically from the occupancy grid's signed distance field, yielding a closed-form safety filter that requires only a small linear system solve per cycle. On resource-constrained platforms such as the Raspberry Pi, where SLAM and planning already consume significant compute, the low overhead of the proposed filter preserves resources. An adaptive gain schedule relaxes the frontier constraint in information-rich regions and tightens it in well-mapped areas, improving exploration efficiency while maintaining safety. The filter operates in velocity space as a minimally invasive correction and composes with arbitrary nominal controllers, including learning-based methods. Hardware flight experiments on a PX4-controlled quadrotor demonstrate zero collisions across multiple indoor runs.
Grey failures in the computing continuum produce ambiguous overlapping symptoms that existing approaches fail to diagnose reliably, either due to a lack of causal awareness or acting under high epistemic uncertainty, risking destructive interventions. This paper presents an uncertainty-aware resilience micro-agent for causal observability (AURORA), a lightweight framework for diagnosing and mitigating grey failures in edge-tier environments. The framework employs parallel micro-agents that integrate the free-energy principle, causal do-calculus, and localized causal state-graphs to support counterfactual root-cause analysis within each fault's Markov blanket. Restricting inference to causally relevant variables reduces computational overhead while preserving diagnostic fidelity. AURORA further introduces a dual-gated execution mechanism that authorizes remediation only when causal confidence is high and predicted epistemic uncertainty is bounded; otherwise, it abstains from local intervention and escalates the diagnostic payload to the fog tier. Our experiments demonstrate that AURORA outperforms baselines, achieving a 0% destructive action rate, while maintaining 62.0% repair accuracy and a 3ms mean time to repair.
We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models struggle with this, often remaining anchored to the visually implied sound source when video and text contents disagree. We present ConterFlow, an inference-time dual-phase sampling scheme for pretrained flow-matching VT2A models. Phase 1 builds a video-derived temporal structure while suppressing the visually implied source; Phase 2 drops video conditioning to focus entirely on shaping audio timbre toward the target prompt. ConterFlow substantially improves counterfactual Video Foley generation compared to naive negative prompting and state-of-the-art baselines. To evaluate replacement quality, we propose a metric leveraging a text-audio co-embedding space to measure both target-prompt evidence and residual visually implied source leakage. Video demonstrations and code are available at this https URL
Near-field beamfocusing enabled by extremely large-aperture arrays (ELAA) is a promising 6G technique for massive connectivity and high spectrum efficiency. While beamfocusing concentrates energy at an intended user, the radiated field outside the focal point exhibits a structured leakage that varies with the focal-point coordinates. This paper shows that this leakage enables a new form of passive user localization in which distributed far-field sensors measuring only received power can infer the user's location by exploiting this location-dependent power signature. Using the induced noncentral chi-square statistics, we derive a Bayesian Cramér-Rao lower bound (BCRLB) that establishes the fundamental limits of this inference problem. We then evaluate a model-based grid-search estimator and an attention-based permutation-invariant deep learning regressor (DeepSet). Results under both line-of-sight (LoS) and multipath propagation confirm that reliable location inference is feasible, with accuracy improving as more sensors and snapshots are used.