Acoustic imaging visualization is a core methodology in acoustics, enabling spatial analysis of sound sources and acoustic scenes. However, limited sensor availability in practical systems motivate approaches that enhance spatial resolution without increasing the hardware complexity. In this paper, we focus on upsampling virtually a tetrahedral 4-microphone array to a spherical 32-microphone array by estimating the covariance matrices of the channels employing deep learning techniques. Five neural network architectures are investigated for covariance upsampling for acoustic imaging using the real-world STARSS23 dataset. These models are developed to estimate a 32-microphone, time-frequency covariance matrix from a 4-microphone input covariance representation. The proposed architectures are based on 2D convolutional layers to capture the underlying spatial-spectral structure of covariance matrices, and are further enhanced with frequency dynamic convolution to model their frequency-dependent properties. The proposed architectures are evaluated in terms of root mean square error (RMSE) and using delay-and-sum beamforming acoustic imaging. Quantitative results show that all models outperform a random-guess baseline, which yields an RMSE of 0.548, with the best-performing architecture achieving an RMSE of 0.432. We analyze qualitatively the performance of the proposed models through beamforming heatmap visualizations derived from the 4-channel input covariance, the 32-channel ground truth, and the predicted 32-channel covariance matrices. These results demonstrate that covariance upsampling significantly enhances the effective performance of the 4-channel microphone array, producing sound maps that closely resemble those obtained with the 32-channel array.
Most existing audio classification methods suppose that each query (testing) sample belongs to a class of support (training) samples, and misrecognize samples of unseen classes as seen classes (cannot reject samples of unseen classes). In this study, we propose a method for Few-shot Open-set Audio Classification (FOAC), which can recognize query samples of seen classes after updating the model using a few support samples, and meanwhile reject query samples from unseen classes. We design a model consisting of an encoder and a classifier. The encoder is the backbone of a ResNet used for extracting embeddings. The classifier consists of prototype generators of few-shot classes and open-set classes. Prototypes of few-shot classes are obtained by fusing the class-discriminative information of support and query embeddings and by assigning larger weighting coefficient to representative part of the support embeddings. One prototype is generated for open-set classes using the proposed prototype generator. The encoder is trained with abundant samples of base classes in supervised manner, and then the prototypes of base classes are generated under the supervision of a joint loss. The classifier is trained using a few samples of few-shot classes in a meta-training way. Three public datasets (LS-100, NSynth-100, and FSC-89) are used to assess the performance of our method. Experiments show that our method has advantage over prior methods in AUROC and accuracy. This advantage has statistical significance for most prior methods. Our method has lower computational complexity than most prior methods. The code is at this https URL.
This paper proposes a full duplex fluid antenna near field system (FD-FANS) with a multi-sector antenna array that jointly exploits resource allocation, antenna mobility, and group-based transmitting (TX) and receiving (RX) partitioning. A spherical wave uplink downlink channel is established that accounts for residual self interference (SI), wireless energy transfer (WET), and geometric constraints on antenna motion. Within the FD-FANS framework, an efficient protocol is devised to enable simultaneous downlink energy transmission (DET) and uplink data transmission (UDT) at the base station (BS). Furthermore, we formulate, for both perfect and imperfect SI cancellation (SIC), a weighted sum rate (WSR) maximization problem over time power allocation, antenna positions, and binary group selection, under practical average and peak power limits, per antenna box constraints, minimum spacing, and a half TX half RX balance. To tackle the resulting non convex mixed integer design, we develop an efficient alternating optimization (AO) framework based on majorization minimization successive convex approximation (MM SCA). The proposed algorithm monotonically improves the objective and converges to a stationary solution of a continuous relaxation. Simulation results demonstrate that the proposed scheme achieves consistent performance gains over several benchmark designs, including half duplex FANS (HD FANS), FD fixed position antenna near field system (FD FPANS), non-grouped FD FANS, and far field counterparts, in terms of average sum rate (ASR), energy efficiency (EE), and user fairness, while exhibiting robustness to residual SI and channel uncertainty.
This paper develops a data-driven reachability framework for linear systems whose disturbances are modeled by probabilistic zonotopes (PZs), combining bounded deterministic and Gaussian stochastic components. In contrast to methods that require a precisely known disturbance model (either purely deterministic or purely stochastic), we assume only a conservative prior PZ and refine it from data. The framework separates two uncertainty sources: realized disturbances, which act along the collected trajectory and govern the size of the data-consistent model set, and aleatory disturbances, which enter as future additive uncertainty during reachable-set propagation; both shape the reachable sets, but through different mechanisms. Refinement exploits prior system knowledge together with trajectory-consistency constraints induced by the data, which impose affine couplings between deterministic and Gaussian latent variables. We accordingly develop a constrained-PZ calculus that absorbs the stochastic part of these constraints into an equivalent representation, removes infeasible latent directions, and reduces stochastic covariance, together with identification-aware fusion rules for combining heterogeneous constrained-PZ descriptions. The refined realized-disturbance proxies then serve as scenarios in a linear program that learns the smallest translated and scaled copy of the prior disturbance set that contains all proxy confidence sets while remaining nested in the prior. The resulting deterministic, high-probability reachable sets carry formal containment guarantees with substantially reduced conservatism, and numerical examples confirm that the pipeline tightens both the data-consistent model set and the propagated reachable sets.
Accurate assessment of eating behavior is essential for understanding and managing conditions such as eating disorders, obesity, and diabetes. Wearable-based food intake detection has shown considerable promise; however, most existing approaches are trained and evaluated using internal validation on a single dataset with fixed sensor orientation and known wearing hand, which limits their generalizability to real-world settings. Furthermore, many existing approaches rely on both accelerometer (acc) and gyroscope (gyro) signals to achieve strong performance. However, gyro measurements may be unavailable in some real-world deployments due to battery constraints, and performance often degrades when only acc data are used. We propose a generalizable framework for orientation-invariant eating episode detection, with an acc2gyro module to improve performance in acc-only settings. The framework is trained using fine-grained wrist-worn datasets and externally validated across three heterogeneous datasets: the Clemson All-Day (CAD) and Capture-24 datasets, as well as Physio-ED, a dataset collected from individuals with eating disorders. Across external evaluations, the proposed framework demonstrates robust performance despite substantial variations in sensor modality, wearing hand, participant population, and annotation protocols. Specifically, the framework achieved F1-scores of 0.751, 0.592, and 0.793 on CAD, Capture-24, and Physio-ED, respectively, with CAD performance exceeding recent state-of-the-art methods evaluated using internal validation only. This study provides the first external validation of eating episode detection in an eating disorder population. Additionally, the acc2gyro module improves the performance in acc-only settings. These findings demonstrate the potential of orientation-invariant wearable sensing for scalable and clinically applicable assessment of eating behavior.
Channel knowledge maps (CKMs) enable environment-aware wireless systems by providing location-specific channel knowledge, but long-term environmental variations, such as construction, traffic redistribution, and foliage changes, require periodic map refresh. In practice, channel measurements are often sparse and irregular, while environmental knowledge may be limited to coarse layout or topology descriptors. This paper studies CKM reconstruction from sparse measurements. We show that reconstruction pipelines that apply local aggregation or spectral operators directly to a zero-filled pilot grid can entangle the sampling mask with the channel field, allowing structural priors to act on mask-induced distortions before the measurements define a supported radio field. To address this issue, we propose Anchor-CKM, a measurement-first, knowledge-aided reconstruction framework. Anchor-CKM first uses support-aware partial convolutions to construct a pilot-supported representation, and then performs layout-conditioned dual-path Fourier refinement followed by coordinate-based heteroscedastic prediction of the CKM mean and per-location predictive variance. Experiments on transmitter-disjoint DeepMIMO scenarios cover missing ratios from 0.3 to 0.95, including stringent 5% to 10% pilot-coverage settings. In explicit-layout outdoor scenarios, Anchor-CKM reduces received-power root-mean-square error (RMSE) by 0.79 to 1.33 dB relative to the strongest reproduced baseline, while ablations identify pilot-support stabilization as the largest contributor and layout conditioning as beneficial for line-of-sight/non-line-of-sight (LOS/NLOS) boundary fidelity.
Networked battery systems arise in industrial automation, distributed energy applications, and multi-agent systems, where terminals consume energy locally and recharge only when connected to a source. Resource constraints often limit the number of simultaneous connections, requiring networks to be dynamically reconfigured to maintain system functionality. Managing such networks in dynamic environments is challenging, particularly when low-energy terminals must be prioritized for timely replenishment. This paper presents a battery-aware topology optimization algorithm that extends the GeoSteiner framework with a tailored Mixed-Integer Linear Program (MILP) formulation for Full Steiner Tree (FST) aggregation. The formulation minimizes network length while prioritizing low-battery terminals through a weighted objective subject to a global budget constraint, enabling partial network formation under realistic resource limits. An overlap-correction term is introduced that prevents double-counting when selected trees share terminals. To capture the network reconfiguration cost between time steps, a graph-distance metric penalizes frequent topology changes, resulting in 72.2% reduction compared to a baseline without penalty. Simulations on a 20-terminal network demonstrate battery levels are effectively managed as the lowest battery level improved from 2.7% to 68.6% over 30 iterations while maintaining the topology stability and budget utilization (92%). The framework offers a principled approach to designing energy-aware, adaptive connectivity in power-limited multi-agent systems.
We consider the problem of synthesizing robust feedback controllers for discrete-time linear systems that ensure the satisfaction of context-dependent linear temporal logic specifications in the presence of additive bounded disturbances. Building on existing results that reduce context-triggered temporal logic synthesis to the realization of context-dependent reach-avoid-stay (cRAS) objectives, we focus on the corresponding low-level control synthesis problem. We first employ certificate-based conditions for the almost-sure satisfaction of RAS specifications. Based on these conditions, we propose a switching control architecture that combines robust model predictive control (MPC) with a local invariant controller, and show that the resulting MPC value function serves as a reachability certificate while avoidance is enforced through robust constraints and the stay is enforced via the local controller. To obtain computationally tractable formulations for the resulting robust optimizations, we employ convex duality to reformulate the robust constraints into equivalent deterministic optimization problems, yielding convex quadratic and second-order cone programs for relevant geometric settings. The proposed framework is demonstrated on a robot navigation problem with context-triggered logical switches in both static and moving environments. The results show significantly larger feasible sets than Lyapunov-based approaches, while naturally accommodating dynamic environments and online task reconfiguration.
Mode shape recognition is a fundamental task in automotive NVH development, yet it remains dependent on manual visual inspection by experienced engineers. Existing approaches based on engineering heuristics, Modal Assurance Criterion (MAC), or geometry-dependent AI representations often exhibit limited robustness across different vehicle architectures, finite element (FE) meshes, and experimental measurement layouts, restricting their industrial applicability. This paper presents a Canonical Engineering Graph Representation and region-aware graph learning framework for robust and explainable 3D mode shape recognition. Rather than learning directly from vehicle-specific FE meshes, heterogeneous FE models and experimental measurements are transformed into a common graph whose nodes represent semantically meaningful structural regions connected through engineering-informed relationships. Geometry-independent regional descriptors are combined with graph attention learning and region-aware pooling to capture structural interactions while preserving engineering semantics and enabling physically interpretable predictions. The resulting representation decouples engineering knowledge from numerical discretization, allowing transfer across different vehicle programs without requiring identical mesh topology or sensor configurations. The proposed framework is validated using FE and experimental datasets from four vehicle programs under severe label scarcity. Results demonstrate high classification accuracy, cross-vehicle transferability, and physically meaningful explanations by directly relating predictions to engineering-defined structural regions used in NVH analysis. Beyond mode shape recognition, the proposed Canonical Engineering Graph Representation provides a reusable engineering abstraction for trustworthy and transferable AI across heterogeneous simulation and experimental workflows.
Modern automatic speech recognition (ASR) systems excel at transcribing lexical content but often omit nonverbal vocalizations (NVs), such as laughter, breaths, coughs, and cries, that carry conversational and affective information. Modeling NVs in ASR is challenging because NV annotations are sparse and highly long-tailed, with frequent categories such as breaths and laughter dominating rarer events such as cries and coughs. We study three data-centric strategies for improving low-resource NV recognition: (1) a two-stage curriculum that first maps all NV events to a generic token and then fine-tunes on target categories; (2) inter-token transfer from high-resource events, such as laughter and breath, to rare events, such as crying; and (3) voice-conversion augmentation with class balancing. Experiments show that shared acoustic structure across vocal events can be exploited to improve rare-category detection while preserving lexical ASR quality.
Simultaneous acoustic information and power transfer (SAIPT) plays a crucial role in enabling self-sustainable and maintenance-free Internet of Underwater Things (IoUT) networks. This paper studies a multicarrier underwater SAIPT system that jointly considers the frequency-dependent characteristics of acoustic transducers and the nonlinear behavior of rectifier circuits. The waveform vector is firstly optimized using the successive convex approximation (SCA) method under constraints on average and peak transmit power for acoustic power transfer (APT). Then, in the SAIPT scenario, both the power splitting factor and waveform vectors are jointly optimized through an alternating optimization (AO) framework based on SCA, subject to transmit power and achievable rate constraints. Simulation results demonstrate that incorporating the transducer's frequency response, rectifier nonlinearity, and the high peak-to-average power ratio (PAPR) of multicarrier waveforms leads to a significant improvement in acoustic energy transfer efficiency. The results also show that the energy harvesting DC output can be further enhanced by properly choosing system parameters, such as the number of subcarriers and subcarrier spacing.
Acoustic-to-Articulatory Inversion (AAI) estimates vocal tract articulator movements from speech, benefiting tasks like ASR, speech synthesis, and speaker verification. While deep learning-based methods (CNNs, RNNs, Transformers) have advanced AAI, recent studies show that Self-Supervised Learning (SSL) features further enhance performance, particularly in low-resource settings. However, SSL feature extractors introduce inference latency and computational overhead. To address this, we propose a novel pretraining method leveraging three target representations-Phoneme Labels, Articulatory Feature Labels, and Critical-articulator Labels-eliminating the need for an SSL extractor during inference. We evaluate our approach against both baseline and SSL-based models across various data conditions. Results demonstrate that our method consistently improves AAI performance, particularly in low-resource scenarios, while significantly reducing inference costs without sacrificing accuracy.
This paper proposes a novel unified control framework for achieving hybrid grid-forming (GFM) and grid-following (GFL) inverter operation by integrating dispatchable virtual oscillator control with reference-following synchronization. The proposed inverter control method supports multiple operating modes within a unified structure, including voltage- and frequency-following (PQ mode), voltage-forming and frequency-following (PV mode), voltage-following and frequency-forming (Qf mode), voltage- and frequency-forming (Vf mode), and a hybrid mode with mixed GFM and GFL behaviors. In particular, the proposed method achieves smooth pre-synchronization and enables seamless transitions across a spectrum of inverter operating modes by tuning a small set of continuous control parameters, rather than relying on discrete controller switching. This framework provides a flexible and physically interpretable approach for adapting inverter dynamics to varying grid conditions and operational requirements. The small-signal stability and input-output frequency-domain characteristics are further analyzed under different control parameter settings. The effectiveness and robustness of the proposed unified control method are demonstrated through extensive electromagnetic transient (EMT) simulations and hardware-in-the-loop (HIL) experiments.
Small-signal instabilities, such as unforced sub-synchronous oscillations (SSOs), are increasingly observed in inverter-based resource (IBR) dominated grids. While decentralized stability certificates offer a scalable means to avoid instability onset, they are typically derived under restrictive network-state assumptions--such as small angle differences or negligible voltage drops--that cannot capture how departures from these conditions affect system stability. In this paper, we develop a network model and a decentralized analysis framework that explicitly characterizes how reactive power mismatches, line loading, and inverter control parameters jointly determine small-signal stability. We show that increased steady-state reactive power mismatches and line loading lead to more stringent conditions on admissible inverter droop gains. These results make decentralized stability certificates explicitly network-state dependent, showing how network stress shrinks the set of stabilizing local controller parameters.
Although the electromagnetic transient (EMT) framework can capture subsynchronous oscillations (SSOs), it faces scalability issues for large-scale systems. Thus motivated, we propose a generalized dynamic phasor (DP) framework to analyze SSOs in multi-machine systems with inverter-based resources (IBRs) and large loads such as artificial intelligence data centers (AI DCs) under balanced and unbalanced conditions. The grid-following (GFL) and grid-forming (GFM) IBRs are modeled in their respective $dq$-frame DPs. In contrast, the detailed model of multi-mass turbine driven synchronous generators (SGs) along with dynamic transmission network models and loads are represented in $pnz$-frame DPs. The linearizability and time-invariance of the framework enable us to perform eigen decomposition, which is a powerful tool for root-cause analysis of SSO modes and the design of damping controllers. In addition, the DP modeling approach facilitates faster simulation of large-scale systems. The generalized framework is validated with EMTDC/PSCAD simulations using the IEEE first benchmark model for subsynchronous resonance and the modified IEEE 4-machine system. Several use cases are presented on the modified IEEE 68-bus system with two GFL IBRs to show the applicability of the framework. First, time- and frequency-domain analyses of the IBR-induced SSO mode are presented. Then, two solutions are proposed to damp the poorly damped SSO mode: (a) a decentralized controller is designed using particle swarm optimization, and (b) the control of one GFL IBR is replaced by GFM control. Finally, the impact of AI DC load on primary frequency response of the system and the multi-mass turbines of the SGs are studied.
This study provides a theoretical expansion of the recent Data Relativistic Uncertainty (DRU) framework by formalizing a physics-to-AI paradigm for image enhancement. By modeling images as probabilistic wave functions rather than deterministic states, the paradigm explicitly integrates wave-particle duality to illustrate the system flow of how DRU leverages the intrinsic physical uncertainty of light, a dimension requiring further theoretical discussion. Consequently, this paradigm provides a rigorous Explainable AI (XAI) approach that enhances the interpretability of how DRU mitigates illumination bias and maintains robustness against data noise.
Despite significant technological progress, the realization of fully autonomous berthing and unberthing remains a significant challenge. One of the primary obstacles is the complex, non-linear nature of low-speed ship dynamics, which are difficult to model and control and often necessitate equally complex maneuvering models and control systems. This study proposes a simplified approach to bridge this gap by modeling the ship dynamics in the form of a time-invariant, continuous-time linear state-space system. The model parameters are estimated through system identification using the Covariance Adaptation Strategy Evolution Strategy (CMA-ES) applied to full-scale maneuvering data. Validation results demonstrate a strong agreement between the model output and empirical data. This outcome demonstrates the significant potential of simplified models to effectively define the maneuvering motion of a ship at low speeds.
This paper investigates the joint uplink scheduling and power control problem in a coordinated multicell wireless network, where at most one single-antenna user is allowed to access the single-antenna base station in each cell simultaneously. The resulting weighted sum-rate (WSR) maximization problem is a mixed discrete-continuous, nonconvex optimization problem that is notoriously difficult to solve directly. Classical fractional programming (FP) methods tackle this problem by leveraging the Lagrangian dual transform (LDT) followed by the quadratic transform (QT), yielding a tractable closed-form solution for scheduling and power control, with the LDT playing a crucial role in handling discrete variables. In this paper, we revisit the LDT from a minorization-maximization (MM) perspective and observe that its induced surrogate is somehow conservative due to the reciprocal-coordinate construction. Motivated by this observation, we propose a novel reciprocal-inversion transform (RIT) that constructs a tighter first-order Taylor expansion lower bound for the logarithmic rate function. The proposed RIT remains fully compatible with the QT, leading to a surrogate-enhanced FP (SEFP) algorithm for joint uplink scheduling and power control. The proposed SEFP algorithm retains the desirable per-cell separability of the classical FP framework and admits closed-form updates for the auxiliary variables, scheduling decisions, and transmit powers. Simulation results demonstrate that the SEFP algorithm consistently outperforms the classical FP method and other baselines for different network utilities.
Radio frequency (RF) maps provide a compact representation of multipath propagation characteristics and are fundamental to channel modeling, coverage analysis, and environment-aware wireless optimization. This paper proposes a unified RF map construction framework based on a physics-informed neural network (PINN) and a graph neural network (GNN), supporting both cross-scene generation and in-scene completion with 2D and 2.5D environmental representations. The PINN embeds electromagnetic propagation constraints to establish a physically consistent mapping from receiver locations to multipath parameters, including path gain, time of arrival, and angles, while the GNN enforces spatial consistency by modeling correlations among neighboring receivers. To comprehensively evaluate multipath reconstruction quality, we propose a peak-weighted dynamic time warping metric that jointly accounts for amplitude errors and peak delay misalignment in channel impulse responses. Extensive experiments demonstrate that the proposed method consistently outperforms image-based, diffusion-based, and interpolation baselines across both map-level and multipath-level metrics, achieving robust generalization and high-fidelity RF map construction under sparse observations.
The Koopman operator has gained considerable attention due to its ability to provide a global linear representation of highly complex dynamical systems. The operator describes nonlinear dynamics in a linear way through the lens of real- or complex-valued observable functions. Recently proposed data-driven techniques, like extended dynamic mode decomposition (EDMD), its kernelized variant, and machine-learning methods, can be used to generate finite-dimensional approximations accompanied by finite-data error bounds. In this tutorial paper, we provide a concise introduction into Koopman operator theory and its use in systems and control. A particular focus is put on data-driven surrogate models, their extension to systems with inputs, and controller design using Koopman operator theory. Moreover, we demonstrate the key techniques, i.e., EDMD and Koopman MPC. To this end, we provide simulation studies including source code on GitHub to enable the interested reader to experience the Koopman operator in systems and control step by step.
Packet loss concealment (PLC) reconstructs audio packets that are missing at the receiver, usually with a trained model whose parameters remain fixed at deployment time. This treats the PLC model as static, even though each call or recording exposes signal-specific information through the packets that did arrive. We present TTT-PLC, a self-supervised test-time tuning framework that adapts existing PLC models using only those received packets. The method creates supervision by synthetically masking portions of the available signal, training the model to conceal them with its native PLC objective, and then using the adapted model to reconstruct the true packet losses. No clean reference signal, external adaptation data, or architectural modification is required. We study TTT-PLC in two deployment settings. In the non-causal setting, the received file is available before reconstruction, allowing repeated self-supervised adaptation passes and providing a per-file adaptation ceiling. In the causal setting, audio is streamed without revising emitted samples; adaptation is performed only on completed past blocks, and updated parameters affect only future audio. We instantiate the framework on two public PLC backbones, FRN, a recurrent full-band speech PLC model, and PARCnet, a hybrid autoregressive-neural model for networked music. Across these settings, the results show that pretrained PLC systems do not need to be treated as fixed at inference time, the still-observed portions of a lossy signal can provide an effective training signal for improving concealment on that same signal.
Discrete tokens obtained from neural audio codecs (NACs) have been used as compact representations in audio generation and understanding models. In such token-based systems, token temporal resolution (TTR), defined as the time interval between adjacent token frames, is important because it controls the trade-off between representing rapid acoustic events and reducing token-sequence length. However, most NACs are trained at a single TTR and require separate training for each TTR. This paper proposes a mechanism that enables a single NAC to operate at multiple TTRs using sampling-frequency-independent convolutional layers. The mechanism regards TTR as the sampling period of the token sequence and generates TTR-dependent convolutional kernels from a shared parameter set, while adjusting the kernel size and stride for each TTR. We incorporate the mechanism into Descript Audio Codec, leaving the quantizer unchanged. Experiments on environmental sound reconstruction show that the proposed model outperforms a single-model baseline that switches TTR-specific layers for each TTR.
Wake-up radio (WUR) is a technology designed to enhance the energy efficiency of Internet of Things (IoT) networks and extend device battery life. While most studies focus on WUR performance with single-antenna base stations, this paper investigates the multiple-input multiple-output (MIMO) technology to improve device energy saving and extend the coverage of wake-up signals. By leveraging MIMO beamforming, the transmitted energy can be spatially focused toward the intended IoT devices, with high beamforming gain and minimal inter-device interference. We develop a preliminary analytical framework using stochastic geometry to evaluate the wake-up success probability of WUR-MIMO in multi-cell cellular IoT networks, when the number of antennas equals $2 \times (\text{number of devices}) - 1$. Monte Carlo simulations show that, relative to a single-antenna WUR baseline, MIMO beamforming significantly enhances wake-up reliability when this antenna configuration is applied, mitigates more than 50% of false activations across all settings, and thereby prolongs the lifetime of IoT devices.
Brain-Computer Interfaces (BCIs) have revolutionized neuroscience applications, from motor rehabilitation to neuroergonomics. Traditional implantable BCIs with invasive microelectrode arrays pose challenges, notably the need for wired connections and inherent implantation risks. This paper introduces a battery-free wireless BCI system, consolidating an implant and its external supporting system. Our design centers on a dual-function antenna system: firstly, an inductive coupling mechanism enables wireless power transfer, sufficiently powering the implant's Application-Specific Integrated Circuit (ASIC) for stimulation and readout without an implant battery. Secondly, a backscatter antenna in the implant facilitates battery-free, high-data-rate wireless connectivity (up to 32 Mbps). This system not only enhances the BCI experience by eliminating wires but also retains data fidelity and energy efficiency, promising a safer, more efficient interface for tasks like robotic arm control.
We propose a lightweight multi-path alignment network (LMPAN) for on-device joint acoustic echo cancellation (AEC) and noise suppression (NS) in full-duplex spoken dialogue systems. To address hardware-induced distortions and dynamic acoustic conditions, we introduce three core innovations: (1) a multi-path alignment stage correcting temporal and energy mismatches across reference, linear AEC (LAEC) output, and microphone signals; (2) an attention-based mechanism that dynamically integrates enhanced LAEC and microphone features under varying acoustic scenarios; (3) a post-filtering module with a dynamic target generation strategy for downstream tasks (ASR, VAD). Furthermore, we adopt a two-stage training framework leveraging self-supervised learning representations to enhance perceptual quality. Experiments show that LMPAN, with only 480K parameters and 126 MACs, achieves performance comparable to the state-of-the-art lightweight model DeepVQE-S, while ensuring real-time inference capability.
The loading margin to voltage collapse -- the distance in parameter space to the closest saddle-node bifurcation -- is a standard proximity index for voltage stability. This paper develops its transient-stability counterpart: a margin M that measures the time to the synchronism boundary rather than a distance, and that unifies two limits usually treated separately. The critical clearing time (CCT) is the fast, fixed-parameter limit; the slow drift of the operating point toward a static loadability limit is the other. M is defined as the first-passage time of the joint state-parameter motion to the survival boundary. We prove and verify that M equals the CCT exactly on the one-machine-infinite-bus reduction (deviation <= 0.01% across loadings on a published benchmark), establishing a certified single-machine pillar. Under operating-point drift, M yields an operational lead time before faults become unclearable; we take the 28 April 2025 Iberian blackout timeline as an illustrative time scale for the drift rate. On the New England 39-bus system, an independent benchmark, the single-machine-equivalent reduction reproduces the CCT within 1.8-6.0% (conservatively), and a critical slowing-down signature flags proximity to the boundary. For the multimachine case we characterize the limits explicitly: the transfer-conductance work is tightly boundable, while the controlling unstable equilibrium is the binding obstruction to a certified margin.
This paper studies the robust stabilization of 2 $\times$ 2 linear hyperbolic partial differential equations (PDEs) with Markov-jumping parameters and boundary input delay. The main challenge arises from the simultaneous presence of stochastic parameter variations and input delay, which complicates both the stability analysis and controller design. To address this issue, a nominal delay-compensating backstepping controller is first designed for a fixed nominal system. Applying the nominal transformation to the stochastic system yields a target system with additional perturbation terms induced by parameter mismatch. A mode-independent Lyapunov functional is then constructed to establish a pathwise exponential estimate, which directly implies mean-square exponential stability under an explicit small-mismatch condition. The proposed analysis provides a direct robustness certificate for nominal delay compensation without using mode-dependent Lyapunov functionals. Finally, we present simulation results and discuss how the conservative small-mismatch condition should be interpreted for the numerical example.
High-altitude platform stations (HAPS) are envisioned as a key component of future wireless networks, enabling ultra-wide coverage and providing direct connectivity to users with cylindrical massive multiple-input multiple-output (mMIMO) systems. Exploiting the channel degrees of freedom necessitates accurate modeling and characterization of three-dimensional (3D) channels in the presence of spatial correlation functions (SCFs). However, existing spatial correlation models are primarily developed for planar or linear antenna arrays and cannot be directly applied to cylindrical geometries commonly adopted by HAPS platforms. To address this limitation, this paper derives an exact closed-form expression for the SCF of 3D MIMO channels with antenna elements arranged in a cylindrical array. The proposed formulation is based on the spherical harmonic expansion (SHE) of plane waves and accommodates arbitrary antenna radiation patterns and angular distributions through the Fourier series (FS) coefficients of the power azimuth and zenith spectra. The derived SCF is validated through Monte Carlo simulations under standard-compliant settings.
While Large Multimodal Models excel in comprehension, high-throughput inference engines lack native support for multimodal generation. This is severe in Speech Language Models, where generating multi-layered audio tokens via decoupled AR+NAR or synchronous Multi-Token Prediction (MTP) with delay-pattern interleaving conflicts with standard single-stream loops. We present a vLLM-based inference pipeline for unified speech understanding and generation. We extend autoregressive decoding to natively execute delay-pattern de-interleaving and coordinated multi-stream sampling, integrating an on-GPU acoustic decoder for end-to-end waveform synthesis. Crucially, we overcome the shared intuition that Classifier-Free Guidance (CFG) halves throughput. By co-scheduling paired conditional and unconditional requests within a continuous batch, our CFG implementation sustains 80% of non-CFG throughput, absorbing dual-request and logit merging overheads. We open-source our framework.
Terahertz (THz) transmission technologies hold significant potential for enabling ultra-broadband, short-range communication in next-generation networks. Despite the vast bandwidth, THz signals suffer from limited transmission range and a feasible scenario is to deploy THz within clustered heterogeneous networks (HetNets) to enhance coverage. This paper investigates THz communication in clustered HetNets, leveraging stochastic geometry for performance analysis. Specifically, we consider two tiers of macro base stations (MBS) and small base stations (SBS). The MBS tier is modeled as a Poisson Point Process (PPP), and both the SBS tier and users are modeled as a Poisson Cluster Process (PCP) to capture user clustering and network hotspots. We derive the analytical expressions for user association probabilities, the Laplace transform of interference, and the coverage probability. The derived coverage probability is validated through Monte Carlo simulation. The numerical results show that the coverage in THz PCP-HetNets is higher than that achieved in THz PPP HetNets. In addition, a moderate spatial spread of SBSs is beneficial for coverage.
Penile measurement is clinically relevant across male reproductive and urogenital health, including conditions such as micropenis, congenital and endocrine disorders, and sexual or urinary dysfunction. However, quantitative assessment of penile size has relied mainly on external length or circumference measurements, which are difficult to standardize, sensitive to measurement conditions, and unable to capture the internal portion of the penis. MRI enables volumetric assessment of the whole penis in vivo, but automated segmentation has not previously been established at population scale. Automated whole-organ volumetry would enable high-throughput phenotyping for multi-omics and clinical studies of male reproductive disease. Here, we present a deep learning framework for whole-penis segmentation in multi-channel DIXON MRI. Using a newly curated expert-annotated training dataset ($n = 145$ subjects; $13,050$ annotated slices) and a double-annotated independent test benchmark ($n = 24$ subjects; $2,160$ double-annotated slices), we optimized a 3D nnU-Net architecture. The model achieved a 5-fold cross-validation Dice score of $0.90$ and performed at observer-level accuracy on the independent test set (Dice: $0.92$; Hausdorff distance: $3.58$). We deployed the model in $34,412$ UK Biobank participants, enabling automated quantification of total penile tissue, including both external and internal components. Longitudinal evaluation in 2,282 men demonstrated high inter-session reproducibility ($r = 0.87$). This framework establishes a reproducible and population-scalable method for MRI-based assessment of penile anatomy and provides an open technical resource for future studies in urological imaging and male reproductive health. The trained model weights will be publicly released.
This paper presents a reachability-aware guidance architecture for autonomous approach to a tumbling, uncooperative target under a rotating line-of-sight (LOS) docking corridor. The LOS admissible set rotates with the target body frame, producing time-varying polyhedral constraints in the chaser's relative coordinates. A safe-start region is constructed via two conservative criteria: (i) directional per-constraint erosion, the margin consumed by rotation-induced drift before thrust can arrest it, and (ii) a synchronization range bound $r < 2a_{\max}/\omega_t^2$ ensuring the chaser can cancel the apparent rotational velocity without overshooting the hold point. Closed-loop guidance uses a receding-horizon MPC controller with Clohessy-Wiltshire-Hill (CWH) prediction dynamics and explicit LOS corridor constraints in the quadratic program. Truth propagation uses the exact discrete CWH state-transition matrix with sub-stepping, so feasibility claims are physically honest: no reference blending or state projection is applied. A three-regime tracking law manages the transition from long-range inertial approach to body-frame co-rotation and synchronized hold. The analytical safe-start region is benchmarked against four standard reachability engines (backward and forward polytopic reachable sets, Hamilton-Jacobi level sets, and closed-loop Monte Carlo): the closed-form criteria are 250x faster than Hamilton-Jacobi reachability while predicting closed-loop feasibility with precision 0.80 and recall 0.91 on a 500-case sweep. The residual 6% false-positive rate and the IoU gap against Hamilton-Jacobi quantify a structural property: the synchronization set (reach and co-rotate) is a strict subset of the positional reachable set, the gap widening with tumble rate. The analytical bound is thus a sound inner certificate for onboard go/no-go decisions where Hamilton-Jacobi is prohibitively expensive.
This paper presents an intelligent control framework for trajectory tracking of robotic manipulators using radial basis function (RBF) neural networks for online disturbance estimation. The proposed control structure combines model-based nonlinear control with an adaptive neural approximator that compensates for parametric uncertainties, friction, and unmodeled dynamics. A Lyapunov-based adaptation law with projection guarantees boundedness of the closed-loop signals and convergence of the tracking error to a compact region. The primary objective of this work is to investigate how the choice of activation function within the RBF network influences transient behavior, steady-state accuracy, and control smoothness. The controller is implemented on a robotic manipulator. Experimental results demonstrate that although stability is preserved for all kernels, activation function selection significantly affects adaptation dynamics and practical tracking performance. These findings demonstrate that activation function selection acts as a structural design parameter in intelligent control, directly shaping adaptation dynamics and practical closed-loop performance.
Monitoring the radio-frequency (RF) spectrum from space imposes demanding requirements to satellite platforms in terms of communication bandwidth and computational resources, which are necessary for the downlink, the storage, and the processing of high-throughput I/Q samples. This paper analyzes in depth the quasi-direct geolocation (QDG) as a technique to enable the exploitation of satellites of opportunity in low Earth orbit (LEO) to sense the spectrum in the bands of global navigation satellite systems (GNSS). This is a technique of passive RF geolocation and consists of an ensemble of signal processing algorithms, which compress the I/Q samples and process the compressed data through fast delay-Doppler shift matching and interferometry in a quantized time-frequency domain. These algorithms speed up the exhaustive search of multiple RF sources in the position domain. The efficiency gain addresses the bottleneck that prevents the employment of satellites, which are limited in downlink capacity and on-board computational power. These satellites are usually constrained in size, weight and power (SWaP) and represent most of the spacecrafts in LEO. The ability to exploit assets as such for the geolocation of terrestrial GNSS jammers in near real time is instrumental the performance of a multi-constellation GNSS RFI monitoring system. The present work describes the mathematical framework and precision bounds, introduces single- and multi-antenna uses cases, combines different compression methods, and evaluates the geolocation accuracy with real data. The I/Q samples were collected by a repurposed GNSS reflectometry (GNSS-R) satellite, OPS-SAT PRETTY, in a dedicated test session during Jammertest 2025. The experimental results demonstrate the capability to geolocate GNSS jammers with different signal-to-noise ratios (SNR) with extremely high compression ratios.
This paper studies safe optimal output agreement for nonlinear multi-agent systems with output safety constraints. Existing safe feedback optimization methods often implement gradient-flow dynamics directly through the plant input, which may require high-order control barrier functions (HOCBFs). The resulting derivative-chain design is tuning-sensitive and can introduce additional equilibrium conditions that alter the steady-state optimal solution. We propose a reference-governed two-layer architecture that separates lower-layer output regulation from upper-layer distributed optimization. The upper layer filters the reference gradient flow through first-order control barrier function constraints, which are easier to tune and preserve the steady-state optimality structure of the original agreement problem. The lower layer uses an internal-model-based output regulator with a reference-dependent Lyapunov function, from which dynamic safety margins (DSMs) are constructed to certify transient output safety. We prove forward invariance, optimal-solution preservation under DSM-compatibility conditions, and convergence via a Lyapunov small-gain argument. Simulations validate safe convergence, show advantages over HOCBF-based feedback optimization, and demonstrate adaptive tangential objective shaping for escaping spurious equilibria induced by nonconvex obstacles.
Mutual information (MI)-inspired feature learning techniques are capable of generating low-dimensional embeddings that retain nonlinear dependence structures, but direct estimations of MI suffer from noisy probability distribution estimates in the low-data regime. The H-Score objective, computed from second-order statistics, provides a practical proxy metric for training feature extraction networks. We prove that H-Score is invariant to invertible transformations in the unrestricted functional setting, but becomes sensitive to input basis rotations under constrained approximation classes. Consequently, we study unitary preconditioning for H-Score networks and show that selecting an appropriate basis rotation reduces finite-width truncation error by concentrating predictive dependence into fewer dominant modes. We identify the fast Fourier transform (FFT) as an effective data-independent, low-cost preconditioner for approximately stationary processes, where spectral structure induces concentration of the cross-covariance singular value spectrum. We introduce training-free metrics based on spectral entropy and cumulative dependence energy to quantify basis suitability and predict downstream inference gains prior to network training. Experiments across eight multivariate datasets demonstrate that FFT preconditioning is particularly useful in resource-constrained regimes, achieving up to 50% normalized mean squared error (NMSE) reduction, while the proposed metrics correlate with observed performance gains and correctly identify cases where spectral preconditioning is detrimental.
Current Few-shot Class-incremental Audio Classification (FCAC) methods assume that samples of base and incremental classes are in the same domain (following the same distribution). However, there is generally a domain shift between the above two types of samples. In this paper, we explore the problem of Cross Domain FCAC where samples of base and incremental classes have domain shift. We propose a strategy of adversarial contrastive training which enables the model to effectively classify samples of different classes from unseen domains. The model consists of an encoder and a classifier. The encoder is trained in base session but frozen in incremental sessions, whereas the classifier is trained in all sessions. Experiments are done on six pairs of cross-domain datasets. Results show that our method exceeds state-of-the-art methods in average accuracy. The code is at this https URL.
Robust speech understanding in real-world acoustic environments remains a fundamental challenge for intelligent auditory systems such as robot audition, hearing aids, teleconferencing systems, smart speakers, and voice-controlled assistants. These systems must operate under background noise, reverberation, competing speakers, and dynamic acoustic conditions. Spatial speech perception addresses this challenge by exploiting microphone-array information to localize, enhance, and interpret target speech in complex acoustic scenes. This paper surveys spatial speech perception systems with emphasis on the roles of sound source localization (SSL), directional speech enhancement (DSE), and automatic speech recognition (ASR), both individually and within integrated processing pipelines. We review classical signal-processing approaches and recent learning-based methods for microphone-array localization, beamforming, neural enhancement, speech separation, and modern recognition architectures. Beyond component-level analysis, we discuss robustness to noise and reverberation, multi-speaker operation, real-time constraints, and computational efficiency. We also examine representative applications in robot audition, hearing assistance, smart speakers, and teleconferencing, and identify open challenges and future directions toward robust, low-latency, and perception-aware speech systems for complex acoustic environments.
This paper investigates geometry-reconfigurable transmission for multiuser communication systems enabled by a rotatable antenna array. In contrast to conventional fixed arrays, the proposed architecture jointly exploits array pose adjustment and element-level boresight steering, thereby reshaping both the array-induced phase responses and the direction-dependent channel gains. We formulate a weighted sum-rate maximization problem that jointly optimizes the transmit beamformers, array pose, and element boresights under practical visibility and steering constraints. To reveal the underlying design principles, we first provide a geometric interpretation via zero-forcing analysis, showing that the resulting rates stem from both channel-strength enhancement and spatial-separability improvement. Specifically, array-pose rotation improves inter-user channel orthogonality even with isotropic elements, whereas directional elements introduce a tradeoff between phase-based spatial separation and boresight-dependent gain alignment. Motivated by these insights, we develop an efficient optimization framework that jointly coordinates transmit beamforming, array-pose adaptation, and element-boresight steering to exploit the geometry-induced phase-and-gain channel-shaping capability. Simulation results demonstrate that the proposed joint design outperforms fixed-array, pose-only, and boresight-only benchmarks, with larger gains achieved under more directive element patterns and tighter boresight-steering constraints.
Stochastic resources such as wind farms, electric vehicle aggregators, and demand-side assets are increasingly participating as reserve providers in ancillary service markets. To manage delivery uncertainty, system operators impose minimum reliability thresholds on such providers. Energinet, the Danish transmission system operator (TSO), has pioneered this approach through the P90 requirement, requiring stochastic providers to make accepted reserve capacity bids available with at least 90% probability. Yet this threshold is set by regulatory convention, not optimization: no existing framework treats it as a design variable or characterizes the cost-reliability trade-off it governs. This paper closes that gap. We develop a bilevel optimization framework in which the TSO in the upper level sets the reliability threshold endogenously while providers in the lower levels respond through reliability-constrained bidding, with chance constraints reformulated analytically using a Weibull tail distribution. Applied to the Nordic frequency containment reserve for disturbances (FCR-D) market, the cost-optimal threshold lies below P90 in the studied cases, with cost reductions by up to 14.5% relative to the fixed standard. Dynamic hourly thresholds yield a further reduction of up to 2.4%, suggesting efficiency gains may increase in larger and more diverse reserve markets.
The displacement of synchronous generation by inverter-based resources is accelerating power system frequency dynamics beyond the response capability of conventional automatic generation control. This paper presents Autonomous Grid Generation Control with Decision Transformers, a framework coupling an offline-trained Decision Transformer with a twostage symbolic safety stack for secondary frequency control. The Decision Transformer learns a conditional dispatch policy from offline supervisory control and data acquisition records via sequence modeling, eliminating online exploration risks. A Constraint Verification Unit provides sub-ten-millisecond algebraic screening using real-time power transfer distribution factors, while an aggregate digital twin performs swing-equation-based dynamic stability certification. Validated on the Northeast Power Coordinating Council 140-bus system under low-inertia conditions, the proposed controller reduces the area control error integral by over 99% relative to tuned automatic generation control, maintains a 59.4 Hz frequency nadir, and achieves inference latency of approximately 10 ms, well within real-time constraints. Comparative evaluation against a linear quadratic regulator baseline and structural analysis against conservative Q-learning demonstrate the advantages of the sequence-modeling formulation. Small-signal eigenvalue analysis characterizes the dominant 1.87 Hz electromechanical mode and confirms that the safety stack maintains stable operation across operating points. By falling back to tuned automatic generation control whenever proposals are rejected, the safety stack bounds worst-case performance to industry-standard levels in simulation.
This paper investigates a robust transmission design for a multi-user rate-splitting multiple access (RSMA)-based simultaneous wireless information and power transfer (SWIPT) system empowered by movable antennas (MAs) and a reconfigurable intelligent surface (RIS) under channel state information (CSI) uncertainty and residual hardware impairments (HIs). The effective channels in MAs-enabled systems depend on antenna positions, causing CSI uncertainty to affect not only active and passive beamforming but also antenna position optimization. Furthermore, residual HIs distort the effective SINRs, creating additional coupling among beamforming, RIS reflection control, common-rate allocation, power-splitting ratio optimization, and antenna position optimization. Consequently, the joint impact of CSI uncertainty and HIs leads to a highly coupled and challenging resource allocation problem. To address this challenge, we propose a robust resource allocation framework that jointly optimizes common-rate allocation, transmit beamforming, RIS reflection coefficients, power-splitting ratios, and MAs positions to maximize the achievable sum-rate while satisfying practical system constraints. To obtain an efficient solution, the original problem is decomposed into active beamforming, RIS reflection design, power-splitting ratio optimization, and MAs position optimization subproblems, where tractable convex surrogate functions are constructed to handle the non-convex objective and constraints. Simulation results verify the effectiveness of the proposed framework and demonstrate substantial improvements in achievable sum-rate, robustness against CSI uncertainty and hardware impairments, and convergence performance compared with benchmark schemes.
Accelerated magnetic resonance imaging reduces acquisition time, but reconstruction from undersampled k-space can blur diagnostically relevant structures or introduce failures that are not captured by global image metrics. We propose SA-RDM-DC, a Self-Auditing Residual generative Drifting Model with Data Consistency for accelerated knee MRI. The method adapts the newly proposed generative drifting paradigm to accelerated MRI by training a physics-conditioned drift field from the zero-filled reconstruction toward the fully sampled residual correction. It predicts image- and missing-k-space residual corrections, enforces data consistency with acquired k-space, uses frequency-aware and residual drifting supervision to recover fine detail, and produces dense error maps and slice-level risk scores in the same inference pass. We evaluate SA-RDM-DC on multi-coil fastMRI knee data at acceleration factors of 4, 8, and 12, with fastMRI+ pathology annotations for region-level and classifier-based task preservation, and on SKM-TEA for zero-shot and fine-tuned protocol-shift evaluation. Compared with zero-filled reconstruction, UNet-image-SENSE, DC-UNet, Score-Diffusion, ELF-Diff, SENSE-VarNet, and MoDL baselines, SA-RDM-DC achieves the highest SSIM across fastMRI acceleration factors while retaining subsecond per-slice inference and avoiding the long sampling time of iterative diffusion baselines. In pathology-aware analysis, SA-RDM-DC preserves lesion-region structural fidelity and reduces meniscus prediction instability. Its self-auditing scores strongly identify high-error reconstructions on fastMRI and partially transfer as a selective-review signal under SKM-TEA protocol shift. These results support reconstruction evaluation that jointly considers image fidelity, pathology preservation, runtime, and case-specific reliability.
This paper studies source seeking for a torque-controlled nonholonomic vehicle with a laterally displaced scalar sensor. The vehicle has constant forward speed, while its yaw motion is controlled by torque input with unknown inertia and damping. The objective is to steer the vehicle to a source-centered circular motion so that the lateral sensor approaches the unknown source, without using position, heading, source-location, gradient, or source-value information. The proposed torque law combines a fast oscillatory component, which generates averaged steering through symmetric-product approximation, with a slowly tuned bias component, which selects the desired orbit. Two bias-tuning designs are developed. The first is an output-feedback design using only the scalar measurement; it applies a Lie-bracket extremum-seeking update and yields local practical stability. The second is a velocity-assisted design using forward-speed and yaw-rate measurements; it tunes the bias through the yaw-rate tracking error and yields a globally asymptotically stable averaged system, implying semi-global practical stability of the original system. Simulations illustrate the proposed designs.
In this letter, we present a strategy for autonomous docking of autonomous vehicles in three-dimensional space. Docking is a safety-critical task and requires expert piloting skills. Vehicles with autonomous docking capabilities are highly desirable in various applications, such as marine vehicle docking, aerial vehicle docking, spacecraft docking, and landing. To dock autonomously with the docking station, the vehicle must align itself to a specific desired orientation relative to the docking station and also reduce speed as it approaches. The vehicle achieves near-zero speed to dock successfully and safely without colliding with the docking station. Inspired by the philosophies from the guidance literature, we present a finite-time sliding mode-based strategy to achieve the same. The range and line-of-sight kinematics relations describing the motion of the vehicle with respect to the stationary docking station are used to steer the vehicle to achieve the desired orientation for docking. This docking strategy is validated in MATLAB\textsuperscript{\textregistered} simulations for various initial locations and orientations of both the vehicle and the docking station.
Recent advances in speech synthesis have shifted from phoneme representations to direct grapheme modeling. While phonemes address the one-to-many mapping between text and acoustics, they rely on grapheme-to-phoneme (G2P) systems that fail to capture speaker-specific acoustic variation. Prior work demonstrates that grapheme-based models outperform phoneme-based systems at scale, but not in low-resource settings. In this paper, we propose SPARCLE, a speaker-aware grapheme representation model that enriches characters with their precise acoustic realizations. SPARCLE is trained with a contrastive objective to align graphemes with corresponding Wav2Vec2 acoustic representations while conditioned on speaker identity. The resulting model serves as a replacement to G2P systems for downstream text-to-speech (TTS) tasks. We demonstrate that SPARCLE improves generation quality, reducing word error rates by half in extreme low-resource settings compared to standard grapheme-based models.
Accompanying a group of humans is an essential aspect of developing human-like social cognition in robots. However, human groups typically do not follow fixed formations, which poses significant challenges for robots in maintaining natural companionship behaviors. In this paper, we propose an adaptive group-accompaniment method for social robots based on Vision-Language Models (VLMs), leveraging their semantic reasoning capabilities to infer companion positions, maintain social distances, and understand group dynamics. The members of the group are first detected, and a perceptual module generates visual representations of the interaction group space as input to the VLM, which is then combined with a Model Predictive Path Integral (MPPI) controller to ensure stability and safety. Experimental evaluations across five scenarios show that the proposed method enables robots to accompany the group effectively, demonstrating a 15\% improvement in success rate and a 25\% reduction in collision rate compared to baseline approaches. Additionally, a user study indicates that the generated companionship behaviors are perceived as natural and socially appropriate.
Cyber-physical systems (CPSs) are increasingly deployed in every aspect of our lives and can be compromised through memory corruption vulnerabilities, allowing attackers to hijack the control flow and take over the system. Existing techniques mostly focus on detecting such attacks but respond by terminating or halting execution upon attack detection, which is not acceptable in CPSs used in safety-critical tasks, as interrupted tasks can have catastrophic consequences. Other techniques replace compromised CPS components with simplified defaults that degrade system behavior, or reboot the system upon attack detection. We propose Chameleon, a novel framework for automatically recovering CPSs from memory corruption attacks using machine learning (ML)-based surrogates trained at compartment granularity that nearly replicate their original compartments' behavior but do not have the same memory corruption vulnerabilities. Upon attack detection, Chameleon replaces the compromised compartment with its trained surrogate. We implemented Chameleon using the LLVM compiler and evaluated its efficiency and effectiveness on seven different robotic vehicles (RVs), including simulated and real ones. We found that Chameleon can generate surrogates that closely approximate the original compartments (with an average R$^2$=0.96), successfully recover the system despite real-world memory corruption attacks unlike prior approaches, and complete their tasks while incurring low performance and memory overhead.
Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities across robotic manipulation tasks, yet their real-world deployment remains limited by the lack of effective safety measures. Specifically, existing safety measures only prevent collisions caused by the robot's next action. In this paper, we propose a neuro-symbolic safety guidance mechanism for flow matching based VLAs that enables predictive collision avoidance. Flow matching based VLAs determine the next actions by predicting a trajectory (a sequence of actions) through an iterative neural flow matching process. Our method formulates safety enforcement as a minimum-norm constrained optimization problem that corrects safety violations during the denoising process of noisy intermediate trajectory predictions. By analyzing predicted trajectories and applying corrections during iterative denoising, our approach anticipates collisions before they become unavoidable. This interleaving of symbolic constraint satisfaction with neural trajectory generation enables predictive collision avoidance rather than reactive intervention. On the SafeLIBERO benchmark, our method achieves 82.8% collision avoidance and 81.6% task success, a 6.3% and 19.8% improvement respectively over single-step methods, with the largest gains on long-horizon tasks where compounding distribution shift is most pronounced. Video demonstrations of our approach are included on our project page at this https URL.
At the heart of human visual perception lies the ability to maintain a continuous and coherent understanding of the external world. By integrating observations with accumulated experience, the human visual system can continuously adapt to variations in both the target and its surrounding environment, while preserving robust visual continuity as scene dynamics evolve. Human vision can therefore integrate prior knowledge, spatial geometry, and semantic context to understand complex scenes and their changes. As a core problem in computer vision, visual object tracking aims to bring machine perception closer to human visual perception. These capabilities are central to the task of Generic Object Tracking (GOT). In this task, a visual tracker is initialized only with the bounding box of an arbitrarily specified target in the first frame, and must continuously localize the target in subsequent dynamic visual streams. However, future events, observations, and real-world variations are inherently unpredictable; therefore, the model's generalization and online adaptation capabilities remain bottlenecks. Tracking reliability can deteriorate when the target undergoes severe deformation, is affected by complex distractors, encounters significant environmental changes, or belongs to a category unseen during training. This dissertation aims to narrow the gap between machine visual tracking systems and human visual perception by proposing a series of methods that systematically enhance the target discrimination, robust adaptation, and geometric reasoning capabilities of tracking models.
Small multirotor aircraft are increasingly tasked with operations in the atmospheric boundary layer, where turbulent winds comparable to the vehicle's airspeed degrade trajectory tracking and can defeat conventional feedback control. This work illustrates a two-stage learning pipeline that first estimates the local wind from onboard kinematics and dynamics and then exploits that estimate inside a reinforcement learning (RL) flight controller. The wind estimator, an attention-augmented gated recurrent network trained on thousands of simulated flights through von Karman turbulence with power-law shear and veer, recovers the horizontal wind vector with a per-flight root-mean-square error of 0.40 m/s and a direction error of 3.2 degrees on unseen wind regimes, an accuracy near the floor imposed by unresolved turbulence, and generalizes to vertical ascent profiles with a skill score of 0.861 over a constant-wind reference. A proximal policy optimization controller receiving the frozen estimator's output reduces horizontal trajectory tracking error by 48% relative to a wind-blind proportional-derivative baseline across mean winds of 4 m/s to 12 m/s, winning on 100% of evaluation episodes. A three-way ablation decomposes this improvement into a kinematic component, available without wind information, and a wind-perception component; the perception share rises with wind speed, from small in light winds toward roughly half the total benefit in strong winds, consistent with the quadratic scaling of aerodynamic drag. The controller degrades gracefully on out-of-distribution winds of 13 m/s to 15 m/s, where the baseline fails catastrophically.
Speech-LLM integration has shown promising results by leveraging extensive textual pretraining, yet its specific benefits for automatic speech recognition (ASR) remain unclear. We observe that as supervised ASR training data increases, the contribution of LLM priors becomes less evident, and simple speech-text joint training under-utilizes textual knowledge. We therefore propose Joint Speech-Text Interleaved Pretraining (JSTIP), an ASR-oriented pretraining strategy that constructs word-level and segment-level interleaved speech-text sequences within aligned pairs for speech-LLM architectures that accept continuous inputs. Experiments on 38k hours of ASR data show consistent entity accuracy improvement compared to ASR-only and joint speech-text training baselines. JSTIP achieves on-par entity recognition performance using domain transcription text compared to synthetic speech-text pairs, simplifying domain adaptation. Benefiting from textual pretraining and domain text data, JSTIP is competitive with open-source ASR and Speech-LLM systems in medical entity recognition. The zero-shot speech question answering behaviors further suggest that interleaving reduces the speech-text modality gap and preserves the LLM generative prior, which is likely the reason for the entity improvements on the ASR task.
We study how to predict the downstream closed-loop performance of a learned latent world model from validation-time diagnostics alone. Choosing the right checkpoint from a world-model training run is difficult: validation loss and multi-step prediction RMSE keep improving long after closed-loop performance has collapsed. We present a suite of structural validation-time diagnostics drawn from optimal-control theory and apply them to Gymnasium's LunarLander v3, which features shaped rewards. We train an RSSM [5, 4] world model on it and treat per checkpoint CEM-MPC return as the oracle for closed-loop quality. By evaluating 40 metrics against this oracle, we find that the strongest single predictor is the Reward Observability Fraction (ROF), which measures the reward predictor's dependence on the observable subspace. We combine ROF with three structural regularizers into a single-number offline checkpoint selection score, the Composite Reward Observability Fraction (CROF). The CROF-selected world model trains a model-based A2C policy that beats a fairly evaluated model-free A2C baseline by ~24.5 return points while using ~65x fewer real-environment interactions, and the same world model also drives a strong zero-shot CEM-MPC policy. Code and data: this https URL.
Vision-radar fusion is central to robust autonomous driving, combining dense visual semantics with precise range and velocity measurements from radar. However, real-world fusion quality is fundamentally challenged by dynamically varying input quality, stemming from occlusion, adverse weather, and channel noise. To address this, we re-frame the problem from static data fusion to channel-aware semantic reasoning and propose a Large Language Model-centric Semantic-layer Channel-aware Integrated Perception (LM-SCIP) framework. It places a Large Language Model (LLM) as a central reasoning core to fuse a local visual stream with a quality-varying external radar stream used to cover perception-blind spots. Concretely, LM-SCIP couples a hierarchical radar-vision encoder with a Channel-Adaptive Semantic Module (CASM) that maps link indicators into a "Channel Prompt" to dynamically gate external radar features. A parameter-efficient, LoRA-tuned LLM, in conjunction with a heterogeneous Mixture-of-Experts (H-MoE), then arbitrates between local visual cues and the channel-conditioned radar context. Finally, a decoupled multi-task decoder outputs localization, trajectory forecasting, and image reconstruction. Experiments on nuScenes and VIRAT validate our approach. On nuScenes, under a controlled toggle of radar input, LM-SCIP reduces localization RMSE by 40.0% versus a vision-only baseline. On VIRAT, the model attains a 0.214m localization RMSE and 0.179m minFDE (k=1). These results reveal that the proposed LM-SCIP enables a robust vision-dominant fallback at low SNR and synergistic fusion at high SNR.
Channel State Information (CSI) has become a widely used wireless channel sensing modality for applications such as indoor localization, activity recognition, and respiration monitoring. Because collecting labeled data under every target condition is impractical, training CSI-based models often relies on simulated data produced by adding noise or perturbations to recorded channel estimates, most commonly additive white Gaussian noise (AWGN). This practice assumes that the receiver chain between the antenna and the channel estimator is linear and gain-invariant. We test this assumption empirically using RF jamming as a controlled perturbation on 6 commodity receivers across 2 indoor environments. The assumption does not hold. Automatic gain control compresses the channel estimate multiplicatively before digitization, producing amplitude distributions that no additive noise variance can reproduce. To close the resulting fidelity gap, we propose M_QTC, a measurement-calibrated model that learns the per-subcarrier distribution transformation through quantile mapping, temporal filtering, and copula-based cross-subcarrier reordering. M_QTC reduces amplitude error 8-fold and closes 89% of the aggregate fidelity gap across four complementary dimensions. The improvement transfers directly to downstream tasks, where 5 classifiers from different families trained on M_QTC-simulated data recover 93% of real-data jamming detection performance, while AWGN-trained classifiers remain near random decision.
Motivated by the challenge of stabilizing a general unknown linear dynamical system (LDS) from observations, we study the natural prerequisite of online prediction. Our goal is to achieve sublinear regret with a memory footprint that adapts to the intrinsic complexity of the dynamics rather than the full hidden -- state dimension. We focus on the practically central regime of systems with low instability complexity -- eigenvalues outside the real stable interval that do not decay rapidly, together with non-semisimple modes-potentially embedded in an otherwise stable real spectrum of much higher dimension; we write $k$ for this count. This regime is the primary setting in which stabilization is plausible: we show that many systems with high instability complexity cannot be stabilized without exponentially large controls. Thus, prediction is meaningful for stabilization precisely when the instability complexity is small. Within this regime, we introduce a unified online algorithm that handles every LDS (including non-diagonalizable systems with complex or exploding modes) with a learnable parameter count of $\widetilde{O}(k)$. Finally, we prove a lower bound showing that $k$ is a valid complexity measure: any filter-based predictor needs at least $k$ filters. Experiments corroborate our theory: on a high-dimensional system, our predictor sharply outperforms prior methods at an equal parameter budget.
We study timestep allocation for score-based diffusion sampling, where a learned reverse-time dynamics is discretized on a finite grid. Uniform and hand-crafted schedules are standard choices, but they rely on fixed prescriptions and can therefore be suboptimal. To address this limitation, we propose Adaptive Reparameterized Time (ART), a continuous-time control formulation that learns a time change by treating the speed of the sampling clock as the control, so that a uniform grid on the learned clock induces adaptive timesteps in the original diffusion time. Based on a leading-order Euler error surrogate, ART provides a principled objective for allocating timesteps along the sampling trajectory. To solve this deterministic control problem, we introduce ART-RL, an auxiliary randomized formulation with Gaussian policies that turns schedule learning into a continuous-time reinforcement learning problem. We prove that the randomized ART-RL formulation is equivalent to ART at the optimizer level, in the sense that its optimal Gaussian policy recovers the optimal ART time-warping rate through its mean. We further establish policy evaluation and policy improvement characterizations and derive trajectory-based moment identities that yield implementable actor--critic updates for learning the schedule. Across experiments ranging from controlled low-dimensional settings to image generation, ART-RL can be plugged into existing diffusion samplers by changing only the timestep grid, consistently improving sample quality over strong baseline schedules at matched budgets while leaving the rest of the sampling pipeline unchanged. The learned schedules also exhibit broad generalization, transferring without retraining across sampling budgets, datasets, solvers, pipelines, and representation spaces.
Alzheimers disease (AD) is a brain disorder that develops slowly and mainly affects memory, thinking, language, and daily activities. It is one of the most common causes of dementia and creates many difficulties for patients as well as their families. In the early stage, the symptoms are often mild and may look like normal ageing. For this reason, many people are diagnosed late, when the disease has already progressed. At present, there is no complete cure for AD. Still, early detection can help doctors manage the condition better and take suitable steps at the right time. In this study, a machine learning model is proposed to detect the early stages of Alzheimers disease using clinical details, neuropsychological test scores, and neuroimaging-related measures. The data used in this work is collected from the Alzheimers Disease Neuroimaging Initiative (ADNI). As the dataset has missing values, iterative imputation is applied to fill them. The dataset also has class imbalance, which is handled using Borderline SVM-SMOTE. After that, feature selection is carried out using wrapper-based and embedded methods so that only important features are used for training. The selected features are divided into training and testing sets, and feature scaling is applied. A stacking ensemble model is developed using Logistic Regression, Extra Trees, Bagging KNN, and LightGBM as base classifiers. Along with this, an artificial neural network is also trained on the same dataset. The performance of these models is compared using precision, recall, F1-score, and AUC-ROC. This study aims to find the best classifier and also identify important biomarkers that may help in the early diagnosis of Alzheimers disease.
Instruction tuning for speech language models (SLMs) is substantially more challenging than for text-based large language models (LLMs), as it requires learning a new modality and a wide range of speech-specific instructions in addition to those supported by text LLMs. Existing SLM training approaches largely replicate the text LLM training paradigm by synthesizing large-scale speech pre-training and instruction-tuning datasets. However, this strategy is difficult to scale, since speech sequences are significantly longer than text sequences. In this paper, we propose SpeechCombine, an instruction-following speech language model trained without any instruction tuning, using only a single round of speech pre-training on 30k hours of data. Starting from a text LLM base model, we perform continuous pre-training on speech utterances to obtain a speech-adapted model, and then directly combine its weights with the weight difference between the instruction-tuned and base versions of the text LLM. Our results show that this simple combination strategy not only preserves the knowledge and capabilities of the original text LLM, but also effectively transfers them to the speech domain. These findings suggest a new direction for SLM training that avoids reliance on massive speech data.
For clinical deployment, it is essential that automated diagnostic systems remain reliable when confronted with previously unseen cases, yet deep models routinely misclassify out-of-distribution (OOD) inputs with high confidence, underscoring the need for more robust OOD detection methods. Although substantial effort has been devoted to improving model robustness, most of the existing literature assumes balanced datasets, evaluates OOD detection on coarse or non-clinical OOD sources, or lacks comprehensive assessment across diverse OOD scenarios. To address the gaps, we propose a novel methodology trained on diverse and imbalanced medical datasets and evaluated across a clinically reflective OOD spectrum. Our framework comprises three key components: (1) a Nonlinear von Mises-Fisher (NvMF) classifier capable of learning non-linear decision boundaries, with theoretical proof of its asymptotic connection to cosine classifiers; (2) a multi-expert framework in which margin-aware NvMF classifiers specialise in different regions of label distribution to better handle imbalance; and (3) an outlier expert trained explicitly to distinguish inlier from outlier data, thereby strengthening OOD detection. Evaluation on RFMiD, ISIC2019, and NCTCRC datasets demonstrates consistent improvements over state-of-the-art methods, achieving mean FPR95 reductions of 8.45%, 13.02%, and 36.90% respectively. These gains are further supported by comprehensive ablations that validated the contributions of each component. This enables reliable identification of unfamiliar cases for deferral to clinicians, supporting safer AI-assisted diagnosis in real-world workflows. Our code is available at this https URL.
Narration is central to the audiobook listening experience, shaping how listeners engage with and understand the content. This work explores how narration qualities shape an audiobook's appeal, noting that their effects can vary by genre, title, and audience. We extract vocal and acoustic features (e.g., tone, pace, loudness) from LibriVox using pre-trained audio models and analyse their relationship with consumption data (specifically, view-rate) and their interplay with genre and title. Despite limited consumption data, we find that acoustic information alone has a robust association with appeal, even after accounting for title effects. We further validate these findings using more nuanced proprietary engagement metrics. To our knowledge, this is the first systematic computational study linking narration qualities, genre, title, and audiobook consumption, highlighting the potential of data-driven insights to improve audiobook personalisation and narrator casting.
This paper presents QuadRocket, a quadrotor-based rocket prototype that provides a low-cost, low-risk platform for validating advanced thrust-vector control strategies for launch vehicle-type systems. The prototype consists of a cylindrical main body mounted on top of a quadrotor through a universal joint, forming a flying inverted pendulum with non-negligible inertia. For control design, the coupled system is modeled as a single axisymmetric rigid body actuated by a vectored force applied along its longitudinal axis. A reduced-attitude representation on the two sphere is adopted to explicitly exploit the vehicle's axial symmetry and to decouple yaw from the thrust-vector direction. On this model, we derive an adaptive backstepping controller that achieves almost global trajectory tracking in the presence of unknown constant disturbances, while a control-point transformation mitigates non minimum-phase behavior. The quadrotor is then treated as a thrust vector actuator, and a dynamic-surface-based attitude controller is designed to track the desired thrust-vector, accounting for actuation dynamics and avoiding explicit differentiation of virtual control signals. The complete architecture is evaluated in simulation and validated experimentally in an indoor motion-capture arena. Results demonstrate accurate trajectory tracking, effective disturbance compensation, and confirm the suitability of the QuadRocket as a versatile testbed for thrust-vector-controlled robotic vehicles.
Fetal brain tissue segmentation from magnetic resonance imaging (MRI) is crucial for studying neurodevelopment, but remains challenging due to data heterogeneity and limited annotations. Domain randomization (DR) has recently emerged as a promising strategy for single-source domain generalization by synthesizing training images with randomized artifacts, contrast, and resolution. In this work, we investigate how to maximize the out-of-domain (OOD) generalization of DR-based methods. We evaluate several synthetic data generation strategies for DR, with a particular focus on our recently proposed framework, FetalSynthSeg. We show that simple Gaussian mixture-based intensity modeling outperforms more complex physics-based simulations, and that intensity clustering (subdividing tissue classes based on intensity) improves OOD robustness. Evaluated on 348 fetal subjects from four sites spanning 0.55-3T and both T1w and T2w contrasts, FetalSynthSeg reaches state-of-the-art performance on several FeTA 2024 testing datasets (80-85 Dice score) and, for the first time, offers robust segmentation on modalities other than T2w for fetal brain segmentation (80 Dice on dHCP-T1w dataset). Compared with state-of-the-art methods such as BOUNTI, nnU-Net ensemble, and the FeTA 2024 winner, FetalSynthSeg delivers comparable or superior accuracy while maintaining strong robustness across domain shifts. Our code, model weights, and Docker image ready for easy inference are available at this https URL.
As global life expectancy increases, so does the burden of chronic diseases, yet individuals exhibit considerable variability in the rate at which they age. Identifying biomarkers that distinguish fast from slow ageing is crucial for understanding the biology of ageing, enabling early disease detection, and improving prevention strategies. Using contrastive deep learning, we show that skin biopsy images alone are sufficient to determine an individual's age. We then use visual features in histopathology slides of the skin biopsies to construct a novel biomarker of ageing. By linking with comprehensive health registers in Denmark, we demonstrate that visual features in histopathology slides of skin biopsies predict mortality and the prevalence of chronic age-related diseases. Our work highlights how routinely collected health data can provide additional value when used together with deep learning, by creating a new biomarker for ageing which can be actively used to determine mortality over time.
In this work, we address the challenge of approximating unknown system dynamics and cost functions through a Koopman-based Inverse Optimal Control (IOC) framework. Using optimal trajectories, a modified Extended Dynamic Mode Decomposition with control (EDMDc) constructs a bilinear control system in lifted coordinates. Pontryagin's Maximum Principle (PMP) conditions are then derived, revealing structural similarities to the inverse Linear Quadratic Regulator (LQR) problem. This allows tractable cost recovery without resorting to nonlinear IOC formulations. The bilinear representation also inherits the analytical advantages of linear systems. Simulation and robotic experiments validate the approach, showing accurate estimation of both dynamics and costs, and illustrating its potential for general control and modeling applications.
Authorization systems are increasingly relying on processing radio frequency (RF) waveforms at receivers to fingerprint (i.e., determine the identity of) the corresponding transmitter. Federated learning (FL) has emerged as a popular paradigm to perform RF fingerprinting in networks with multiple access points (APs), as they allow effective deep learning-based device identification without requiring the centralization of locally collected RF signals stored at multiple APs. Yet, FL algorithms that operate merely on in-phase and quadrature (I/Q) time samples incur high convergence rates, resulting in excessive training rounds and inefficient training times. In this work, we propose FLAME: an FL approach for multi-modal RF fingerprinting. Our framework consists of simultaneously representing received RF waveforms in multiple complementary modalities beyond I/Q samples in an effort to reduce training times. We theoretically demonstrate the feasibility and efficiency of our methodology and derive a convergence bound that incurs lower loss and thus higher accuracies in the same training round in comparison to single-modal FL-based RF fingerprinting. Extensive empirical evaluations validate our theoretical results and demonstrate the superiority of FLAME in comparison to multiple considered baselines.
Model-free and reinforcement learning-based adaptive filtering methods are gaining traction for denoising in dynamic, non-stationary environments such as wireless signal channels, biomedical monitoring, and sensor networks. Traditional filters such as LMS, RLS, Wiener, and Kalman are often limited by assumptions of stationarity, the need for exact noise statistics, or fragile parameter tuning. This paper proposes an adaptive filtering framework using Proximal Policy Optimization (PPO), guided by a composite reward that balances SNR improvement, MSE reduction, and residual smoothness. We frame adaptive filtering as a Markov decision process and train a PPO agent to adjust filter coefficients directly in response to changing noise. Experiments on synthetic nonstationary signals with diverse noise types show that the PPO agent generalizes beyond its training distribution. Moreover, real-world analysis is made and evaluated on ECG recordings from the MIT-BIH Noise Stress Test Database corrupted by baseline wander, electrode motion, and muscle artifacts. The learned PPO policy achieves real-time inference and slightly outperforms strong classical baselines on ECG denoising. These results demonstrate the viability of policy-gradient reinforcement learning as a computationally efficient and flexible tool for adaptive filtering in nonlinear, time-varying dynamical systems.
We propose data-driven decentralized control algorithms for stabilizing interconnected discrete-time linear time-invariant systems. We first derive a data-driven condition to synthesize a local controller that ensures the dissipativity of the local subsystems. Then, we propose data-driven decentralized stability conditions for the global system based on the dissipativity of each local system. Since both conditions take the form of linear matrix inequalities and are based on dissipativity theory, this yields a unified pipeline, resulting in a data-driven decentralized control algorithm. As a special case, we also consider stabilizing systems interconnected through diffusive coupling and propose a control algorithm. We validate the effectiveness and the scalability of the proposed control algorithms in numerical examples in the context of microgrids.
Accurate channel estimation is essential for both high-rate communication and high-precision sensing in 6G wireless systems. However, a major performance limitation arises from calibration mismatches when operating phased-array antennas under real-world conditions. To address this issue, we propose to integrate antenna element self-calibration into a variational sparse Bayesian learning (VSBL) algorithm for parametric channel estimation. We model antenna gain and phase deviations as latent variables and derive explicit update equations to jointly infer these calibration parameters and the channel parameters; the number of multipath components (MPCs) along with their complex amplitudes, delays, and angles-of-arrival (AoA), as well as the noise variance. We assess its performance in terms of the optimal subpattern-assignment (OSPA) metric, demonstrating consistent improvements over conventional VSBL without calibration. Furthermore, we show that integrating the estimation of the calibration parameters into the VSBL algorithm actually increases convergence speed, since a missing or wrong calibration results in the additional estimation of spurious components.
Advances in miniaturised implantable neural electronics have paved the way for therapeutic brain-computer interfaces with clinical potential for movement disorders, epilepsy, and broader neurological applications. This paper presents a mixed-signal analogue front end (AFE) designed to record simultaneously both extracellular action potentials (EAPs) and local field potentials (LFPs). The feedforward path integrates a low-noise amplifier (LNA) and a successive-approximation-register (SAR) analogue-to-digital converter (ADC), while the feedback path employs a fixed-point infinite-impulse-response (IIR) Chebyshev Type II low-pass filter to suppress sub-mHz components via bulk-voltage control of the LNA input differential pair using two R-2R pseudo-resistor digital-to-analogue converters (DACs). The proposed AFE employs a low-power (LP) mode and an offset-cancellation high-performance (HP) mode. The proposed AFE achieves 40.55 dB gain and supports neural recording from 0.1 Hz to 5.705 / 9.66 kHz (LP / HP), with typical input-referred noise of 3.9 / 6.615 uVrms in the LFP band and 11.42 / 11.11 uVrms in the EAP band (LP / HP). Its typical power per channel is 5.44 uW (LP) and 11.35 uW (HP), while it occupies 0.198 mm2.
Autonomous driving requires reliable collision avoidance in dynamic environments. Nonlinear Model Predictive Controllers (NMPCs) are suitable for this task, but struggle in time-critical scenarios requiring high frequency. To meet this demand, optimization problems are often simplified via linearization, narrowing the horizon window, or reduced temporal nodes, each compromising accuracy or reliability. This work presents the first general convex obstacle avoidance formulation, enabled by a novel approach to integrating logic. This facilitates the incorporation of an obstacle avoidance formulation into convex MPC schemes, enabling a convex optimization framework with substantially improved computational efficiency relative to conventional nonconvex methods. A key property of the formulation is that obstacle avoidance remains effective even when obstacles lie outside the prediction horizon, allowing shorter horizons for real-time deployment. In scenarios where nonconvex formulations are unavoidable, the proposed method meets or exceeds the performance of representative nonconvex alternatives. The method is evaluated in autonomous vehicle applications, where system dynamics are highly nonlinear.
Background: Smartphone-based dermatology requires inter-device colorimetric reliability that holds across calibration regimes, yet quantitative multi-device benchmarks remain scarce. Materials and Methods: We analyzed matched facial images from 965 Korean subjects captured by a digital single-lens reflex (DSLR) camera, a consumer tablet, and a consumer smartphone, and evaluated two calibration methods against the DSLR reference. The methods are standard global linear Color Correction Matrix (CCM) normalization and region-specific CCM trained per anatomical region, both applied in Commission Internationale de l'Eclairage Lab* (CIELAB) space. Results: Linear CCM reduced inter-device color differences by 61-74% and placed both Melanin Index (intraclass correlation coefficient [ICC] = 0.80) and Individual Typology Angle (ITA, ICC = 0.78) in the good reliability band. Region-specific CCM raised both indices into the excellent reliability band (MI ICC = 0.95, ITA ICC = 0.93), with anatomical region exceeding the source device as the largest pre-calibration variance contributor (analysis-of-variance $\eta^2 = 0.18$ versus 0.12). Conclusion: Consumer-device skin colorimetry therefore achieves clinically useful inter-device reliability using standard calibration, with region-aware calibration the largest remaining source of improvement.
In recent years, Text-to-Audio Generation has achieved remarkable progress, offering sound creators powerful tools to transform textual inspirations into vivid audio. However, existing models predominantly operate directly in the acoustic latent space of a Variational Autoencoder (VAE), often leading to suboptimal alignment between generated audio and textual descriptions. In this paper, we introduce SemanticAudio, a novel framework that conducts both audio generation and editing directly in a high-level semantic space. We define this semantic space as a compact representation capturing the global identity and temporal sequence of sound events, distinct from fine-grained acoustic details. SemanticAudio employs a two-stage Flow Matching architecture: the Semantic Planner first generates these compact semantic features to sketch the global semantic layout, and the Acoustic Synthesizer subsequently produces high-fidelity acoustic latents conditioned on this semantic plan. Leveraging this decoupled design, we further introduce a training-free text-guided editing mechanism that enables precise attribute-level modifications on general audio without retraining. Specifically, this is achieved by steering the semantic generation trajectory via the difference of velocity fields derived from source and target text prompts. Extensive experiments demonstrate that SemanticAudio surpasses existing mainstream approaches in semantic alignment. Demo available at: this https URL
In this paper, we propose a projection-free power-limiting droop control for grid-connected power electronics and an associated constrained flow problem. In contrast to projection-based power-limiting droop control, the novel projection-free power-limiting droop control results in networked dynamics that are semi-globally exponentially stable with respect to the set of optimizers of the constrained flow problem. Under a change to edge coordinates, the overall networked dynamics arising from projection-free power-limiting droop control coincide with the projection-free primal-dual dynamics associated with an augmented Lagrangian of the constrained flow problem. Leveraging this result, we (i) provide a bound on the convergence rate of the projection-free networked dynamics, (ii) propose a tuning method for controller parameters to improve the bound on the convergence rate, and (iii) analyze the relationship of the bound on the convergence rate and connectivity of the network. Finally, the analytical results are illustrated using an Electromagnetic transient (EMT) simulation.
We analyze signal recovery when samples are taken concomitantly from a signal and its Fourier transform. This two-sided sampling framework extends classical one-sided reconstruction and is particularly useful when measurements in either domain alone are insufficient because of sensing, storage, or bandwidth constraints. We formulate the resulting recovery problem in finite-dimensional spaces and reproducing kernel Hilbert spaces, and illustrate the infinite-dimensional setting in a Fourier-symmetric Sobolev space. Numerical experiments with sinc- and Hermite-based schemes indicate that, under a fixed sampling budget, two-sided sampling often yields better conditioned systems than one-sided approaches. A simplified spectrum-monitoring example further demonstrates improved reconstruction when limited time samples are supplemented with frequency-domain information.
Transcranial ultrasound stimulation (TUS) offers non-invasive deep-brain neuromodulation with high spatial precision, but reliably generating complex multi-target acoustic fields through the skull remains challenging. Here, we introduce a physics-aware hologram technique that directly generates fabrication-ready holographic implementations while preserving consistency between numerical field synthesis and physical acoustic realization. The method enables single-, dual-, and tri-focal transcranial stimulation patterns and was validated through in silico simulations, ex vivo skull measurements, and in vivo experiments. Compared with representative state-of-the-art methods, the proposed approach improved focal reconstruction and energy confinement at intended targets while reducing off-target acoustic leakage. In a neuropathic-pain mouse model, simultaneous bilateral stimulation of thalamic nuclei reduced c-Fos expression and showed preliminary improvements in pain-related behavioral responses. These findings support the use of fabrication-consistent holographic design for spatially localized and reproducible multi-target transcranial neuromodulation.
Realistic modeling of scattering from curved metallic bodies - such as vehicles and roadside structures - is essential for cellular and vehicular channel modeling as well as radar applications. A practical approach is to approximate curved surfaces with planar facets and apply ray-tracing with diffraction methods; however, accuracy depends critically on both geometric discretization and diffraction modeling. This work investigates ray-tracing-based modeling of near-field scattering from curved bodies, both in the backscattering and in the forward (shadow) region; in the ray-tracing tool, diffraction is modeled according to the Uniform Theory of Diffraction (UTD), extended with vertex diffraction and double-bounce interactions, including a heuristic combination of edge and vertex diffraction. A discretization strategy linking facet size to local curvature and wavelength is proposed to balance geometric fidelity, diffraction modeling, and efficiency. Validation is initially performed against analytical solutions and full-wave simulations for canonical geometries (sphere and circular cylinder). Furthermore, the practical applicability of the approach is demonstrated for a realistic vehicle by comparison with bistatic measurements in the backscattering region and full-wave simulation in the shadow region. The results demonstrate that no universal discretization strategy exists: fine meshes are beneficial for accurate backscattering prediction, while coarser discretizations can provide more efficient and accurate shadow region prediction. The proposed extended diffraction framework provides a computationally efficient framework for vehicular propagation and integrated sensing and communication (ISAC) channel modeling.
Large, spatially flexible electricity consumers such as data centers can reallocate demand across locations, influencing dispatch and prices in wholesale electricity markets. While flexible load is often assumed to improve system efficiency, this intuition typically relies on price-taking behavior. We study price-anticipatory spatial load shifting by modeling a large flexible consumer as a Stackelberg leader interacting with DC optimal power flow (DC-OPF) based market clearing. We show that decentralized, cost-minimizing load shifting need not align with system operating cost minimization, and that misalignment arises at boundaries between DC-OPF operating regimes, where small changes in load can induce discrete changes in marginal generators or congestion patterns. We evaluate strategic load shifting on the 73-bus RTS-GMLC test system, where findings indicate reductions in system operating cost in most hours, but misalignment in a subset of cases that are driven by redispatch at merit-order discontinuities. We find that these outcomes are primarily redistributive relative to a price-taking benchmark, reducing generator profits while lowering electricity procurement costs for both flexible and inflexible consumers, even in cases where total system operating costs increase.
Quasi-bimodal objects, such as text, road signs, and barcodes, play a basic yet vital role in daily visual communication. By boiling these down to clear silhouettes, binarization uses a minimal language to convey essential vision cues for maximum downstream efficiency, especially for tasks that require simple geometric, topological reasoning rather than heavy appearance modeling. The catch is that frame-based imaging often struggles on mobile platforms like drones, self-driving cars, and underwater vehicles, in which rapid motion causes severe motion blur and harsh lighting washes out scene details. To overcome these physical limits, neuromorphic vision via event cameras, featuring microsecond time resolution and high dynamic range, steps in as a natural solution. Building upon this event-driven paradigm, we propose a simple yet effective dual-modal approach that harnesses the synergy between frames and events for training-free, real-time, high-frame-rate binarization on CPU-only devices. Extensive evaluations show that it earns competitive performance against leading techniques in reducing blur artifacts and delivers impressive improvements under challenging illumination at a lower computational cost. Besides, its asynchronous nature bypasses long-standing event-scarcity issues that break traditional time-binning reconstruction at fixed time slots, maintaining clear target shapes even at extreme kilohertz frame rates. Its binary results further serve as reliable representations to facilitate a range of downstream tasks. This work paves the way towards lightweight perception and interaction in embodied intelligence on resource-constrained edge platforms.
Recent speech-aware large language models (Speech-LLMs) rely on pre-trained speech encoders to convert audio into semantic/acoustic rich representations consumable by LLM. In this work, instead, we explore: can an LLM learn to read Mel spectrogram directly without a dedicated speech encoder? We propose Mel-LLM, an encoder-free Speech-LLM that feeds lightly pre-processed Mel-spectrogram patches directly into the LLM through a linear projection, allowing the LLM to learn speech-text alignment purely through its own parameters. We focus on speech understanding tasks, including automatic speech recognition (ASR), spoken QA and audio understanding. For ASR, we evaluate on the OpenASR Leaderboard public sets and production-level scaling experiments, demonstrating that the encoder-free solution achieves competitive performance with only limited degradation compared to encoder-initialized counterparts. We find that when data is limited, initialization from a multimodal checkpoint (Phi-4-MM) is crucial for maintaining performance. We also present ablation studies suggesting which LLM layers are most involved in speech adaptation. Beyond ASR, we extend Mel-LLM with general speech/audio understanding tasks, revealing an acoustic-semantic trade-off: directly exposing the LLM to Mel-spectrogram input improves paralinguistic and non-ASR acoustic tasks, while knowledge-intensive spoken QA remains more challenging than encoder-anchored systems. We additionally include a text-to-speech (TTS) proof-of-concept with a next-token VAE decoder, showing that direct Mel generation is possible but still trails stronger latent-diffusion generation.
A key challenge of speaker de-identification is the balance between privacy and utility. Many utility variables, such as the cognitive health status of the speaker, are correlated with the privacy variable, such as the speaker identity, violating the independence assumption held by the disentanglement-based approaches, causing leakage of private information and the loss of useful information for downstream tasks. To tackle this challenge, we propose a general framework, DDPO-VC, for speaker de-identification through reinforcement learning-based post-training with diffusion models. Learning from reward signals combining knowledge from privacy-focused and utility-focused teachers, our method outperforms various strong \deid/ methods in both privacy preservation and cognitive utility on two commonly used dementia speech benchmarks. Please check out our code\footnote{\href{this https URL}{this https URL}} and demo\footnote{\href{this https URL}{this https URL}}.
This paper introduces a low-complexity technique named quasi-direct geolocation (QDG) to perform passive radio-frequency (RF) geolocation of emitters directly in the position domain, akin to direct geolocation (DG) and direct position determination (DPD). The proposed technique drastically reduces the complexity of DG/DPD and is experimentally demonstrated in geolocating a terrestrial jammer at Jammertest 2025 from a repurposed satellite in low Earth orbit (LEO): OPS-SAT PRETTY. The goal of QDG is to enable satellites to contribute to a multi-constellation system for RF interference (RFI) monitoring as opportunistic spectrum sensors in global navigation satellite systems (GNSS) bands, even if these are constrained by low size, weight, and power (SWaP). They can serve as data collectors and/or edge computers. In the former case, QDG can be used to compress large volumes of I/Q samples into minimal signal information, which can be relayed to ground for post-processing via low-capacity downlinks. In the latter case, QDG can be used to compute the geolocation of RFI sources in orbit on low-power on-board computers (OBC). The drawback of these capabilities is lower sensitivity and accuracy than DG/DPD plus limitations on the types of signal sources that can be geolocated, which, nonetheless, include the most common GNSS jammers.
Non-invasive prediction of glioma molecular status from routine magnetic resonance imaging (MRI) has shown promising performance, but model generalization remains challenging given small-scale matched imaging-genomic datasets. Foundation models may address this bottleneck, but a comprehensive benchmark is needed to establish the impact of diverse architectures, pre-training domains, and objectives. Given the use case of isocitrate dehydrogenase (IDH) mutation prediction from FLAIR and post-contrast T1 MRIs, we compared four image-based foundation models, BrainIAC, MRI-CORE, BiomedCLIP, and BrainDINO, against radiomics-based TabPFN and logistic regression baselines. Prediction performance and calibration were assessed across four public adult glioma cohorts and an external post-treatment cohort. Within-cohort, TabPFN matched or outperformed all visual encoders, achieving 0.92 (0.03) AUROC and 0.74 (0.17) AUPRC (mean (SD) across all datasets). Among visual encoders, BiomedCLIP performed best (0.85 (0.08) AUROC), with BrainDINO competitive (0.82 (0.09) AUROC), while MRI-specific encoders (BrainIAC, MRI-CORE) consistently underperformed. Cross-cohort transfer showed moderate AUROC degradation but stronger AUPRC sensitivity to prevalence shifts. On the external cohort, BiomedCLIP achieved the highest AUROC (0.74 (0.07)), whereas TabPFN provided superior calibration (Expected Calibration Error 0.07 (0.01)). These results indicate that representation modality and evaluation context critically influence foundation-model performance in MRI-based molecular prediction. Tabular foundation models on radiomic features provide a strong, well-calibrated baseline, while image foundation models may offer complementary value under clinically distinct distribution shifts. Code available at this https URL
This paper investigates dynamic scheduling for flexible manufacturing systems (FMSs) subject to dynamic events, such as new order arrivals, temporary order cancellations, and machine failures. Traditional methods often face significant challenges in achieving real-time responsiveness under such conditions. To address this issue, the scheduling problem is formulated as a Markov decision process (MDP) with timed Petri nets, where the future evolution of the system depends exclusively on the current marking and the subsequently executed transitions, independent of historical trajectories. The state space and action space of the MDP are constructed using the notion of basis reachability graph (a compact state space representation) of Petri nets to alleviate the state explosion problem, thereby accelerating model training convergence. Meanwhile, a hierarchical dense reward function is constructed by integrating stepwise guidance with terminal evaluation. Then, a multi-agent proximal policy optimization algorithm is employed for model training under the centralized training and decentralized execution paradigm to improve scheduling efficiency. Numerical experiments are conducted involving typical dynamic events, and the results demonstrate that the proposed method can effectively handle dynamic events and achieve superior scheduling performance compared with conventional approaches.
Multimodal large language models (MLLMs) have emerged as a promising approach for improving the accuracy, transferability, and explainability of automatic dementia classification (ADC) systems from voice recordings. Yet it remains unclear whether their reasoning capabilities are beneficial for ADC, and how such capabilities should be leveraged. In this paper, we conduct a careful evaluation of reasoning MLLMs for ADC and show that naive strategies, such as relying on text-based rationales, can lead to hallucinated and inconsistent rationales for diagnosis and yield inferior ADC performance compared with LLM-free baselines. To overcome this limitation, we propose \textbf{De}mentia \textbf{T}hinker with Nonlinear \textbf{A}daptor and Re\textbf{i}nforcement \textbf{L}earning (DeTAiL), an adaptor-based framework that exploits the internal representations of reasoning MLLMs for improved dementia classification. Across two dementia datasets with distinct test formats and label granularities, DeTAiL consistently outperforms strong baselines and methods that rely on text-based rationales. Code and demo will be released upon acceptance.
Mental health problems such as stress, anxiety, and depression affect millions of people worldwide. These conditions are usually assessed using questionnaires, which rely on how people describe their own feelings. In this study, we explore whether a wearable device can help measure mental health using physical signals from the body. The device records small changes in blood flow and tissue activity from the fingertip. We collected data from 132 adults across 19 countries and compared these signals with mental health questionnaire results. We found that patterns in blood flow and tissue activity are linked to stress-related symptoms. This approach may help develop new tools for simple, non-invasive mental health monitoring in everyday life. Code and datasets are publicly available: this https URL
Text-to-speech (TTS) for Modern Hebrew is challenged by the language's orthographic complexity, with existing solutions ignoring underspecified phonetic features such as stress. We present a framework for more phonetically accurate Hebrew TTS with four contributions: (1) Phonikud, an open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified International Phonetic Alphabet (IPA) transcriptions, designed by augmenting a base diacritizer. (2) The ILSpeech corpus of paired Hebrew audio, text, and expert IPA annotations. (3) A benchmark for the previously unmeasured task of Hebrew G2P conversion. (4) Hebrew audio-to-IPA models capturing previously disregarded phonetic details for automatic TTS evaluation. Our results show that Phonikud more accurately predicts Hebrew phonemes than prior methods, and that small, local TTS models with phonetic input from Phonikud approach large proprietary systems. We release our code, data, and models at this https URL.
Cardiovascular diseases (CVDs) are the leading cause of death worldwide, accounting for approximately 17.9 million deaths each year. Early detection is critical, creating a demand for accurate and inexpensive pre-screening methods. Deep learning has recently been applied to classify abnormal heart sounds indicative of CVDs using synchronised phonocardiogram (PCG) and electrocardiogram (ECG) signals, as well as multichannel PCG (mPCG). However, state-of-the-art architectures remain underutilised due to the limited availability of synchronised and multichannel datasets. Augmented datasets and pre-trained models provide a pathway to overcome these limitations, enabling transformer-based architectures to be trained effectively. This work combines traditional signal processing with denoising diffusion models, WaveGrad and DiffWave, to create an augmented dataset to fine-tune a Wav2Vec 2.0-based classifier on multimodal and multichannel heart sound datasets. The approach achieves state-of-the-art performance. On the Computing in Cardiology (CinC) 2016 dataset of single channel PCG, accuracy, unweighted average recall (UAR), sensitivity, specificity and Matthew's correlation coefficient (MCC) reach 92.48%, 93.05%, 93.63%, 92.48%, 94.93% and 0.8283, respectively. Using the synchronised PCG and ECG signals of the training-a dataset from CinC, 93.14%, 92.21%, 94.35%, 90.10%, 95.12% and 0.8380 are achieved for accuracy, UAR, sensitivity, specificity and MCC, respectively. Using a wearable vest dataset consisting of mPCG data, the model achieves 77.13% accuracy, 74.25% UAR, 86.47% sensitivity, 62.04% specificity, and 0.5082 MCC. These results demonstrate the effectiveness of transformer-based models for CVD detection when supported by augmented datasets, highlighting their potential to advance multimodal and multichannel heart sound classification.
Earth s gravity fundamentally shapes human behaviour. The brain encodes this force as an internal model of gravity, enabling the prediction and interpretation of gravitational effects during perception and action. Understanding how this model adapts to altered gravity is critical for predicting human performance in spaceflight. We present a computational framework for modelling neurophysiological adaptation across diverse gravitational environments. The framework has two components trained on open-access data from altered-gravity studies, particularly parabolic flights. The first component (CorticalG) employs a lightweight multilayer perceptron neural network to predict gravity-dependent changes in EEG frequency bands, estimating cortical state under different gravitational loads. The second component (PhysioG) uses independent Gaussian process models to capture broader physiological responses, including heart rate variability, electrodermal activity, and motor control. To complement the quantitative modelling, we simulated subjective experience across gravitational environments using the Large Language Model (LLM) Claude 3.5 Sonnet. Physiological outputs prompted the model to generate narratives describing alertness, bodily awareness, and cognitive state across zero gravity, partial gravity of the Moon and Mars, and hypergravity. This framework provides a novel approach for investigating human adaptation to spaceflight. It offers a predictive tool to assess performance and resilience, supporting the design of future space exploration missions.
To study power system resilience with real data, it is necessary to group individual power outages recorded by utilities into events in which outages cluster and overlap due to extreme weather. We show how to automatically group utility outage data into resilience events based on their time and location. Each outage is represented as a cylinder in three-dimensional space, with a disk centered at the outage location in the geographic plane and a vertical extent corresponding to a limited outage duration, so that two outages overlap in time and space if their cylinders intersect. The grouping algorithm can be implemented as a graph whose nodes are the outages and whose edges represent the overlaps of outages in time and space, so that events are the connected components of the graph. Extending time-based grouping to both time and location is particularly useful when extracting events from outage data collected across a wide area, as it prevents unrelated outages from being incorrectly merged into anomalous events solely due to temporal overlap. We propose a metric to tune the parameters of the grouping algorithm to minimize anomalous events. The grouping of outages into events works with both detailed utility outage data and web-scraped EAGLE-I outage data. Results are validated against NOAA storm event records and DOE-417 reports. The automatically extracted events from utility data closely match documented major weather events.
General-purpose object detectors face fundamental structural limitations when applied to ship detection in satellite imagery, where the ship scale distribution is concentrated at small sizes and high aspect ratios. In conventional You Only Look Once architectures, the deepest feature pyramid level (stride 32) compresses narrow vessels into sub-pixel representations, causing severe spatial feature dilution and compromising accurate ship boundary regression. We propose Less is More YOLO, a streamlined detector built upon the extra-large variant of YOLOv9, to address these domain-specific structural conflicts. From a statistical analysis of ship scale distributions across four major benchmarks (SODA-A, DOTA-v1.5, FAIR1M-v2.0, and ShipRSImageNet), we introduce a Pyramid Level Shift Strategy that shifts the detection head from strides 8, 16, and 32 to strides 4, 8, and 16. This shift satisfies a spatial representability condition derived from the Nyquist-Shannon principle for the narrowest targets, while eliminating the computational redundancy of the deepest pyramid level. To further stabilize training on high-resolution satellite inputs, we incorporate a group-normalized composite-backbone projection module, mitigating gradient instability in memory-constrained micro-batch regimes. Validated on these four datasets, our detector attains an mAP50:95 of 0.600 with only 21.16 million parameters, a 64.1% reduction from the extra-large YOLOv9 baseline (58.99 million). Despite this compact size, our model surpasses state-of-the-art detectors up to three times larger, validating that a well-targeted pyramid level shift achieves a "Less is More" balance between accuracy and efficiency. The code is available at this https URL.
Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in generalizing across diverse robotic manipulation tasks. However, deploying these models in unstructured environments remains challenging due to the critical need for simultaneous task compliance and safety assurance, particularly in preventing potential collisions during physical interactions. In this work, we introduce a Vision-Language-Safe Action (VLSA) architecture, named AEGIS, which contains a plug-and-play safety constraint (SC) layer formulated via control barrier functions. AEGIS integrates directly with existing VLA models to improve safety with theoretical guarantees, while maintaining their original instruction-following performance. To evaluate the efficacy of our architecture, we construct a comprehensive safety-critical benchmark SafeLIBERO, spanning distinct manipulation scenarios characterized by varying degrees of spatial complexity and obstacle intervention. Extensive experiments demonstrate the superiority of our method over state-of-the-art baselines. Notably, AEGIS achieves over 50% improvement in obstacle avoidance rate while substantially increasing the task success rate by nearly 10%. All benchmark datasets, code, and supplementary materials are publicly available at this https URL.
Variational Quantum Algorithms (VQAs) are a class of hybrid quantum-classical algorithms that leverage on classical optimization tools to find the optimal parameters for a parameterized quantum circuit. One relevant application of VQAs is the Variational Quantum Eigensolver (VQE), which aims at steering the output of the quantum circuit to the ground state of a certain Hamiltonian. Recent works have provided global convergence guarantees for VQEs under suitable local surjectivity and smoothness hypotheses, but little has been done in characterizing convergence of these algorithms when the underlying quantum circuit is affected by noise. In this work, we derive an upper bound on the error on the optimal parameters of a VQE under the effect of different coherent and incoherent noise processes. We then procced to show robust convergence guarantees of the algorithm to the perturbed optimal parameters. Our work provides novel theoretical insight into the behavior of VQAs subject to noise. Furthermore, we accompany our results with numerical simulations implemented via Pennylane.
We address the problem of reactive motion planning for quadrotors operating in unknown environments with dynamic obstacles. Our approach leverages a 4-dimensional spatio-temporal planner, integrated with vision-based Safe Flight Corridor (SFC) generation and trajectory optimization. Unlike prior methods that rely on map fusion, our framework is mapless, enabling collision avoidance directly from perception while reducing computational overhead. Dynamic obstacles are detected and tracked using a vision-based object segmentation and tracking pipeline, allowing robust classification of static versus dynamic elements in the scene. To further enhance robustness, we introduce a backup planning module that reactively avoids dynamic obstacles when no direct path to the goal is available, mitigating the risk of collisions during deadlock situations. We validate our method extensively in both simulation and real-world hardware experiments, and benchmark it against state-of-the-art approaches, showing significant advantages for reactive UAV navigation in dynamic, unknown environments.
In Formula 1, race strategies are adapted according to evolving race conditions and competitors' actions. This paper proposes a reinforcement learning approach for multi-agent race strategy optimization. Agents learn to balance energy management, tire degradation, aerodynamic interaction, and pit-stop decisions. Building on a pre-trained single-agent policy, we introduce an interaction module that accounts for the behavior of competitors. The combination of the interaction module and a self-play training scheme generates competitive policies, and agents are ranked based on their relative performance. Results show that the agents adapt pit timing, tire selection, and energy allocation in response to opponents, achieving robust and consistent race performance. Because the framework relies only on information available during real races, it can support race strategists' decisions before and during races.
Accurate inter-vehicle distance estimation is a cornerstone of Advanced Driver Assistance Systems (ADAS) and autonomous driving. While LiDAR and radar provide high precision, their high cost prohibits widespread adoption in mass-market vehicles. Monocular camera-based estimation offers a low-cost alternative but suffers from fundamental scale ambiguity. Recent deep learning methods for monocular depth achieve impressive results yet require expensive supervised training, suffer from domain shift, and produce predictions that are difficult to certify for safety-critical deployment. This paper presents a framework that exploits the standardized typography of United States license plates as passive fiducial markers for metric ranging, resolving scale ambiguity through explicit geometric priors without any training data or active illumination. First, a four-method parallel plate detector achieves robust plate reading across the full automotive lighting range. Second, a three-stage state identification engine fusing optical character recognition text matching, multi-design color scoring, and a lightweight neural network classifier provides robust identification across all ambient conditions. Third, hybrid depth fusion with inverse-variance weighting and online scale alignment, combined with a one-dimensional constant-velocity Kalman filter, delivers smoothed distance, relative velocity, and time-to-collision for collision warning. Baseline validation on a controlled static dataset reproduces a 2.3% coefficient of variation in character height measurements and a 36% reduction in distance-estimate variance compared with plate-width methods from prior work.
Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes in the design of network entities, interfaces, and procedures. The adoption of agentic AI in next-generation networks is expected to enhance network intelligence and autonomy through agents capable of planning, reasoning, and acting, while also opening up new business opportunities. Under this vision, existing network functions are expected to evolve into AI-enabled agents and tools that deliver both connectivity and beyond-connectivity services. As an initial attempt to move toward this vision, this paper presents a tool-based interface design and an experimental prototype that are based on agentic AI for the mobile core network, with the Model Context Protocol (MCP) and the Agent2Agent (A2A) protocol as foundational protocols. MCP is selected to design the interface between the agent and network tools, and the A2A protocol is used for message exchange between AI agents. In such an experimental setup, we analyze packet-level message flows between the agents, tools, and network functions and break down the latency of end-to-end operations, starting from the prompt injection until the completion of the input task. This work demonstrates how an AI agent-based core network combined with network-specific tools can be utilized in next generation mobile systems to execute intent-based tasks.
This paper studies the continuous-time dynamics generated by control-theoretic Lagrangian methods for equality-constrained optimization. In particular, we consider dynamics induced by proportional-integral and feedback linearization controllers, which have recently been proposed as alternatives to primal-dual gradient methods. Unlike global convergence results for these dynamics, which rely on strong convexity of the objective function or boundedness assumptions, we exploit the geometric structure induced by the constraints. Specifically, we show global exponential convergence for non-convex problems that satisfy a suitable convexity property when restricted to the constraint manifold.
We study the problem of determining a matrix whose $k$th multiplicative compound, with $k > 1$, is a prescribed matrix $M$. The cardinality of the set of matrices whose $k$th multiplicative compound equals $M$ is characterized in terms of $\rank(M)$. On the one hand, if $\rank(M)\le 1$, it is shown that there exist infinitely many such matrices for which a complete characterization is determined. On the other hand, if $\rank(M)>1$, then there exists a unique matrix -- up to an overall sign -- whose compound is $M$. An algorithm for finding a matrix whose compound equals $M$ is detailed, and its time complexity is analyzed.
Generative models have shown promising results for speech enhancement (SE), but they often rely on multi-step inference, limiting low-latency deployment. We propose SB-RF, a one-step generative framework that integrates Rectified Flow (RF) with Schrödinger Bridge (SB) theory. During training, SB-RF samples intermediate states from an SB time marginal and trains a conditional velocity field with the RF velocity-matching objective. At inference, SB-RF starts from the noisy observation and applies a single Euler update. Experiments show that SB-RF achieves competitive performance among generative methods on the VoiceBank-DEMAND benchmark. To further assess performance beyond this standard setting, we evaluate SB-RF on a simulated low signal-to-noise ratio test set using an expanded training dataset. Under these conditions, SB-RF achieves superior performance over the compared baselines, supporting its potential for real-world applications.
Executable evaluation -- checking the consequences of an agent's actions with a program rather than grading its prose -- has become a prominent way to assess tool-using AI agents in software settings. Electric power engineering has not yet had an analogous benchmark: language-model use is still dominated by retrieval and text question answering, while agents acting on power-system artifacts remain mostly academic prototypes. We introduce the Power Systems Agent Benchmark, an executable benchmark for power-engineering agents. An agent receives a structured task and returns a structured solution; a deterministic evaluator recomputes the engineering quantities, checks operational constraints, and returns a feasibility flag, a normalized score, and explicit violations. The benchmark contains 41 task families across eight areas of power engineering, from power flow and protection to stability, microgrids, reliability, power quality, and forecasting. Each task is grounded in a citable source, standard, or documented engineering formulation. To resist contamination, held-out cases are synthesized on demand by per-family generators from private seeds: the construction is inspectable, but the instances remain private. In a reference evaluation with three command-line agents, the strongest score near the compact tier's ceiling, a smaller open model trails, and public and held-out performance are broadly consistent; a separate public-split grid with OpenCode and Aider probes harness effects. The reference evaluation doubles as quality control: unanimous failures flag candidate task or evaluator defects, and it exposed a latent evaluator bug missed by self-consistency checks. The evaluators are compact deterministic surrogates, but the task contract allows their internals to be upgraded to simulator-backed checks without changing how tasks are posed or solved.
We introduce a semi-parametric framework for nonlinear system identification, which decouples discrepancy functions from physics-based components. Orthogonal Gaussian process regression balances sparse parameter selection (the white box) with discrepancy learning (the black box) to produce interpretable models from incomplete physics.