New articles on Electrical Engineering and Systems Science


[1] 2606.17117

Sensing-Native Over-the-Air Federated Learning

Over-the-air federated learning (FL) leverages the superposition property of multiple-access channels to enable communication-efficient distributed model training. Existing integrated sensing, communication, and computation (ISCC)-enabled over-the-air FL systems typically require dedicated resources for the sensing module, inevitably compromising FL performance due to resource competition. In this paper, we propose a sensing-native over-the-air FL framework that explores built-in distributed wireless sensing capability with zero overhead per model aggregation. Specifically, the high-dimensional local gradient signals possessing favorable autocorrelation property are concurrently leveraged for target distance estimation, while the gradient statistics already required for over-the-air FL serve as a ready-made gateway to deliver locally-sensed results to the edge server for cooperative localization. To combat inter-device interference, channel fading, and communication noise, we put forth a robust trilateration-based target positioning method building upon an efficient matched-filtering-based distance estimation. Then, by explicitly characterizing the impact of imperfect model aggregation and noisy gradient-statistics transmission on the sensing-native over-the-air FL convergence, we develop a statistics-aware communication-learning co-design approach. We first derive the closed-form optimal power budgets allocated to local gradients and their statistics, based on which an efficient successive convex approximation method is proposed for receiver beamforming optimization. Simulation results show that the proposed framework simultaneously achieves superior learning and sensing performance compared to representative baselines.


[2] 2606.17198

Precoding Sequence Design for MIMO Sensing with Scatterers Based on Prior Information

The presence of interfering scatterers fundamentally changes the design principle for MIMO sensing. Unlike the target-only case, where MIMO sensing sequence design reduces to optimizing the transmit sample covariance, this paper shows that scatterer-induced signal-dependent interference makes the Bayesian Fisher information depend on the full temporal precoding sequence. Consequently, for the MIMO sensing problem with scatterers using the Bayesian Cramér-Rao lower bound (BCRLB) as the objective, the entire sensing sequence must be designed explicitly, instead of just the precoding matrix. This paper considers such a precoding sequence design problem under hardware constraint for MIMO sensing for estimating the azimuth angles of multiple targets based on the prior information of both the targets and the scatterers. We formulate a worst-case BCRLB minimization across multiple target angles, yielding a max-min fractional program under constant-modulus or constant-norm hardware constraints. We further develop a constant-norm linear transform that converts the ratio objectives into linear forms, leading to an iterative algorithm with closed-form precoder updates. The framework extends to joint precoder-combiner design and multi-stage sensing with adaptive prior refinement. Numerical results demonstrate the effectiveness and the efficiency of the proposed algorithm, revealing sweeping-like beampatterns that illuminate target angular regions while suppressing interference from the scatterers.


[3] 2606.17202

Gauge Freedom Optimization for Truncation Error Reduction in Inertial Navigation

Numerical integration plays a central role in inertial navigation systems, where sensor measurements are propagated through time to obtain orientation, velocity, and position states. The accuracy of this propagation depends on the numerical integrator type, order and step-size. Prior work showed that for second-order systems with known forcing functions, the gauge freedom in the variation of parameters technique can be exploited to reduce truncation error without modifying the integrator. However, this approach requires analytical knowledge of the forcing function, limiting its applicability in real-world systems. To address this limitation we propose the u-space methodology, a novel state mapping that generalizes the gauge freedom to systems with unknown forcing functions. The optimal gauge is derived in closed form for second-order systems and in both closed and empirical form for first-order systems. The proposed approach was evaluated through Monte Carlo simulations across four forcing functions, five sensor grades, and four Adams-Bashforth orders, as well as on a real-world inertial navigation dataset. Results show consistent error reduction across all tested conditions, with the largest gains observed in the full inertial mechanization pipeline, making the approach applicable to high-grade inertial systems, where truncation error constitutes a larger share of the error budget, and to aided low-cost systems with high-rate updates, where propagation spans only short inter-update intervals.


[4] 2606.17217

A Stateful Stochastic Allocation Mechanism with Fairness Guarantees for Networked Electricity Systems

This paper develops and analyses the Fair Play Automatic Market Maker (FP-AMM), a programmable electricity allocation mechanism in which scarcity allocation is treated as a controlled, stateful, and auditable cyber-physical process. Existing mechanisms such as locational marginal pricing are memoryless and cannot account for historical service outcomes, preventing guarantees of equitable treatment across market intervals. The FP-AMM employs a two-stage stochastic clearing rule comprising service-priority sampling and inverse-fairness weighting, coupled with a DC-OPF feasibility set and bounded shortage memory updated through a saturated integrator. Four main results are established. First, the shortage-memory state is invariant in $[0,1]^N$ and the update map is a contraction with rate $1-\beta$. Second, the intra-interval clearing operator converges linearly to a unique fixed point with contraction factor $q\in(0,1)$. Third, under the Fair Play priority rule, the per-node delivery ratio converges almost surely to the contracted target $F^\star$, with a finite-time $O(1/\sqrt{T})$ bound obtained via Lyapunov analysis of the deficit recursion. Fourth, event-triggered execution guarantees practical ultimate boundedness of the allocation tracking error and quantifies the computation-fidelity trade-off. The mechanism is validated on the IEEE 14-, 57-, and 118-bus systems over $T=5000$ market intervals. Fairness convergence to $F^\star$ is achieved on all benchmarks, peak weak-bus fairness error is reduced by 54% on the IEEE-57 network and by up to 55% relative to an equal-weight baseline during scarcity periods, and DC feasibility is maintained throughout.


[5] 2606.17219

A Unified Analytical Nullspace-Based Least-Squares Design of the Farrow Structure

Farrow structures based on linear--phase FIR subfilters provide an efficient realization of variable fractional--delay (VFD) filters with reduced implementation complexity. While the all--linear--phase configuration admits a decoupled least--squares (LS) formulation with an analytical solution, this decoupling fails when branches of mixed types, linear--phase and general FIR, are required, as occurs when a group--delay constraint is imposed. This letter presents a unified LS design for Farrow structures via a nullspace parameterization of the per--branch symmetry constraints, yielding an analytical solution that accommodates arbitrary per--branch types. Numerical results demonstrate that the proposed framework satisfies group--delay constraints that the all linear--phase approach cannot meet, while substantially reducing the number of free parameters relative to the unconstrained general FIR baseline.


[6] 2606.17247

Large-scale Tunable Liquid Lens-assisted VLC Systems under Random Receiver Orientation

This paper investigates the performance of tunable liquid lens (TLL)-assisted receivers in large-scale visible light communication (VLC) systems under random receiver orientation. A simple electrowetting-based TLL architecture is proposed, capable of dynamically steering the incident optical signal toward the photodiode receiver by adjusting the orientation of the liquid interface. The proposed architecture enhances the desired signal reception while mitigating interference from neighboring access points (APs). The spatial distribution of APs is modeled using a Matérn hard-core point process, whereas receiver orientation is characterized by uniformly distributed azimuth angles and Gaussian-distributed polar angles. Furthermore, a tractable mathematical optical channel model is developed to capture the combined effects of AP/receiver locations, receiver orientation, and lens adjustment angles on the VLC channel gain. Based on this framework, three lens orientation strategies, namely best signal reception (BSR), closest LED selection, and vertical upward lens orientation, are proposed to improve system performance under dynamic receiver conditions. Using stochastic geometry tools, exact and approximate analytical expressions for the outage probability are derived for each scheme. Numerical results verify the accuracy of the developed analysis and demonstrate that the proposed TLL-assisted receiver architecture significantly improves the robustness of VLC systems under severe receiver orientation fluctuations and dense AP deployments. In particular, the BSR scheme reduces the outage probability by $57.1\%$ compared with conventional fixed-lens receivers at an AP height of $3.5$ m and AP density of $0.2~\text{m}^{-2}$. The presented analytical framework and numerical results provide useful design insights for the deployment of future TLL-assisted VLC networks.


[7] 2606.17254

Synergizing Zero-Shot Cross-Lingual Alzheimer Detection with Language-Invariant Multimodal Bi-Geometric Adversarial Learning

In this work, we study zero-shot cross-lingual speech-based Alzheimer's disease detection (SADD). We hypothesize that learning language-invariant multimodal representations by fusing multilingual speech and text pretrained models is essential for reliable transfer to unseen languages, as the two modalities capture complementary acoustic and linguistic markers of cognitive impairment while adversarial learning suppresses language-specific confounds. Empirical results in zero-shot cross-lingual evaluation substantiate the hypothesis, showing that multimodal fusion consistently outperforms unimodal baselines. To this end, we propose ORBIT, a novel framework that combines cross-attentive fusion, multi-tap language adversaries, and complementary spherical--hyperbolic geometric learning with consensus clustering. Across settings, ORBIT achieves the strongest performance compared to unimodal models and simple concatenation-based fusion baselines.


[8] 2606.17258

Single frequency filtering based multi-speaker direction of arrival estimation from stereo recordings

Robust direction-of-arrival (DoA) estimation from noisy and reverberant microphone signals remains challenging. Conventional estimators such as generalized cross-correlation (GCC) and its variants operate in the short-time Fourier transform (STFT) domain, where spectral features primarily reflect vocal-tract characteristics. Recent single frequency filtering (SFF)-based estimators instead use a time-frequency representation that provides high spectral resolution of harmonics along with high temporal resolution of excitation-source events, such as epoch-like impulses. Since excitation-source features have been shown to be more robust to noise and reverberation than spectral features, this work proposes an improved SFF-based DoA estimator that correlates the envelopes of SFF outputs across microphone channels using PHAT-weighted GCC. We further provide a comprehensive evaluation of SFF-based and state-of-the-art GCC-based estimators using publicly available real-room recordings under challenging reverberant, multi-speaker, and noise-corrupted conditions. Experimental results show that the proposed method and an existing SFF-based estimator achieve detection and accuracy performance that is superior or comparable to the best GCC-based estimator across all test cases. We also demonstrate that using speech-dominant bins improves GCC-PHAT robustness, motivating future incorporation of such weighting strategies into SFF-based DoA estimation.


[9] 2606.17259

Intelligibility of Speech in Noise: Investigating Contribution of Magnitude and Phase Spectra

It is well known that intelligibility of speech reduces in the presence of ambient noise. However, studies show that all sounds are not affected uniformly (or equally) and that vowels are more robust to noise than consonants. In this study, intelligibility of various consonants is assessed and analyzed in stationary white noise and non-stationary babble noise conditions. Specifically, this study investigates the individual contribution of magnitude and phase spectra of a given speech signal on human speech recognition of consonants in noisy conditions. In this regard, three experiments are carried out. In experiment 1, clean signal, signal reconstructed with only magnitude spectrum information (magnitude only signal) and signal reconstructed with only phase spectrum information (phase only signal) are assessed for intelligibility. In experiment 2, noise is added to clean speech. From noisy speech, phase only signal and magnitude only signal are reconstructed and intelligibility tests are performed for all these three signals. In experiment 3, noise is added directly to the magnitude only and phase only signals reconstructed from clean speech and their intelligibility is assessed. Results of these experiments show that magnitude spectrum contributes more to intelligibility in clean condition than phase spectrum, while information from phase spectrum is more robust in noisy conditions. It is also observed that, among consonants, nasals are more susceptible to noise whereas fricatives and approximants were observed to be comparatively more robust.


[10] 2606.17263

Direction of arrival estimation from distant microphone data using single frequency filtering

In distant microphones, broadband (BB) methods for direction-of-arrival (DoA) estimation are more suitable than narrowband (NB) methods. Due to the aggregation of their optimization function across all frequency bands, BB estimators are robust to spatial aliasing, a known problem in processing distant microphone data. In NB methods, DoA estimation is performed by utilizing \textit{local} information in each frequency band and hence the estimation is affected by spatial aliasing. However, unlike BB methods, NB methods exploit frequency sparsity to estimate the DoAs of \textit{multiple speakers} in a \textit{single time frame}. In this article, a method to improve the robustness of a NB DoA estimator to spatial aliasing is developed. The proposed method is based on cross-correlation of speech-present time-frequency regions obtained by single frequency filtering (SFF) of the microphone signals. The SFF spectrum is chosen because SFF components have regions of high signal-to-noise ratio both in time and frequency and because speech and non-speech discrimination is robust to degradations in the SFF domain. The proposed NB estimator is compared to four state-of-the-art estimators (one NB and three BB) using detection and accuracy metrics on simulated and real-world data in different reverberation and noise conditions. The results show that in all the environments, the SFF-based NB approach outperforms the state-of-the-art NB approach. Furthermore, the performance of the SFF-based approach is better than some of the BB estimators.


[11] 2606.17280

Optimal Powered Descent Guidance with Pyramid-Shaped Approach-Angle Constraints

In this paper, a novel optimal soft-landing guidance law with inequality approach-angle path constraints is analytically derived. The proposed guidance law prevents ground collision and enables approach-angle control by constraining the optimal trajectory to remain within a convex inverted pyramid originating at the landing point. A 3D point-mass linear kinematic model in a constant gravitational field is employed, together with a quadratic control-effort cost and terminal constraints on position and velocity. Analytical open-loop and closed-loop solutions, together with the optimal final time, are derived using Pontryagin's Minimum Principle and the optimality conditions at the transitions between unconstrained and constrained arcs. It is additionally shown that the optimal final time decreases when the path constraints become active. The resulting guidance law is continuous, piecewise linear in time, and nonlinear in the states in closed-loop. When a constraint becomes active, the controller cancels the gravitational component normal to the constraint, causing the trajectory to evolve along the constraint surface. The proposed guidance law is evaluated in simulations under various initial conditions, demonstrating accurate landing performance and consistent satisfaction of the path constraints.


[12] 2606.17292

Robust Direct Data-Driven Hamiltonian for Safe Set Computation under Measurement Noise and Disturbances

Safe set computation is a fundamental challenge in safety-critical control systems, especially in direct data-driven settings where safety analysis is performed directly from noise-affected measurements, without explicit modeling. A recently proposed method, Data-Driven Hamiltonian (DDH), enables reachability analysis directly from measurements, without relying on prior knowledge of the underlying system dynamics. This paper extends the DDH framework to a robust setting that accounts for measurement noise, exogenous disturbances, and sampling-induced state-velocity estimation error. A Robust Data-Driven Hamiltonian (R-DDH) is derived from noisy measurements and shown to yield a certified lower bound on the exact Hamiltonian. This results in a provable under-approximation of the value function and an inner approximation of the associated safe set. The gap between the data-driven and exact Hamiltonians is quantified, and it is shown to converge to zero with more data in a noise-free setting with additive disturbances. The effectiveness of the approach is shown through two case studies: a constrained double integrator and an aircraft taxiing system with a nonlinear closed-loop controller operating under perceptual uncertainty.


[13] 2606.17295

Phenotyping TPF via Self-Supervised Learning: A Label-Agnostic Framework with Expert Validation

The full potential of artificial intelligence in tibial plateau fracture characterisation remains unrealised, constrained by a fundamental dependency on labelled datasets whose consistency cannot be guaranteed: conventional classification schemes such as Schatzker and AO/OTA suffer from inter-observer variability, causing supervised models to learn human disagreement rather than stable fracture morphology. We design, implement, and validate a label-agnostic framework that eliminates this constraint by learning fracture representations directly from imaging data without observer-assigned labels. A RadImageNet-pretrained ResNet-50 encoder is fine-tuned on 154 cleaned knee radiographs using the SimCLR contrastive objective, preceded by a data cleaning protocol and followed by UMAP dimensionality reduction and k-means clustering to discover four imaging-derived phenotypes. Phenotype validity is assessed through a blinded expert review protocol administered to two independent clinicians. The four phenotypes demonstrate robust stability (bootstrap ARI = 0.319 +/- 0.041), strong internal cohesion (silhouette = 0.511), and coherence ratings of 3-5/5 from both reviewers under blinded conditions; one phenotype was unanimously identified as exhibiting comminution -- a high-complexity feature isolated without any supervisory signal. Inter-partition comparison against Schatzker labels yields ARI = 0.013, confirming orthogonality to conventional classification boundaries. Notably, expert reviewers anchored to established classification vocabularies perceived imaging-derived groups as heterogeneous precisely where Schatzker alignment was lowest, suggesting that Schatzker-trained perception and label-agnostic embedding geometry measure orthogonal dimensions. These findings establish label-agnostic SSL phenotyping as a reproducible and clinically interpretable complement to conventional classification.


[14] 2606.17306

Robust Beamforming Design for Secure Uplink NOMA-ISAC

Integrated sensing and communication is an important technology for sixth-generation (6G) mobile networks, enabling the joint use of communication and radar sensing within a unified system. While offering significant benefits in terms of spectral efficiency, ISAC introduces new security challenges. In particular, the joint use of resources for sensing and communication can increase vulnerability to eavesdropping and information leakage. In this paper, we study an uplink Non-Orthogonal Multiple Access (NOMA) system where the base station (BS) simultaneously receives user data and senses a potential eavesdropper (Eve) with uncertain location. To enhance the physical-layer security, a robust sensing signal is designed to both sense and jam Eve. We formulate a joint optimization problem that aims to maximize the users' sum rate and the BS sensing performance while maintaining security against Eve. Since the resulting optimization problem is non-convex, we develop an iterative alternating optimization (AO) algorithm that decomposes it into two tractable subproblems. In the first subproblem, the receive combining vectors are optimized in closed form using generalized eigenvalue decomposition. In the second subproblem, the transmit beamforming matrices and sensing power are jointly optimized via semidefinite relaxation (SDR) and successive convex approximation (SCA). Simulation results demonstrate the effectiveness of our solution in terms of fast convergence and resource allocation.


[15] 2606.17311

Pilot-Aided MIMO Channel Identification and Linear Deconvolution in Correlated Gaussian Noise

This paper presents a pilot-aided study of multiple-input multiple-output (MIMO) channel identification and linear deconvolution under spatially correlated Gaussian noise. A real-valued $4\times4$ baseband model is analyzed for both memoryless and finite-impulse-response channels. The noise process is generated from a Toeplitz covariance matrix, the channel is estimated from pilot symbols through maximum-likelihood/least-squares formulations, and the empirical mean-square error is compared with the Cramer--Rao bound. The estimated channel is then used for data-symbol recovery through maximum-likelihood zero-forcing and linear minimum-mean-square-error deconvolution. The results show that sufficiently long and well-conditioned pilot blocks allow the channel estimator to approach the theoretical lower bound, whereas short training intervals cause rank and conditioning limitations, especially for the four-tap model. The deconvolution experiments further show that MMSE regularization provides a more stable inverse than unregularized zero forcing at low signal-to-noise ratios and for inaccurate channel estimates.


[16] 2606.17314

Line Outage Impact Factor (LOIF): A New Sensitivity Factor for Enhanced Transmission Observability

Transmission failures can lead to cascading failures and system blackout affecting millions of customers if not handled in time, and choosing the best locations to monitor the condition of the transmission system is crucial for power system reliability. In this paper, we propose a new sensitivity factor, the line outage impact factor (LOIF), which is especially useful for power system monitoring and can reveal the impacts of a transmission outage on the power flow of other lines more effectively than existing sensitivity factors, such as the line outage distribution factors (LODF). In this study, we apply the LOIF in transmission line outage detection in three test systems and compare it with LODF using a number of observed transmission line (OTL) selection methods based on these two sensitivity factors. Then we apply a machine learning algorithm to detect the outages of other lines by monitoring the selected OTLs, and the detection accuracy is evaluated using the F1-score. The results show that, in general, with the same number of OTLs, detection using the OTLs selected using LOIF achieved higher F1-scores. The pattern was especially consistent in large-scale systems, showing its potential in real-world applications.


[17] 2606.17325

Backscatter Assisted Indoor NLOS Positioning

Passive backscatter devices (BDs) can enable indoor non-line-of-sight (NLOS) positioning by serving as virtual anchors whose Doppler-separated signatures are observable in standard channel estimates. This paper studies continuous user-equipment (UE) tracking in corridor environments using a noncoherent power-domain formulation that avoids BD phase synchronization and remains robust to residual carrier offsets and strong multipath. The BD-dependent measurements are modeled by a log-distance law with unknown BD-specific offsets, which allows passive asynchronous devices to be used as anchors without transmit-power calibration. Based on this model, we develop a corridor-constrained maximum a posteriori (MAP) tracker with motion regularization and Huber-robust estimation. In ray-tracing-inspired simulations, the method achieves median positioning errors of 0.23--0.27 m with 90th-percentile errors below 0.45 m. In office-corridor measurements with four passive BDs at 866 MHz, it attains an aggregated median error of 0.505 m and outperforms a simple weighted-average baseline. The results show that passive asynchronous BDs can provide practical sub-meter indoor NLOS tracking while remaining compatible with existing channel-estimation pipelines and energy-autonomous BD deployments.


[18] 2606.17332

Self-Calibrated Indoor Tracking from Backscatter Fiducials under NLOS Transmitter Illumination

This paper studies indoor tracking from wall-mounted backscatter fiducials in corridor segments outside direct transmitter illumination. In the measured setup, the transmitter-to-fiducial links are NLOS, whereas the fiducial-to-receiver links along the corridor are largely LOS. The main challenge is that the effective fiducial response is deployment-dependent, so a fixed calibrated link budget is not reliable. We therefore use a grid-based penalized-likelihood tracker that profiles the receiver path, a fitted log-distance slope parameter, and fiducial-specific offsets directly from received powers. The resulting paths can then be reused as surrogate calibration coordinates for residual-map correction, while the same correction with measured calibration coordinates is reported only as a reference. On a short four-fiducial corridor segment, the profiled dual-band tracker gives a 0.52 m median error without measured calibration coordinates, and surrogate residual correction improves this to 0.46 m. With measured calibration coordinates, the same correction and a RADAR-style fingerprint reference both reach 0.31 m. The main remaining limitation is therefore the quality of the surrogate calibration paths rather than the structured observation model itself.


[19] 2606.17333

Communication Modeling of Long-Distance Abscisic Acid Signaling in Plant Vascular Systems

Abscisic acid (ABA) is a central plant hormone for coordinating responses to drought, salinity, cold stress, pathogen attack, wounding, and developmental aging. This paper reviews the biological stimuli that increase ABA biosynthesis, the main production sites and pathways, and the long-distance movement of ABA through plant vascular tissues. It then discusses experimental quantification approaches, including gas-liquid chromatography with electron-capture detection and high-performance liquid chromatography with ultraviolet detection. Finally, the paper presents a molecular-communication-inspired model of ABA transport in which root-side ABA release is represented as a transmitter, the xylem pathway as a bounded channel, and soybean tissue as a receiver. MATLAB Brownian-motion simulations are used to evaluate the effects of released molecule quantity and receiver radius on the detected ABA signal. The results show that higher release quantities produce smoother and stronger reception trends, while larger receivers increase molecule-capture probability.


[20] 2606.17337

From Signals to Patterns: Non-Invasive Tuberculosis Detection from Cough Audio using Bandit Weighted Hyperbolic Prototypes

In this study, we focus on cough-based tuberculosis screening (CBTS) and hypothesize that fusing speech/audio foundation representations with spectral descriptors will yield stronger screening performance. We expect this fusion to reveal complementary strengths: spectral features preserve fine-grained short-time acoustic detail in cough signals, while foundation embeddings capture higher-level temporal and event-level patterns learned from large-scale pretraining. To this end, we propose COBALT, a novel fusion framework based on codebook-aligned hyperbolic prototypes and bandit-style reliability weighting to integrate heterogeneous representations effectively. Using the CODA TB DREAM Challenge benchmark, COBALT consistently outperforms individual representations and a concatenation baseline, achieving the best overall performance when fusing MFCC with PaSST thereby establishing a new state-of-the-art on the benchmark.


[21] 2606.17347

Classifying Transient Regimes in Dynamic Systems through Properties of Spatial Curves and Stochastic Processes: A Data-Driven Approach

This article proposes a novel methodology for the classification of transient and stationary regimes in dynamic systems. Several sensor-based solutions for regime classification in the literature require the setting of several parameters, or are not suitable for scenarios involving multivariate systems that may contain periodic signals. The proposed method introduces a spatial curve representation of the considered system based on its sample mathematical moments. Then, by connecting concepts of stability theory, geometrical properties of spatial curves and stationary stochastic processes, two regime classifiers are designed using the arc length and the curvatures of the proposed curve. Both classifiers are capable of describing and detecting transient regimes, considering behaviors such as: multivariate asymptotically, marginally stability, and cyclostationarity. Furthermore, a quantitative comparison in performance and computation resources of the proposed classifiers against existing classifiers in the literature illustrates that the proposed regime classifier based on the arc length outperforms other techniques in classifying transient regimes for simulated linear, non-linear, and discontinuous multivariate systems under the specified studied conditions.


[22] 2606.17382

Automated Estimation of Equivalent Circuit Model from Impedances with Long Short-Term Memory

Electrochemical Impedance Spectroscopy (EIS) is a widely used, non-destructive technique for characterizing electrochemical systems, and its analysis typically relies on fitting the measured spectra to an Equivalent Circuit Model (ECM). Selecting an appropriate ECM, however, remains a major bottleneck: knowledge-based selection requires expert judgment and is difficult to reproduce, while existing automated approaches either choose from a fixed set of candidate circuits or, in the case of Gene Expression Programming, require repeated equivalent-circuit fitting and a predetermined circuit scale. Here, we propose a machine learning method that estimates an ECM directly from an impedance spectrum by representing the circuit as a serialized string of symbols and generating this string with a Long Short-Term Memory (LSTM) network coupled to a convolutional feature extractor. Because the LSTM inherently handles variable-length sequences, the method produces the circuit topology directly, without any fitting during estimation nor prior assumption for the number of elements. A fourth-root transformation of the impedance is introduced to emphasize the mid-frequency features essential for distinguishing circuits, and an adaptive beam search yields multiple ranked candidates. Evaluated on 100,000 synthetic datasets generated from 119 circuit topologies with 1% added noise on impedances, the method identified the correct topology as the most probable ECM in 77.8% of cases and among the top five candidates in 98.8% of cases, with an average estimation time of 17.8 milliseconds per dataset - several orders of magnitude faster than reported fitting-based approaches. These results indicate that direct topology generation with a neural network is a promising route toward fully automated, expert-independent ECM estimation.


[23] 2606.17404

ELSA: Acoustic Event-Level Semantic Alignment for Fine-Grained Reference-Free Text-to-Audio Evaluation

Text-to-audio (TTA) generation, synthesizing audio from natural language, has been widely studied for its ability to capture precise user intent. To effectively advance TTA models, it is essential to reliably evaluate generated audio without relying on costly human subjective ratings, motivating the development of automatic evaluation metrics that correlate well with human judgments. While recent CLAP-based metrics provide practical reference-free solutions, their coarse-grained text-audio similarity matching often correlates poorly with human ratings. To address this, we propose ELSA, a reference-free evaluation metric for fine-grained text-audio alignment. ELSA decomposes generated audio guided by distinct acoustic events derived from the text query and assesses event-level alignment. Experiments across four TTA benchmarks show that ELSA reveals a higher correlation with human subjective ratings than prior metrics, highlighting its effectiveness for reliable TTA evaluation.


[24] 2606.17420

Feynman Kac Reweighted Schrödinger Bridge Matching for Surface-Based Tau PET Harmonization

Tau PET imaging is central to tracking Alzheimer's disease progression, but systematic differences between scanners, protocols, and radiotracers across sites introduce nonbiological variability that inflates biomarker variance, reduces sensitivity to disease effects, and can bias downstream clinical assessments. Harmonization methods aim to remove these site-induced shifts while preserving biologically meaningful signal, yet existing approaches struggle when source and target cohorts differ in subgroup composition, risking conflation of site effects with biological variation such as tau-positivity status. We propose the Feynman Kac Reweighted Schröodinger Bridge Matching (FKRSBM) model to address this problem. Rather than routing data through a Gaussian noise prior as in diffusion-based methods, FKRSBM learns a direct stochastic transport process between source and target distributions via entropy-regularized optimal transport. To enforce biologically consistent transport, FKRSBM incorporates a subgroup-aware endpoint proposal derived from a Feynman Kac reweighting of the reference bridge measure, implemented entirely through stratified importance sampling at the data level and requiring no changes to the underlying bridge-matching solver or network architecture. For surface-based neuroimaging, FKRSBM employs a spherical convolutional backbone operating on cortical meshes to perform vertex-level harmonization. We evaluate the method on tau PET SUVR maps, harmonizing PI-2620 data from the HABS-HD cohort into the AV-1451 domain of ADNI. Compared against ComBat, CycleGAN, a diffusion-based method (DF), and unregularized Diffusion Schröodinger Bridge Matching (DSBM), FKRSBM achieves superior distributional alignment, reduced tau-positivity sign mismatch, stronger APOE subgroup alignment, and improved downstream disease classification performance.


[25] 2606.17439

Two-Stage IQ Imbalance Estimation and Compensation for AFDM Systems

Affine frequency division multiplexing (AFDM) is an emerging chirp-based multicarrier waveform with strong diversity in doubly selective channels, but practical systems suffer from transmitter and receiver IQ imbalance, causing image interference and performance degradation. This paper proposes a two-stage IQ imbalance estimation and compensation method for AFDM systems. First, a preamble-assisted iterative algorithm estimates the time-invariant IQ imbalance parameters by exploiting their slowly time-varying nature. Then, a joint channel estimation and data detection scheme combines basis expansion model (BEM)-based channel estimation with an improved LMMSE detector for interference suppression. Simulations show rapid convergence and near-ideal BER performance.


[26] 2606.17479

A Miniaturized Dynamic Array for Antenna-Level Physical Layer Security

A compact dynamic omnidirectional array is proposed for antenna-level physical-layer security through directional modulation. Unlike conventional directional-modulation transmitters based on phased-array beam synthesis or multiple RF chains, the proposed architecture uses a single RF input and a switching-controlled four-element printed meander-line monopole array operating at 5.05 GHz. The state-dependent excitation introduces controllable magnitude and phase perturbations in the radiated field, producing angle-dependent constellation distortion and bit error rate behavior. Reliable information recovery is confined to a narrow broadside region in the E-plane, whereas the H-plane remains quasi-static and omnidirectional, providing a full 360-degree information-recoverable region. The antenna is implemented on a single-layer Rogers RO4350B substrate with a compact footprint of 0.57 x 1.11 lambda_0^2. A four-path switching network based on commercial RF components is used for experimental validation. Communication measurements using 16-QAM at 5.05 GHz demonstrate BER-defined E-plane information beamwidths of 30 to 36 degrees for calibrated switching modes under a BER <= 10^-3 criterion, while no bit errors are observed in the measured H-plane and the SNR remains above approximately 33 dB. Feed-phase offsets are also used to steer the BER-defined information-recoverable sector, demonstrating information-beam steering with the same antenna-level switching mechanism. These results show that compact antenna-level directional modulation can provide angularly selective information recovery in one principal plane while preserving omnidirectional coverage in the orthogonal plane.


[27] 2606.17484

Exploiting RIS Optimization Limits for Multi-User Beamforming and Signal Suppression

This paper presents a unified framework for exploiting the boundaries of reconfigurable intelligent surfaces (RIS) joint optimization in multi-user wireless systems, where a single RIS accommodates diverse this http URL first propose an adaptive gradient-scaling mechanism that accelerates the convergence of the underlying optimization algorithm while maintaining stable performance across varying channel and system parameters. The proposed mechanism enables the solver to reach a reasonably good solution rapidly without requiring manual tuning of step sizes or algorithmic hyperparameters when system inputs change. We then propose a low-complexity beamformer recovery method tailored for single-user scenarios, which circumvents the full matrix decomposition required by traditional approaches, thereby significantly reducing computational overhead. Building on these foundations, we develop an element allocation strategy that enables user-specific prioritization through assignment of RIS subsets. This is further extended by a modular add-drop mechanism that supports partial-panel optimization in general multi-user settings. The framework is evaluated across three representative scenarios: (i) signal amplification for all users, (ii) signal suppression for all users, and (iii) selective amplification and suppression. To characterize performance limits, we derive power trade-off boundaries using scalarized joint optimization, which closely align with Monte Carlo simulations. Our unified joint optimization method consistently yield solutions near these boundaries, confirming its near-optimality. Extensive simulations under realistic channel models demonstrate that the proposed approach outperforms conventional semidefinite relaxation techniques, offering a scalable and effective RIS control strategy for cooperative and competitive multi-user environments.


[28] 2606.17504

Two-Stage Fine-Tuning of ResNet50 for High-Sensitivity Melanoma Detection on Dermoscopic Images

Melanoma is the most dangerous form of skin cancer with five-year survival rates exceeding 99% when detected early but falling sharply once the disease spreads. This paper proposes and evaluates a two-stage fine-tuning approach for ResNet50 applied to binary melanoma classification on dermoscopic images. The core challenges addressed are class imbalance and suboptimal transfer learning from single-stage fine-tuning. After stratified train/validation/test splitting, random oversampling was applied exclusively to the training set to achieve a 1:1 class balance. Stage 1 trained only the classification head with the ResNet50 base frozen, while Stage 2 fine-tuned all layers jointly at a low learning rate of 1e-5 to prevent catastrophic forgetting of learned visual features. On an independent test set of 3,826 images, the model achieved an AUC-ROC of 0.9559, accuracy of 88.34%, sensitivity of 87.56%, specificity of 89.13%, and F1-score of 88.29%. An ablation study confirms the two-stage protocol significantly outperforms single-stage fine-tuning, with sensitivity gains of over 4%. Grad-CAM visualizations demonstrate correct lesion localization. A fully deployable Streamlit detection application is provided alongside all training code.


[29] 2606.17509

Data-Driven Stabilizing Controller Design for Linear Infinite Networks

We propose a direct data-driven method for controller synthesis of infinite networks composed of unknown linear time-invariant subsystems. Using a single set of noise-corrupted input-state trajectories collected from each subsystem, and provided that certain linear matrix inequalities hold, each subsystem is rendered exponentially input-to-state stable (eISS) by locally constructing an eISS control Lyapunov function together with an exponentially input-to-state stabilizing feedback controller. We then compose these local components under a compositional small-gain condition in infinite-dimensional spaces to obtain a global control Lyapunov function and an associated stabilizing controller, ensuring uniform global exponential stability of the infinite network. The approach is validated on a physical case study with unknown dynamics.


[30] 2606.17537

Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition

Non-autoregressive (NAR) decoding generates output tokens in parallel, making speech recognition faster than autoregressive decoding, which generates them sequentially from left to right. However, the recognition performance is degraded because NAR decoding cannot resolve uncertainty by conditioning on previously generated tokens. To address this issue, we propose a novel NAR decoding framework based on minimum Bayes' risk (MBR) decoding, termed NAR-MBR decoding, that maximizes the expected utility calculated from samples drawn from the output probability of an NAR model rather than maximizing the output probability. Notably, by leveraging the nature of NAR models, multiple samples are obtained efficiently with a single forward computation. Our experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that our NAR-MBR decoding outperformed previous NAR decoding and ran faster than AR decoding.


[31] 2606.17543

Deep Learning-Empowered Movable-Antenna Position Optimization with Partial CSI

Movable antennas (MAs) are a promising technology to improve wireless data rates by dynamically adjusting their positions to avoid deep fading. However, finding the optimal MA positions requires full channel state information (CSI) for all possible locations within the movement region, creating massive channel estimation overhead. This paper proposes a deep neural network (DNN)-based learning framework to predict the optimal positions of multiple transmit MAs in a multi-user multiple-input single-output (MISO) system, entirely bypassing explicit channel this http URL, we analyze a single-user MISO case, revealing a complex, highly nonlinear mapping between the optimal MA positions and the channel power gains from a specific subset of locations in the transmit region to the user. Because this mapping cannot be mathematically characterized for practical channel models, we train a DNN via supervised learning to capture it. The pre-trained DNN can then determine optimized MA positions in real-time relying only on partial power measurements from the transmit this http URL this to multi-user scenarios is challenging due to complex rate expressions and the lack of globally optimal position solutions to use as training labels. To overcome this, we develop an unsupervised training framework that directly maximizes the multi-user sum-rate. This framework utilizes an attention-based architecture to extract latent features from the partial channel measurements and effectively manage inter-user interference. Simulation results show that our proposed approach achieves near-optimal performance in single-user systems and surpasses conventional CSI-based alternating optimization algorithms in multi-user environments.


[32] 2606.17565

Stability Analysis in Large-scale Centralized Bidirectional Inverter-based Stations Connected to Bulk Power Systems through AC and DC Connections

Massive controlled DC resources (CDCRs), such as battery energy storage systems, are connected to AC power systems through bidirectional inverters for power balance requirements. This study investigates converter-driven stability (CDS) issues in the sub-synchronous frequency range caused by large-scale bidirectional inverter-based stations (IBSs). The impacts of the AC and DC connections of IBSs on subsynchronous oscillations (SSOs) are compared by examining three factors: the number of CDCRs, power flow direction, and control parameters of the inverters. For AC connections, IBSs may induce instability as the number of CDCRs increases, regardless of the power flow direction. To maintain stability, the maximum power amplitude of the IBS is calculated. It is found that switching to DC connections can reduce these instability risks if the DC line resistance is much less than the AC line reactance. Moreover, the method of tuning control parameters is demonstrated to be more effective in improving power-related critical stability under DC connections. Therefore, The DC-IBS is preferred for high-voltage transmission. Finally, the conclusions are validated in power systems connected with both AC- and DC-IBSs under various network topologies and system scales.


[33] 2606.17568

Instability Caused by Integration of IBRs under Strong Grid Connections -- A Practical Case Study on Large-scale Energy Storage Systems

It has been well known that inverter-based resources (IBRs) can lead to converter-driven stability issues under weak grid connections. However, as the number of IBRs increases, instabilities can also occur even under strong grid connections. A practical case is presented to demonstrate this conclusion, using large-scale energy storage systems (ESSs) as an example. In this study, the ESSs induce oscillations with a frequency of 150 Hz in the d-q coordinates while providing both capacitive and inductive reactive power support (achieved by ESS functional control loops) to the connected power system. Theoretical analysis reveals that under strong grid connections, the dynamic interactions among power conversion systems (PCSs) of ESSs can be superimposed and intensified as the ESS scale extends, which reduces oscillation damping and leads to system instability. This indicates that ESS functional control loops also have potential instability risks when providing supports to power systems, which should be carefully examined. Finally, major impact factors are identified to mitigate the oscillations, and the conclusions are validated based on the SIMULINK platform. This paper provides valuable practical insights into system instabilities even under strong grid conditions, emphasizing the importance of functional control design and careful planning of the scale for IBR-dominated systems.


[34] 2606.17570

Fine-UNETR for PSMA PET/CT Lesion Segmentation: Automated Tumor Quantification and Overall Survival Stratification in Prostate Cancer

Introduction: To develop and evaluate Fine-UNETR, a Vision Transformer-based architecture for automated segmentation of PSMA-avid lesions on whole-body PET/CT, and to assess clinical utility of AI-derived tumor burden biomarkers for overall survival stratification in radioligand therapy. Methods: In this retrospective study, 373 PSMA PET/CT scans (mean age, 71+-8 years) from patients with prostate cancer were analyzed. Fine-UNETR, a modified UNETR with 8x8x8 voxel patch embedding and axial sliding window training, was trained on 299 scans and validated on 74 scans. Overall survival stratification was assessed in an independent cohort of 67 pre-radioligand therapy patients using Kaplan-Meier analysis and log-rank testing. External validation was performed on 192 cases from the AutoPET IV PSMA PET/CT dataset. Results: Fine-UNETR achieved a Dice similarity coefficient (DSC) of 66.63%, sensitivity of 70.27%, precision of 67.77%, and a lesion detection rate of 79.53% (96.05% for lesions with SUVmax >= 5). On the external validation dataset, the model achieved a DSC of 44.11% and a lesion detection rate of 87.18%, indicating that lesion detection performance was preserved despite reduced voxel-level overlap. AI-derived biomarkers showed excellent agreement with ground truth (total tumor volume: r=0.984; total lesion uptake: r=0.989; lesion count: r=0.960). In the clinical cohort, total tumor volume (p=0.0019), SUVmax (p=0.014), and SUVmean (p=0.016) significantly stratified overall survival. Conclusion: Fine-UNETR enables accurate automated whole-body PSMA lesion segmentation and tumor burden quantification. Performance on an external dataset demonstrates robustness despite evidence of domain shift. AI-derived biomarkers significantly stratified overall survival in a pre-radioligand therapy cohort, supporting the clinical utility of automated PSMA PET/CT quantification for prognostication.


[35] 2606.17575

Dynamic Analysis of Centralized Energy Storage Systems -- A Comparison between Grid-following and Grid-forming Controls

This study investigates the small-signal stability of centralized energy storage systems (CESSs) using grid-following (GFL) and grid-forming (GFM) controls, particularly focusing on bidirectional power flow and multiple energy storage systems (ESSs). To address the issue of complex dynamics in CESSs when comprehensive GFL and GFM control loops are considered, high-order dynamics are simplified using the virtual damping method by focusing on the dominant oscillation mode. Damping analysis verifies that CESSs using a single-type control (either GFL or GFM) have dynamic superimposition characteristics. Specifically, as ESS number increases, the damping of GFM-CESSs improves but that of GFL-CESSs decreases. The damping sensitivity shows that the damping of GFM-CESSs is more sensitive to bidirectional power flow and all control loops, whereas that of GFL-CESSs is more sensitive to d-axis control loop. Consequently, GFM-CESSs are preferred for large-scale integration but are limited in scenarios with significant power reversal. If GFL and GFM controls are hybridized in CESSs, the ratio of GFM-CESSs should be constrained to avoid instability from modal resonance between GFL-CESSs and GFM-CESSs. This highlights that implementing GFM-CESSs necessitates considering scenario limitations rather than pursuing maximal integration under hybrid integration conditions. The conclusions are validated through modal analysis and time-domain simulations.


[36] 2606.17594

Low-Thrust Orbital Differential Games with Speed Constraint Enforcement Using CostWeighting

This paper considers the problem of a low-thrust spacecraft pursuit-evasion differential game with an arbitrary terminal relative speed constraint. It addresses the terminal phase of the engagement for two relatively close spacecraft near a circular orbit. The problem is formulated as a linear-quadratic zero-sum differential game, with soft constraints on the terminal relative position and velocity, and running costs on the players' control efforts. An analytical, closed-loop, minimum-fuel-consumption optimal guidance law is derived for each player, forming a saddle-point solution. It is proven that any terminal speed can be achieved by properly choosing the weighting parameters of the cost function. To verify the optimality of the solution, a conjugate point analysis is performed when the cost function velocity weighting matrix is either positive or negative definite. The negative-definite case arises at high terminal speeds and is seldom seen in the literature. The performance of the derived guidance law is evaluated in simulations for different target maneuvers and compared to a state-of-the-art optimal-control-based guidance law. The simulations show that the derived guidance law satisfies the constraints and offers a substantial advantage over the optimal-control-based guidance law when the target is optimally evading.


[37] 2606.17641

Toward Quantum-Enhanced ISAC: Active-RIS-Aided Integrated Sensing and Communication with Rydberg Atomic Receivers

In this paper, we investigate an active-RIS (ARIS)-aided integrated sensing and communication (ISAC) system with Rydberg Atomic REceiver (RARE). Leveraging the magnitude-only and real-domain observation structure of RARE, we first derive a unified ISAC model, along with a closed-form Cramer-Rao bound (CRB) for direction-of-arrival (DoA) estimation. Based on this formulation, we propose a joint design of the {base station (BS)} beamforming and ARIS reflection coefficients to minimize the CRB under RARE-specific signal-to-interference-noise-ratio (SINR) and ARIS power constraints. To tackle the resulting highly non-convex problem, we develop an alternating optimization (AO) framework that combines semidefinite relaxation (SDR) for beamforming and a majorization-minimization (MM)-based approach for ARIS design. Numerical results demonstrate that the proposed RARE-aware framework significantly outperforms conventional RF-based designs and achieves performance close to the radar-only benchmark, highlighting the potential of RARE for quantum-enhanced ISAC with ARIS.


[38] 2606.17662

An Analysis of the Effectiveness of Synthetic Speech Data for ASR Fine-tuning in Selected Indic Languages

Synthetic data has the potential to be a valuable resource for training machine learning models, particularly Automatic Speech Recognition (ASR) Systems; however, its effectiveness requires systematic evaluation. In this study, we investigate the impact of incorporating synthetic speech data alongside real-world recordings for three Indic languages: Hindi, Kannada, and Telugu. We analyze the performance gains achieved by augmenting synthetic data with real data and independently examine how ASR performance varies with the sources of scripts used to generate synthetic speech. In addition, we evaluate the effect of synthetic speech generated using different speech synthesis models. Finally, we study the impact of voice cloning in synthetic speech generation on ASR performance, including how performance varies with the number of distinct cloned voices used during data generation.


[39] 2606.17699

Joint Synchronization and Radar Parameter Estimation for Distributed OFDM-ISAC Systems

We propose a novel approach to the synchronization paradigm in distributed ISAC (DISAC) systems in doubly-dispersive (DD) channel environments via a joint synchronization and radar parameter estimation framework. The proposed method exploits the structure of the system model, which can be linearized in order to apply a bivariate Gaussian belief propagation (GaBP) algorithm that jointly estimates the time offset (TO) and carrier frequency offset (CFO) of each base station (BS), as well as the delay and Doppler parameters of the DD channel in conventional orthogonal frequency division multiplexing (OFDM) systems. Simulation results demonstrate the effectiveness of the proposed algorithm, showing that the radar parameter estimates (i.e., range and velocity) and synchronization parameter estimates (i.e., TO and CFO) approach the Cramér Rao lower bound (CRLB) even at low-to-moderate signal-to-noise ratio (SNR) regimes.


[40] 2606.17718

BASIIS: Bistatic Angular Sampling and Interpolation for ISAC Setups

Integrated Sensing and Communications (ISAC) is a defining feature of 6G, extending cellular networks with radar-like sensing at limited additional overhead. In bistatic deployments, sensing requires coordinating the transmitter (TX) and receiver (RX) arrays to scan the Cartesian product of angle of departure and arrival, resulting in a four-dimensional sampling problem in the angular domain. This work establishes a complete angular sampling framework for bistatic ISAC, extending the DFT-based optimal-sampling methodology to the full azimuth and elevation domains of both arrays. We show that the bistatic geometry couples the TX and RX elevation angles, and represent this coupling through the ortho-baseline coarray, a virtual array that captures the joint elevation aperture of the array pair. From the coarray we derive a minimal sampling and interpolation scheme, near-lossless and realizable with any beamforming architecture. Monte Carlo simulations confirm the proposed minimal acquisition essentially equalizes the detection accuracy of dense oversampled imaging while acquiring 3 to 5 times fewer TX-RX direction pairs. This allows having bistatic operations with drastically reduced overhead on the radio resource usage of ISAC systems.


[41] 2606.17737

Deep CSI Feedback for FDD Massive MIMO Systems: A Curvelet Learning Approach

Downlink channel state information (CSI) feedback plays a key role in frequency division duplex (FDD) massive multiple-input multiple-output (mMIMO) systems. The growth of antennas in ultra-massive MIMO increases the difficulty and overhead of CSI feedback, which poses significant challenges for conventional downlink CSI feedback mechanisms. To address the limitations of existing CSI feedback approaches, this paper proposes a novel curvelet learning based framework termed SwinCANet, comprising a frequency-domain information processing module and a denoising module. The frequency-domain information processing module employs curvelet transform to decompose CSI into low-frequency and high-frequency components. Subsequently, Swin Transformer and channel-wise attention block are utilized for extracting the low-frequency and high-frequency representations, respectively, thereby enhancing reconstruction quality. Notably, an additional Swin Transformer facilitates the fusion of multi-scale frequency components, enhancing capabilities across different angular resolutions and spatial directions. Furthermore, we develop a variant (De-SwinCANet), which employs a Sigmoid threshold function to effectively suppress noise coefficients, thereby mitigating various channel impairments and nonlinear distortions. Numerical simulation results demonstrate that the proposed methodology achieves superior performance compared to existing benchmarks while maintaining robust performance under challenging propagation conditions.


[42] 2606.17741

A Wearable Multimodal Ultrasound+Inertial System for Real-Time Virtual Reality Interaction

A-mode ultrasound (US) is a promising sensing modality for Virtual Reality (VR) interaction, as it enables the mapping of muscular activity into control commands while retaining the benefits of wearable sensing. However, existing approaches still face limitations in terms of wearability and interaction complexity, often relying on external hardware such as cameras. In this work, we propose a fully wearable multimodal interface for real-time VR-interaction, based on concurrent US and inertial (accelerometry) sensing from the forearm and upper arm. The system is built on the WULPUS platform and integrates an end-to-end software framework for real-time acquisition, visualization, and communication with a Unity-based VR environment. A multimodal learning pipeline is introduced for concurrent hand pose and forearm position estimation in 2D space. The interface is evaluated through offline and online experiments with five subjects, during the execution of three functional tasks: cylinder grasping (gross motor) and relocation, marble pinching (fine motor) and relocation, and liquid pouring. For offline experiments, we collect 5 acquisition sessions across multiple days, achieving an average inter-session accuracy across subjects of 80$\pm$6\% for hand pose estimation and 77$\pm$7\% for forearm position estimation. Online validation with minimal fine-tuning (5 min) demonstrates success rates of 92.0$\pm$16.0\%, 88.0$\pm$9.8\%, and 96.0$\pm$8.0\% for the three tasks, respectively. With a power consumption of only 19.9~mW, our system enables more than 2.5 days of continuous use on a small 350 mAh LiPo battery without the need for recharge, enabling truly wearable, multimodal, and functionally meaningful VR interaction.


[43] 2606.17801

Joint Direction-of-Arrival and Range Estimation for Millimeter-Wave Uniform Linear Array Radar

An FFT-based direction-of-arrival (DOA) and range-estimation framework for a monostatic uniform linear array (ULA) operating at 77 GHz is presented. A narrowband sinusoidal waveform is used to derive the spatial phase model, determine an aliasing-free inter-element spacing, and select the aperture required to obtain a boresight angular resolution of 2 degree. The resulting design uses an element spacing of 0.97 mm and 58 antenna elements, corresponding to an aperture length of 56.42 mm. Numerical results show accurate angular estimation for a single target at 30 degree and for multiple simultaneous targets. The analysis is further extended to two-dimensional localization by replacing the narrowband waveform with a 1 GHz sinc-modulated signal, which provides an approximate range resolution of 0.15 m. Additional simulations quantify the effects of additive complex Gaussian noise, increased antenna spacing, and target decorrelation on the DOA response.


[44] 2606.17806

PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching in SSL Representation Domain for Speech Enhancement

Flow matching (FM) enables high-fidelity generation, while self-supervised learning (SSL) speech models provide hierarchical representations spanning acoustic and phonetic levels. However, existing FM-based speech enhancement (SE) methods operate primarily in the spectral domain, treating SSL features only as external conditions rather than modeling directly in the SSL latent space. To fully exploit the structural richness of SSL representations, we propose PhASE-Flow, an FM-based SE framework that operates entirely in the SSL space. It models the conditional distribution of clean acoustic representations given phonetic ones, reconstructing the waveform via a neural vocoder. Experiments show that PhASE-Flow outperforms state-of-the-art baselines in perceptual quality and intelligibility. Notably, it achieves competitive performance with only four sampling steps, enabling highly efficient inference. Audio demos are available at this https URL.


[45] 2606.17869

Perceptually-Weighted Video Quality Metric for Asymmetric Encoded Sports Videos

Objective video quality metrics commonly assume uniform spatial attention, an assumption that conflicts with the selective nature of human visual perception, particularly in sports videos. Here, allocating more bits for salient regions through semantic encoding can lead to significant bitrate savings. We present a Perceptually-Weighted Video Quality Metric (PW-VQM), a full-reference metric that accounts for the unequal perceptual importance of spatial regions and therefore targets quality evaluation for asymmetrically encoded content. SSIM maps computed in a multiscale wavelet domain are weighted by differentiating between foreground and background regions. Perceptually salient foreground regions are identified by combining open-vocabulary object detection with optical flow analysis, and are assigned higher weight during quality aggregation. Evaluated on sports video content, PW-VQM achieves a Spearman Rank Order Correlation Coefficient of 0.9511, outperforming established metrics including SSIM, VMAF, FUNQUE, and LPIPS. An ablation study confirms the individual contributions of the components of the perceptual weighting.


[46] 2606.17873

Model-Free Control for Multi-Time Scale Dynamics of Grid-Connected Power Converters

Controller synthesis in power electronics-based systems depends predominantly on the mathematical model of the system, which is a limitation when the actual system is complex and the mathematical model cannot capture all its dynamics. Model-free control addresses this limitation by using an ad-hoc simple model which is compensated by high-rate evaluation of dynamics in terms of their derivatives. However, application of the model-free control strategy to power electronics-based multi-time scale dynamical systems is challenging because of the derivative action needed to implement such control. Grid-connected power converters are examples of such systems, yet experimental validation has not been adequately addressed in the literature. This letter presents the validation of such control including the hardware implementation level. An intelligent proportional-integral (iPI) controller is synthesized and validated on a 16 kW experimental test bench. This proves the benefits of the approach in control of grid-connected power converters, among which their participation in the secondary voltage control.


[47] 2606.17876

Feedforward and Iterative Phase Noise Compensation for Channels with Chromatic Dispersion

Equalization-enhanced phase noise is avoided by applying phase noise compensation (PNC) before chromatic dispersion compensation. Feedforward and iterative PNC algorithms based on expectation propagation are proposed. Both achieve information rates close to channels without phase noise for 100 GBaud 64-QAM and 10,000 km of fiber.


[48] 2606.17879

A 399uW 114.3 dB DR Companding Readout ASIC for MEMS Microphones Employing a Multirate Time-Domain ADC

Improvements in the dynamic range and sensitivity of digital MEMS microphones are essential in applications like advanced noise canceling and voice recognition. A cost effective solution to achieve these goals is the companding ADC architecture. Companding ADCs split the dynamic range in several segments with different quantization noise levels, relaxing power constraints. A common problem of companding microphones are audible artifacts generated when the input signal crosses the boundaries between different amplitude segments. We show in this paper a companding ADC architecture that mitigates the boundary artifacts by leveraging the instantaneous and high-resolution time-domain representation of the input signal in a VCO-based ADC. The use of a multi-rate frequency-to-digital converter allows to decouple quantization noise from the VCO frequency, keeping standard audio sampling rates. Co-optimization of the driver and oscillator circuits enables our VCO-ADC to reach \textgreater 112dBc of peak SFDR without a feedback DAC, keeping a Giga-Ohm input impedance compatible with a capacitive MEMS. We show measurements of a 0.13 $\mu$m ASIC implementing a complete readout circuit for a digital MEMS microphone. This includes two analog channels and the digital signal processing and calibration blocks required to deliver a standard single-bit PDM output. This ADC reaches a dynamic range of 114.3dB with a power budget under 400 uW, a Schreier FoM_{SNDR} of 171.0 dB and a FoM_{DR} of 191.3 dB.


[49] 2606.17893

Condition-Wise Sinkhorn Drifting for One-Shot Learned Channel Simulation

Learned communication systems may evaluate stochastic channel surrogates millions of times inside differentiable training loops, making diffusion-style reverse sampling expensive. This paper proposes condition-wise Sinkhorn drifting, a one-shot channel surrogate that preserves the transmitted symbol and transports only the conditional output laws \(p(y\mid x)\). We formulate a conditional Sinkhorn objective over repeated outputs at the same transmitted symbol and train the generator with finite-sample barycentric velocities followed by detached particle regression. Experiments on additive white Gaussian noise (AWGN), Rayleigh fading, solid-state power amplifier (SSPA) nonlinearity, and a compact tapped-delay-line (TDL) channel compare direct drifting, joint Sinkhorn drifting, condition-wise Sinkhorn drifting, conditional denoising diffusion probabilistic modeling (DDPM), denoising diffusion implicit modeling (DDIM), and Wasserstein generative adversarial network (WGAN) references. Within the evaluated one-shot drifting-family variants, condition-wise Sinkhorn is strongest under conditional diagnostics and symbolic-coding checks, while diffusion remains strongest on the hardest downstream symbol-error-rate (SER) curves. The resulting operating point is a condition-preserving one-shot simulator for settings where repeated channel calls make diffusion-style sampling too costly.


[50] 2606.17900

Time-Slotted Multi-Cluster UAV AirComp with Energy-Awareness: A Pointer Network-Assisted Soft Actor-Critic Learning Framework

Over-the-air computation (AirComp) has emerged as a promising approach for massive data aggregation, which is yet challenged by the channel variations, task distributions, and inherent energy limitation of the computation nodes. In this paper, we propose an unmanned aerial vehicle (UAV)-assisted Aircomp system to serve multi-cluster computation tasks over time, where the UAV mobility-facilitated spatial and time diversity is exploited for efficient and accurate data computation. Specifically, we aim for the minimization of AirComp aggregation error and the energy consumption by jointly optimizing the transceiver beamforming, normalizing factors, sensor scheduling, and UAV trajectory. To solve the formulated problem, we decompose it into two layers where the inner layer addresses the optimization-based AirComp transceiver design, and the outer layer focuses on the deep reinforcement learning (DRL)-based scheduling and trajectory design. In particular, a pointer network actor-critic learning is developed to tackle the binary scheduling problem, and a soft actor-critic DRL algorithm is employed to determine the UAV trajectory. Simulation results validate the convergence of the proposed hierarchical learning framework and demonstrate its significant performance gains in terms of AirComp aggregation error and energy consumption as compared with baseline schemes.


[51] 2606.17903

Constellation Design for Nonlinear Unified SWIPT Receiver Channels with Memory

Unified receivers (URs) have emerged as a promising architecture for simultaneous wireless information and power transfer (SWIPT), since a common rectifying front-end enables information decoding (ID) and energy harvesting (EH) from the same rectified output. However, rectification is nonlinear due to the diode, while the capacitor introduces memory across symbols, making constellation design over the channel challenging. In this paper, we study constellation design for nonlinear UR-SWIPT channels in both memoryless and memory regimes. First, we propose a tractable unified rectification model that captures both (i) the nonlinear steady-state mapping and (ii) the asymmetric capacitor charging/discharging dynamics under transient operation. To isolate the impact of rectification with memory on ID, we study the information-based design. In this setting, we develop a state-adaptive policy with an algorithmic constellation design that accounts for the rectifier state and shapes the constellation in the observation domain. By approximating the rectifier state distribution, we derive a closed-form average symbol error rate (SER) expression and characterize the rate-reliability (R-R) tradeoff. We then seek constellations that minimize the SER under average transmit power and EH constraints. We address the resulting energy-constrained setting in the memoryless regime using an autoencoder-based framework that embeds the nonlinear rectification model as a differentiable channel block. Numerical results validate the proposed models, demonstrate the impact of memory on the R-R tradeoff, and show how learned constellations adapt to EH requirements in the rate-energy tradeoff.


[52] 2606.17913

Reducing Building Heat Demand Through Intelligent Control: A Comparative Simulation Study

Space heating remains the dominant energy consumer in buildings. While structural retrofitting can substantially reduce demand, it is often costly and time-intensive. As an alternative, this study investigates the potential of intelligent heating control strategies to reduce heat consumption with lower investment and faster implementation. Previous studies have shown that replacing conventional heating-curve-based controllers with model predictive controllers (MPCs) can reduce heating energy demand. Whereas most studies compare MPC to conventional control, this work evaluates two MPC strategies with different control objectives and quantifies their impact on indoor temperature tracking and heating demand. A virtual residential building model was developed in Python based on ISO 52016-1 to generate synthetic measurement data. A simplified resistance-capacitance (RC) model was parametrised using this dataset and used as the internal model for two MPC strategies implemented in MATLAB. The strategies differ only in their optimisation objective: one minimises quadratic heating power, while the other prioritises indoor temperature tracking for thermal comfort. Simulations over six days show that both strategies satisfy comfort and system constraints, but differ in energy use and temperature variation. The comfort-oriented controller achieves lower total heat consumption than the controller minimising heating power, which is attributed to the penalisation of high heating rates in the quadratic objective function. The results demonstrate the importance of objective function formulation in MPC design and show that high comfort levels can be maintained while achieving lower heating demand without structural modifications to the building envelope.


[53] 2606.17914

Three-phase model of unbalanced distribution networks with DERs

Classical DistFlow equations for steady-state distribution network analysis fail to capture the inherent imbalances of three-phase systems arising from asymmetrical lines, loads, and distributed energy resources (DERs). This paper extends the classical power flow (PF) equations into a rigorous, non-approximated three-phase formulation, termed Dist3Flow. The proposed branch flow model (BFM) utilizes the real and imaginary components of nodal voltages and the active and reactive power flows as state variables. Lines are modelled by nonlinear forward and backward equations, while loads and DERs are represented via ZIP models and P-Q control, respectively. By incorporating specific boundary conditions at the terminal nodes, the formulation generalizes PF analysis to both radial and closed-ring topologies. The solution is obtained by using a backward/borward sweep (BFS) algorithm. The approach is validated against OpenDSS across various configurations, considering open-ring and closed-ring topologies with and without DERs.


[54] 2606.17942

On the Optimum Energy-per-bit Launch Power in Coherent Hollow-core Fibre Transmission Systems

We investigate the optimum energy per bit in hollow-core-fibre transmission systems. We show that a 1000 km C-band link can achieve a 41.5% reduction in total power consumption when operating at the minimum energy-per-bit launch power with only 2.2% throughput penalty.


[55] 2606.18019

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.


[56] 2606.18054

AI-based Cognitive-linguistic Features for Dementia Assessment in Picture Description

Picture descriptions provide valuable insights into several clinical constructs related to cognitive-linguistic abilities. However, operationalizing these constructs into quantitative measures remains challenging, limiting interpretability and clinical utility. We introduced seven constructs tailored to the Cookie Theft picture description task and prompted large language models (LLMs) to evaluate them, generating severity scores and example-based explanations. Among the examined LLMs, Claude 3.5 Sonnet performed the best, producing severity scores that significantly distinguish cognitively impaired individuals from healthy controls. The model achieves a high accuracy of 85% on the ADReSS dataset. Expert evaluation of Claude's scores and explanations yields a 3.99/5 average agreement. The findings demonstrate the potential of LLMs to operationalize clinical constructs and generate interpretable evaluations, offering a promising approach for accessible cognitive screening tools.


[57] 2606.18058

Multiscale reconstruction of protein conformations from cryo-EM images

We present a novel multiscale algorithm for directly recovering the atomic model structure of a protein from single-particle cryo-EM data. Our algorithm is able to estimate protein structures to state-of-the-art accuracy for high-noise and low-contrast data. It is also robust to misspecifications in the TEM image formation model. These desirable properties are primarily due to the use of an explicit representation of the protein backbone in terms of bonds, torsion angles and bond angles, which supplies rich prior information to the structure recovery process. We apply our method on three protein cryo-EM datasets, generated using an electron microscope digital twin, and show that using a multiscale approach yields an improvement of the root-mean-square deviation (RMSD) and template modelling (TM) scores with respect to the ground truth. Furthermore, there is evidence that larger-scale structures are being prioritised with the multiscale algorithm, which reduces the possibility of convergence to bad local minima.


[58] 2606.18072

One-Step Token-to-Waveform Generation with MeanFlow in Latent Space

Neural audio codecs are central to modern LLM-based Text-to-Speech (TTS) and multimodal systems. As low-bitrate semantic codecs gain prominence, the Token-to-Waveform (Token2Wav) decoder becomes a bottleneck determining both perceptual quality and system efficiency. Conventional multi-step flow-matching decoders offer superior quality but suffer from high inference latency due to iterative sampling, creating a severe quality-speed trade-off. In this paper, we propose a novel Token2Wav architecture that overcomes this limitation by applying MeanFlow in a highly compressed latent space. By modeling the average velocity rather than the instantaneous velocity field, MeanFlow enables true one-step generation. Operating in the latent domain mitigates the memory and stability issues of waveform-level flows, yielding up to a 17$\times$ improvement in Real-Time Factor (RTF) compared to multi-step baselines with negligible quality degradation. Furthermore, we introduce refinement strategies that mitigate latent mismatch, including decoder-only fine-tuning with the MeanFlow generator frozen and end-to-end joint fine-tuning, improving fidelity without increasing inference-time cost. Code and demo are publicly available.


[59] 2606.18085

A Generic Multi-dimensional Symbol Construction for Digital Over-the-Air Computation and Practical Aspects

In this paper, we propose a general-purpose multi-dimensional symbol construction for computing an arbitrary symmetric function with digital over-the-air computation (OAC) and discuss the practical aspects of coherent aggregation. For our first contribution, we discuss the categorical representation of a symmetric function. By using this representation and leveraging the sufficiency of the histogram to evaluate a symmetric function, i.e., inspired by type-based multiple access (TBMA), we introduce a general approach to design a single set of OAC symbols to compute any digital function. For our second contribution, we use a comprehensive platform based on low-cost nodes that maintain synchronization in time, frequency, phase, and amplitude via a trigger mechanism, enabling coherent OAC experiments without Global Positioning System (GPS) or cable-based synchronization. Using measurements from the platform, we characterize the phase and amplitude statistics of the composite channel to derive a realistic impairment model for coherent OAC. Through a comprehensive analysis, we demonstrate the effectiveness of the proposed scheme under impairments captured by the proposed model


[60] 2606.18109

Verifiable computations for dynamic encrypted control

Encrypted control can preserve the privacy of data and parameters while the necessary computations can be outsourced to a cloud server. To ensure the integrity of the received values from the cloud, i.e., that they have not been changed, however, strong assumptions or verification algorithms are needed. Previous methods require computationally expensive cryptographic protocols or are only applicable to static computations. In this paper, we present a novel type of verification algorithm for linear dynamic encrypted control. We utilize system-theoretic input-output properties of the controller for artificial challenge signals, which are processed in the cloud in parallel with the requested control input, to check the correctness of the results at the plant. This results in almost no additional computational load, wrong computations are revealed with high probability, and no replay attacks are possible.


[61] 2606.18126

Decentralized Decision-Making for Finite-State Systems over Finite Alphabets is Undecidable

This paper investigates decentralized decision-making for finite-state transition systems, i.e., discrete-event systems, under finite communication alphabet constraints. We consider a general decentralized observation framework in which a plant is observed by multiple local agents that transmit symbolic messages over a finite alphabet to a memoryless fusion center. The fusion center then produces a binary decision according to a prescribed fusion rule. We study the fundamental question of whether there exist local decision maps that enable exact reconstruction of a given regular specification language from decentralized observations. Contrary to classical results that rely on specific monotone fusion rules such as conjunction and disjunction, we show that the problem becomes undecidable even under a severely restricted information architecture: binary local decision alphabets and a fixed exclusive-or (XOR) fusion rule. The proof is based on a reduction from the Thue word problem, a classical undecidable problem in rewriting systems. We further show that decentralized supervisory control, decentralized fault diagnosis, and decentralized fault prognosis are also undecidable under finite communication alphabets. Our results reveal that existing decidability results fundamentally rely on structural properties of fusion rules, in particular their monotone order-preserving nature. In contrast, non-monotone fusion rules such as XOR break this structure, leading to undecidability even in highly restricted settings.


[62] 2606.18134

Grounding Spoken LLMs in Multi-Speaker Audio via Diarization Conditioning

We propose diarization-conditioned spoken language models (SLMs), a strategy for extending SLMs to far-field multi-talker audio. Rather than adapting the decoder via Serialized Output Training, which risks catastrophic forgetting, we condition the acoustic encoder on diarization masks to extract target-speaker representations, keeping the decoder frozen. We instantiate this as Dixtral, integrating a Diarization Conditioned Whisper (DiCoW) encoder into the Voxtral SLM. On AMI, NOTSOFAR-1, LibriSpeechMix, and Mixer6, Dixtral outperforms Gemini 3.0 Flash, VibeVoice, and Voxtral Mini Transcribe V2 on speaker-attributed transcription by 29.0%, 19.8%, and 16.0% absolute cpWER respectively. On a novel long-form multi-speaker QA benchmark, zero-shot Dixtral matches Gemini on far-field content understanding, and when fine-tuned surpasses both Gemini and Voxtral operating on close-talk across all tasks.


[63] 2606.18150

Spatial and Temporal Generalization of CSI-based Neural Positioning

Channel state information (CSI)-based neural positioning learns a mapping from CSI measurements to user equipment (UE) positions using neural networks. However, most existing performance evaluations utilize randomly partitioned train/test CSI-dataset splits, which fail to reflect the generalization requirements of practical deployments and present optimistic results. In this paper, we study the spatial and temporal generalization of neural positioning with standard-compliant Wi-Fi and 5G NR systems for three real-world CSI datasets acquired in indoor and outdoor environments. We assess generalization with two different architectures, a conventional multilayer perceptron (MLP) and a novel transformer architecture, to unseen spatial regions, unseen UE trajectories, and CSI measurement campaigns separated by one week. Our experiments show that both architectures generalize well in space and time, and the proposed transformer consistently outperforms the MLP in positioning accuracy while requiring fewer model parameters.


[64] 2606.18151

Channel Charting for Position and Orientation

Channel charting (CC) in real-world coordinates is a recently proposed self-supervised machine learning method that maps high-dimensional channel state information (CSI) to user equipment (UE) position. In this paper, we extend CC to also estimate UE orientation, which can further assist tasks such as beamfinding, precoding, and beam- and cell-assignment. To this end, we propose a novel orientation triplet loss that accounts for angle periodicity and an alignment loss that embeds estimated orientations in real-world coordinates in a self-supervised fashion. Using real-world CSI measurements from a standard-compliant 5G NR system, we demonstrate that the proposed method achieves position and orientation estimation accuracy close to that of supervised approaches trained with ground-truth labels.


[65] 2606.18196

Receiver-Aware Analysis and Verification of the Spectral Separation Coefficient Under Interference-Induced Degradation

Interference poses a significant challenge to satellite-based positioning systems, making it essential to accurately quantify the effects of specific interference types on receiver performance and the resulting reliability of position computation. In current practice, interference effects are often quantified using receiver-independent metrics, with receiver-specific front-end characteristics either idealized or only implicitly considered. In this paper, we address this limitation by explicitly incorporating receiver-specific front-end characteristics into the computation of interference effects and validating the resulting receiver-dependent analysis experimentally. Therefore, we record a real-world open-field dataset comprising 210 distinct interference scenarios and compute the receiver-dependent spectral separation coefficient (SSC) and interference impact for a specific receiver module. Furthermore, we verify the computation using a controlled dataset generated with a radio frequency constellation simulator (RFCS), employing the same receiver module and replaying similar interferences classes. The comparison of results obtained in both environments demonstrates the robustness of the interference impact computation.


[66] 2606.17061

A Fixed Representation Probe Reveals Morphology-Space Organization in Non-Gaussian Elastic Transients

Elastic systems driven by intermittent energy release generate non-Gaussian transients across domains such as brittle fracture, seismicity, rotating machinery and interferometric instrumentation. These signals often contain bursts, ringdowns, ridges and clustered energy packets, but it remains unclear whether such motifs define a measurable morphology comparable across physical systems. Here we use a frozen convolutional encoder trained on transient-rich interferometric noise as a fixed probe of non-Gaussian elastic morphology. The encoder is not fine-tuned, retrained or recalibrated on any target domain. Signals are mapped to a common time-frequency representation and compared through latent geometry and perturbation response rather than task-specific classification. In granite acoustic-emission experiments, L2-normalized embeddings define trajectories on a latent hypersphere. The cumulative angular path provides a derivative-free observable of morphological reorganization. This geometry distinguishes two fracture organizations: a more distributed damage evolution and a more localized rupture regime. The localized regime accumulates a larger angular path and degrades more strongly under phase randomization and temporal-order perturbation, consistent with a more phase-sensitive and sequence-dependent rupture morphology. Synthetic controls and seismic morphology-destruction experiments indicate that the response is not explained by marginal spectral energy alone, while random-weight attribution controls show that visual localization is insufficient without quantitative perturbation tests. These results support frozen transient-rich representations as fixed measurement probes for comparing non-Gaussian elastic morphology across heterogeneous physical systems.


[67] 2606.17066

Inversion of Electrochemical Immittance Spectra based on the Mellin Transform

In this work, we show that the Fredholm integral equations underlying the distribution of relaxation times (DRT), the distribution of capacitive times (DCT), and related frameworks share a common mathematical structure, namely that of a Mellin convolution. This comes from the fact that all standard immittance (impedance or admittance) kernels depend on the product $\omega\tau$ rather than on $\omega$ and $\tau$ independently. Exploiting this structure, we derive an exact algebraic inversion formula in Mellin space that converts the deconvolution problem into a closed-form relation between the Mellin transform of the measured immittance and that of the unknown distribution function. The framework is validated analytically on a set of examples including the constant phase element (CPE), the Davidson-Cole (DC) model, and the finite-length Warburg model with blocking boundary conditions. It is also validated numerically using the fast Mellin transform via the fast Fourier transform algorithm for both the CPE and the DC model, including their DRT and DCT recovery under clean and noisy conditions. The approach unifies the impedance- and admittance-based inversions under a single spectral framework, and provides a new approach for the characterization of electrochemical systems from immittance data.


[68] 2606.17093

Diagnosing and Repairing Shape-Prior Shortcuts in Long-Range Single-Shot Fringe Projection Profilometry

Learning-based single-shot fringe projection profilometry (FPP) has been studied mostly at close range. The long-range regime (standoff beyond 1 m) remains largely unaddressed: inverse-square intensity falloff lowers fringe signal-to-noise ratio and degrades physical ground truth, the single-shot problem is ill-posed because fringe-order information is absent from one image, and these architectures have not been studied mechanistically. We present a diagnose-repair-verify study using mechanistic interpretability (MI) and conformal uncertainty quantification (UQ) as convergent diagnostics: they agree on one physical failure locus, driving and verifying an architectural repair. On a photorealistic synthetic benchmark (15,600 fringe images, 50 objects at 1.5-2.1 m), a best UNet baseline reaches 14.54 mm object mean absolute error (MAE). Three probes (linear probing, Grad-CAM, flat-plane out-of-distribution test) converge: the baseline solves the task via object-boundary shape priors rather than fringe-phase decoding. We repair this with PhiCalNet, which outputs wrapped phase rather than depth and applies a fixed differentiable calibration layer mapping phase to depth, removing the shape-prior solution from the hypothesis space architecturally rather than by a loss penalty. A physics-informed loss that enforces the same physics as a soft penalty on a depth-regressing network yields no measurable gain, isolating the architecture as the operative factor. PhiCalNet reduces object MAE 3.3x to 4.46 mm; the residual is carried by 0.103% of pixels at the +/-pi wrap discontinuity. Pixel-wise conformal UQ confirms the diagnosis: rejecting the top 5% of object pixels by snapshot disagreement cuts PhiCalNet RMSE by 64% (20.6->7.4 mm) versus 3.5% for the baseline. MI and UQ converge on the same failure locus.


[69] 2606.17185

Finsler Geometry, Graph Neural Networks, and You

Graph neural network architectures based on the graph Laplacian approximate the Laplace-Beltrami operator, thus limiting their application to isotropic operators. As a nonlinear alternative to the Laplace-Beltrami operator, we consider estimates of the Finsler Laplacian on point clouds sampled from a manifold. We prove that these discrete estimates converge to the true operator on the manifold as the number of point samples grows. Moreover, we show that this operator can be expressed as a graph neural network layer, which we use to define a family of Finslerian graph neural networks constrained to express Finsler geometry. We show that Finslerian graph neural networks recover the geometry underlying nonlinear diffusion equations in practice.


[70] 2606.17208

On the Strong Duality in Continuous-time and Discrete-time Linear Quadratic Regulators

This paper revisits the strong duality in the linear quadratic regulator (LQR) for continuous-time and discrete-time systems, and explores its interconnection with typical assumptions and the uniqueness of primal-dual solutions. Using a linear operator $\Psi$, we formulate a common nonconvex LQR problem that captures both time domains. We then derive its Lagrange dual problem and establish the strong duality via a rank-constrained tight semidefinite program (SDP) relaxation. Further, we show that the primal-dual optimal solutions to the SDP relaxation, after dropping the rank constraint, recover the classical algebraic Riccati equations and optimal feedback gains in a constructive manner. The dual derivation and strong duality analysis rely on mild standard assumptions and exploit the properties of the linear operator and its adjoint, revealing a structural symmetry between the two time domains.


[71] 2606.17226

220-GBd optical coherent waveform generation using temporal unitary transforms

We use temporal unitary transforms to generate 16-QAM up to 220 GBd using only 50-GHz electrical bandwidth. The technique is theoretically lossless and can generate arbitrary optical waveforms beyond the bandwidth of the constituent modulators.


[72] 2606.17241

Beyond Benchmarks: Continuous Edge Inference for Fine-Grained Roadside Perception

Continuous AI inference on resource-constrained edge hardware introduces deployment effects that are largely invisible to conventional benchmark evaluation, including temporal instability in streaming video, thermal throttling under sustained load, and workload-dependent performance variability. We present Edge-TSR, a deployment-oriented continuous edge inference system for sustained roadside perception on the NVIDIA Jetson Orin Nano. Edge-TSR integrates detection, tracking, fine-grained classification, and a lightweight track-aware temporal stabilization mechanism that improves streaming inference consistency with negligible computational overhead. Our central finding is that benchmark-centric evaluation systematically overstates deployed edge inference performance. Across three state-of-the-art baselines, we observe consistent 20-30% relative degradation when transitioning from static-image evaluation to real-world streaming deployment. Edge-TSR addresses this gap through temporal inference stabilization, recovering up to 10.16% classification accuracy over per-frame inference baselines while maintaining sustained real-time performance under continuous operation. We evaluate the complete system under diverse real-world deployment conditions, jointly characterizing inference quality, latency, throughput, and thermal behavior during long-duration operation. A 55-minute vehicular deployment over a 26 km route demonstrates sustained operation at 16.18 FPS within safe thermal limits on a single embedded device without cloud offload. Our findings show that deployment-aware evaluation and temporal inference stabilization are necessary components of continuously operating edge AI systems intended for real-world sensing deployments. We release a sample annotated streaming video evaluation dataset and full system implementation to support reproducible deployment-centric evaluation.


[73] 2606.17249

From Compression to Deployment: Real-Time and Energy-Efficient FastGRNN on Ultra-Constrained Microcontrollers

The dominant trajectory of modern machine learning has been to scale up: larger models, larger accelerators, larger memory budgets. Yet a multi-year global semiconductor supply constraint and the growing energy and carbon cost of always-online inference expose the fragility of this trajectory and motivate the opposite direction: refactoring AI and ML algorithms to fit the small, ubiquitous microcontrollers already in mass production in wearables, sensors, and edge appliances. We present an end-to-end open-source reproduction of FastGRNN, a compact gated recurrent cell, deployed on two bare-metal targets: the 8-bit Arduino (ATmega328P) and the 16-bit MSP430 (no hardware multiplier; 16 KB Flash; 512 B SRAM). Our compression pipeline combines low-rank weight factorization, iterative hard-thresholding sparsity, and per-tensor Q15 post-training quantization with explicit activation calibration. The deployed model occupies 566 bytes of weights and achieves macro F1 = 0.918 (seed 0; five-seed Q15 mean 0.853+-0.107) on the HAPT test set. It matches a PyTorch reference at 100% prediction agreement across 3,399 test windows (MCU seed 0; 99.91-100% C-equivalent across five seeds). Both platforms sustain real-time 50 Hz streaming inference (9.21 ms per sample on Arduino; 13 ms on MSP430), where a 256-entry sigmoid/tanh look-up table delivers a 30.5x speedup on the multiplier-less MSP430. Four contributions extend the original FastGRNN paper: (i) cross-platform bit-equivalent deterministic inference; (ii) characterization of recurrent warm-up latency (median 74 samples, 1.48 s; worst-case 125 samples, 2.50 s over 100 test windows); (iii) a deployable look-up-table recipe for multiplier-less embedded targets; and (iv) hardware energy characterization showing 17.7 mW active inference power, <0.09 mW idle power, and 96.7% energy reduction with the LUT.


[74] 2606.17266

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

Production planning increasingly has to treat workforce capability as a decision variable: certifications lapse when skills are not maintained, new products require skills the current workforce does not hold, and reskilling competes for the same worker hours needed for production. Existing operations benchmarks usually treat labor as exogenous, while workforce-planning models with skills and learning are rarely released as reusable testbeds. We introduce SkillChain-Gym, a benchmark specification for reskilling-aware production-inventory control: a single-site environment with stylized worker skill-state dynamics, hard threshold certification, forgetting, and capacity-consuming training actions constrained by the same per-worker time budget as production. The benchmark includes seed-controlled disruption scenarios, three feasibility modes with projection diagnostics, deterministic replay, and metrics covering operations, resilience, capability growth, and training-access distribution. We evaluate production-only, reactive adaptive, water-filling adaptive, and static-insurance policies with budget variants over 60-shift horizons with paired statistical tests. The results are regime-dependent rather than a ranking. Training-capable policies dominate the production-only baseline, and maintenance training is necessary under forgetting even without disruptions. Among training-capable classes, adaptive training helps when bottlenecks are visible in the forecast, while a lean static cross-training plan, a deliberately favorable comparator whose structure encodes relevant skill contingencies, acts as strong insurance under surprise shocks and absenteeism. Capacity slack and the forgetting rate govern the boundary between these regimes. No policy class dominates across regimes, motivating forecast-driven controllers that decide when to buy skill insurance and when to react.


[75] 2606.17269

Skill-Constrained Model Predictive Control for Resilient Manufacturing Supply Chains

In skill-constrained production-inventory systems, the qualified human capacity available tomorrow depends on training decisions made today: production requires certified workers, certifications decay unless maintained, and training consumes the same scarce worker hours that production needs now. We study a closed-loop skill-constrained model predictive controller that, at every shift, solves a finite-horizon mixed-integer program over production, inventory, backlog, and training, with binary predicted certification, hard production eligibility, and an interpretable terminal value that prices certified-capacity gaps at the horizon boundary; only the first-period action is applied before replanning. On synthetic, seed-controlled SkillChain-Gym scenarios - announced and surprise new-skill shocks, demand shocks, absenteeism, forecast- and availability-quality modes, capacity-boundary and training-rate sweeps, and negative controls - we evaluate the controller against production-only and maintenance-only ablations, static cross-training insurance plans, and a strong reactive heuristic, under an ex-ante locked configuration and paired statistics. The result is regime dependence, not superiority: no policy class dominates. Predictive control helps when skill or labor bottlenecks are forecastable early enough for training to complete; lean static insurance remains hard to beat under surprise shocks, near the demand-capacity boundary, and wherever pre-shock slack makes insurance cheap. Attribution ablations separate certification maintenance, re-acquisition of lapsed certifications, and greenfield skill acquisition. Forecastability, not adaptivity per se, decides when predictive control pays.


[76] 2606.17281

Are you speaking my languages? On spoken language adherence in multimodal LLMs

While Large Language Model (LLM) based Automatic Speech Recognition (ASR) enables seamless multilingual use, models often misidentify the output language, compromising transcription fidelity and downstream application quality. To preserve flexibility and code-switching capabilities, we propose a soft prompting approach that hints at potential spoken languages without strictly constraining the output. We formally define this challenge as a lack of language adherence, introduce a novel metric to quantify violations, and evaluate three mitigation strategies: (1) zero-shot prompting for robust guidance under uncertainty, (2) supervised fine-tuning (SFT) to improve prompt adherence, and (3) Chain-of-Thought (CoT) reasoning to enforce adherence during decoding. We present a comparative analysis of these methods across multiple languages, evaluating effectiveness in reducing the language violation while maintaining overall ASR performance. Finally, we discuss trade-offs to guide strategy selection under various compute constraints.


[77] 2606.17377

Performance-Driven Environment Abstraction with Multi-Timescale Learning

We study performance-driven environment abstraction for decision-making in large Markov decision processes. Rather than preserving geometric or topological structure, we seek abstractions that directly optimize decision quality. We model abstraction as a controlled approximation obtained by aggregating the state space and enforcing a shared action distribution within each aggregated state. For a fixed partition, we establish a performance guarantee that separates value-function approximation error from the loss introduced by action sharing. Guided by this analysis, we develop a multi-timescale reinforcement learning framework that jointly adapts the policy and a tree-structured environment abstraction. The resulting algorithm refines and coarsens regions of the state space based on Q-value discrepancies, balancing performance against abstraction size and complexity. Empirical results demonstrate substantial state compression, improved sample efficiency, and faster replanning compared to actor-critic baselines.


[78] 2606.17379

MeiBRD: Meta-Learning Intraoperative Biomechanical Residual Deformation

Accurate intraoperative liver registration is challenging due to substantial soft-tissue deformation yet sparse intraoperative measurements. Biomechanical models regularize this ill-posedness with prior knowledge but exhibit persistent prediction bias due to simplifying assumptions, while data-driven learning solutions struggle with data efficiency, generalization, and physical plausibility. We propose a hybrid registration framework that adapts a biomechanical prior using sparse intraoperative correspondences. Rather than learning a full deformation field, we learn a residual deformation function that corrects linear biomechanical predictions, modeled as a graph neural diffusion function with geometry-aware attention over the 3D liver mesh. To enable long-range information transfer of sparse observations, we take a novel perspective of sparse intraoperative measurements as \textit{context} samples where input-output pairs of the residual deformation function are fully observed, casting the problem into learning-to-learn this residual function from intraoperative context samples with feedforward meta-learners. Experiments on a deformable liver phantom dataset demonstrate improved registration accuracy and generalization compared to rigid, biomechanical, and data-driven baselines, particularly for out-of-distribution geometries and deformations.


[79] 2606.17388

Agent Utilities over Generalized Voronoi Regions and their Gradients

In this paper, we generalize the concept of Voronoi regions, define agent utility as the integral of a utility density over the corresponding Voronoi region, derive gradients of the utility, and illustrate the approach in a two-team example from soccer. The generalization of Voronoi regions is in the form of so-called Cost-Induced Voronoi (CIV) regions, where the agent state space may differ from the space being partitioned. One example of such regions is when the cost is given by the optimal solution of an LQR control problem. Then the agent states include position as well as velocity, while the partitioned space only includes positions. The agent utility is defined by integrating some utility density over the CIV region of the agent. This utility density might be the probability density of some beneficial event, such as receiving a pass in soccer. The utility is then the overall probability of receiving a pass and the gradient represents a way to improve that probability. We show how this utility gradient can be computed using the Reynolds Transport Theorem from fluid mechanics, and that this approach achieves similar accuracy while reducing computation time by about an order of magnitude compared to a baseline finite-difference approximation.


[80] 2606.17465

Perron--Frobenius Operator Matching for Generative Modeling

We introduce Perron--Frobenius Operator Matching (PFOM), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models. We prove that among Bregman divergences, only Kullback--Leibler divergence preserves equality between density-level and sample-conditioned objectives, yielding a practical loss equivalent to Koopman path matching. We further develop Nesterov-accelerated training and sampling that stabilize discretization and accelerate convergence. %On Gaussian mixtures and two-moons, PFOM achieves faster KL/$W_2$/MMD decrease and improved wall-clock efficiency with empirical validation. PFOM unifies operator-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high-dimensional applications.


[81] 2606.17471

ReRAM-aware Model Finetuning addressing I-V Non-linearity and Retention Errors

Traditional CPU, GPU, and NPU architectures are increasingly limited by the von Neumann bottleneck. While In-Memory Computing (IMC) using ReRAM crossbar arrays offers a high-density, energy-efficient alternative, its practical deployment is constrained through their non-idealities. Existing hardware-aware training frameworks often require training from scratch, which is computationally prohibitive for modern large-scale models. In this work, we propose a finetuning-based hardware-aware training algorithm that enables robust DNN deployment on ReRAM with minimal training overhead. Our approach mitigates I-V non-linearity by applying a range-shrunk sinh transformation and incorporates retention errors directly into a regularization loss during the finetuning process. We evaluate our framework across models and tasks such as image classification and question-answering (QA). Experimental results demonstrate that our method achieves similar accuracy on large-scale models like ResNet18 and DeiT-Tiny as the base model. In-case of ImageNet for MobileNetV3 families the technique has only less than 2% accuracy degradation. Further, applying the technique on the SQuAD v2 dataset results in only 1 point degradation of F-1 score.


[82] 2606.17510

OmniDroneX: An LLM-Assisted Holistic Drone-as-a-Service Ecosystem

Despite rapid advances in UAV technologies, current deployments remain limited due to several gaps in UAV systems research. To address these challenges, we propose OmniDroneX, a unified Drone-as-a-Service ecosystem, in which drones are transitioned from fixed function platforms into dynamically composable entities that can be integrated with external infrastructures to offer omni-capabilities. OmniDroneX bridges low-level physical primitives with high-level mission intent through a unified vendor-agnostic interface (libUAV) and a formal physical-service abstraction model (PT-SOA). A core innovation is the diverse application of large language models (LLMs) across multiple layers of the OmniDroneX architecture. LLMs are used to assist in identifying and formalizing primitive device functions and abstract service definitions, supporting automated service composition and workflow generation, and enabling interactive, natural-language mission specification and refinement. OmniDroneX also incorporates important categories of composition techniques that are essential in dynamic UAV systems, including physical layer composition for drone capability augmentation, as well as spatiotemporal, functional, collaborative, exception-aware, and QoS-based service compositions. Collectively, these features allow OmniDroneX to serve as a foundation for scalable, resilient, and self-evolving UAV ecosystems operating in complex and dynamic environments.


[83] 2606.17562

Anywhere, Any-Stymie: Remote Activation of Trojan Malware on LiDAR with Modulated Signals

LiDAR sensors are widely deployed in autonomous systems for 3D perception and safety-critical decision-making. We identify a previously unexplored attack surface in which dormant malware embedded in the LiDAR sensing pipeline remains inactive during normal operation and can be externally triggered after deployment, without requiring access to sensor hardware or networking at attack time. To operationalize this threat, we design malware capable of low-level point-cloud manipulation and embed it into LiDAR firmware. This malware was developed in a closed research test environment with vendor technical support, rather than by exploiting an inherent production supply-chain vulnerability. To selectively trigger attack activation, we design and implement an optical trigger that remotely activates the malware by delivering a modulated signal into the sensing environment. Once triggered, the malware performs real-time point cloud manipulation, and we demonstrate false object injection and real object suppression on static and mobile victim platforms. Our evaluation first establishes attack feasibility, including static operation at 300~ft and recorded drive-by runs reaching 35~mph. We then illustrate quantitatively that injected person-like artifacts can remain semantically detectable by a state-of-the-art 3D object detector. Finally, we demonstrate multiple modes of safety-critical impact on a deployed tactical autonomous vehicle. Together, these results highlight the need for stronger integrity guarantees throughout the LiDAR sensor development and deployment pipeline.


[84] 2606.17572

When Dynamics Models Read the Wrong Time Steps: Label-Free Event Credit Re-Anchoring for Robust Global Readouts

Learned dynamics models often answer global physical questions, such as fault severity or impact stiffness, by pooling a per-step feature sequence into one readout vector. This sequence-to-global interface creates an under-studied temporal credit problem: with only trajectory-level supervision, a model can predict accurately in training conditions while reading from abundant smooth correlates rather than the brief physical events that determine the target. We call this failure temporal credit dilution. It is not exposed by the training loss and is not removed by standard physics-informed residuals, because the error lies in where the global readout assigns functional credit. We introduce Credit-in-Event, an interface-level probe for measuring how much pooled credit lands on event steps, and prove in closed form that a pooled linear reader routes credit to a spurious background channel as the event fraction shrinks. We then propose CREST, a training-free and label-free readout that estimates a transient event core from learned features and re-anchors the pooled representation through event-versus-rest contrast. Across simulated gear and impact systems, recurrent and attention encoders, and public bearing vibration data, CREST reduces out-of-distribution error while restoring event credit. Ablations show that stable-step selection and receptive-field shrinking fail, confirming that the gain comes from event-core credit re-anchoring rather than a generic locality or stability prior.


[85] 2606.17743

Information-Theoretic Meta Dynamic Programming for Signalling and Control of POMDPs

In this paper, we study the information-theoretic characterization of simultaneous signalling and control over channels modeled by partially observable Markov decision processes (POMDPs). The problem is formulated as an optimization over randomized control strategies that maximize the directed information from actions to observations, subject to an average-cost constraint. We derive a novel dynamic programming framework in which the state is defined on the space of conditional probability distributions, leading to a high-level ``meta'' dynamic program. Specifically, we show that two coupled information states, namely, the posterior distribution of the system state and a distribution over such posteriors, satisfy Markov recursions and provide sufficient statistics for optimal control. This structure enables the decomposition of optimal strategies into separated randomized policies that depend only on these information states. Our results establish necessary and sufficient conditions for optimality and unify classical stochastic control and information-theoretic formulations. In particular, we show that in the absence of signalling, the proposed framework reduces to the standard dynamic programming equations for POMDPs. The developed approach provides a principled foundation for analyzing and designing control systems with intrinsic information constraints.


[86] 2606.17835

Perceptual compensation for tonal context in self-supervised speech models

This study examines the extent to which the wav2vec2.0 architecture exhibits evidence of compensation for phonological context. We conducted a pseudo-replication of a perceptional compensation experiment on Mandarin Chinese tones, and compared the embedding similarities and probing classifier outputs between a purely self-supervised pre-trained model and a model fine-tuned for Mandarin ASR. No evidence of compensation was found in the embedding similarities of the purely pre-trained model. Probing classifiers showed some evidence of compensation in addition to the expected layer-wise improvements in categorization, but failed to replicate human performance on isolated test syllables. Our findings contrast with previous reports of sensitivity to phonological structure emerging through pre-training alone, and suggest that supervised objectives may be necessary to encourage the abstraction of at least some types of phonological regularities.


[87] 2606.17932

Dual Line Coherent Detection

We experimentally demonstrate dual-line coherent detection using an optical frequency comb local oscillator, enabling large frequency offset tolerance with minimal additional signal processing. The proposed method achieves 200 GHz offset tolerance for 400 Gbit/s signals with low penalty, supporting uncooled, low-cost coherent transceivers.


[88] 2606.18122

Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines

Embedded machine learning moves inference from cloud services to resource-constrained devices that must acquire data, preprocess signals, run a model, and act within tight limits on memory, energy, and latency. This paper presents a systems-oriented synthesis of an embedded machine-learning workflow for microcontroller-class platforms. The emphasis is placed on engineering decisions that are often hidden in generic machine-learning introductions: sampling and buffering, feature extraction as dimensionality reduction, validation under class imbalance, model/runtime co-design, and streaming deployment. Two representative signal families are used throughout the paper. The first is inertial motion recognition, where a two-second, three-axis accelerometer window is transformed from raw samples into root-mean-square and spectral features before classification. The second is keyword spotting, where audio is sampled, anti-aliased, transformed into mel-frequency cepstral coefficients, and processed by a compact one-dimensional convolutional network. The paper concludes with practical design rules for robust on-device inference, including data curation, quantization, thresholding, scheduling, and field monitoring.


[89] 2606.18218

Finite-Time Queue Peak Laws in Stochastic Networks: Logarithmic Scaling After Geometric Thresholds

We study finite-horizon queue peaks in generalized switches, a standard stochastic-network model in which many queues share constrained service resources. Arrivals may be dependent, time-varying, and adapted to the past; the standing load condition is uniform interior slack, meaning the conditional mean arrival vector stays in a fixed contraction of the capacity region. We show that this slack reshapes the finite-time peak law for drift-minimizing scheduling policies such as MaxWeight. The square-root envelope that is sharp without slack persists only up to a geometry-dependent threshold; beyond that threshold, the running maximum grows only logarithmically with the horizon, both with high probability and in expectation. The mechanism is self-normalization: in the current queue direction, the projected fluctuation scale is normalized by the stabilizing drift scale. This removes capacity geometry from the logarithmic coefficient, while geometry remains in the threshold. Matching lower bounds show that both the logarithmic term and a geometric threshold are unavoidable. When finite-time state-space collapse is available, the threshold can be sharpened using local bottleneck geometry. For generalized input-queued switches, we obtain finite-time peak bounds with tight logarithmic coefficients. Simulations illustrate the two-phase envelope, local geometric refinements, and variance-sensitive improvements predicted by the theory.


[90] 2606.18223

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implement security rules while maintaining critical operations. However, these autonomous networks are partially observable systems, i.e., the cyber-attacker's (red agent's) actions are not observable, making it difficult for the defender to predict red actions, learn red policies, or assess the attacker's intrusion levels. To address this, we propose a Policy Learning Technique using imitation learning to learn policies for partially observable RL agents with discrete states and discrete actions. We apply this technique in an autonomous cyber environment to predict red agent's actions from network observations and defender actions. Integrated with a neurosymbolic cyber-defense agent, our method effectively handles different red policies and achieves high prediction accuracy across diverse simulated scenarios.


[91] 2208.03023

AID: Open-source Anechoic Interferer Dataset

A dataset of anechoic recordings of various sound sources encountered in domestic environments is presented. The dataset is intended to be a resource of non-stationary, environmental noise signals that, when convolved with acoustic impulse responses, can be used to simulate complex acoustic scenes. Additionally, a Python library is provided to generate random mixtures of the recordings in the dataset, which can be used as non-stationary interference signals.


[92] 2401.14814

Joint Background-Anomaly-Noise Decomposition for Robust Hyperspectral Anomaly Detection via Constrained Convex Optimization

We propose a novel hyperspectral (HS) anomaly detection method that is robust to various types of noise. Most existing HS anomaly detection methods are designed without explicit consideration of noise or are based on the assumption of Gaussian noise. However, in real-world situations, observed HS images are often degraded by various types of noise, such as sparse noise and stripe noise, due to sensor failure or calibration errors, significantly affecting the detection performance. To address this problem, this article establishes a robust HS anomaly detection method with a mechanism that can properly remove mixed noise while separating background and anomaly parts. Specifically, we newly formulate a constrained convex optimization problem to decompose background and anomaly parts, and three types of noise from a given HS image. Then, we develop an efficient algorithm based on a preconditioned variant of a primal-dual splitting method to solve this problem. Experimental results using seven real HS datasets demonstrate that the proposed method achieves detection accuracy comparable to state-of-the-art methods on original images and exhibits significantly higher robustness in scenarios where various types of mixed noise are added.


[93] 2411.06842

Evaluating Synthetic Data Generation for Domain Generalization in Fetal Brain MRI Segmentation

Fetal brain tissue segmentation from magnetic resonance imaging (MRI) is crucial for studying neurodevelopment, but remains challenging due to data heterogeneity and limited annotations. Domain randomization (DR) has recently emerged as a promising strategy for single-source domain generalization by synthesizing training images with randomized artifacts, contrast, and resolution. In this work, we investigate how to maximize the out-of-domain (OOD) generalization of DR-based methods. We evaluate several synthetic data generation strategies for DR, with a particular focus on our recently proposed framework, FetalSynthSeg. We show that simple Gaussian mixture-based intensity modeling outperforms more complex physics-based simulations, and that intensity clustering (subdividing tissue classes based on intensity) improves OOD robustness. Evaluated on 348 fetal subjects from four sites spanning 0.55-3T and both T1w and T2w contrasts, FetalSynthSeg reaches state-of-the-art performance on several FeTA 2024 testing datasets (80-85 Dice score) and, for the first time, offers robust segmentation on modalities other than T2w for fetal brain segmentation (80 Dice on dHCP-T1w dataset). Compared with state-of-the-art methods such as BOUNTI, nnU-Net ensemble, and the FeTA 2024 winner, FetalSynthSeg delivers comparable or superior accuracy while maintaining strong robustness across domain shifts. Our code, model weights, and Docker image ready for easy inference are available at this https URL.


[94] 2412.08895

Fully Bayesian Wideband Direction-of-Arrival Estimation and Detection via RJMCMC

Consider an array receiving unknown wideband signals from an unknown number of sources $k$. Wideband signals can occupy arbitrarily wide bandwidths, rendering demodulation-based approaches inapplicable, a common situation in settings involving acoustic signals. Here, we aim to determine $k$ given $N$ noisy array-valued measurements, a task known as the "detection problem," for which Bayesian model comparison is a common approach. To render Bayesian inference tractable, it is typically necessary to marginalize the source signals. Unfortunately, for wideband signals, naive marginalization has an unaffordable time complexity of $\mathcal{O}(N^3 k^3)$. As a result, fully Bayesian signal detection has yet to be demonstrated in wideband settings. In this work, we propose a wideband signal model that allows for computationally tractable marginalization of the source signals. We begin from the canonical model of linear time-invariant (LTI) signal propagation, which is then augmented into a circular convolution, all without loss of generality. This allows for efficient computation in the frequency domain, where the resulting linear system admits a decomposition into a sparse matrix we refer to as a \textit{stripe matrix decomposition}. Exploiting this sparsity pattern reduces the time complexity of computing the marginal likelihood to $\mathcal{O}(N k^3)$. These computational improvements enable efficient posterior inference via reversible-jump Markov chain Monte Carlo (RJMCMC). In this work, we use the non-reversible extension of RJMCMC (NRJMCMC), which often achieves lower autocorrelation and faster convergence than RJMCMC. Detection of the latent source signals can then be performed in a fully Bayesian manner using samples drawn by NRJMCMC. We evaluate our procedure by comparing it against generalized likelihood ratio testing (GLRT) and information criteria.


[95] 2501.06756

Generative AI Enabled Robust Sensor Placement in Cyber-Physical Power Systems: A Graph Diffusion Approach

With advancements in physical power systems and network technologies, integrated Cyber-Physical Power Systems (CPPS) have significantly enhanced system monitoring and control efficiency and reliability. This integration, however, introduces complex challenges in designing coherent CPPS, particularly as few studies concurrently address the deployment of physical layers and communication connections in the cyber layer. This paper addresses these challenges by proposing a framework for robust sensor placement to optimize anomaly detection in the physical layer and enhance communication resilience in the cyber layer. We model the CPPS as an interdependent network via a graph, allowing for simultaneous consideration of both layers. Then, we adopt the Log-normal Shadowing Path Loss (LNSPL) model to ensure reliable data transmission. Additionally, we leverage the Fiedler value to measure graph resilience against line failures and three anomaly detectors to fortify system safety. However, the optimization problem is NP-hard. Therefore, we introduce the Experience Feedback Graph Diffusion (EFGD) algorithm, which utilizes a diffusion process to generate optimal sensor placement strategies. This algorithm incorporates cross-entropy gradient and experience feedback mechanisms to expedite convergence and generate higher reward strategies. Extensive simulations demonstrate that the EFGD algorithm enhances model convergence by 18.9% over existing graph diffusion methods and improves average reward by 22.90% compared to Denoising Diffusion Policy Optimization (DDPO) and 19.57% compared to Graph Diffusion Policy Optimization (GDPO), thereby significantly bolstering the robustness and reliability of CPPS operations.


[96] 2509.09837

Remote Tracking with State-Dependent Sensing in Pull-Based Systems: A POMDP Framework

We consider real-time remote tracking of a Markov source observed by multiple heterogeneous sensors with state-dependent sensing accuracy, motivated by distributed camera networks with overlapping coverage and spatial blind spots. Upon commands from a remote sink, sensors transmit their observations over error-prone channels. We aim to minimize the long-term average of a weighted sum of goal-aware distortion and transmission costs. The problem is formulated as a partially observable Markov decision process (POMDP) and cast into an equivalent belief-MDP. To address the intractability of the infinite and continuous belief space, we develop a truncation-based method that yields a finite-state MDP which can be solved via standard methods such as relative value iteration. We further use a discounted reformulation to derive a theoretical lower bound for the optimal average cost, which is tightened via the incremental pruning algorithm (IPA) and also induces a comparison policy. Numerical results demonstrate that the performance of the proposed policy improves with the truncation depth at the expense of computational effort, and also outperforms low-complexity baselines across a wide range of system parameters. The results also reveal a switching-type structure of the truncation-based policy over the belief simplex and quantify the impact of key system parameters, highlighting the importance of accounting for state-dependent~sensing.


[97] 2601.00564

Fractional Programming for Kullback-Leibler Divergence in Hypothesis Testing

Maximizing the Kullback-Leibler divergence (KLD) is a fundamental problem in waveform design for active sensing and hypothesis testing, as it directly relates to the error exponent of detection probability. However, the associated optimization problem is highly nonconvex due to the intricate coupling of log-determinant and matrix trace terms. Existing solutions often suffer from high computational complexity, typically requiring matrix inversion at every iteration. In this paper, we propose a computationally efficient optimization framework based on fractional programming (FP). Our key idea is to reformulate the KLD maximization problem into a sequence of tractable quadratic subproblems using matrix FP. To further reduce complexity, we introduce a nonhomogeneous relaxation technique that replaces the costly linear system solver with a simple closed-form update, thereby reducing the per-iteration complexity to quadratic order. To compensate for the convergence speed trade-off caused by relaxation, we employ an acceleration method called STEM by interpreting the iterative scheme as a fixed-point mapping. The resulting algorithm achieves significantly faster convergence rates with low per-iteration cost. Numerical results demonstrate that our approach reduces the total runtime by orders of magnitude compared to a state-of-the-art benchmark. Finally, we apply the proposed framework to a multiple random access scenario and a joint integrated sensing and communication scenario, validating the efficacy of our framework in such applications.


[98] 2602.08924

Automating the Wildfire Detection and Scheduling Pipeline with Maneuverable Earth Observation Satellites

Wildfires are becoming increasingly frequent, with potentially devastating consequences, including loss of life, infrastructure destruction, and severe environmental damage. Low-Earth-orbit satellites equipped with onboard sensors can capture critical information relative to active wildfires and enable near-real-time detection through machine learning algorithms applied to the acquired data. We propose a framework that automates the complete wildfire detection and satellite scheduling pipeline, entitled the WildFire-applicable Intelligent and Responsive Ensemble for Detection and Scheduling (WildFIRE-DS). This paper develops an algorithm to realize the vision of the WildFIRE-DS as a proof of concept, integrating three key components: wildfire detection in satellite imagery, statistical updating that incorporates data from repeated flyovers, and multisatellite scheduling optimization. The algorithm enables wildfire detection using convolutional neural networks with sensor fusion techniques, incorporates subsequent flyover information via Bayesian statistics, and schedules a constellation of satellites using the state-of-the-art Reconfigurable Earth Observation Satellite Scheduling Problem. Simulated experiments conducted using real-world wildfire locations and the orbits of operational Earth observation satellites to demonstrate that this autonomous detection and scheduling approach effectively enhances wildfire monitoring capabilities.


[99] 2603.03965

Adaptive Modular Geometric Control of Robotic Manipulators

This paper develops an adaptive modular geometric control framework for robotic manipulators with uncertain inertial parameters. The manipulator is decomposed into rigid-body and joint modules, where each rigid-body module is represented by an Euler-Poincaré-type spatial dynamics on the Lie algebra se(3), and configuration errors are defined intrinsically through logarithmic maps on SE(3). The joint modules impose local screw constraints that relate adjacent body twists, accelerations, and transmitted wrenches, yielding a recursive propagation structure for the interconnected multibody system. Within this formulation, local geometric control laws are constructed at the module level, while the interconnection among modules is characterized by power-conjugate twist--wrench pairs induced by the natural duality pairing between the Lie algebra se(3) and its dual space se(3)^*. For the nominal case, exponential tracking stability of the interconnected system is established using local configuration energy functions on SE(3) and the power-preserving structure of the modular interconnection. To address inertial parametric uncertainty, a geometric adaptation law is introduced on the manifold of symmetric positive-definite matrices, ensuring physically consistent parameter estimates while retaining compatibility with the Lie-algebraic control formulation. Under the adaptive controller, semi-global uniform ultimate boundedness of the closed-loop tracking and parameter estimation errors is proven. Numerical simulations on a redundant high-inertia robotic manipulator demonstrate accurate pose tracking, smooth transient behavior, orientation regulation, and robustness under inertial perturbations. Comparative studies with state-of-the-art methods further illustrate the effectiveness of the proposed framework for complex robotic manipulation tasks.


[100] 2603.04438

CogGen: Cognitive-Load-Inspired Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction

Fully unsupervised deep generative modeling (FU-DGM) offers significant potential for compressively sampled magnetic resonance imaging (CS-MRI) reconstruction. Representative FU-DGM formulations, such as deep image prior (DIP) and implicit neural representation (INR), employ architectural bias to induce a low-dimensional manifold in the image space that aligns with the forward observation. However, as the underlying inverse system is highly ill-posed, prolonged iterative fitting in FU-DGM typically leads to poor efficiency and noise amplification. In this paper, guided by the cognitive principle of easy-to-hard learning, we propose CogGen, an FU-DGM framework that reformulates CS-MRI reconstruction as a staged inversion problem. Specifically, CogGen implements an self-paced curriculum learning (SPCL)-driven progressive scheduling strategy through an MRI-aware dual-threshold weighting criterion, which adaptively regulates k-space measurement participation. The data-consistency residual thresholding evaluates the fitting reliability of the current generator, while the k-space radius thresholding controls stage-wise measurement exposure, thereby avoiding uniform fitting throughout optimization. Theoretically, our analysis shows that, when early stages favor easy-to-fit measurements, CogGen yields a reduced local sufficient-iteration bound and a smaller cumulative noise-amplification bound, explaining the improved convergence behavior and reconstruction fidelity of CogGen within a finite iteration budget. Numerical experiments demonstrate that both CogGen instantiations, CogGen-DIP and CogGen-INR, achieve superior performance over prevailing CS-MRI reconstruction techniques, including unsupervised and supervised pipelines.


[101] 2603.19697

Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction

The goal of this paper is to provide a new perspective on audio-visual target speaker extraction (AV-TSE) by decoupling separation and target selection. Conventional AV-TSE systems typically integrate audio and visual features deeply to re-learn the entire separation process, which can act as a fidelity ceiling due to the noisy nature of in-the-wild audio-visual datasets. To address this, we propose Plug-and-Steer, which assigns high-fidelity separation to a frozen audio-only backbone and limits the role of the visual modality strictly to target selection. We introduce the Latent Steering Matrix (LSM), a minimalist linear transformation that re-routes latent features within the backbone to anchor the target speaker to a designated channel. Experiments across four representative architectures show that our method effectively preserves the acoustic priors of diverse backbones, achieving perceptual quality comparable to that of the original backbones. Audio samples are available at: this https URL


[102] 2604.17027

Trapping Regions for Quadratic Systems with Generalized Lossless Nonlinearities

We consider a class of quadratic systems, primarily motivated by incompressible fluid flows, where the nonlinearities are generalized lossless: they do not produce or dissipate energy, as measured by a generalized quadratic metric. Our goal is to compute trapping regions, which are forward invariant sets that certify ultimate boundedness. The key contribution is a novel parameterization of the generalized lossless condition that enables optimization of trapping regions for a broader class of quadratic systems. We also formulate the conditions for ellipsoidal trapping regions, whereas spherical regions have been the focus of prior works. We provide three numerical examples, which demonstrate the improvements offered by the proposed approach relative to existing methods.


[103] 2606.11766

Fast Speech Foundation Model Distillation Using Interleaved Stacking

Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. However, the training efficiency of SFM distillation remains underexplored. In this work, we explore training acceleration of SFM distillation to speed up model deployment. We examine the potential of stacking, in which the model depth is progressively increased through training until the target model depth is reached. While existing stacking methods improve training speed, they suffer from performance degradation. To handle this limitation, we propose interleaved stacking, a novel stacking method that consistently preserves layer position throughout the stacking process. This property is particularly critical in SFMs, in which each layer encodes distinct layer-specific knowledge. We validate the effectiveness of the proposed method on SUPERB.


[104] 2606.12327

From the Linear Quadratic Regulator (LQR) to the (Deterministic) Kalman Filter in Two Easy Steps

This note is a tutorial on the deterministic version of the Kalman filter (state estimator), which is formulated as finding the state trajectory consistent with the system's equations with the minimal amount of $L^2$ process and measurement uncertainty. As stated, this is an input signal design problem with linear dynamics and an objective that is affine-quadratic in the state and inputs. The first step is to convert this problem to one with a purely quadratic objective by embedding in a larger system using ``homogeneous coordinates''. This converts the problem to a purely quadratic (i.e. an LQR) problem, but with non-standard initial or final state constraints. This latter problem can then be solved using a version of the matrix Differential Riccati Equation (DRE) for the larger LQR problem. The second step is a partitioning of this larger problem, which then yields the optimal dynamic observer and the DRE of the traditional Kalman filter. For comparison, the solution of the traditional LQ-tracking (Servomechanism) problem is also treated using a similar construction.


[105] 2606.12899

LGVSC: A Large-Model-Driven Generative Video Semantic Communication Framework

Driven by the massive video transmission requirements in the Internet of Everything, semantic communication holds great promise for striking a balance between transmission efficiency and quality. This paper introduces a large-model-driven generative video semantic communication (LGVSC) framework, enabling efficient video semantic transmission under extremely low bandwidth conditions. First, by decoupling the encoder and decoder as well as exposing explicit intermediate semantic representations, LGVSC maintains interpretability, avoiding the black-box behavior commonly observed in end-to-end systems. Next, we introduce a new metric, i.e., the probability-based semantic similarity score (PSSS), which quantifies semantic similarity for complex modalities within a continuous range, allowing for more precise evaluation of semantic content. Building on PSSS, we propose a semantic-guided keyframe extraction module driven by a multimodal large model. This module can enhance fine-grained semantic consistency during keyframe selection at the transmitter, optimizing transmission bandwidth without compromising semantic fidelity. Additionally, we design a generative large-model-driven dynamic semantic-adaptive decoder at the receiver, which can adapt to videos of arbitrary lengths. Simulation results demonstrate that LGVSC significantly outperforms traditional schemes, achieving a channel bandwidth ratio on the order of $10^{-4}$ to $10^{-3}$, while maintaining strong zero-shot generalization across downstream tasks.


[106] 2606.13919

GMN4AD: Graph Matching Network for Alzheimer's Disease Diagnosis with Test-Time Domain Adaptation using Multi-centered Structure Magnetic Resonance Imaging

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that affects millions of older adults, with prevalence expected to rise significantly in the coming years. Early diagnosis, particularly during the mild cognitive impairment (MCI) stage, is critical for timely intervention. Structural Magnetic Resonance Imaging (sMRI) has emerged as a key modality for detecting AD-related brain changes, but traditional graph-based approaches often struggle with modality and inter-site heterogeneity, limiting diagnostic performance. In this paper, we propose Graph Matching Network for Alzheimer's Disease Diagnosis (GMN4AD), designed to model interactions between heterogeneous brain graphs derived from neuroimaging data. Unlike conventional methods that treat each brain graph independently, GMN4AD leverages graph matching to capture cross-graph relationships, enhancing diagnostic precision. Furthermore, we introduce a test-time domain adaptation strategy that combines contrastive learning to mitigate domain shifts during inference. Extensive experiments on three public AD datasets demonstrate that GMN4AD achieves superior performance compared to state-of-the-art methods, offering a robust and generalizable solution for AD diagnosis.


[107] 2606.15973

An auscultation location specific study on the relationship between expiratory-to-inspiratory acoustic patterns and spirometric airflow limitation across age and gender in asthmatic patients

Asthma causes expiratory airflow limitation and is clinically assessed using spirometry, which provides the FEV1/FVC ratio representing the proportion of air exhaled in the first second relative to total forced vital capacity. Prior studies suggest that respiratory sounds recorded at posterior sites (Left Lower, Left Upper, Right Upper, Right Lower) reflect regional airflow patterns. In this study, we investigate the relationship between the expiratory-to-inspiratory (E/I) spectral power ratio and FEV1/FVC in 141 participants aged 20-60 years using Spearman correlation across frequency subbands. The 100-200 Hz and 200-400 Hz bands showed significant correlations. Overall, lower posterior sites showed stronger associations; younger adults showed stronger correlations at the Left Lower site, whereas older adults showed stronger correlations at the Left Upper site. Gender-stratified analysis showed stronger Left Lower correlations in males and stronger Left Upper correlations in females.


[108] 2404.02687

Dynamic Resource Allocation with Karma: An Experimental Study

We perform a behavioral experiment of karma, a class of mechanisms for repeated resource allocation with attractive fairness and efficiency properties, in theory. Individuals in these mechanisms bid non-tradable credits that flow from resource consumers to yielders, like karma. Human subjects recruited on Amazon MTurk are repeatedly and randomly paired to bid karma according to time-varying and stochastic individual preferences or urgency to acquire resources. Treatments varied in the dynamic urgency process (frequent moderate urgency versus sporadic high urgency) and the richness of the bidding scheme (binary versus full range). Results are benchmarked against random allocation, and karma achieves a (almost) Pareto improvement over random, despite the MTurk subjects deviating significantly from the theoretically optimal Nash bidding policy. Maximum improvement is attained by subjects that deviate from Nash by up to one karma bid unit on average, and positive improvement is attained with average deviations of up to 3-4 bid units. These findings hold across all treatments, among which no significant differences are found, with the exception of the sporadic high urgency process with binary bidding treatment being (weakly) favorable over others. These results offer behaviorally robust lower bounds for the expected performance of karma in human populations. They also provide guidance for future testing and implementation of karma mechanisms in the real world.


[109] 2406.07435

Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

Image restoration networks are usually comprised of an encoder and a decoder, responsible for aggregating image content from noisy, distorted data and to restore clean, undistorted images, respectively. Data aggregation as well as high-resolution image generation both usually come at the risk of involving aliases, i.e.~standard architectures put their ability to reconstruct the model input in jeopardy to reach high PSNR values on validation data. The price to be paid is low model robustness. In this work, we show that simply providing alias-free paths in state-of-the-art reconstruction transformers supports improved model robustness at low costs on the restoration performance. We do so by proposing BOA-Restormer, a transformer-based image restoration model that executes downsampling and upsampling operations partly in the frequency domain to ensure alias-free paths along the entire model while potentially preserving all relevant high-frequency information.


[110] 2505.19937

ALAS: An Automatic Latent Alignment Score for Audio Language Models

Large Language Models (LLMs) are extended into Speech-LLMs, and the quality of the audio--text alignment they learn affects most downstream Spoken Language Understanding (SLU) behavior. Yet despite a growth of fusion strategies, there is no standard way to measure how well a Speech-LLM internally binds audio frames to text tokens. We introduce ALAS (Automatic Latent Alignment Score), a model and task-agnostic metric that probes the LLM's per-layer hidden states, scoring the cross-modal cosine similarity between audio and text representations against a Whisper-derived reference. ALAS needs only a frozen forward pass and an off-the-shelf ASR reference, with no training or fitted classifier, and is calibrated to an interpretable uniform baseline comparable across tasks. Applying ALAS to four open-source Speech-LLMs (AF3, Qwen2-Audio, Qwen-Omni, SALMONN) across emotion recognition (IEMOCAP), open-ended SQA (LibriSQA), and multi-choice audio understanding (MMAU-speech), we find that the depth and strength of alignment reflect each model's audio-encoder design and the acoustic-versus-semantic demands of the task, and that ALAS tracks but does not duplicate task accuracy, exposing models that score well without genuinely grounding in the audio. We release ALAS as an open-source library so that practitioners can probe their own Speech-LLMs or try it on new tasks.


[111] 2506.13127

Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement

In this paper, we propose an intra-set and inter-set recursive fusion framework with time-frequency calibrated knowledge distillation (I$^2$SRF-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully exploits the time-frequency differential information of speech while facilitating both local information focusing and global knowledge circulation. Firstly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through recursive fusion to form the fused feature set that enables inter-set knowledge interaction. Secondly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting, thus enabling refined allocation of distillation contributions across different layers according to speech characteristics. The proposed distillation strategy is applied to the dual-path dilated convolutional recurrent network (DPDCRN) that ranked first in the SE track of the L3DAS23 challenge. To evaluate the effectiveness of I$^2$SRF-TFCKD, we conduct experiments on both single-channel and multi-channel SE datasets. Objective evaluations demonstrate that the proposed KD strategy consistently and effectively improves the performance of the low-complexity student model and outperforms other distillation schemes.


[112] 2509.15626

LibriTTS-VI: A Public Corpus and Novel Methods for Efficient Voice Impression Control

Numerical voice impression (VI) control (e.g., scaling brightness) enables fine-grained control in text-to-speech (TTS). However, it faces two challenges: no public corpus and impression leakage, where reference audio biases synthesized voice away from the target VI. To address the first challenge, we introduce LibriTTS-VI, the first public VI corpus built on LibriTTS-R. For the second, we hypothesize a single reference causes leakage by entangling speaker identity and VI. To mitigate this, we propose 1) disentangled training with two utterances from the same speaker for speaker and VI conditioning, and 2) a reference-free method controlling the impression solely via target VI. Experimentally, our best method improves controllability: 11-dimensional VI mean squared error drops from 0.61 to 0.41 objectively and 1.15 to 0.92 subjectively. A comparison with a prompt-based TTS reveals imprecise numerical control and entanglement between VI and text semantics, which our methods overcome.


[113] 2509.26633

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.


[114] 2601.06862

Learning QoE from Packet-Level Measurements in Encrypted Video Conferencing Traffic

The quality of the user experience has become one of the most important aspects in todays world, as it directly influences individuals willingness to continue using or abandon a product or service. In this context, video conferencing applications (VCAs), which experienced widespread adoption following the COVID-19 pandemic, must deliver excellent performance to remain competitive in an increasingly crowded market. Although content providers (CPs) such as Zoom, WhatsApp, Telegram, and Google Meet can assess conversation quality by comparing transmitted and received data. The widespread use of end-to-end encryption in VCAs makes quality-of-experience (QoE) evaluation by internet service providers (ISPs) far more challenging. Since ISPs do not have access to the encrypted content, they must rely on passive measurements of unencrypted traffic characteristics on the data path. In this work, we present a simple yet effective QoE prediction framework based on an almost stock convolutional neural network (CNN) architecture that uses only the packet sizes extracted from the communication between two participants in a video conferencing (VC) call to predict two QoE metrics: BRISQUE and MOS. The proposed framework is simple, easy to implement, and does not require high-end computational resources, yet it provides superior prediction performance, as shown in our experiments on two custom datasets collected from WhatsApp and Zoom, which achieve substantial improvements over previous models for the QoE prediction task.


[115] 2602.15537

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling

Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on syllable-like units. However, methods like Sylber and SyllableLM rely on intricate multi-stage training pipelines. We propose ZeroSyl, a simple training-free method to extract syllable boundaries and embeddings directly from a frozen WavLM model. Using L2 norms of features in WavLM's intermediate layers, ZeroSyl achieves competitive syllable segmentation performance. The resulting segments are mean-pooled, discretized using K-means, and used to train a language model. ZeroSyl outperforms prior syllabic tokenizers across lexical, syntactic, and narrative benchmarks. Scaling experiments show that while finer-grained units are beneficial for lexical tasks, our discovered syllabic units exhibit better scaling behavior for syntactic modeling.


[116] 2602.22277

X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation

AI-native architectures are vital for 6G wireless communications. The black-box nature and high complexity of deep learning models employed in critical applications, such as channel estimation, limit their practical deployment. While perturbation-based eXplainable Artificial Intelligence (XAI) solutions offer input filtering, they often neglect internal structural optimization. We propose X-REFINE, an XAI-based framework for joint input-filtering and architecture fine-tuning. By utilizing a decomposition-based, sign-stabilized LRP epsilon rule, X-REFINE backpropagates predictions to derive high-resolution relevance scores for both subcarriers and hidden neurons. This enables a reliable optimization that identifies the most reliable model components. Simulation results demonstrate that X-REFINE achieves a superior performance-complexity-interpretability trade-off compared to the external perturbation-based XAI frameworks, significantly reducing computational complexity while maintaining robust bit error rate (BER) performance.


[117] 2604.06531

A Generalized Sinkhorn Algorithm for Mean-Field Schrödinger Bridge

The mean-field Schrödinger bridge (MFSB) problem concerns designing a minimum-effort controller that guides a diffusion process with nonlocal interaction to reach a given distribution from another by a fixed deadline. Unlike the standard Schrödinger bridge, the dynamical constraint for MFSB is the mean-field limit of a population of interacting agents with controls. It serves as a natural model for large-scale multi-agent systems. The MFSB is computationally challenging because the nonlocal interaction makes the problem nonconvex. We propose a generalization of the Hopf-Cole transform for MFSB and, building on it, design a Sinkhorn-type recursive algorithm to solve the associated system of integro-PDEs. Under mild assumptions on the interaction potential, we discuss convergence guarantees for the proposed algorithm. We present numerical examples with repulsive and attractive interactions to illustrate the theoretical contributions.


[118] 2605.14610

Parametrically Adaptive Transition Polynomial: a Signed-Parity Continuous-alpha Extension of Kunchenko Stochastic Polynomials

Kunchenko's method of polynomial maximization provides a semiparametric apparatus for parameter estimation under non-Gaussian errors, but its classical power basis relies on finite higher-order integer moments. This paper introduces the Parametrically Adaptive Transition Polynomial (PATP), a signed-parity fractional-power family controlled by a continuous parameter alpha in [0,1]. The quadratic exponent map p_i(alpha) connects the fractal regime p_i(0)=1/i, the degenerate linear point p_i(1/2)=1, and the signed-parity integer-power regime p_i(1)=i. For the degree-S=2 case we derive a closed-form variance-reduction coefficient g_2(alpha) in terms of signed and absolute fractional moments, identify the singular behavior at alpha=1/2, and state the moment and regularity conditions under which the formula is meaningful. The construction should be read as a Form-B PATP analogue within Kunchenko's generalized apparatus, not as an exact recovery of the canonical even-power PMM basis at alpha=1. Numerical illustrations on canonical distributions are used to examine the finite-sample behavior of the signed-parity estimator and to mark the boundary of applicability for extremely heavy-tailed cases such as Cauchy.