New articles on Electrical Engineering and Systems Science


[1] 2606.09893

Tractogram foundation model

Diffusion MRI (dMRI) tractography is the only noninvasive approach for mapping white-matter pathways in the living human brain. It represents each brain as a tractogram: a large, unordered set of three-dimensional streamlines that includes information about both local streamline geometry and whole-brain anatomical organization. This structure makes tractograms a natural but challenging target for representation learning. Existing methods treat streamline classification and subject-level prediction as separate problems: streamline classifiers focus on geometric patterns, whereas subject-level prediction often depends on hand-crafted features. As a result, current methods do not learn reusable representations that connect streamline anatomy with whole-brain inter-subject variation. Here we introduce TractFM, a tractogram foundation model that learns reusable representations directly from whole-brain streamline sets. TractFM combines a local streamline encoder with a permutation-equivariant tractogram encoder, allowing all streamlines from a subject to be contextualized jointly in a single forward pass. Pretraining on dense anatomical tract parcellation, i.e., assigning anatomical labels to individual streamlines, yields two complementary representations: contextualized streamline-level embeddings for tract parcellation and compact subject-level descriptors for downstream prediction of subject phenotypes. Across three tractography algorithms and five dMRI datasets, TractFM transfers to both streamline-level and subject-level tasks. Its frozen representations achieve accurate tract parcellation and predict age and sex across independent datasets. These results show that whole-brain geometric context, learned once, can generalize across tractography pipelines, datasets, and prediction tasks.


[2] 2606.09953

Deep Slice Interpolation for Reducing Through-Plane Anisotropy and Noise in Head CT

Head computed tomography (CT) typically uses sub-millimeter in-plane resolution but 2-5 mm through-plane spacing, creating substantial anisotropy that degrades multiplanar reconstructions, volumetric measurements such as hematoma volume estimation, and downstream algorithms that assume near-isotropic voxels. We present a deep learning system that synthesizes intermediate CT slices from pairs of neighboring axial slices, halving the effective through-plane spacing. The system improves three-dimensional visualization while simultaneously producing inherently denoised outputs, yielding two complementary benefits from a single inference pass. To build a reliable system, we systematically evaluate pixel-wise losses, namely mean squared error (MSE) and mean absolute error (L1); structural-similarity losses, namely the structural similarity index (SSIM) and its multi-scale variant (MS-SSIM); and hybrid combinations. On a held-out test set, all converged models outperform classical interpolation baselines and pretrained video frame interpolation methods (RIFE, FILM) on all structural measures, with MS-SSIM+L1 offering the strongest balanced profile. We also document training instability in SSIM-family losses and identify partial remedies: the standard numerical fixes eliminate the dominant failure mode but leave residual divergence at smaller batch sizes. All results are reported with patient-level bootstrap confidence intervals and paired statistical tests. As an illustration, we apply the system to an out-of-distribution head CT series from Hospital Universitario Virgen del Rocío: the model synthesizes intermediate slices and exhibits on the real slices the implicit-denoising signature predicted by our theoretical analysis, supporting in a single external case that interpolation quality and implicit denoising are not confined to the training distribution.


[3] 2606.10010

DeRA-MOS: Optimizing Text-to-Music Evaluation via Decoupled Listwise Ranking and Modality Alignment

Evaluating text-to-music (TTM) systems remains expensive because music impression (MI) and text alignment (TA) scores rely on human mean opinion scores (MOS). Most automatic MOS estimators are trained with point-wise regression or distributional classification. These objectives do not directly optimize rank-based metrics and provide weak geometric constraints for cross-modal coherence. To address these gaps, we propose DeRA-MOS, a decoupled optimization framework for TTM evaluation. For MI, we introduce a batch-aware listwise ranking loss that models relative order within each mini-batch and better aligns with evaluation based on Spearman's rank correlation coefficient (SRCC). For TA, we introduce a score-anchored modality alignment loss that maps human scores to target audio-text similarity and regularizes the latent space before fusion. By effectively mitigating the point-wise training mismatch and modality drift, experiments on MusicEval demonstrate that our decoupled framework yields substantial improvements in both MI and TA ranking metrics, establishing a robust paradigm for large-scale TTM evaluation.


[4] 2606.10045

A constrained symbolic regression approach for Lyapunov function discovery

In this paper, we consider the data-driven discovery of Lyapunov functions for autonomous dynamical systems. We represent the Lyapunov function as an expression tree of fixed depth and formulate the Lyapunov discovery task as a constrained self-supervised symbolic regression problem. The constraints model the output of the Lyapunov function for a given input as well as the Lyapunov stability conditions. This modeling approach makes no a priori assumptions about the functional form of the Lyapunov function, is inherently interpretable since the function is obtained in a symbolic form, and, in principle, can be applied to any continuous dynamical system. We also develop a tailored branch-and-bound-and-check solution approach to efficiently solve the resulting learning task. Applications to several case studies show the ability of the proposed approach to discover Lyapunov functions.


[5] 2606.10048

Human Walking Sensing and Pose Estimation in the 6 GHz Band Using Amplitude and Phase CSI

This paper investigates human pose estimation from Orthogonal Frequency-Division Multiplexing (OFDM) signals in an indoor multistatic wireless network operating in the 6 GHz band. We design and validate a processing pipeline that exploits both the amplitude and phase of the Channel State Information (CSI) from multiple radio links to estimate the human body pose. Four deep learning architectures from the literature, namely DT-Pose, MetaFi++, HPE-Li, and VST-Pose, are adapted to the OFDM CSI structure and extended to jointly exploit the amplitude and phase information. The models estimate the pose of a human walking within the network coverage area. Performance evaluation is conducted on an open-access dataset using standard pose-estimation metrics such as Procrustes-aligned Mean Per-Joint Position Error (PA-MPJPE) and Bone Length Loss (BLL). Results indicate that reliable human pose reconstruction can be achieved from 6 GHz OFDM CSI measurements, with DT-Pose providing the best overall accuracy. On average, amplitude-only CSI yields performance comparable to joint amplitude-phase processing, whereas phase information is more beneficial as a complementary feature rather than as a standalone input.


[6] 2606.10157

An Algebraic State Observer for a Self-Sensing Active Magnetic Bearing System

The problem of designing a globally stable observer for a self-sensing active magnetic bearing system assuming only measurements of currents and voltages is addressed in this paper. Towards this end, we first design a radically different, high performance, state observer, which is obtained invoking novel techniques. Indeed, our objective is to obtain an algebraic relation between the unmeasurable part of the state and filtered versions of the systems inputs and outputs, which holds for all times. Then, using this algebraic observer, we propose a robust asymptotic version of the observer. Simulation results that illustrate the performance of the observer are also presented.


[7] 2606.10164

Curved Beam Enabled Wireless Communications: Modeling, Analysis and Optimization

In this paper, the problem of using curved beams to improve wireless communication performance in the presence of a blockage is studied. In particular, a transmitter equipped with a continuous aperture array can generate curved beams to serve multiple receivers by allowing signals to propagate along both straight and curved paths. To optimize the weighted sum-rate, a curved beam model is developed for controlling the beam steering, beam focusing, and beam curving functions, along with a segmented channel model to characterize practical channels induced by the blockage. Based on the introduced curved beam model, an optimization problem is posed with the goal of maximizing the weighted sum-rate of all users under a transmit power budget and physical constraints of curved beams. To solve this problem, the continuous aperture is first converted into finite summations via a discrete sampling of the continuous coordinate. Then, the performance gap between the ideal continuous aperture design and its practical discrete aperture approximation is analyzed. Based on the above discrete approximation, an iterative algorithm is developed to optimize curved beam control parameters. In particular, the original problem is reformulated as a trackable form via fractional programming (FP). Then, the transformed problem is solved by designing an enhanced block coordinate ascent (BCA) method which determines a surrogate-construction point leveraging the local descent from previous iterations, thereby accelerating convergence. Then, a proximal regularization term is included into the surrogate function to control the update magnitude and suppress aggressive update, thereby improving updates stability. Finally, the beam amplitudes are computed based on the effective channel gains. Simulation results show that the proposed method can improve the weighted sum-rate compared to using only straight beam.


[8] 2606.10190

Optimal Illumination via Joint Movement and Phase Optimization for Movable Antenna-RIS Configuration

Reconfigurable intelligent surfaces (RIS) enable programmable control of wireless propagation but remain vulnerable to persistent deep fades in static deployments. This paper introduces a Movable Antenna-enhanced RIS (MA-RIS) architecture where antenna elements physically reposition to sample independent spatial channels, enabling mobility-induced diversity. We model antenna motion using a Stochastic Differential Equation (SDE) framework capturing controlled drift and environmental diffusion. It^o calculus-based analysis characterizes steady-state antenna distributions, spatial decorrelation, and outage probability, revealing fundamental trade-offs between control strength and mobility randomness. To maximize long-term SNR while accounting for control overhead, we propose an overhead-aware Two-timescale framework separating slow antenna trajectory control from fast phase adaptation. The stochastic optimal control problem is solved via predictive approximation of the Hamilton-Jacobi-Bellman (HJB) formulation, enabling real-time implementation. Simulations validate theoretical predictions: the Two-timescale strategy achieves up to 36 dB steady-state SNR with remarkable stability, outperforming position-only control by up to 15 dB and uncontrolled baselines by over 30 dB. Despite experiencing a lower SNR than Active RIS, the proposed approach delivers up to 16 times higher energy efficiency (EE) across varying system scales, establishing a new paradigm of mobility-enabled channel adaptation for resilient wireless systems.


[9] 2606.10201

Game-Theoretic Area Coverage Control with Cooperative-Adversarial Multi-Agent Systems

We formulate a multi-agent area coverage control problem as a two-player zero-sum game between two agent groups with conflicting goals. Conventional coverage control allocates resources based on an environmental risk density field. In contrast, we generalize this metric by allowing a second group of adversarial agents to generate the spatial risk field. Coupled agent dynamics are linked through the area coverage metric, which functions as the game reward. This framework induces coupled gradient-descent-ascent controllers for the groups. Analysis of a low-dimensional case reveals a Hopf bifurcation dictated by the ratio of the groups' control gains. In the regime dominated by adversarial agents, the system is driven into a periodic chase-evasion cycle. In the regime dominated by ordinary agents, the system converges to a fixed configuration. Numerical simulations validate these theoretical insights. Finally, we characterize the Nash equilibrium conditions. Under this equilibrium, ordinary agents converge to a generalized centroidal Voronoi tessellation, whereas adversarial agents settle at their corresponding equilibrium centroids.


[10] 2606.10231

LLM can Read Spectrogram: Encoder-free Speech-Language Modeling

Recent speech-aware large language models (Speech-LLMs) rely on a pre-trained speech encoder to convert audio into semantic-rich representations consumable by LLM. In this work, instead, we explore: can an LLM learn to read Mel spectrogram directly without a dedicated speech encoder? We propose Mel-LLM, an encoder-free Speech-LLM that feeds lightly pre-processed Mel spectrogram patches directly into the LLM through a linear projection, allowing the LLM to learn speech-text alignment purely through its own parameters. We conduct extensive experiments on both automatic speech recognition (ASR) and text-to-speech (TTS) tasks. For ASR, we evaluate on the OpenASR leaderboard public sets and production-level scaling experiments, demonstrating that the encoder-free solution achieves competitive performance with only limited degradation compared to encoder-initialized counterparts. We find that when data is limited, initialization from a multimodal checkpoint (Phi-4-MM) is crucial for maintaining performance. We also present ablation studies revealing which LLM layers are less relevant to speech encoding. For TTS, we show preliminary results with a next-token VAE approach. While TTS performance is not yet optimal, these results establish the feasibility of a fully unified encoder-free architecture for autoregressive speech-text modeling.


[11] 2606.10233

ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refinement for Joint Multi-Resolution Speech Quality Modeling

While speech quality is typically assessed on complete utterances, streaming and generative systems require incremental estimation from partial audio. Existing predictors assume full context, degrading on prefix-constrained inputs. Extending ARECHO, we propose ANCHOR, reformulating incremental assessment as a multi-resolution autoregressive task. It models chunk- and utterance-level quality within a single decoder using dual-resolution tokens and a resolution-aware hierarchy for coarse-to-fine refinement. Experiments show substantial robustness under partial input, including a 48% PLCMOS error reduction on 2-second prefixes. Convergence analysis reveals a 4-6 s effective perceptual context horizon. A stress test further isolates structured extrapolation biases under localized corruption. Results demonstrate that hierarchical supervision improves incremental prediction and elucidates how perceptual quality accumulates over time.


[12] 2606.10240

Laplace-Mixture Dipole Inversion for Quantitative Susceptibility Mapping

Purpose: To develop an automatic dipole inversion method for quantitative susceptibility mapping (QSM) that preserves fine anatomical structures without the need for manual regularization-parameter tuning. Theory: The original approximate message passing with parameter estimation (AMP-PE) framework models image gradients with a single Laplace prior, which does not fully capture the heavy-tailed gradient distribution of brain susceptibility maps. This prior mismatch can lead to over-regularization and blocky reconstructions. We address this limitation by modeling the gradients with a two-component Laplace mixture prior. Methods: We propose a Laplace-Mixture Dipole Inversion (LAMDI) method by incorporating a two-component Laplace mixture prior into the AMP-PE framework with automatic parameter estimation. LAMDI was evaluated on a public in vivo dataset. Its performance was compared with FANSI, MEDI, and AMP-PE with a single-Laplace prior (AMP-PE-L1) under both standard default and reference-tuned settings. Results: On a public multi-orientation QSM dataset, LAMDI achieved NRMSE and SSIM comparable to AMP-PE-L1 while substantially reducing HFEN, suggesting improved preservation of high-frequency anatomical detail. Under reference-based tuning, FANSI and MEDI achieved the best performance for some metrics, but LAMDI remained competitive without requiring reference maps or manual regularization tuning. Conclusion: LAMDI provides an effective and automatic parameter-estimation alternative for QSM dipole inversion by combining competitive reconstruction accuracy with improved preservation of fine anatomical detail.


[13] 2606.10255

POPSICLE: Benchmark Datasets for Segmentation and Localization in CryoET

Cryo-electron tomography (cryoET) has emerged as a powerful tool in structural and cellular biology by enabling direct visualization of macromolecular structures within intact cells, thereby linking molecular architecture to cellular organization in a native context. Realizing the full potential of cryoET, however, increasingly depends on advances in computational analysis, particularly machine learning (ML), to interpret its complex and information-rich data. Despite rapid progress, ML development for cryoET remains bottlenecked by the lack of standardized, well-annotated benchmarks. Existing evaluations are typically small, task-specific, and are assembled in isolation, limiting robust comparisons across methods. Here, we present POPSICLE, a benchmark suite for cryoET segmentation and macromolecular localization built from the CryoET Data Portal - an open, ML-ready repository of tomographic data, metadata, and annotations. POPSICLE spans eukaryotic and prokaryotic systems, both purified and fully in situ samples, and dense voxel-wise segmentation as well as sparse localization tasks. Built on a living data resource, it can expand as new datasets and annotations become available. Baseline experiments reveal substantial variation in model rankings across tasks, underscoring the need for benchmarks tailored to the unique characteristics of cryoET rather than evaluation practices adapted from adjacent biomedical imaging domains. POPSICLE thus provides an open and extensible foundation for reproducible ML evaluation in cryoET.


[14] 2606.10280

Overlapped Wavelet Diffusion for Low-Light Image Enhancement

In this study, we propose an overlapped wavelet diffusion framework for Low-Light Image Enhancement (LLIE), which incorporates two complementary components to achieve blocking artifact-free and detail-preserving enhancement. Although recent diffusion-based LLIE methods have demonstrated remarkable performance compared with traditional approaches, DiffLL still suffers from blocking artifacts caused by the Haar Wavelet Transform (WT) and blurred edges or over-smoothed textures due to the limitations of its High-Frequency Restoration Module (HFRM). To overcome these issues, we introduce an Overlapped WT (OWT) that incorporates correlations across neighboring regions, thereby structurally preventing blocking artifacts. Furthermore, we integrate a low-frequency-guided High-Frequency Enhance Block (HFEBlock) to strengthen detail recovery, yielding sharper edges and more reliable textures. Extensive experiments on the LOLv1 and LOLv2-real datasets demonstrate that our framework, termed OWDiff, consistently outperforms existing LLIE methods both qualitatively and quantitatively, achieving superior visual quality while maintaining computational efficiency. OWDiff effectively addresses the structural limitations of the Haar WT and the HFRM, achieving an average PSNR gain of 0.58 dB, along with a 1.64% relative improvement in SSIM and a 5.9% relative reduction in LPIPS, compared to DiffLL across both the LOLv1 and LOLv2-real datasets.


[15] 2606.10301

Fundamentals of NOMA in Low-Earth Orbit Coordinated Multi-Satellite Networks

Coordinated multi-satellite (CoMS) transmission and non-orthogonal multiple access (NOMA) are envisioned to jointly enhance coverage, capacity, and spectrum efficiency for satellite networks. Their integration into a unified CoMS-NOMA framework will allow more efficient, reliable, and energy-efficient multi-user access. This paper investigates the downlink performance of CoMS-NOMA networks from a system-level perspective, in which multiple satellites cooperatively serve multiple users via NOMA. Leveraging tools from stochastic geometry, related angles and distances in CoMS-NOMA are first derived as intermediate results. Then, we obtain the combined signal power distributions and analyze coverage and spectrum performance under both inter- and intra-satellite interference, accounting for potential imperfect successive interference cancellation (SIC). The analytical model is validated across a range of system parameters, including the number of satellites, service region angle, error-propagation factor, and power allocation coefficients. Numerical results indicate that increasing the number of cooperative satellites does not always improve coverage and spectrum efficiency. Additionally, while a higher main-lobe gain improves coverage, a near-perfect SIC provides only slightly greater benefits than a reasonably good SIC. With properly selected power allocation coefficients, CoMS-NOMA achieves up to a 270% improvement in coverage and a 56% gain in sum spectral efficiency, compared with conventional orthogonal and single-satellite schemes, indicating potential for green, energy-efficient satellite networking.


[16] 2606.10308

On Time-Delay Compensators for Delayed-Output Systems

This paper advances the practical utility of functional observer theory by addressing sensing latency in linear time-delay systems. We address the estimation of the functional $z(t)=Fx(t)$ in cases where the measurement delay $h$ is independent of the internal state delay $\tau$, with a specific focus on the condition $0 < h < \tau$. To compensate for sensing lags, we propose a functional observer structure characterized by multiple internal delays and an augmented architecture. Algebraic existence conditions are established alongside a constructive synthesis procedure. By incorporating an additional delayed measurement vector, we demonstrate that this approach significantly expands the design space and is applicable to a wider class of systems with larger state and output delays.


[17] 2606.10317

SSL-GMMVC: Interpretable Voice Conversion via Locally Linear GMM Transforms in Self-Supervised Representation Space

We introduce SSL-GMMVC, an interpretable voice conversion method in self-supervised speech space. The method models paired source-target features with a Gaussian mixture model and performs conversion as a posterior-weighted sum of affine transforms. This yields locally linear transformations that adapt to heterogeneous feature-space structure while remaining analytically tractable. Through objective and subjective evaluations, we show that SSL-GMMVC improves speaker similarity with comparable intelligibility and naturalness, and that even a constrained covariance variant surpasses a deep learning baseline as the number of mixture components increases. Further analyses link component selection to phonetic structure and reveal interpretable scaling and rotation in the learned transforms. These findings highlight SSL-GMMVC as an effective, analyzable framework for voice conversion.


[18] 2606.10426

Dynamic Optimization of Virtual Inertia and Damping in Converter-Based Power Systems

The transition towards a sustainable power system is enabled by the replacement of conventional synchronous generators with converter-interfaced renewable energy sources. However, the resulting loss of rotational inertia and governor damping causes significant frequency deviations and can therefore cause instability. The focus of this paper is the optimal allocation of virtual inertia and damping in the power system activated by established converter control schemes. To this end, we propose a novel dynamic optimization algorithm that considers performance metrics for system stability, cost-efficiency, and resilience. In addition, our algorithm considers the magnitudes and locations of disturbances in the power system for the optimal allocation. Finally, we validate our approach on a three-area system and also compare our results with a $\mathcal{H}_2$ system-norm-based allocation approach.


[19] 2606.10454

Entropy-Aware Domain-Routed Mixture-of-Experts Speech-LLM Framework: A Case Study of Multi-Domain Child-Adult ASR

While Speech Large Language Models (Speech-LLMs) have achieved strong performance on adult Automatic Speech Recognition (ASR), their effectiveness on child speech remains under-explored, and single models often struggle to handle diverse adult and child age groups simultaneously. This paper proposes a Mixture-of-Experts (MoE) Speech-LLM for unified ASR across adult and child speech spanning diverse environments and age groups. The framework employs a Classifier-based Domain Router (C-DR) with a coarse-to-fine strategy and integrates both a Mixture-of-Projectors (MoP) and a Mixture-of-LoRAs (MoL) to model domain-specific variations. To address routing uncertainty near domain boundaries, an Entropy-Aware Routing (EAR) mechanism is introduced to dynamically incorporate a shared expert. Experiments on public child corpora demonstrate consistent improvements over baselines while preserving adult ASR performance. To our knowledge, this is the first work leveraging Speech-LLMs for unified, multi-domain ASR encompassing both children and adults.


[20] 2606.10464

GC-LoRA: Gated Convolutional LoRA for Parameter-Efficient Acoustic Adaptation

Transformer-based Speech Foundation Models excel in most Automatic Speech Recognition tasks but often suffer performance degradation when applied to domains with mismatched acoustic characteristics. While Parameter Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), adjust global attention, they lack the local context modeling crucial for capturing domain-specific variations. We propose GC-LoRA, a novel adapter architecture that injects Conformer-style local convolutional processing into pretrained Transformer encoders. By integrating a lightweight adapter to encoder attention output projections, our method efficiently captures local acoustic dependencies without disrupting pretrained global representations. Experiments across diverse datasets (acoustically-degraded, bandlimited, dialectal, child) demonstrate the efficacy of our approach, achieving Word Error Rate (WER) reductions of up to 10.9% compared to baselines while adding minimal trainable parameters.


[21] 2606.10511

Simplified Temporal Convolutional-Based Channel Estimation for a WiFi Vehicular Communication Channel

Channel estimation in vehicular communication is a crucial element in the advancement of intelligent transportation systems. However, the use of pilot signals in the IEEE 802.11p standard is insufficient for accurate channel estimation in high-mobility scenarios. Data pilot-aided (DPA) estimation helps address this, but suffers from demapping errors. We propose a simplified Temporal Convolutional Network-based estimator (DPA-TCN) trained on a mixed signal-to-noise ratio dataset to improve estimation performance and reduce computational complexity. Our DPA-TCN estimator achieves a bit error rate comparable to a state-of-the-art long-short-term memory network with DPA and temporal averaging (LSTM-DPA-TA) while reducing the complexity of the model by approximately 65%.


[22] 2606.10539

Backstepping Control of Multidimensional Coupled First-Order Hyperbolic PDEs with Collinear Velocities

This paper addresses the backstepping boundary stabilization of coupled multidimensional first-order hyperbolic systems. We consider systems whose transport velocity fields are collinear, meaning that each velocity field is a scalar multiple of a common base velocity field. Building upon a recent framework developed for scalar multidimensional first-order hyperbolic equations, we introduce a change of variables, based on characteristic curves defined entirely in the spatial domain, that converts the original multidimensional system into a continuum of coupled one-dimensional first-order hyperbolic systems. By designing a backstepping controller for each system in the continuum representation, and assuming that the transit times of the characteristic curves are uniformly bounded, we achieve finite-time stabilization of the multidimensional system.


[23] 2606.10540

Complex VAE with Heavy-Tailed Likelihood for Radar Target Detection in Sea Clutter

To address the heavy-tailed, spike-prone nature of sea clutter and the scarcity of labeled target data, an unsupervised complex-valued variational autoencoder (VAE) for maritime radar target detection is proposed. In implementation, each complex baseband slow-time sequence is represented by its in-phase and quadrature components, and the model learns their joint reconstruction from clutter-only data. A Student-\(t\) negative log-likelihood is adopted to capture heavy-tailed reconstruction errors while reducing sensitivity to outliers during clutter learning. In addition, a time-domain amplitude error constraint is introduced to penalize slow-time magnitude mismatch in the reconstruction. At inference, reconstruction deviation is used as the detection statistic, and the decision threshold is set via an empirical quantile estimated from a clutter-only validation set to enforce a constant false-alarm rate (CFAR). Experiments on measured sea-clutter data show that detection performance is consistently improved over MF, AMF, and a real-valued \(\beta\)-VAE under CFAR constraints.


[24] 2606.10547

Unsupervised Deep Learning for Limited-Angle STEM-EDX Tomography -- Application to 3D Chemical Analysis of Phase-Change Memory Devices

Energy Dispersive X-ray (EDX) tomography in Scanning Transmission Electron Microscopy (STEM) enables 3D compositional and elemental mapping at the nanoscale, but its use is limited by restricted tilt ranges and low-dose conditions required to avoid beam damage. Limited-angle acquisition introduces missing-wedge artefacts such as elongation and anisotropic resolution, while noisy low-dose data further degrade reconstruction quality and quantitative reliability. Here, we introduce an unsupervised deep learning framework based on Deep Image Prior with total variation regularization (DIP-TV) for limited-angle STEM-EDX tomography. We extend it to a multi-channel formulation (DIPm-TV) that jointly reconstructs multiple elemental maps by exploiting spatial correlations. Using a synthetic 3-channel phantom, we show that the method compensates for severe missing-wedge artefacts corresponding to approximately $100^\circ$ of missing angular range under moderate noise, outperforming simultaneous iterative reconstruction technique and compressed sensing approaches. We apply the method to 3D chemical analysis of Ge-Sb-Te (GST) memory devices in virgin (as-fabricated) and SET (crystalline) operational states. Samples were prepared as cross-sectional focused ion beam lamellae and acquired under a limited-angle tilt range from $-40^\circ$ to $+40^\circ$ with $5^\circ$ steps and a dose of $2.0\times10^5$ $e^-/Ang^2$. The multi-channel approach enables voxel-by-voxel elemental reconstruction using only EDX signals without external structural priors such as high-angle annular dark-field imaging. The reconstructed volumes show near-isotropic spatial resolution and reveal compositional heterogeneities associated with device operation. This approach enables 3D chemical characterization in experimentally accessible sample geometries where conventional methods fail due to severe angular limitations.


[25] 2606.10589

Transient Stability of Offshore Energy Hubs

Offshore energy hubs (OEHs) use grid-forming modular multilevel converters (MMCs) to enable large-scale offshore wind integration and multi-terminal HVDC operation. In HVDC-connected offshore wind farms and OEHs, the offshore grid-forming HVDC converters absorb active power from an offshore AC grid supplied by the wind farms and convert it to DC power for transmission to the onshore grid. Converter current limiting under different fault types in this setting is an understudied topic in the literature, which mostly focuses on power-injecting converters. This paper proposes a unified current-limiting strategy that combines a variable virtual impedance (VVI), based on a smooth threshold function, with a novel virtual-power (VP) mechanism derived from the power dissipated in the virtual resistance. The VVI ensures current limitation during fault-induced overcurrents while preserving voltage-source behavior, whereas the VP mechanism adds a compensating power term into the synchronization loop, enabling automatic power redistribution among converters. P-delta analysis further shows that a more resistive VVI can improve the transient stability of power-absorbing converters, while the proposed VP mechanism further enlarges the stability margin. EMT simulations validate that the combined VVI-VP strategy limits fault currents, maintains synchronism during severe faults, and achieves coordinated post-fault power sharing in fully converter-based OEHs.


[26] 2606.10600

Toward Proactive RF Charging Scheduling: Generative AI for Decision Support

Radio frequency wireless power transfer (RF-WPT) is an enabling technology for supporting uninterrupted communications in future Internet of Things systems by reducing the need for battery replacement and mitigating battery-waste-related issues. For large-scale RF-WPT deployment, one of the main challenges is the scheduler-level resource allocation. Specifically, the transmitter must decide how much energy to deliver, when, and to whom, under limited charging resources, incomplete receiver-side information, and uncertain near-future charging conditions. This article positions generative artificial intelligence (GenAI) as a promising tool for this setting because it can foresee multiple plausible charging scenarios conditioned on coarse operational context and receiver-side information. We propose GenAI to act as an uncertainty-aware support layer for the RF-WPT scheduler rather than as a standalone forecasting or decision-making tool. To this end, we first revisit the main challenges of RF-WPT scheduling, and discuss how major GenAI families can support uncertainty-aware charging decisions by generating scenario-based inputs for downstream tasks. We then present a warehouse-style case study showing that preserving uncertainty through the sampling capability of generative models can improve robust charging decisions compared with deterministic prediction and simple non-learning baselines, especially under risk-sensitive objectives. Finally, we identify key open challenges and present some directions for future research.


[27] 2606.10713

++nnU-Net: Scaling nnU-Net with Prefix-Based Data Augmentation

The nnU-Net has demonstrated continuous success in medical segmentation tasks, which heavily rely on the availability and diversity of annotated biomedical data. However, assembling medical imaging cohorts remains challenging due to numerous factors such as privacy regulations and annotation costs. As a result, data augmentation plays a crucial role in increasing data availability while maintaining anatomical feasibility. Hence, we propose the ++nnU-Net, a novel data augmentation module based on image registration that operates prior to preprocessing and training take place. Our framework was evaluated across five different 2D datasets. In this workflow, image data go through a two-stage registration process, generating new warped images. The transformations are then applied to the respective segmentation. In addition, the pipeline computes available disk space, generates supplementary binary synthetic masks and generates checkpoints. We demonstrate that the ++nnU-Net outperforms the nnU-Net baseline, yielding improvements in Dice Similarity Coefficient scores. In the most prominent cases, we observe performance gains of approximately 22\%. These findings highlight the effectiveness of registration-based data augmentation, particularly for 2D medical imaging datasets and suggest that the ++nnU-Net provides a practical and scalable approach for enhancing segmentation performance in data-limited settings. The source code for the ++nnU-Net is available at: this https URL


[28] 2606.10738

Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA Encoding

Recent multimodal large language models mainly process audio as monaural signals, thereby discarding the spatial cues contained in spatial audio for sound localization, spatial relation reasoning, and spatial scene understanding. We propose Spatial-Omni, a lightweight method that implements SO-Encoder to inject First-Order Ambisonics (FOA) spatial audio into existing Omni LLMs as an independent modality, without modifying their original audio encoders. SO-Encoder provides spatial tokens with limited additional context cost and improves spatial audio understanding through efficient staged training. To support training and evaluation, we construct SO-Dataset, SO-QA, and SO-Bench from open-source data, real recordings, and simulations, containing 400K FOA spatial audio clips and 2.1M spatial question answering pairs. SO-Bench covers 16 spatial audio understanding subtasks, including basic detection and location estimation, spatial relation understanding, and complex spatial reasoning. Experiments show that Spatial-Omni outperforms existing open-source Large Audio-Language Models (LALMs) and Omni LLM models on spatial audio understanding tasks while retaining a reasonable level of general audio understanding. Code and data are available at this https URL.


[29] 2606.10758

Anchoring the Unknown: Open-Set Model Attribution via Proxy-Anchor Learning

The proliferation of text-to-speech (TTS) systems capable of generating realistic synthetic speech poses growing challenges for audio forensics. While binary deepfake detection has received considerable attention, source tracing (i.e., identifying which TTS system produced a given audio sample) remains underexplored, particularly in open-set scenarios where unknown systems may be encountered. We propose a metric learning framework based on the Proxy-Anchor loss function that operates on Wav2Vec2-BERT embeddings to learn a discriminative embedding space for TTS source attribution and out-of-distribution (OOD) detection of unseen systems. We evaluate it on the MLAAD v9 dataset spanning 140 TTS systems across 51 languages, and introduce an architecture merging strategy that groups TTS system versions into unified classes, reducing inter-class confusion. Our system achieves 99.76% accuracy on 110 in-distribution classes and a False Positive Rate (FPR@95) as low as 2.04% for OOD detection. Also, for a fair comparison against the current state of the art, we further evaluate it on the MLAAD v5 official dataset splits, improving the OOD accuracy by almost doubling it. These results demonstrate that Proxy-Anchor metric learning, combined with architecture-aware class design and post-hoc OOD scoring, provides an effective framework for forensic TTS source tracing in both closed-set and open-set settings.


[30] 2606.10781

Recovering the Zipfian Distribution in Unsupervised Term Discovery

Unsupervised term discovery involves segmenting unlabelled speech into word- or syllable-like units and clustering these into a lexicon of candidate types. True lexicons follow a Zipfian distribution, yet the dominant centre-based clustering approach -- K-means -- produces a more uniform distribution due to an inductive bias toward spherical clusters. In this paper we revisit graph-based clustering as a bottom-up alternative, where segment embeddings are connected by pairwise similarity and partitioned using the Leiden algorithm. We show that graph clustering substantially outperforms centre-based approaches (K-means, GMM, BIRCH) in both word- and syllable-level lexicon discovery across three languages, producing more Zipf-like distributions. Another bottom-up approach, agglomerative clustering with average linkage, also performs well, although it is computationally less efficient and allows for less control over the resulting distribution. Our work calls into question the dominance of centre-based clustering for term discovery, and promotes graph clustering as an attractive alternative.


[31] 2606.10838

Towards Deep Contextual Reasoning from Broad Descriptions for ASR with Speech-LLM via Metadata-Driven Reasoning Chains

Speech recognition often fails on rare, domain-specific terms and context-related named entities. Existing contextualization techniques typically bias decoding with keywords or phrase lists, which does not scale well or exploit deeper knowledge. We propose a training method that teaches a speech-LLM to use broad descriptions (e.g. from videos) as weak semantic priors to perform contextual reasoning grounded in the audio. We build 400 hours of reasoning-augmented speech data by pairing erroneous hypotheses with video metadata and LLM-generated reasoning explanations that justify context-driven corrections. We finetune the speech-LLM to perform chain-of-thought reasoning: generate an initial transcript, then reason over the context, and finally return a corrected transcript. On held-out YouTube-derived test sets, our approach reduces errors, with specific improvements on rare words and named entities, and lays groundwork for deeper contextual reasoning in speech recognition.


[32] 2606.10853

Speech Encoder Fusion for LLM-based Automatic Speech Recognition

Speech-aware large language models (LLMs) can incorporate speech through pre-trained acoustic encoders that project speech features into the LLM embedding space. While the choice of the speech encoder critically influences performance, different encoders often exhibit complementary strengths, motivating their combination. In this work, we investigate whether fusing multiple pre-trained speech encoders can enhance speech-aware LLMs for automatic speech recognition (ASR). We explore several fusion strategies beyond simple feature concatenation, including learned combinations and Transformer-based fusion architectures, and evaluate them across mono- and multilingual ASR settings as well as diarized speech recognition. Our results indicate that carefully fusing multiple parallel speech encoders improves downstream performance in all scenarios with limited computational overhead.


[33] 2606.10864

Phoneme-First Prediction for LLM-Based Speech Recognition

Recent research has explored integrating Large Language Models (LLMs) with speech encoders to create speech-augmented LLMs capable of contextualized speech recognition. The main challenge lies in aligning the semantic embeddings of LLMs with the acoustic representations of speech encoders. We propose a novel approach that teaches the LLM to first predict phonemes from the speech features before generating the final transcript. By integrating a phoneme prediction step directly into the LLM, the model develops a fine-grained knowledge of pronunciation, reducing acoustic confusion and improving transcription accuracy and explainability. Our method is cheap and simple, as phoneme targets can be automatically derived from existing transcripts. Through comprehensive experiments, we show that intermediate phoneme prediction can improve speech recognition, particularly in low-resource settings, and yields outputs that are acoustically more faithful to the speech.


[34] 2606.10869

Information Bottleneck Meets Quantization: Finite Rate Analysis and Optimal Designs

The Information Bottleneck (IB) is a well established framework that looks for a latent compact representation of a data source, by trading rate and data-size representation, for information accuracy with respect to another target data. The Gaussian IB (GIB) is its simple closed form solution, when the target is jointly Gaussian with the source. Actually, in many practical problems the latent representation has to be stored or represented by a finite number of bits, while the optimal (G)IB solution has not. First, this manuscript theoretically analyzes the effect of scalar and vector quantization of the GIB latent representation, and its impact on the (dis)informativeness with respect to the target data. Then, task-oriented quantization designs are proposed by (jointly) reformulating the GIB optimization problem under a finite-rate constraint on the latent representation. Simulation results on MMSE regression problems confirm the effectiveness of the proposed quantization designs, which show significant gains with respect to more heuristic, or separate, quantization designs of the standard GIB latent representation. Finally, the paper extends the task-oriented philosophy to non-Gaussian settings, by properly modifying the cost function used in variational auto-encoders (VAEs) of IB-inspired vector quantizers.


[35] 2606.10883

Temperature-Aware Heat Pump Modeling for Large-Scale Energy System Optimization

Heat pumps are expected to dominate the heating sector, substantially increasing peak electricity demand. At the same time, building thermal inertia enables operational strategies, providing temporal flexibility in heat pump operation and short-term demand response. However, this dynamic behavior is not yet represented in large-scale energy system optimization models. To address this gap, we present an innovative formulation of building thermal inertia. The resulting temperature variable is integrated into a novel conic temperature-aware heat pump efficiency formulation, enabling a more precise emulation of smart control strategies. In a case study of the European energy system, we show that the approach captures operational heating flexibility while remaining computationally efficient. The results indicate substantial untapped flexibility potential, enabling up to a 22% reduction in heating-related electricity costs. This potential can be realized through a suitable energy market design that incentivizes coordinated heat pump control, individually or via aggregators.


[36] 2606.10893

Low-Dose 3D Bonding Mapping Through "Soft" Core-Loss EELS Tomography and Unsupervised Deep Learning

Resolving the 3D chemical configuration of beam-sensitive nanomaterials at high spatial resolution remains a persistent frontier in scanning transmission electron microscopy (STEM). The main limitation lies in the trade-off between high electron dose required for analytical signals and the large number of projections needed for tomographic reconstruction. Here, we achieve dose-efficient 3D bonding mapping of FeO/Fe$_3$O$_4$ core-shell nanocubes with high resolution via electron energy loss spectroscopy (EELS). Our approach relies on two developments. First, a standardless "soft" core-loss EELS methodology exploiting Fe-M$_{2,3}$ edges provides ${\sim}50\times$ higher dose efficiency than conventional Fe-L$_{2,3}$ edges, using the latter only as a source of FeO and Fe$_3$O$_4$ standards. Second, we introduce multi-channel deep image prior with total variation regularization (DIPm-TV), an unsupervised method for spectroscopic tomography that jointly reconstructs multiple channels by exploiting spatial correlations under sparse-view and low-dose conditions. Using simulated datasets, high-quality reconstructions are obtained from as few as nine projections over $-70^\circ$ to $+70^\circ$, without HAADF-STEM signal or symmetry constraints. Applied to FeO/Fe$_3$O$_4$ nanocubes, Fe-M$_{2,3}$ EELS maps show improved SNR and spatial resolution, revealing a thin outer FeO shell surrounding the magnetite shell. DIPm-TV yields ${\sim}1$ nm isotropic resolution oxidation-state volumes preserving cubic morphology, recovering the outer FeO shell, and revealing a small internal void, features not accessible with conventional reconstruction methods. This work establishes a pathway for low-dose 2D and 3D analytical mapping of beam-sensitive materials using shallow core-loss edges, enabling orders-of-magnitude dose reduction while maintaining spectral fidelity and reliable 3D information.


[37] 2606.10900

Personalized Deep Learning for Short-Term Forecasting of Impending Atrial Fibrillation from Continuous Wearable ECG Signals

Background and Objective: Continuous wearable electrocardiogram (ECG) monitoring is increasingly used for ambulatory arrhythmia surveillance, yet forecasting impending atrial fibrillation (AF) is challenged by inter-patient ECG variability. This study investigated whether personalizing a global model via fine-tuning on an individual's ECG signals improves short-term forecasting of impending AF. Methods: A global model trained on the ICENTIA11K dataset was compared against personalized models fine-tuned across three cohorts: ICENTIA11K, IRIDIA-AF, and MobiCARE. Following preprocessing, models processed 60-second ECG segments for a five-minute forecast horizon. We evaluated the impact of adaptation data volume and analyzed ECG features, such as heart rate and RMSSD. Results: Personalized models significantly outperformed the global model, achieving AUROCs of 0.711 vs. 0.614 in ICENTIA11K and 0.686 vs. 0.585 in MobiCARE. Personalization benefits increased with the amount of patient-specific fine-tuning data. While the global model's accuracy rose as AF onset approached, personalized models in the two external cohorts exhibited distinct temporal dynamics, which may indicate the capture of patient-specific characteristics less dependent on proximity to the AF event. Pre-AF episodes showed elevated heart rates and RMSSD. Feature attributions highlighted clinically relevant precursors, including frequent premature atrial complexes (PACs) and short supraventricular tachycardias (SVTs). Conclusions: Adapting deep learning models with patient-specific wearable ECG data significantly enhances short-term forecasting of impending AF. This personalized framework supports timely preventive interventions and improved AF management in ambulatory monitoring environments.


[38] 2606.10923

Robust Current Regulation of MMC-based MTDC Power Systems based on Lyapunov Inequality

Multi-terminal DC (MTDC) transmission systems based on modular multilevel converters (MMCs) are a key component of the envisioned future energy sector, where sustainability and efficiency are increasingly prioritized. To ensure their reliable operation, MMC currents must be regulated safely and rapidly under a wide range of uncertain operating conditions. Consequently, the design of current controllers faces a fundamental challenge: achieving fast transient response while maintaining robustness against uncertainties. This paper addresses this challenge by proposing a linear matrix inequality (LMI)-based design framework that leverages Lyapunov stability conditions to synthesize a less conservative static state-feedback controller. The proposed design method explicitly accounts for system constraints, including input saturation and overcurrent limits. The proposed method effectiveness is assessed on the CIGRE MT-HVDC benchmark, simulated in RTDS, and compared with existing methods.


[39] 2606.10972

Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

This study aims to explore the performance of the VAR model in comparison with mel-frequency cepstral coefficient (MFCC) matrices and log-mel spectrograms using deep learning. In pulmonary sound classification, spectrogram-based representations suffer from inconsistent temporal dimensions due to varying respiratory cycle durations. Along with traditional trimming/zero-padding, adaptive-length windowing was presented to fix their temporal dimensions. Their spectral and temporal dimensions were optimized by testing a range of parameters. Different convolutional neural network (CNN) architectures were employed to extract features from the two-dimensional representations obtained over the sub-phases. The extracted sub-phase features were then fused using various strategies including direct concatenation, gated recurrent unit (GRU) network and GRU with attention mechanism. Model performances were assessed through respiratory cycle-based evaluation and subject-based evaluation comprising multiple respiratory cycles. Several data augmentation techniques were also studied to cope with limitations in data size. The best cycle-based F1-score (0.877) was obtained using the MFCC matrices with thirteen coefficients and 64-point time resolution per sub-phase representation followed by direct feature concatenation, and the best subject-based F1-score (0.855) was obtained using the MFCC matrices with thirteen coefficients and 256-point time resolution per full-cycle representation, both obtained by adaptive-length windowing. Augmentation degraded the performance of models overall, yet mixup augmentation was the best among the methods tested. MFCC outperformed log-mel spectrogram and VAR model in differentiation of asthma and COPD. Sophisticated fusion strategies did not improve the diagnosis. Augmentation did not contribute, demonstrating the significance of authentic data in pulmonary sound studies.


[40] 2606.10997

A Companion App for an Autonomous Family Vehicle: Identification of Values for an Autonomous Mobility System

In this paper, we present a companion app for an autonomous vehicle aimed at user groups who would normally require an accompanying person to drive them. Two aspects of a companion app are presented in this paper: First, the possibility for a trusted person to track the ride of the person in need of support and second, to put the settings of the vehicle for persons in need of support in the hands of a trusted person. In addition, this article describes the requirements and addressed values and discusses the safety-relevant aspects of such a companion app. We also discuss and identify the values that influence passengers and trusted persons using the companion app. Overall, a companion app can provide new perspectives and opportunities for people in need of support, allowing them to take advantage of the features offered by autonomous vehicles. It enables trusted individuals to configure the vehicle according to the passengers needs. Also such an app can be a mechanism to involve trusted persons in the options given by the vehicle and give them the possibility to adapt the vehicle to the needs of the person in need of support.


[41] 2606.11020

Multi-Channel Soil Moisture Measurement: High Accuracy and Low Crosstalk Through Optical-Semiconductor Based Differential Sensing

Soil moisture measurement plays a key role in irrigation and environmental management. Yet it remains unreliable due to heterogeneous soils, limited sensing volumes, temperature drift, and parasitic inter-channel coupling. This work presents a compact multi-depth capacitive probe that extends a parallel-plate geometry from previous work with differential activation to suppress stray capacitances and improve accuracy. An equivalent-circuit model quantifies parasitic effects, and optically coupled transistor bridges isolate each sensing layer. Raw capacitance is converted to volumetric water content and plant-available water using established calibration models. Laboratory results show a fourfold reduction in temperature sensitivity, strong confinement of the sensing volume, and improved repeatability in heterogeneous soils. Field validation against reference sensors demonstrates high accuracy and precision comparable to widely used instruments, enabling a practical and scalable solution for agricultural and urban soil-moisture monitoring.


[42] 2606.11049

Free Parametrization of L_2-Bounded Structured State-Space Controllers for Nonlinear Control with Stability Guarantees

Designing stabilizing control policies for nonlinear systems while optimizing complex objectives remains a formidable challenge. Neural networks (NNs), despite their expressive power, can be highly sensitive to small input perturbations and can easily destabilize the closed-loop system. Existing approaches often impose explicit constraints on the controller's parameters to ensure stability, but this typically leads to additional computational overhead. To address this issue, we leverage recently proposed structured state-space models (SSMs) to parametrize discrete-time control policies for nonlinear systems. Our key contribution is a new free parametrization of linear time-invariant (LTI) systems with a prescribed L2 gain. We use this result to construct the L2-Recurrent Unit (L2RU), an SSM layer that enforces the desired L2 bound by design. The resulting architecture can be used to guarantee closed-loop stability via the small-gain theorem or the so-called performance-boosting framework, independently of the controller's optimization parameters, thereby enabling fully unconstrained optimization of general nonlinear objectives. Furthermore, the structure induced by the proposed parametrization enables the efficient processing of long input sequences, as it is highly parallelizable through algorithms such as parallel scan. We demonstrate the effectiveness of this approach on a formation-control task for mobile robots, where the L2RU-based controller ensures collision and obstacle avoidance while maintaining stability and performance.


[43] 2606.11091

QUIET: Quantifying Underutilized Influential Edges for Targeted Synchronization

Network control theory can be used to model intrinsic and extrinsic strategies to steer neural dynamics. Standard approaches are node-centric, structural, and focused on achieving desired instantaneous states. Here, we develop an edge-centric approach which incorporates both structure and function to achieve extended patterns of neural dynamics characterized by desired synchronization states. Our method, Quantifying Underutilized Influential Edges for Targeted Synchronization (QUIET), is an edge-centric framework that integrates structural controllability of individual white matter connections and mutual information between pairwise functional timeseries to identify energy-efficient synchronization pathways. QUIET identifies quiet highways, edges that are structurally influential but functionally underutilized, to optimize regional synchronization. We validated QUIET across 75 synthetic configurations, where QUIET-ranked edge sets significantly outperformed random selection in 93% of cases (p<0.01). The framework, tested on Human Connectome Project participants, revealed that the control energy required for synchronization of the salience network correlates with fluid intelligence. QUIET, applied to healthy adults undergoing dexmedetomidine-induced unresponsiveness, showed that the frontoparietal and default-mode networks exhibited the largest control energy required for synchronization in both awake and sedated states. QUIET is released as a stand-alone software to be used to study theoretically-defined synchronization pathways, which in turn could inform testable hypotheses in perturbative studies.


[44] 2606.11107

Multimodal Brain Tumour Classification Using Feature Fusion

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.


[45] 2606.11125

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

Blood pressure (BP) is a key marker for cardiovascular risk assessment and therapeutic decision-making, and Photoplethysmography (PPG) enables low-cost, wearable-friendly cuffless BP estimation. However, even with recent progress, many PPG-based models are trained with BP regression alone and may rely on amplitude-dominated shortcuts. In addition, demographic covariates that systematically modulate vascular compliance are often incorporated only via late fusion, limiting subject-specific representation learning. We propose a Transformer-based network for cuffless BP estimation from PPG signal, leveraging self-attention to capture long-range dependencies across multiple cardiac cycles. To account for subject-specific vascular differences, the model is conditioned on demographics via FiLM-style feature modulation applied through the attention and feed-forward sublayers of Transformer blocks. In addition, we add an auxiliary morphology head to guide the model to attend to BP-relevant waveform morphology associated with arterial stiffness and wave reflection. Under calibration-based evaluation protocols on the large-scale PulseDB dataset, the proposed method achieves MAE of 4.56 mmHg for systolic BP and 2.62 mmHg for diastolic BP, reducing errors by 47% and 50% compared with prior demographic-enhanced PPG baselines. The resulting lightweight, single-sensor model supports scalable and clinically grounded cuffless BP estimation in calibration-enabled deployment settings.


[46] 2606.11135

Pre-Fault Voltage Discrimination and Time-Domain Protection for Distribution Networks with Inverter-Based Resources

The increasing proliferation of inverter-based resources (IBRs) in distribution networks is presenting a major challenge for phasor-based overcurrent protection. This challenge stems from IBRs' lack of short-circuit current sourcing capacity. As a result, traditional overcurrent protection functions (e.g., ANSI 51) are inadequate in such scenarios, and warrant alternative approaches. Time-domain protection, for example, shows promise in overcoming this challenge. In this paper we propose a pre-fault voltage discrimination (PVD) strategy whose role is to detect faults and discriminate normal switching and transformer inrush disturbances from actual faults. The use of PVD allows for the design of a simple, yet effective fault detection algorithm by using time-domain protection principles for distribution networks containing IBRs. The introduction of PVD provides for faster fault detection without reducing security and dependability. Offline simulation experiments and controller hardware-in-the-loop real-time simulation validate the effectiveness of the proposed algorithm against various fault and normal switching events.


[47] 2606.09870

Safecloud: A Distributed, Encrypted Storage Cloud for Streaming

We present Safecloud, a distributed, encrypted, self-pricing storage and streaming network whose storage and routing nodes never see plaintext and never hold keys. Each file is split into chunks, encrypted on the owner's device, and distributed across Drops (browser tabs storing ciphertext in IndexedDB) and Jets (federated routing servers). Only the owner, or an authorised grantee, can decrypt. We make five contributions: (1) A one-root key hierarchy: every key derives deterministically from a single root via HKDF, and owner and range-scoped grantee derive identical chunk keys (derivation agreement); a subtree key derives its range and nothing else (delegation containment). (2) Convergent content addressing: identical content yields identical ciphertext and identifiers, enabling deduplication without plaintext exposure, with identifiers binding authenticated ciphertext so a keyless Drop verifies integrity (blind verifiability). (3) Three parallel trees over one navigation path (Merkle for integrity, key-derivation for confidentiality, access for authorisation), with sound Merkle-verified retrieval. (4) The key tree doubles as a streaming index: a player derives each segment key in O(1), seeking by derivation, while parallel tracks (video, audio, captions) are independent subtrees unlockable per-track and per-segment, a combination we believe no prior encrypted-storage network offers. (5) Jets and Drops earn Safebux verifiably, kept honest by a one-signature proof-of-storage challenge under chilling-effect Proof-of-Corruption, a zero-sum economy that is significantly cheaper than Filecoin's proof-of-replication sealing (which is slow and provides no confidentiality). We give the architecture, cryptographic construction, a threat model, and an open-source reference implementation, stating precisely what is implemented versus designed.


[48] 2606.10085

Structured Adaptive Tensor Prediction for Streaming Data

Matrix-valued time series arise in a wide range of applications, such as spatio-temporal data from medical imaging and geophysics. Existing methods are mainly designed for static settings and lack adaptability to streaming and time-varying environments. Adaptive filtering techniques have also been largely limited to data with scalar or vector values, leaving adaptive forecasting for matrix-valued time series inadequately understood. To bridge these gaps, we develop an adaptive tensor regression framework that includes Matrix-on-Matrix (MoM) and Tensor-on-Matrix (ToM) formulations for streaming matrix-valued prediction. The two formulations differ in whether to directly model matrix-valued outputs or to exploit temporal structure via higher-order tensor representations. For the proposed tensor regression framework, we develop stochastic gradient descent (SGD) algorithms for online learning. We show that stacking multiple responses across time into higher-order tensors improves performance; in particular, the ToM achieves lower steady-state error and stronger denoising capability than MoM, motivating our focus on the ToM model. We further characterize the tracking behavior of SGD under time-varying dynamics. From a statistical perspective, we establish fixed-time recovery guarantees for ToM under general low-dimensional structures, including sparsity, low-rankness, and their joint sparselow-rank models.


[49] 2606.10111

Nonlinear Estimator: Dual Bayesian Affine Estimators for Parameter Learning

This paper presents a nonlinear parameter estimator for Wiener-type state-space models obtained as a fixed-point architecture that couples two affine minimum mean-squared error (MMSE) estimators: one for the unknown parameters and one for latent variables. The architecture retains the functional structure of the optimal affine MMSE parameter estimator while incorporating Dynamic Basis Statistics (DBS) estimates that summarize nonlinear basis-function evaluations. Two DBS construction strategies are developed, leading to two nonlinear estimator frameworks. The dual basis-parameter estimator combines an affine basis estimator with the affine parameter estimator, whereas the dual state-parameter estimator first computes affine state estimates and their covariances, then maps these state-estimate statistics through a Gaussian DBS operator to obtain DBS estimates. Both dual estimators admit fixed-point characterizations that alternate between estimating each component using the updated prior of the other, obtained from that component's plug-in estimate statistics from the previous iteration. The efficacy of the proposed methods is examined via extensive Monte Carlo experiments, showing that the dual basis-parameter estimator attains parameter mean-squared errors comparable to those of the purely affine parameter estimator, while the dual state-parameter estimator achieves the lowest parameter mean-squared error, outperforming both the dual basis-parameter and purely affine parameter estimators, as well as sequential Monte Carlo variants of classical Particle Gibbs and Expectation-Maximization schemes.


[50] 2606.10192

Submodular Optimization with Applications to Decision and Control

Submodular set functions, characterized by the diminishing-returns property, provide a unifying combinatorial framework for many subset-selection problems in decision and control. Although exact maximization is NP-hard in general, the structural properties of submodular functions enable simple greedy algorithms that achieve constant-factor approximation guarantees for monotone objectives, with randomized greedy-based variants extending such guarantees to the non-monotone case. This survey reviews the theory, algorithms, and applications of submodular optimization with a focus on systems and control. We cover the structural properties of submodular functions, including curvature and the submodularity ratio, the constraint families that arise in practice (matroids, knapsack, and $p$-systems), and the main approximation algorithms for monotone and non-monotone submodular maximization, with up-to-date approximation ratios and hardness results. We then survey applications across sensor scheduling, multi-agent coordination, robust submodular optimization, leader-follower systems, distributed submodular optimization, game theory, system theory, resource allocation, social networks, and informative path planning. The survey emphasizes practically implementable greedy-based algorithms and instance-dependent refinements via curvature and the submodularity ratio. We close with observations on canonical control-theoretic objectives: certain functionals are submodular (the log-determinant and rank of the controllability Gramian, and the log-determinant of the Kalman filter information matrix), whereas closely related objectives fail to be sub- or supermodular (the steady-state Kalman filter error covariance, and the average control energy obtained from the inverse Gramian). We also highlight the cross-cutting open directions that follow.


[51] 2606.10344

Koopman Modeling and Stabilization of Discrete-Time Nonlinear Control Systems: Bilinearity on a Reproducing Kernel Hilbert Space

Despite the popularity of Koopman modeling for nonlinear systems, in the presence of input variables, the evident nonexistence of a fully linear time-invariant model even in infinite dimensions makes Koopman-based control largely an open problem to date. Focusing on discrete-time systems in this paper, which eschews from using operator semigroup and infinitesimal generator notions, it is proven that nonlinear systems, if satisfying appropriate smoothness and regularity conditions, can be expressed exactly as bilinear dynamics, when the state variables and input variables are separately lifted into their reproducing kernel Hilbert spaces (RKHSs). To account for the knowledge of an equilibrium point at the origin, the RKHS is defined by a linear--radial product kernel, and hence the functions belonging to this RKHS are spanned by the multiplications of component functions and Sobolev functions. The stabilization problem, namely the determination of a feedback law that causes a Lyapunov function (expressed as a kernel sum-of-squares form) to decrease, is then posed as an infinite-dimensional optimization problem over state-dependent conditional probability measures over the input space, solved via a discretization scheme.


[52] 2606.10410

A Comprehensive Inference-Time Augmentation Framework in Physiological Signals: Application to PPG-Based AF Detection

Objective: Accurate classification of physiological signals in real-world deployments is challenged by sensor noise, motion artifacts, and distribution shifts between training and deployment data. Inference-time augmentation (ITA), which applies augmentations during inference rather than retraining, offers a simple, model-agnostic mechanism to improve robustness. However, ITA application to physiological signals has remained narrow in scope, relying on limited augmentation methods with fixed, unoptimized parameters. This work proposes a unified ITA framework to address that gap. Approach: The framework incorporates 13 augmentation methods spanning time-domain, amplitude-domain, frequency-domain, and artifact-injection transformations, with hyperparameters optimized via Bayesian optimization. We evaluate on atrial fibrillation (AF) detection from 30-second PPG signals using GPT-PPG and ResNet across five datasets comprising more than 400 patients and ${\sim}$9,800 hours of recording. Main results: Standard ITA consistently improved AUROC (up to 8.5% for GPT-PPG and 0.7% for ResNet) and AUPRC (up to 10.6% for GPT-PPG and 0.8% for ResNet). Selective ITA further reduced average FPR by up to 4.4% (GPT-PPG) and 1.3% (ResNet) on non-AF datasets. Significance: These findings establish ITA as a practical, model-agnostic approach for improving PPG-based AF classification reliability in deployment settings where retraining is not feasible, with broader applicability to physiological signal analysis.


[53] 2606.10439

Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic Downsampling

The rapid progress of large language models (LLMs) has opened up a new frontier for automatic speech recognition (ASR), making their effective integration a critical and challenging research direction. To this end, this work proposes a projector-based LLM-ASR framework targeting the key challenges of multilingual generalization and modality alignment. Our approach incorporates a Mixture of Experts (MoE) architecture to improve cross-lingual adaptability, and a Continuous Integrate-and-Fire (CIF) mechanism for dynamic downsampling and modality alignment. Experimental results show that the combination of these components yields substantial performance improvements, surpassing strong baseline models. The proposed method represents a step toward building more accurate, robust, and generalizable LLM-based ASR systems.


[54] 2606.10544

From Stacks to Circuits: A Regenerative Socio-Technical Roadmap for AI Infrastructure within Planetary Boundaries

Current scaling trajectories for Generative AI, typified by linear supply-side "stacks," prioritize performance density while externalizing significant thermodynamic and material costs. As the "Twin Transition" of green and digital transformation accelerates, the industry faces technology gaps - including Scope 3 emissions and e-waste recycling - that impede sustainable scaling and lead to social tensions. This study proposes a Regenerative Socio-Technical roadmap that repurposes the Sustainable Production and Consumption system map to reframe artificial intelligence infrastructure as a system-of-systems governed ultimately by planetary limits. By integrating the Institute of Electrical and Electronics Engineers International Roadmap for Devices and Systems (IEEE IRDS) sustainability considerations for semiconductor facilities, the study proposes a metabolic circuit framework that centers "Values and Needs" within production and consumption relationship loops. This study identifies critical gaps in current Nvidia-centric roadmaps and proposes a competing reference architecture. It demonstrates how a spontaneous order of resource parsimony and planetary accountability can provide an actionable pathway for regulatory compliance and industrial resilience in the digital circular economy.


[55] 2606.10565

A Lightweight Dual-Factor Acoustic Authentication System via Cascaded GMM-DTW Architecture for Edge Computing

This paper presents a lightweight, cascaded GMM-DTW dual-factor voice lock system for resource-constrained edge environments. By utilizing a shared MFCC feature space, the framework implements a sequential defense mechanism combining GMM speaker screening and DTW passphrase verification. To counter presentation threats without extra hardware, a dynamic joint absolute-relative margin constraint is integrated into the GMM classification space, limiting the physical imposter and high-fidelity replay attack False Acceptance Rates (FAR) to 2.73% and 6.67%, respectively, with a legitimate False Rejection Rate (FRR) of 16.67%. Due to Sakoe-Chiba window optimization, the global end-to-end processing latency under temporal stress is rigidly bounded at 9.82ms on a single-core CPU, comprising 1.51ms for feature extraction, 0.54ms for GMM scoring, and 7.77ms for worst-case DTW matching. These empirical benchmarks demonstrate the viability of white-box acoustic cascades for secure, deterministic real-time deployment on low-power edge nodes.


[56] 2606.10579

LieIPM: Lie Group Interior Point Method for Direct Trajectory Optimization of Rigid Bodies

Designing dynamically feasible trajectories for rigid bodies is a fundamental problem in robotics. While direct methods are widely used, the existing constrained optimizers typically operate in Euclidean space and ignore the manifold structure of rigid body motions. This mismatch may introduce singularities or lead to poorly conditioned optimization problems. To bridge this gap, we develop a structure-aware framework for constrained trajectory optimization directly on matrix Lie groups. Our approach is based on the second-order rigid body models utilizing Lie group structures, which enables efficient Newton-type updates while preserving the underlying geometry. Building on this model, we propose a line-search Lie Group Interior Point Method (LieIPM) to handle constraints on the manifolds. We instantiate the framework for rigid body motion planning using Lie group variational integrators and derive closed-form intrinsic derivatives that exploit group symmetries. The LieIPM preserves the topology of rotation motions by construction and avoids singularities. Numerical results demonstrate superior robustness and faster convergence compared to general-purpose solvers and structure-exploiting optimal control methods.


[57] 2606.10581

ParaBridge: Bridging Paralinguistic Perception and Dialogue Behavior in Speech Language Models

Speech carries more information than just words: a child's voice, a fearful tone, or a noisy background should all lead a sufficiently competent spoken-dialogue assistant to different replies. Current Speech Language Models (SLMs) can recognize such paralinguistic cues but often ignore them in open-ended dialogue. We observe that a simple paralinguistic instruction scaffold at the inference stage narrows this perception-behavior gap, suggesting that the relevant cues are already latent in the model. Such scaffolds, however, remain brittle under multi-turn context and competing instructions. Therefore, we propose \textbf{ParaBridge}, an on-policy self-distillation method that turns a brittle inference-time scaffold into stable model behavior. During training, the scaffold serves only as a temporary privileged view; the scaffold-free model rolls out its own response, while the scaffolded view supplies dense, full-vocabulary next-token targets along its trajectory. This supervision teaches when non-lexical cues should affect the reply without the need for curated dialogues, human labels, or external reward models. On Qwen3-Omni-thinking, ParaBridge raises scaffold-free VoxSafeBench SAR from $14.6\%$ to $40.3\%$ and improves EchoMind average rating from $3.27$ to $3.92$. It also preserves general ability, with MMAU-Pro, VoiceBench, and GPQA all within $0.4$ points of the original model. Beyond the training distribution, ParaBridge generalizes to unseen paralinguistic cues, transfers from safety-oriented training to empathy-oriented dialogue, and works on a different SLM backbone.


[58] 2606.10596

Embedding Hybrid Systems into Continuous Latent Vector Fields

This work proves that an $n$-dimensional hybrid system can be embedded into an $m$-dimensional Euclidean space equipped with a continuous vector field on its embedded image whenever $m>2n$. This result suggests that an intrinsically discontinuous hybrid system generically admits a continuous extrinsic representation that is well-posed for differentiable optimization. Building on this existence theorem, we show that a latent Neural ODE with consistency loss in both the latent and state space can accurately recover the flow of hybrid systems. Extensive experiments suggest the proposed method outperforms the existing method in learning hybrid systems with varying geometries from only time series data.


[59] 2606.10675

Multilingual Word-Level Forced Alignment with Self-Supervised Representations and Learned Dynamic Programming

We present a method for accurate multilingual word-level forced alignment, consisting of an alignment encoder and a learned alignment decoder. The encoder integrates two representations: one from the Massively Multilingual Speech (MMS) model and another from a self-supervised phoneme boundary detector (UnSupSeg). It learns to fuse them and to estimate word-boundary probabilities over long temporal contexts. The alignment decoder is a learned dynamic programming that combines encoder outputs with segmental features over the MMS and UnSupSeg representations to infer final word boundaries. Trained iteratively on TIMIT and Buckeye, the proposed approach outperforms Montreal Forced Aligner (MFA) and MMS-based alignment on both datasets. On unseen languages (Dutch, German, and Hebrew), the proposed model achieves performance consistently better than or on par with existing alignment approaches, indicating its potential to scale to 1100+ languages supported by MMS without further training.


[60] 2606.10705

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

Reinforcement learning promises to optimize sequential decisions in large-scale systems. Semiconductor manufacturing systems are stochastic and highly constrained environments where heterogeneous wafers traverse hundreds of processing steps across extensive equipment networks. These characteristics yield complex, high-dimensional decision problems with delayed feedback and long-horizon requirements, complicating production planning and control. We propose a deep reinforcement learning framework for multi-objective policy optimization at this scale. Specifically, we formulate control as a centralized-agent problem, where a core policy coordinates system-wide decisions, while system evolution is represented as an interconnected temporal process driven by discrete events. Accordingly, we develop a tailored event-driven temporal-difference formulation that remains general and can be integrated with various policy optimization methods under relevant training settings. We investigate several core model-free algorithms incorporated into this framework and evaluate their effectiveness using high-fidelity simulations of diverse, industry-real operating scenarios. Across extensive validation experiments, agents trained in both offline and online settings show significant and consistent gains in throughput and utilization. We further evaluate performance and generalization across training phases, clarifying the relative strengths of alternative reinforcement learning formulations and algorithms. Overall, the results support the scalability, generality, and transferability of the proposed framework for controlling event-driven complex adaptive systems.


[61] 2606.10810

Direct Data-Driven Approximate Optimal Control of Nonlinear Input-Affine Systems

In this paper, we combine a data-driven system representation with a framework to systematically construct (approximate) solutions to nonlinear optimal control problems. By immersing the unknown dynamics into an extended state space, solutions are characterised via purely data-dependent algebraic conditions. This allows us to design dynamic state-feedback controllers with local stability and performance guarantees for unknown nonlinear, input-affine systems directly using data, without explicitly identifying the dynamics.


[62] 2606.10841

Gradient based Bilevel for Inverse Optimal Control, a Riemannian approach

Inverse Optimal Control (IOC) aims to recover the cost function that explains observed trajectories as solutions of an optimal control problem. Classical IOC formulations rely on bilevel optimization, which repeatedly solves a nested optimal control problem and quickly becomes computationally prohibitive for realistic systems. Recent projection-based approaches offer a promising alternative but suffer from numerical instability when solved with gradient-based methods due to violations of standard constraint qualifications. In this paper, we show that these difficulties stem from the geometric structure of the IOC feasible set. We demonstrate that the set of trajectories satisfying the optimality conditions naturally forms a manifold and reformulate IOC as an optimization problem on this manifold. Based on this insight, we propose a Riemannian Inverse Optimal Control (RIOC) method that projects observed trajectories onto the manifold of optimal solutions while preserving feasibility by construction. Experiments on real human arm trajectories show that the proposed method achieves comparable or better reconstruction accuracy than classical bilevel IOC while reducing computation time by about a factor of four. These results highlight the potential of geometric optimization methods to improve the scalability and reliability of IOC for robotics and human motion analysis.


[63] 2606.10941

Spectral Koopman Approach for Reconstructing State-space Geometry of Cislunar Restricted 3-Body Problem

In this work, we propose a novel approach, based on the path integral formulation of Koopman spectrum, to discover the phase-space geometry of the planar Cislunar Restricted 3 Body Problem (CR3BP). In contrast to existing techniques, which use trajectory-based (usually) local analysis, we leverage the Koopman operator framework, which generates a global linear \emph{representation} of the system, to reconstruct the global phase space geometry of the CR3BP. In particular, we compute the principal eigenfunctions of the Koopman operator via the path integral approach and show how the zero level curves of these eigenfunctions encode the phase space characteristics of the planar CR3BP.


[64] 2606.10971

Resilient Navigation for Autonomous Farm Robots by Leveraging Jerk-Augmented Models with IMU-Only Disturbance Rejection

Precise state estimation for navigation of autonomous agricultural robots is often compromised by sensor outages (GNSS/LiDAR/Visual) and high-frequency vibrations inherent in off-road environments. This paper proposes a robust navigation algorithm based on a jerk-augmented Extended Kalman Filter (EKF) integrated with a Multiple Tuning Factor (MTF) adaptation method. Unlike standard EKF approaches that assume constant measurement noise, our method dynamically adjusts the measurement covariance matrix in real-time, allowing the system to cope with sudden disturbances and sensor outliers. We evaluate the algorithm using real-world data from a Salin247 autonomous robot. Results demonstrate that jerk-augmentation combined with MTF adaptation significantly reduces 3D position Root Mean Square Error (RMSE) compared to baseline EKF models, providing superior dead-reckoning capabilities.


[65] 2606.10975

Learning Doubly Sparse Explicitly Conditioned Transforms

Finding convenient spaces in which certain hypotheses regarding an assumed sparse structure of natural signals hold true has become a desirable result in recent research, its implications being reflected in areas such as data compression, noise reduction and feature extraction. While the extensively used analytical transforms, such as DFT or DCT, already provide efficient algorithms and robust sparse representations, they assume a fixed prior about the data, failing to accurately capture the specific structure of more restrictive classes of signals. To address this, the concept of a data-adaptive, learnt transform has been introduced in the literature, allowing for the reduction of a residual term in the transform domain. More recent studies have shown that the condition number serves as a good metric in this context, where the desired outcome alternates between a generalizing tendency and one that achieves minimal approximation error. Motivated by these considerations, we introduce the learning of a structured, explicitly conditioned transform formulated as the product of a fixed canonical matrix and a refining data-adaptive sparse component. This approach seeks to preserve the advantages of fast and stable analytical transforms, while introducing controllable adaptivity to the data. No references that concern this specific formulation have been identified so far, indicating its novelty. The proposed algorithm is motivated within the framework of inexact proximal methods, leveraging a newly derived closed-form projection operator. Empirical observations demonstrate state-of-the-art results on the doubly sparse transform learning problem and comparable performance with its dense variant at significantly lower computational costs and sometimes faster convergence and better avoidance of bad local minima.


[66] 2606.10986

Multi-UAV Active Sensing with Information Gain-based Planning and Belief Fusion

Unmanned aerial vehicles (UAVs) are increasingly used for active sensing and information gathering in spatially distributed environments. Their performance, however, is constrained by limited flight time, sensing uncertainty, and the trade-off between spatial coverage and observation accuracy. This paper presents a real-world validation of a multi-UAV active sensing framework for probabilistic binary terrain mapping, with precision agriculture used as the application case. The environment is represented as a probabilistic belief map, where spatial dependencies are modeled through a factor-graph formulation. UAV decision making is guided by Information Gain based Informative Path Planning (IGbIPP), and the approach is compared with Random Walk and Sweep coverage path planning baselines using both synthetic terrains and real UAV-derived agricultural imagery. The study also evaluates spatial correlation weights and several probabilistic belief-fusion rules for multi-UAV information sharing. Results show that IGbIPP reduces entropy and mapping error more effectively than the baselines, while a wider field of view improves real-world coverage and map accuracy. The results further show that simple equal or biased spatial weights can be more robust than adaptive weights, and that Bayesian, log-odds, and Dempster--Shafer fusion achieve the best cooperative mapping performance. These findings highlight the importance of uncertainty-driven planning, sensing geometry, spatial modeling, and probabilistic fusion for real-world UAV-based active sensing.


[67] 2606.11017

Data-Driven Runway and Taxiway Exits Prediction of Landing Aircraft: A Case Study at Hartsfield-Jackson Atlanta International Airport

Airport surface operations increasingly constrain performance at high-throughput hubs. This study examines arrival taxi-in decisions at Hartsfield-Jackson Atlanta International Airport (KATL) and proposes a two-stage, data-driven decision aid that mirrors controller workflow. Stage I predicts the runway exit selected by an arriving aircraft. Stage II predicts whether, given that exit, the aircraft will cross the active departure runway at a designated point or use the end-around taxiway. Models are trained using ASDE-X surface trajectories, aircraft characteristics, ramp destinations, short-horizon traffic rates, and weather across multiple look-back windows. We benchmark nine classifiers, including Random Forest, XGBoost, LightGBM, and CatBoost, and evaluate accuracy, macro-F1, precision-recall behavior, confusion matrices, Brier score, and Expected Calibration Error. Across east and west flows, XGBoost and LightGBM outperform Random Forest. Stage I achieves 0.86-0.89 accuracy with macro-F1 scores of 0.40-0.50, while Stage II achieves 0.70-0.74 accuracy with macro-F1 scores of 0.28-0.55. Feature-importance analysis shows that approach speed is the main driver of exit choice. Departure rate, crossing rate, ramp destination, and, for west flow, the selected exit are the strongest predictors of crossing versus end-around routing. Minority classes remain harder to predict because of feature-space overlap, as shown by t-SNE and UMAP analyses. The proposed framework supports controller situational awareness through calibrated, explainable predictions while preserving human responsibility for final routing decisions.


[68] 2606.11050

LLM-Mediated Demand Response Coordination in Smart Microgrids

Effective demand response in smart microgrids requires prosumers to cooperate voluntarily under strategic self-interest, a coordination problem structurally equivalent to a repeated Prisoner's Dilemma on a social network. This paper presents a multi-agent simulation in which a Large Language Model (LLM) Influence Compiler issues structured demand-response directives to a population of heterogeneous prosumer agents, each governed by a hybrid decision architecture combining game-theoretic base probability (derived from payoff history, neighbour imitation, and exploitation memory) with LLM narrative evaluation of incoming coordination signals. The hybrid architecture resolves a key methodological challenge: LLMs aligned via Reinforcement Learning from Human Feedback (RLHF) exhibit strong cooperation bias when used as direct decision-makers, producing flat dynamics regardless of grid conditions. By separating strategic reasoning from grounded narrative evaluation, the model generates realistic prosumer behaviour across six personality archetypes, with baseline cooperation near 50% and clear differentiation under influence. Compiled structured directives achieve 33.3% demand-curtailment cooperation versus 27.0% for unstructured messaging and 28.0% for a no-intervention baseline ($\Delta_\mathrm{comp} = +0.063$), with the advantage preserved across both grounded and idealized agent substrates ($\Delta = +0.083$) and across all resistance levels ($R = 0.1$ to $0.7$). Hub-targeted dissemination via high-centrality network nodes outperforms peripheral or random targeting, confirming that grid topology provides mechanistic amplification independent of message content. These results suggest that structured LLM compilation, grounded agent reasoning, and network-aware targeting are complementary design principles for scalable, interpretable demand-response coordination in smart-city energy systems.


[69] 2606.11167

Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models

Full-duplex spoken dialogue models can listen and speak simultaneously, making them a promising architecture for natural conversation. However, current models are trained solely with supervised learning through token-level likelihood maximization, which does not directly optimize interaction-level behaviors, causing interactivity issues such as excessive silence and ill-timed turn-taking. Recent work has applied reinforcement learning (RL) to improve interactivity, but existing methods address only a limited set of interactive behaviors in their rewards. In this work, we propose a post-training alignment method that comprehensively improves the interactivity of full-duplex spoken dialogue models through RL. We address the four canonical axes of interactivity: pause handling, turn-taking, backchanneling, and user interruption. For each axis, we extract short audio segments from human conversation corpora and optimize the model with axis-specific reward functions. An extra LLM-based reward for response quality prevents semantic degradation. We apply our method to two open-source models, Moshi and PersonaPlex, demonstrating consistent improvements in interactivity on both offline evaluation with pre-recorded audio and real-time multi-turn dialogue evaluation.


[70] 2501.01481

Unleashing Correlation and Continuity for Hyperspectral Reconstruction from RGB Images

Reconstructing Hyperspectral Images (HSI) from RGB images can yield high spatial resolution HSI at a lower cost, demonstrating significant application potential. This paper reveals that local correlation and global continuity of the spectral characteristics are crucial for HSI reconstruction tasks. Therefore, we fully explore these inter-spectral relationships and propose a Correlation and Continuity Network (CCNet) for HSI reconstruction from RGB images. For the correlation of local spectrum, we introduce the Group-wise Spectral Correlation Modeling (GrSCM) module, which efficiently establishes spectral band similarity within a localized range. For the continuity of global spectrum, we design the Neighborhood-wise Spectral Continuity Modeling (NeSCM) module, which employs memory units to recursively model the progressive variation characteristics at the global level. In order to explore the inherent complementarity of these two modules, we design the Patch-wise Adaptive Fusion (PAF) module to efficiently integrate global continuity features into the spectral features in a patch-wise adaptive manner. These innovations enhance the quality of reconstructed HSI. We perform comprehensive comparison and ablation experiments on the mainstream datasets NTIRE2022 and NTIRE2020 for the spectral reconstruction task. Compared to the current advanced spectral reconstruction algorithms, our designed algorithm achieves State-Of-The-Art (SOTA) performance.


[71] 2507.19137

Assessment of Personality Dimensions Across Situations in Dyadic Role-Play Scenarios

Prior research indicates that users prefer assistive technologies whose personalities align with their own. This has sparked interest in automatic personality perception (APP), which aims to predict an individual's perceived personality traits. Previous studies in APP have treated personalities as static traits, independent of context. However, perceived personalities can vary by context and situation as shown in psychological research. In this study, we investigate the relationship between conversational speech and perceived personality for participants engaged in two work situations (a neutral interview and a stressful client interaction). Our key findings are: 1) perceived personalities differ significantly across interactions, 2) loudness, sound level, and spectral flux features are indicative of perceived extraversion, agreeableness, conscientiousness, and openness in neutral interactions, while neuroticism correlates with these features in stressful contexts, 3) handcrafted acoustic features and non-verbal features outperform speaker embeddings in inference of perceived personality, and 4) stressful interactions are more predictive of neuroticism, aligning with existing psychological research.


[72] 2507.22017

Cyst-X: A Multi-Center MRI Benchmark and Federated Learning Framework for Malignancy-Risk Stratification of Pancreatic Cystic Neoplasm

Pancreatic cancer is projected to be the second-deadliest cancer by 2030, making early detection critical. Intraductal papillary mucinous neoplasms (IPMNs), key cancer precursors, present a clinical dilemma, as current guidelines struggle to stratify malignancy risk, leading to unnecessary surgeries or missed diagnoses. Here, we introduce Cyst-X, a multi-center MRI benchmark and a federated learning framework for IPMN malignancy-risk stratification. The dataset comprises 1,461 abdominal MRI scans from 764 patients at seven international centers, with three-tier malignancy labels anchored in histopathology or three-year imaging follow-up and expert pancreas segmentations. The pipeline couples the PanSegNet pancreas segmenter with a 3D DenseNet-121 classifier and a parallel radiomics predictor. On internal cross-validation, the deep learning classifier reached a mean area under the receiver operating characteristic curve (AUC) of 0.85 (95% confidence interval 0.84-0.86) on T2-weighted MRI for high-risk versus low- or no-risk discrimination, with the average precision rising from a prevalence baseline of 0.23 to 0.64. This performance was preserved (AUC 0.85, FedProx) when training was distributed across institutions without exchange of raw patient images. Benchmarked against three blinded radiologists on a 629-case reader subset evaluated under imaging-only conditions, the classifier matched or exceeded sensitivity at comparable specificity. To accelerate research in early pancreatic cancer detection, we publicly release the Cyst-X dataset, segmentation masks, and trained models as the first large-scale, multi-centre MRI resource for pancreatic cystic neoplasm analysis.


[73] 2508.03248

Federated Learning Enhanced by Feature Reconstruction for Semantic Communication Module Updates of Agents

Recent advancements in semantic communication have primarily focused on image transmission, where neural network-based joint source-channel coding modules play a central role. However, such systems often experience semantic communication errors due to mismatched knowledge bases between agents and performance degradation from outdated models, necessitating regular model updates. To address these challenges in vector quantization (VQ)-based image semantic communication systems, we propose FedSFR, a novel federated learning framework that incorporates semantic feature reconstruction (FR). FedSFR introduces an FR step at the parameter server and allows a subset of clients to transmit compact feature vectors in lieu of sending full local model updates, thereby improving training stability and communication efficiency. To enable effective FR learning, we design a loss function tailored for VQ-based image semantic communication and demonstrate its validity as a surrogate for image reconstruction error. We further establish a rigorous convergence analysis of FedSFR. Experimental results on two benchmark datasets validate the superiority of FedSFR over existing baselines, especially in capacity-constrained settings, confirming both its effectiveness and robustness.


[74] 2509.22153

Towards Paradigm-General Suicide Risk Detection via Speech LLM

Suicide risk among adolescents remains a critical public health concern, and speech provides a non-invasive and scalable approach for its detection. Speech-based suicide risk assessment commonly relies on carefully designed speech elicitation paradigms (\textit{e.g.,} verbal fluency, reading, or question answering) to probe cognitive and affective states. Existing approaches, however, typically focus on one single paradigm at a time. This paper, for the first time, investigates cross-paradigm approaches that unify diverse speech elicitation paradigms within a single model. Specifically, we use a speech LLM as backbone with a mixture of DoRA experts (MoDE) to capture complementary cues across assessments dynamically, tested on 1,223 participants across ten speech elicitation paradigms. Results show that MoDE outperforms both paradigm-specific and conventional joint-learning models. Moreover, it can generalise to unseen paradigms and provide better confidence calibration.


[75] 2509.25854

Delay-Doppler Domain Channel Measurements and Modeling in High-Speed Railways

As next-generation wireless communication systems need to be able to operate in high-frequency bands and high-mobility scenarios, delay-Doppler (DD) domain multicarrier (DDMC) modulation schemes, such as orthogonal time frequency space (OTFS), demonstrate superior reliability over orthogonal frequency division multiplexing (OFDM). Accurate DD domain channel modeling is essential for DDMC system design. However, since traditional channel modeling approaches are mainly confined to time, frequency, and space domains, the principles of DD domain channel modeling remain poorly studied. To address this issue, we propose a systematic DD domain channel measurement and modeling methodology in high-speed railway (HSR) scenarios. First, we design a DD domain channel measurement method based on the long-term evolution for railway (LTE-R) system. Second, for DD domain channel modeling, we investigate quasi-stationary interval, statistical power modeling of multipath components, and particularly, the quasi-invariant intervals of DD domain channel fading coefficients. Third, via LTE-R measurements at 371 km/h, taking the quasi-stationary interval as the decision criterion, we establish DD domain channel models under different channel time-varying conditions in HSR scenarios. Fourth, the accuracy of proposed DD domain channel models is validated via bit error rate comparison of OTFS transmission. In addition, simulation verifies that in HSR scenario, the quasi-invariant interval of DD domain channel fading coefficient is on millisecond (ms) order of magnitude, which is much smaller than the quasi-stationary interval length on 100 ms order of magnitude. This study could provide theoretical guidance for DD domain modeling in high-mobility environments, supporting future DDMC and integrated sensing and communication designs for 6G and beyond.


[76] 2511.19706

Selective Disk Bispectrum: A Complete and Rotation Invariant Image Descriptor

Rotation invariance is a fundamental requirement across many computer vision tasks. Historically, this inductive bias has been encoded through hand-crafted rotation-invariant representations. These are compact, interpretable, and fast to compute, but they come at the cost of descriptive power. More recently, architectures achieve inductive bias through learned representations. These are highly descriptive and achieve strong empirical performance, at the cost of efficiency and interpretability. In this work, we propose an alternative at the intersection of both paradigms. We introduce the selective disk bispectrum (SDB), a complex-valued rotation-invariant vector that preserves all information about the image except its orientation. Our key theoretical contributions are the selective disk bispectrum, its inversion, its (reduced) spatial and computational complexities (compared to the full disk bispectrum), and its expectation and variance under noise. Furthermore, we propose a numerical SDB approximation and provide theoretical guarantees for its accuracy and rotation invariance. Empirically, we validate SDB's invariance and robustness to noise classification tasks. We test our reconstruction algorithm on multi-reference alignment of rotated images.


[77] 2512.24683

Waste-to-Energy-Coupled AI Data Centers: Cooling Efficiency and Grid Resilience

AI data-center expansion is increasingly constrained by the coupled availability of deliverable electricity and heat-rejection (cooling) capacity. We propose and evaluate an integrated Waste-to-Energy-AI Data Center configuration that treats cooling as a first-class energy service rather than an unavoidable electricity burden. The coupled system is modeled as an input-output 'black box' with transparent boundaries and a standalone benchmark in which mechanical chilling is powered by grid electricity. The central mechanism is energy-grade matching: low-grade WtE thermal output drives absorption cooling to deliver chilled service, thereby displacing baseline cooling electricity. We show that thermoeconomic superiority is governed by three first-order determinants, (i) cooling coverage of IT heat load, (ii) parasitic electricity for transport and auxiliaries, and (iii) distance-driven delivery decay, yielding a break-even corridor beyond which net benefits vanish. Comparative statics characterize sensitivity to IT utilization, feedstock quality (waste LHV and throughput), climate parameterization, and corridor distance. We translate these accounting gains into decision language through a computable prototype for Levelized Cost of Computing (LCOC) and an ESG valuation channel grounded in measurable mechanisms, without re-deriving full lifecycle inventories. The framework provides siting-ready feasibility conditions for WtE-AIDC coupling in urban AI corridors under grid stress.


[78] 2601.05395

Data-Based Analysis of Relative Degree and Zero Dynamics in Linear Systems

Data-driven control offers a powerful alternative to traditional model-based methods, particularly when accurate system models are unavailable or prohibitively complex. While existing data-driven control methods primarily aim to construct controllers directly from measured data, our approach uses the available data to assess fundamental system-theoretic properties. This allows the informed selection of suitable control strategies without explicit model identification. We provide data-based conditions characterizing the (vector) relative degree and the stability of the zero dynamics, which are critical for ensuring proper performance of modern controllers. Our results cover both single- and multi-input/output settings of discrete-time linear systems. We further show how a continuous-time system can be reconstructed from three sampling discretizations obtained via Zero-order Hold at suitable sampling times, thus allowing the extension of the results to the combined data collected from these discretizations. All results can be applied directly to observed data sets using the proposed algorithms.


[79] 2602.02893

Gridless Full-Space DOA Estimation for STAR-RIS-Assisted Wireless Systems

Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) enable full-space ($0^\circ$--$360^\circ$) signal coverage, making them a compelling platform for integrated sensing and communication in next-generation wireless networks. In this paper, we investigate gridless direction-of-arrival (DOA) estimation across the full spatial domain in STAR-RIS-assisted systems operating with a single RF sensing chain. We show that the coupled reflection-transmission mechanism of STAR-RIS induces a multichannel finite-rate-of-innovation (FRI) structure in the received signal, which enables casting DOA estimation as a structured low-rank recovery problem without angular grid discretization. Building on this observation, we develop a proximal gradient descent algorithm with alternating projections onto a block-Hankel matrix set, enabling robust angle retrieval from limited measurements. Two practically relevant STAR-RIS configurations are addressed: element-wise uniform and nonuniform energy-splitting designs, each handled through a dedicated lifting strategy that preserves the underlying algebraic structure. A Ziv-Zakai bound is derived for the coupled full-space sensing model as a performance benchmark across the full SNR range. Numerical results show that the proposed methods consistently outperform grid-based baselines, achieving sub-degree accuracy within $\pm 60^\circ$ of boresight at comparable or lower computational cost.


[80] 2603.09421

Distributionally robust two-stage model predictive control: adaptive constraint tightening with stability guarantee

This paper proposes a two-stage distributionally robust model predictive control (TSDR-MPC) scheme for stochastic disturbances with unknown time-varying means and covariances. By defining a Wasserstein ambiguity set on the disturbance-to-constraint space, constraint violation penalties are formulated as a second-stage problem, enabling adaptive tightening. A finitely convergent cutting-plane algorithm is developed for real-time implementation. The framework naturally degrades to deterministic MPC as uncertainty vanishes, without pre-specified tightening parameters. Theoretical guarantees include feasibility, finite-time termination, and an asymptotic average cost bound. Numerical simulations validate its adaptability and robustness.


[81] 2604.04217

Towards 6G Single-Anchor Vehicle Localization Exploiting Radio-Reflective Road Markings in Tunnel Environments

Accurate vehicular localization remains a key challenge for cooperative intelligent transport systems (C-ITS), especially in areas without global navigation satellite system (GNSS) coverage, such as road tunnels. This paper proposes a novel vehicle positioning method with a single anchor equipped with multiple antennas, exploiting near-field (NF) propagation and passive radio-reflective structures deployed along the GNSS-denied tunnel. The method assumes a wideband vehicle-to-everything (V2X) communication between the vehicle and the anchor, in line with the undergoing standardization of cellular V2X beyond 5G. We first derive the validity condition that allows us to approximate the multipath channel with a single reflector point, defining a geometry validity bound on the number of antennas that can be employed. Building on this result, we propose JAVELIN, a 6G-compatible single-anchor localization framework that leverages tensor-based NF parameter estimation, adaptive NF/far-field (FF) processing, and recursive Bayesian tracking to enable sub-meter positioning without multi-anchor synchronization. The method integrates angle, delay difference, and curvature measurements into a variable-dimension extended Kalman filter with gated nearest-neighbor association, enabling operation without prior environmental knowledge. Radio-reflective road markings are further introduced to enhance geometric diversity. Simulation results in realistic tunnel scenarios demonstrate accurate and robust localization under different conditions, outperforming state-of-the-art single-anchor approaches and benefiting from passive reflector deployment


[82] 2605.07529

Stochastic Differential Dynamic Programming for Trajectory Optimization under Partial Observability

Designing spacecraft trajectories remains challenging in the presence of stochastic effects such as maneuver execution errors and observation uncertainties. Although covariance control and belief-space planning provide useful tools for designing robust control policies and information-aware trajectories under uncertainty, practical methods remain limited for partially observable trajectory optimization problems in which trajectory design, orbit determination, and correction maneuver planning are tightly coupled. This paper presents a stochastic differential dynamic programming algorithm for such coupled problems. The proposed method optimizes the nominal control sequence and feedback gains subject to a belief-state transition model and general mission constraints, explicitly accounting for the dependence of covariance propagation on the nominal trajectory without relying on the separation principle. Numerical examples demonstrate that the proposed algorithm produces navigation-aware and uncertainty-robust solutions across a range of dynamical systems, observation models, and uncertainty levels.


[83] 2605.25217

Backstepping Control of First-Order Hyperbolic Equations in Arbitrary Dimensions with Non-Trapping Characteristics

This paper presents a backstepping approach for the boundary control of first-order hyperbolic equations with spatially varying coefficients posed on domains of arbitrary dimension. The method is based on a change of variables induced by the characteristic flow of the time-invariant transport operator, transforming the original multidimensional system into a continuum of decoupled one-dimensional hyperbolic equations evolving along individual characteristic curves. A backstepping controller is then designed for each equation in the decomposition, and the resulting control laws are reassembled in the original coordinates to achieve finite-time stabilization of the full system. The framework relies on the existence of characteristic curves foliating the spatial domain, with uniformly bounded transit times (non-trapping).


[84] 2606.03942

Stability Analysis for Autoregressive Sampling Sets

Motivated by recent developments in stochastic modeling of clock jitter in Analog-to-Digital Converters (ADCs) as autoregressive processes of order one (AR(1)), we study the density and stability properties of AR(1)-jittered sampling sets for Paley-Wiener signals. We show that, despite having the correct asymptotic density both on average and almost surely, such sets almost surely fail to be stable sampling sets. We complement this negative result with a finite-dimensional analysis, showing that the corresponding jittered sinc matrices are nonetheless well-conditioned with high probability.


[85] 2606.09141

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Recent progress in speech dialogue systems requires Text-to-Speech (TTS) models to be faster and more responsive. Modern speech dialogue systems impose two primary requirements on TTS models: low latency and support for streaming inputs and outputs. However, most existing single-codebook LLM-based TTS methods rely on multi-stage pipelines that lack native streaming capabilities. These systems typically suffer from high end-to-end latency due to slow autoregressive prediction and multi-step flow matching. To address these limitations, we propose FlashTTS, an open-source and low-latency streaming TTS framework. FlashTTS introduces a lagged multi-track architecture that natively processes streaming text and speech inputs, thereby eliminating the need for sentence-level buffering. To accelerate acoustic generation, we integrate parallel Multi-Token Prediction (MTP) with an X-pred mean flow matching decoder. This configuration achieves high-fidelity token-to-mel generation in exactly two function evaluations (2-NFE). By jointly optimizing input processing and decoding efficiency, FlashTTS offers a practical foundation for real-time speech dialogue systems. Experiments show that FlashTTS substantially reduces First-Packet Latency to 325ms compared to robust streaming baselines, all while preserving strong zero-shot voice cloning and cross-lingual intelligibility. Speech samples are available. The model code and checkpoints will be released as open source.


[86] 2606.09677

MeCo: One-Step MeanFlow-based Corrector for Multi-Channel Speech Separation

While discriminative models for multi-channel speech separation excel in reference-based metrics, they often exhibit suboptimal human listening quality. To address this, we propose a novel MeanFlow-based one-step generative corrector (MeCo). MeCo learns a conditional average velocity field to map discriminative estimates directly onto the clean speech manifold in a single step. To maximize one-step generation performance, we introduce Data-Space Optimization (DSO). DSO integrates an $\mathbf{x}_r$-loss, which penalizes prediction errors on longer displacement intervals to serve as a generative objective for human listening quality, with an Endpoint SI-SDR loss that directly optimizes terminal signal fidelity. Experiments demonstrate that MeCo achieves state-of-the-art (SOTA) performance with minimal computational overhead, simultaneously achieving superior signal fidelity and human listening quality in both in-domain and out-of-domain scenarios.


[87] 2203.03018

RAPTOR: Rapid Aerial Pickup and Transport of Objects by Robots

Rapid aerial grasping through robots can lead to many applications that utilize fast and dynamic picking and placing of objects. Rigid grippers traditionally used in aerial manipulators require high precision and specific object geometries for successful grasping. We propose RAPTOR, a quadcopter platform combined with a custom Fin Ray gripper to enable more flexible grasping of objects with different geometries, leveraging the properties of soft materials to increase the contact surface between the gripper and the objects. To reduce the communication latency, we present a new lightweight middleware solution based on Fast DDS (Data Distribution Service) as an alternative to ROS (Robot Operating System). We show that RAPTOR achieves an average of 83% grasping efficacy in a real-world setting for four different object geometries while moving at an average velocity of 1 m/s during grasping. In a high-velocity setting, RAPTOR supports up to four times the payload compared to previous works. Our results highlight the potential of aerial drones in automated warehouses and other manipulation applications where speed, swiftness, and robustness are essential while operating in hard-to-reach places.


[88] 2412.11449

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

We propose WHISPER-GPT: A generative large language model (LLM) for speech and music that allows us to work with continuous audio representations and discrete tokens simultaneously as part of a single architecture. There has been a huge surge in generative audio, speech, and music models that utilize discrete audio tokens derived from neural compression algorithms, e.g. ENCODEC. However, one of the major drawbacks of this approach is handling the context length. It blows up for high-fidelity generative architecture if one has to account for all the audio contents at various frequencies for the next token prediction. By combining continuous audio representation like the spectrogram and discrete acoustic tokens, we retain the best of both worlds: Have all the information needed from the audio at a specific time instance in a single token, yet allow LLM to predict the future token to allow for sampling and other benefits discrete space provides. We show how our architecture improves the perplexity and negative log-likelihood scores for the next token prediction compared to a token-based LLM for speech and music.


[89] 2504.17080

Geometric Formulation of Unified Force-Impedance Control on SE(3) for Robotic Manipulators

In this paper, we present an impedance control framework on the SE(3) manifold, which enables force tracking while guaranteeing passivity. Building upon the unified force-impedance control (UFIC) and our previous work on geometric impedance control (GIC), we develop the geometric unified force impedance control (GUFIC) to account for the SE(3) manifold structure in the controller formulation using a differential geometric perspective. As in the case of the UFIC, the GUFIC utilizes energy tank augmentation for both force-tracking and impedance control to guarantee the manipulator's passivity relative to external forces. This ensures that the end effector maintains safe contact interaction with uncertain environments and tracks a desired interaction force. Moreover, we resolve a non-causal implementation problem in the UFIC formulation by introducing velocity and force fields. Due to its formulation on SE(3), the proposed GUFIC inherits the desirable SE(3) invariance and equivariance properties of the GIC, which helps increase sample efficiency in machine learning applications where a learning algorithm is incorporated into the control law. The proposed control law is validated in a simulation environment under scenarios requiring tracking an SE(3) trajectory, incorporating both position and orientation, while exerting a force on a surface. The codes are available at this https URL.


[90] 2508.07048

Whisfusion: Parallel ASR Decoding with Masked Diffusion

Autoregressive (AR) encoder-decoder models dominate high-quality multilingual ASR, but their left-to-right decoders make inference latency scale with transcript length. A natural alternative, CTC-style non-autoregressive (NAR) systems avoid this bottleneck but their conditional independence assumption sacrifices transcript-level generative modeling. Masked diffusion language models (e.g., LLaDA, MDLM) offer a competitive NAR text-generation approach. We ask whether such models can bring NAR ASR into the accuracy regime of strong AR ASR systems while removing the left-to-right bottleneck. We propose Whisfusion, which trains a dedicated masked diffusion decoder from scratch on top of frozen Whisper-large-v3 audio embeddings, denoising masked transcripts in just a few steps. We train on ~68k hours of 11-language speech with high-mask specialization to align training with the fully masked starting point of inference, and decode via Parallel Diffusion Decoding. Whisfusion surpasses Whisper-large-v3 on group-average accuracy across English, European, and CJK benchmarks, while running 4-5x faster, additionally surpassing Whisper-turbo in both accuracy and throughput. It reaches accuracy competitive with Canary and Qwen3-ASR while running 3-7x faster. These results establish masked diffusion as a Pareto-competitive non-autoregressive paradigm for high-throughput multilingual transcription. Code and model weights are available at this https URL.


[91] 2508.18540

Real-time 3D Visualization of Radiance Fields on Light Field Displays

Radiance fields, including their recent efficient forms such as 3D Gaussian Splatting and Sparse Voxels, have revolutionized photorealistic 3D scene visualization by enabling high-fidelity reconstruction of complex environments, making them a natural match for light field displays. However, integrating these technologies presents significant computational challenges, as light field displays require many high-resolution renderings from slightly shifted viewpoints, while radiance fields rely on computationally intensive volume rendering, which is intractable to achieve real-time speeds even with efficient scene representations. In this paper, we propose a unified and efficient framework for real-time radiance field rendering on light field displays. Rather than re-rendering each view independently, our method converts the input radiance field into shared intermediate sweeping planes that can be efficiently composited into dense light-field views in a single pass. Our method prioritizes shared, non-directional plane caching for real-time performance, trading fine view-dependent color effects for a modest increase in intermediate memory usage. Our framework generalizes across different scene representations without retraining and avoids repeated computation across views. We further demonstrate a real-time interactive application on a Looking Glass display, achieving 200+ FPS at 512p across 45 rendered views and enabling seamless, immersive 3D interactive viewing experiences. On standard benchmarks, our method achieves up to 22x speedup compared to independently rendering each view, while largely preserving image quality.


[92] 2509.06188

Ignore Drift, Embrace Simplicity: Constrained Nonlinear Control through Driftless Approximation

We present a novel technique to drive a nonlinear system to reach a target state under input constraints. The proposed controller consists only of piecewise constant inputs, generated from a simple linear driftless approximation to the original nonlinear system. First, we construct this approximation using only the effect of the control input at the initial state. Next, we partition the time horizon into successively shorter intervals and show that optimal controllers for the linear driftless system result in a bounded error from a specified target state in the nonlinear system. We also derive conditions under which the input constraint is guaranteed to be satisfied. On applying the optimal control inputs, we show that the error monotonically converges to zero as the intervals become successively shorter, thus achieving arbitrary closeness to the target state with time. Using simulation examples on classical nonlinear systems, we illustrate how the presented technique is used to reach a target state while still satisfying input constraints. In particular, we show that our method completes the task even when assumptions of the underlying theory are violated.


[93] 2512.08280

Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making

Offline decision-making via diffusion models often produces trajectories that are misaligned with system dynamics, limiting their reliability for control. We propose Model Predictive Diffuser (MPDiffuser), a compositional diffusion framework that combines a diffusion planner with a dynamics diffusion model to generate task-aligned and dynamically plausible trajectories. MPDiffuser interleaves planner and dynamics updates during sampling, progressively correcting feasibility while preserving task intent. A lightweight ranking module then selects trajectories that best satisfy task objectives. The compositional design improves sample efficiency and adaptability by enabling the dynamics model to leverage diverse and previously unseen data independently of the planner. Empirically, we demonstrate consistent improvements over prior diffusion-based methods on unconstrained (D4RL) and constrained (DSRL) benchmarks, and validate practicality through deployment on a real quadrupedal robot.


[94] 2603.07238

Scaling Self-Supervised Speech Models Uncovers Deep Linguistic Relationships: Evidence from the Pacific Cluster

Similarities between language representations derived from Self-Supervised Speech Models (S3Ms) have been observed to primarily reflect geographic proximity or surface typological similarities driven by recent expansion or contact, potentially missing deeper genealogical signals. We investigate how scaling an S3M-based language identification system from 126 to 4,017 languages reshapes this topology, and find a non-linear effect: phylogenetic recovery stays flat up to the 1K scale, but the 4K model undergoes a qualitative shift, resolving both clear lineages and long-term linguistic contact. Most strikingly, a robust Pacific macro-cluster emerges, grouping genealogically unrelated Papuan, Oceanic, and Australian languages, and we trace its driver to a concentrated encoding that captures shared acoustic signatures such as global energy dynamics. These results suggest that massive S3Ms internalize multiple layers of language history, offering a promising perspective for computational phylogenetics and the study of language contact.


[95] 2603.11482

AnimeScore: A Preference-Based Dataset and Framework for Evaluating Anime-Like Speech Style

Evaluating 'anime-like' voices currently relies on costly subjective judgments, yet no standardized objective metric exists. A key challenge is that anime-likeness, unlike naturalness, lacks a shared absolute scale, making conventional Mean Opinion Score (MOS) protocols unreliable. To address this gap, we propose AnimeScore, a preference-based framework for automatic anime-likeness evaluation via pairwise ranking. We collect 15,000 pairwise judgments from 187 evaluators with free-form descriptions, and acoustic analysis reveals that perceived anime-likeness is driven by controlled resonance shaping, prosodic continuity, and deliberate articulation rather than simple heuristics such as high pitch. We show that handcrafted acoustic features reach a 69.3% AUC ceiling, while SSL-based ranking models achieve up to 90.8% AUC, providing a practical metric that can also serve as a reward signal for preference-based optimization of generative speech models.


[96] 2605.17111

Symmetry-Aware Convex Shrinkage for High-Dimensional Covariance Estimation

We develop a class of data-adaptive shrinkage estimators for high-dimensional covariance estimation in which the shrinkage target is a Reynolds projection of the sample covariance under a finite symmetry group selected from a candidate library by held-out predictive performance. The class generalizes the convex shrinkage estimator of Ledoit and Wolf by replacing the scalar-identity target with a structured target derived from a symmetry group when one is available, and generalizes the group-symmetric maximum-likelihood estimator of Shah and Chandrasekaran by combining structural targeting with adaptive convex shrinkage and by selecting the group from data rather than treating it as prespecified. A two-tier procedure performs the group selection: a universal per-candidate evaluation based on held-out negative log-likelihood, optionally preceded by a domain-specific step that constructs the candidate library from structural priors. We establish a finite-sample regret bound for the held-out calibration of the convex combination weight, an oracle inequality for the data-driven group selection, and a quantitative sufficient-match condition under which the proposed estimator dominates Ledoit-Wolf shrinkage in Frobenius mean-squared error. The procedure is illustrated on six real-data problems spanning finance (S&P~500 daily returns), climate (NOAA OISST sea-surface temperature anomalies), genomics (TCGA-BRCA gene expression), radio signal processing (RadioML 2018.A), astronomical imaging (Galaxy10 DECaLS), and natural image patches (CIFAR-10 with a CIFAR-10.1 distribution-shift companion). An empirical comparison is also made against the Bayesian permutation-symmetry estimator of Chojecki and colleagues. Outside the few-shot regime, where structural priors carry the most information per observation, Ledoit-Wolf shrinkage remains the appropriate baseline.


[97] 2605.29996

A Lumped RC Equivalent Circuit of Head Tissues for Dispersive Neuro-Electromagnetic Modeling

Accurate modeling of electric potential and current distribution in head tissues is crucial for the design and evaluation of neuro-sensing and neuro-stimulation systems operating in the sub-megahertz frequency range. Numerical methods are widely employed in electromagnetic simulations, however their computational cost can limit their applicability to rapid prototyping, real-time simulations, and circuit-level integration. In this work, we introduce a lumped RC equivalent circuit model that reproduces the electrical behavior of a canonical three-layer spherical head geometry over a frequency range up to 50 kHz. The model accounts for frequency-dependent tissue conductivity and permittivity to capture dispersive effects, employing complex conductivity in the electro-quasi-static (EQS) regime. The circuit topology uses a minimal set of impedance elements in order to represent the essential mechanisms of electric signal propagation. Validation was performed using a dipolar brain source configuration for scalp voltage peak estimation, showing close agreement with semi-analytical solutions across different skull thicknesses and dipole eccentricities. In addition, the impact of tissue dispersion and capacitive branches on the model predictions was quantitatively assessed, showing their contribution to the overall fidelity of the proposed approach.


[98] 2606.03803

LiveBand: Live Accompaniment Generation in the Audio Domain

We present LiveBand, a real-time system that generates high-fidelity music accompaniments to live audio input, respecting strict causal constraints. Our method trains a causal transformer generator in the continuous latent space of a pre-trained causal audio autoencoder, using adversarial sequence-level supervision from a discriminator. At each timestep, the generator receives only the causally available mix context and Gaussian noise, and predicts accompaniment latents without access to future mix frames or ground-truth target latents. Training is performed in a single parallel forward pass under causal masking, while streaming inference proceeds autoregressively with a rolling attention state. The model's training and inference computations are matched by design, eliminating teacher forcing and the associated exposure bias. On a multi-instrument music accompaniment benchmark, LiveBand improves over prior work on objective measures of audio quality, beat alignment, and mix adherence, while enabling real-time streaming generation without lookahead into the future on consumer hardware.