New articles on Electrical Engineering and Systems Science


[1] 2603.11091

Metaheuristic algorithm parameters selection for building an optimal hierarchical structure of a control system: a case study

Metaheuristic algorithms are currently widely used to solve a variety of optimization problems across various industries. This article discusses the application of a metaheuristic algorithm to optimize the hierarchical architecture of an industrial distributed control system. The success of the algorithm depends largely on the choice of starting conditions and algorithm parameters. We examine the impact of parameter selection on the convergence of a modified ant colony algorithm and provide recommendations for tuning the algorithm to achieve optimal results for a specific industrial problem. The findings presented in this article can also be applied to other combinatorial optimization problems.


[2] 2603.11205

Can LLMs Help Localize Fake Words in Partially Fake Speech?

Large language models (LLMs), trained on large-scale text, have recently attracted significant attention for their strong performance across many tasks. Motivated by this, we investigate whether a text-trained LLM can help localize fake words in partially fake speech, where only specific words within a speech are edited. We build a speech LLM to perform fake word localization via next token prediction. Experiments and analyses on AV-Deepfake1M and PartialEdit indicates that the model frequently leverages editing-style pattern learned from the training data, particularly word-level polarity substitutions for those two databases we discussed, as cues for localizing fake words. Although such particular patterns provide useful information in an in-domain scenario, how to avoid over-reliance on such particular pattern and improve generalization to unseen editing styles remains an open question.


[3] 2603.11241

Cough activity detection for automatic tuberculosis screening

The automatic identification of cough segments in audio through the determination of start and end points is pivotal to building scalable screening tools in health technologies for pulmonary related diseases. We propose the application of two current pre-trained architectures to the task of cough activity detection. A dataset of recordings containing cough from patients symptomatic for tuberculosis (TB) who self-present at community-level care centres in South Africa and Uganda is employed. When automatic start and end points are determined using XLS-R, an average precision of 0.96 and an area under the receiver-operating characteristic of 0.99 are achieved for the test set. We show that best average precision is achieved by utilising only the first three layers of the network, which has the dual benefits of reduced computational and memory requirements, pivotal for smartphone-based applications. This XLS-R configuration is shown to outperform an audio spectrogram transformer (AST) as well as a logistic regression baseline by 9% and 27% absolute in test set average precision respectively. Furthermore, a downstream TB classification model trained using the coughs automatically isolated by XLS-R comfortably outperforms a model trained on the coughs isolated by AST, and is only narrowly outperformed by a classifier trained on the ground truth coughs. We conclude that the application of large pre-trained transformer models is an effective approach to identifying cough end-points and that the integration of such a model into a screening tool is feasible.


[4] 2603.11243

Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts

We propose self-speculative decoding for speech-aware LLMs by using the CTC encoder as a draft model to accelerate auto-regressive (AR) inference and improve ASR accuracy. Our three-step procedure works as follows: (1) if the frame entropies of the CTC output distributions are below a threshold, the greedy CTC hypothesis is accepted as final; (2) otherwise, the CTC hypothesis is verified in a single LLM forward pass using a relaxed acceptance criterion based on token likelihoods; (3) if verification fails, AR decoding resumes from the accepted CTC prefix. Experiments on nine corpora and five languages show that this approach can simultaneously accelerate decoding and reduce WER. On the HuggingFace Open ASR benchmark with a 1B parameter LLM and 440M parameter CTC encoder, we achieve a record 5.58% WER and improve the inverse real time factor by a factor of 4.4 with only a 12% relative WER increase over AR search. Code and model weights are publicly available under a permissive license.


[5] 2603.11264

Multi-Robot Multitask Gaussian Process Estimation and Coverage

Coverage control is essential for the optimal deployment of agents to monitor or cover areas with sensory demands. While traditional coverage involves single-task robots, increasing autonomy now enables multitask operations. This paper introduces a novel multitask coverage problem and addresses it for both the cases of known and unknown sensory demands. For known demands, we design a federated multitask coverage algorithm and establish its convergence properties. For unknown demands, we employ a multitask Gaussian Process (GP) framework to learn sensory demand functions and integrate it with the multitask coverage algorithm to develop an adaptive algorithm. We introduce a novel notion of multitask coverage regret that compares the performance of the adaptive algorithm against an oracle with prior knowledge of the demand functions. We establish that our algorithm achieves sublinear cumulative regret, and numerically illustrate its performance.


[6] 2603.11265

Conduction-Diffusion in N-Dimensional settings as irreversible port-Hamiltonian systems

This work extends previous 1D irreversible port-Hamiltonian system (IPHS) formulations to boundary-controlled ND distributed parameter systems describing conduction-diffusion fluid phenomena. Within a unified and thermodynamically consistent framework, we show that conduction and diffusion can be represented through a single coherent structure that preserves global energy balance and ensures a correct characterization of entropy production. The resulting formulation provides a foundation for the systematic modeling and control of complex multi-physical processes governed by coupled transport mechanisms in N dimensions. In the longer term, this framework opens the door to structure-preserving numerical schemes capable of enforcing thermodynamic principles directly at the discretized level.


[7] 2603.11280

Performance Bounds and Robust Filtering for LEO Inter-Satellite Synchronization under Cross-Epoch Doppler Coupling

Low Earth orbit (LEO) inter-satellite links (ISLs) must achieve joint synchronization and ranging under severe hardware impairments, namely oscillator phase noise, clock drift, and measurement outliers, exacerbated by rapid relative dynamics exceeding 7~km/s. In coherent Doppler processing, the frequency observable depends on the \emph{difference} between consecutive carrier phase states, creating a cross-epoch coupling structure that fundamentally affects estimation-theoretic performance limits. This paper makes three contributions. First, we prove analytically that this cross-epoch Doppler coupling is \emph{necessary} to avoid unbounded carrier phase uncertainty: without it, phase variance grows linearly without bound. Second, we derive a posterior Cramér-Rao bound (PCRB) via the Tichavský recursion that explicitly incorporates the resulting 10$\times$10 block information structure. Third, we propose a hybrid robust filtering framework combining hard gating for impulsive cycle-slip outliers with Huber M-estimation for heavy-tail contamination, using TASD-aware innovation covariance to account for cross-epoch uncertainty in residual normalization. Monte Carlo simulations at Ka-band confirm that the PCRB accurately lower-bounds estimator performance under nominal conditions, while the hybrid method reduces 95th-percentile phase error by 27--93\% compared to standard extended Kalman filtering across different outlier regimes.


[8] 2603.11294

EquivAnIA: A Spectral Method for Rotation-Equivariant Anisotropic Image Analysis

Anisotropic image analysis is ubiquitous in medical and scientific imaging, and while the literature on the subject is extensive, the robustness to numerical rotations of numerous methods remains to be studied. Indeed, the principal directions and angular profile of a rotated image are often expected to rotate accordingly. In this work, we propose a new spectral method for the anisotropic analysis of images (EquivAnIA) using two established directional filters, namely cake wavelets, and ridge filters. We show that it is robust to numerical rotations throughout extensive experiments on synthetic and real-world images containing geometric structures or textures, and we also apply it successfully for a task of angular image registration. The code is available at this https URL


[9] 2603.11344

Hybrid eTFCE-GRF: Exact Cluster-Size Retrieval with Analytical p-Values for Voxel-Based Morphometry

Threshold-free cluster enhancement (TFCE) integrates cluster extent across thresholds to improve voxel-wise neuroimaging inference, but permutation testing makes it prohibitively slow for large datasets. Probabilistic TFCE (pTFCE) uses analytical Gaussian random field (GRF) p-values but discretises the threshold grid. Exact TFCE (eTFCE) eliminates discretisation via a union-find data structure but still requires permutations. We combine eTFCE's union-find for exact cluster-size retrieval with pTFCE's analytical GRF inference. The union-find builds the cluster hierarchy in one pass over sorted voxels and enables exact size queries at any threshold; GRF theory then converts these sizes to analytical p-values without permutations. Validation on synthetic phantoms (64^3, 80 subjects): FWER controlled at nominal level (0/200 null rejections, 95% CI [0.0%, 1.9%]); power matches baseline pTFCE (Dice >= 0.999); smoothness error below 1%; concordance r > 0.99. On UK Biobank (N=500) and IXI (N=563), significance maps form strict subsets of reference R pTFCE, which supports conservative error control. Implemented in pytfce (pip install pytfce): baseline completes whole-brain VBM in ~5s (75x faster than R pTFCE), hybrid in ~85s (4.6x faster) with exact cluster sizes; both >1000x faster than permutation TFCE.


[10] 2603.11349

Contractivity of Multi-Stage Runge-Kutta Dynamics

Many control, optimization, and learning algorithms rely on discretizations of continuous-time contracting systems, where preservation of contractivity under numerical integration is key for stability, robustness, and reliable fixed-point computation. In this paper, we establish conditions under which multi-stage Runge-Kutta methods preserve strong contractivity when discretizing infinitesimally contractive continuous-time systems. For explicit Runge-Kutta methods, preservation conditions are derived by bounding Lipschitz constants of the associated composite stage mappings, leading to coefficient-dependent criteria. For implicit methods, the algebraic structure of the stage equations enables explicit conditions on the Runge-Kutta coefficients that guarantee preservation of strong contractivity. In the implicit case, these results extend classical guarantees, typically limited to weak contractivity in the Euclidean metric, to strong contractivity with respect to the $\ell_1$-, $\ell_2$-, and $\ell_\infty$-norms. In addition, we study well-definedness of implicit methods through an auxiliary continuous-time system associated with the stage equations. We show that strong infinitesimal contractivity of this auxiliary system is sufficient to guarantee unique solvability of the stage equations. This analysis generalizes standard well-definedness conditions and provides a dynamic implementation approach that avoids direct solution of the implicit algebraic equations.


[11] 2603.11362

RHOSI: Efficient Anti-Jamming Resource Allocation with Holographic Surfaces in UAV-enabled ISAC

This paper investigates the susceptibility of Integrated Sensing and Communication (ISAC) systems to hostile jamming, focusing on an aerial Reconfigurable Holographic Surface (RHS)-aided unmanned aerial vehicle (UAV). The proposed framework, termed RHOSI, enhances ISAC's resilience by dynamically shaping the wireless propagation environment. Specifically, RHOSI introduces a strategy to improve jamming resistance by jointly optimizing transmit beamforming at the hybrid base station, RHS phase shift configuration, and UAV spatial deployment, while ensuring the required echo signal-to-interference-plus-noise ratios for reliable sensing. The resulting non-linear optimization problem features highly coupled variables, which are decomposed into sub-problems and solved using an alternating optimization (AO) approach. Simulation results confirm the practicality and effectiveness of RHOSI in significantly improving the throughput and robustness of ISAC under adversarial jamming.


[12] 2603.11502

ISAC-Enabled Multi-UAV Collaborative Target Sensing for Low-Altitude Economy

Integrated sensing and communication (ISAC) has attracted growing research interests to facilitate the large-scale development of the low-altitude economy (LAE). However, the high dynamics of low-altitude targets may overwhelm fixed ISAC systems, particularly at the edge of their coverage or in blind zones. Driven by high flexibility, unmanned aerial vehicle (UAV)-assisted ISAC can provide more freedom of design to enhance communication and sensing abilities. In this paper, we propose an ISAC-enabled multi-UAV dynamic collaborative target sensing scheme, where UAVs can dynamically adjust their flight and resource allocation for cooperative sensing of mobile target through communicating with the terrestrial cellular network with ISAC signals. To achieve the precise sensing of the dynamic target, the posterior Cramer-Rao bound (PCRB) for the target state is derived. Subsequently, the PCRB minimization problem is formulated by jointly optimizing the UAV-BS association, UAVs' trajectories and bandwidth allocation, subject to the communication requirements for the UAVs. However, the problem is challenging since it involves non-convex and implicit objective function with coupled optimization variables. For a fast implementation of sensing and tracking, we propose a low-complexity iterative algorithm that can efficiently obtain a sub-optimal solution to the problem. Specifically, the UAV-BS association is first determined by the communication-optimal solution. Then the UAVs' trajectories and bandwidth allocation are alternatively optimized based on the descent direction search algorithm. Finally, numerical results are provided to validate the superiority of our proposed designs as compared to various benchmarks.


[13] 2603.11516

Standard Condition Number-Based Detection for MIMO ISAC Systems under Noise Uncertainty

This paper presents a unified analytical and optimization framework for Standard Condition Number (SCN)-based detection in MIMO Integrated Sensing and Communication (ISAC) systems operating under noise uncertainty. Conventional detectors such as the Likelihood Ratio Test (LRT) and Energy Detector (ED) suffer from false-alarm inflation when interference or jamming alters the noise covariance. To overcome this limitation, the SCN detector, defined as the ratio of the largest to smallest eigenvalues of the sample covariance matrix is analytically characterized for the first time in an ISAC setting. Closed-form expressions for the false-alarm and detection probabilities are derived using random matrix theory for a two-antenna sensing receiver and generalized to arbitrary MIMO dimensions. The analysis proves that the SCN maintains a constant false alarm rate (CFAR) property and remains resilient to covariance mismatch, providing theoretical justification for its robustness in dynamic environments. Leveraging these results, a tractable ISAC power-allocation problem is formulated to minimize total detection error subject to communication rate and power constraints, yielding an interpretable sequential solution. Numerical evaluations verify the theory and demonstrate that the proposed SCN detector consistently outperforms LRT and eigenvalue-based benchmarks, particularly under strong interference and jamming typical of modern multiuser networks.


[14] 2603.11548

Exploiting Skyrmions in Free-Space Optical Communication

In this paper, we propose a novel free-space optical (FSO) communication system utilizing optical skyrmions. We introduce a scheme referred to as skyrmion number modulation (SkM), which employs index modulation by encoding information onto the skyrmion number, a topological invariant preserved during free-space propagation. This topological nature offers the potential for inherent robustness against atmospheric turbulence-induced wavefront distortions, which limit the performance of conventional FSO systems. More specifically, we demonstrate that the fluctuation of the received skyrmion number is mitigated by a proposed intensity-based masking technique. Finally, our performance analysis based on a discrete memoryless channel framework confirms that the proposed system exhibits near-ideal robustness under weak turbulence and supports high-order modulation in moderate regimes.


[15] 2603.11582

Multi-Agent Reinforcement Learning for UAV-Based Chemical Plume Source Localization

Undocumented orphaned wells pose significant health and environmental risks to nearby communities by releasing toxic gases and contaminating water sources, with methane emissions being a primary concern. Traditional survey methods such as magnetometry often fail to detect older wells effectively. In contrast, aerial in-situ sensing using unmanned aerial vehicles (UAVs) offers a promising alternative for methane emission detection and source localization. This study presents a robust and efficient framework based on a multi-agent deep reinforcement learning (MARL) algorithm for the chemical plume source localization (CPSL) problem. The proposed approach leverages virtual anchor nodes to coordinate UAV navigation, enabling collaborative sensing of gas concentrations and wind velocities through onboard and shared measurements. Source identification is achieved by analyzing the historical trajectory of anchor node placements within the plume. Comparative evaluations against the fluxotaxis method demonstrate that the MARL framework achieves superior performance in both localization accuracy and operational efficiency.


[16] 2603.11639

Learnable Template Matching Approach for Micro-Deformation Monitoring based on Integrated Sensing and Communication Platform

Existing integrated sensing and communication (ISAC) platforms fail to fully utilize the shared spectrum and aperture resources for sensing, resulting in poor sensing performance. Specifically, weak target sensing on the ISAC platform, such as micro-deformation monitoring (mDM), suffers from inaccurate measurements due to poor sensing quality. In this paper, we propose an AI-assisted approach to alleviate the effect of poor sensing quality in the ISAC system by effectively removing the clutter. We begin by modeling the environment clutter model as a combination of the deterministic and stochastic signals to represent urban coverage scenarios around the base station (BS). A clutter suppression optimization problem is formulated to extract the micro-deformation displacement (mDD) from the original ISAC signals. We then propose a learnable template-matching (LTM) approach to mitigate the influences of clutters, thereby enhancing sensing quality. In particular, the electromagnetic (EM) signal feature of the mDD is embedded into the network to strengthen the mDM signal, and clutter filters are incorporated to suppress environmental clutter. Numerical results illustrate the superiority of our proposed approach concerning convergence speed and accuracy in mDD prediction. By deploying our approach to the BS measurement, the simulation-only trained LTM exhibits impressive performance in environment clutter separation and mDD estimation.


[17] 2603.11666

Machine Learning-Based Analysis of Critical Process Parameters Influencing Product Quality Defects: A Real-World Case Study in Manufacturing

Quality control is an essential operation in manufacturing, ensuring products meet the necessary standards of quality, safety, and reliability. Traditional methods, such as visual inspections, measurements, and statistical techniques, help meet these standards but are often time-consuming, costly, and reactive. With the advent of AI/ML, manufacturers can shift from reactive to proactive approaches in quality control. This study applies ML-based models for predictive quality control in a real-world manufacturing setting. The case company produces castings for powertrain components in heavy vehicles, where poor control of core-making process parameters leads to costly defects. ML models were developed by analyzing data from two core-making machines, their processes, and maintenance logs to identify parameters associated with casting defects, enabling the prediction and prevention of potential defects before they occur. The results demonstrated good accuracy rates, helping quality and production teams identify and eliminate defective cores and thereby improving product quality and production efficiency.


[18] 2603.11669

SEMamba++: A General Speech Restoration Framework Leveraging Global, Local, and Periodic Spectral Patterns

General speech restoration demands techniques that can interpret complex speech structures under various distortions. While State-Space Models like SEMamba have advanced the state-of-the-art in speech denoising, they are not inherently optimized for critical speech characteristics, such as spectral periodicity or multi-resolution frequency analysis. In this work, we introduce an architecture tailored to incorporate speech-specific features as inductive biases. In particular, we propose Frequency GLP, a frequency feature extraction block that effectively and efficiently leverages the properties of frequency bins. Then, we design a multi-resolution parallel time-frequency dual-processing block to capture diverse spectral patterns, and a learnable mapping to further enhance model performance. With all our ideas combined, the proposed SEMamba++ achieves the best performance among multiple baseline models while remaining computationally efficient.


[19] 2603.11678

RAF: Relativistic Adversarial Feedback For Universal Speech Synthesis

We propose Relativistic Adversarial Feedback (RAF), a novel training objective for GAN vocoders that improves in-domain fidelity and generalization to unseen scenarios. Although modern GAN vocoders employ advanced architectures, their training objectives often fail to promote generalizable representations. RAF addresses this problem by leveraging speech self-supervised learning models to assist discriminators in evaluating sample quality, encouraging the generator to learn richer representations. Furthermore, we utilize relativistic pairing for real and fake waveforms to improve the modeling of the training data distribution. Experiments across multiple datasets show consistent gains in both objective and subjective metrics on GAN-based vocoders. Importantly, the RAF-trained BigVGAN-base outperforms the LSGAN-trained BigVGAN in perceptual quality using only 12\% of the parameters. Comparative studies further confirm the effectiveness of RAF as a training framework for GAN vocoders.


[20] 2603.11715

Affect Decoding in Phonated and Silent Speech Production from Surface EMG

The expression of affect is integral to spoken communication, yet, its link to underlying articulatory execution remains unclear. Measures of articulatory muscle activity such as EMG could reveal how speech production is modulated by emotion alongside acoustic speech analyses. We investigate affect decoding from facial and neck surface electromyography (sEMG) during phonated and silent speech production. For this purpose, we introduce a dataset comprising 2,780 utterances from 12 participants across 3 tasks, on which we evaluate both intra- and inter-subject decoding using a range of features and model embeddings. Our results reveal that EMG representations reliably discriminate frustration with up to 0.845 AUC, and generalize well across articulation modes. Our ablation study further demonstrates that affective signatures are embedded in facial motor activity and persist in the absence of phonation, highlighting the potential of EMG sensing for affect-aware silent speech interfaces.


[21] 2603.11716

Rotatable Antenna Enabled Covert Communication

Unlike conventional fixed-antenna architectures, rotatable antenna (RA) has shown great potential in enhancing wireless communication performance by exploiting additional spatial degrees of freedom (DoFs) in a cost-effective manner. In this letter, we propose a novel RA-enabled covert communication system, where an RA array-based transmitter (Alice) sends covert information to a legitimate user (Bob) in the presence of multiple wardens (Willies). To maximize the covert rate, we optimize the transmit beamforming vector and the rotational angles of individual RAs, subject to the constraints on covertness, transmit power, and antenna rotational range. To address the non-convex formulated problem, we decompose it into two subproblems and propose an efficient alternating optimization (AO) algorithm to solve the two subproblems iteratively, where the second-order cone programming (SOCP) method and successive convex approximation (SCA) approach are applied separately. Simulation results demonstrate that the proposed RA-enabled covert communication system can provide significantly superior covertness performance to other benchmark schemes.


[22] 2603.11738

Dimensional Scaling Laws for Continuous Fluid Antenna Systems

Consider the signal-to-noise ratio (SNR) of a continuous fluid antenna system (CFAS) operating over a Rayleigh fading channel. In this paper, we extend traditional system assumptions and consider spatially coherent isotropic correlation, continuous positioning of the antenna rather than discrete, and the use of multi-dimensional space (1D, 2D and 3D). By focusing on the upper tail of the received SNR distribution (the high SNR probability (HSP)), we are able to derive asymptotically exact closed-form formulas for the HSP. Finally, these results lead to scaling laws which describe the increase in the HSP as we employ more dimensions and the optimal CFAS dimensions.


[23] 2603.11739

BER Analysis and Optimization for Continuous RIS-Enabled NOMA

This letter investigates a novel uplink (UL) system that integrates power-domain non-orthogonal multiple access (PD-NOMA) with a continuous reconfigurable intelligent surface (CRIS). We analyze the effective CRIS-assisted channels under spatially correlated fading to accurately approximate the characteristic function of the cascaded channel. This allows the derivation of an expression for the bit error rate (BER), a key performance metric for UL PD-NOMA. We further utilize the derived BER expressions to introduce a joint optimization framework that minimizes the average BER via UL power allocation and dynamic RIS partitioning among the users. The analytical results are validated by simulations, and show that the proposed optimization scheme eliminates the BER floors that are associated with UL NOMA. The results also confirm the superiority of the optimized CRIS-NOMA scheme over conventional orthogonal multiple access (OMA) and non-optimized UL NOMA schemes.


[24] 2603.11740

On the Distribution of Matched Filtering with Continuous Aperture Arrays

Continuous aperture arrays (CAPAs) provide a theoretical upper bound on the performance of densely packed antenna arrays, but their analysis is limited by the lack of closed-form signal-to-noise ratio (SNR) distributions under realistic fading conditions. This paper derives accurate analytical expressions for the matched-filter SNR distribution of one-dimensional CAPAs in correlated Rayleigh environments under both the sinc and Jakes correlation models using the Karhunen-Loeve expansion. By applying a truncated hypoexponential model, we obtain accurate approximations for the probability density function and cumulative distribution function of the SNR that closely match simulations, including the outage probability region where precise characterization is critical. Compared to a standard gamma approximation, our approach provides significantly improved accuracy in this regime. Additionally, the CAPA system considered is shown to outperform discrete antenna arrays. The derived expressions enable tractable and accurate evaluation of CAPAs under practical channel models.


[25] 2603.11841

ReDimNet2: Scaling Speaker Verification via Time-Pooled Dimension Reshaping

We present ReDimNet2, an improved neural network architecture for extracting utterance-level speaker representations that builds upon the ReDimNet dimension-reshaping framework. The key modification in ReDimNet2 is the introduction of pooling over the time dimension within the 1D processing pathway. This operation preserves the nature of the 1D feature space, since 1D features remain a reshaped version of 2D features regardless of temporal resolution, while enabling significantly more aggressive scaling of the channel dimension without proportional compute increase. We introduce a family of seven model configurations (B0-B6) ranging from 1.1M to 12.3M parameters and 0.33 to 13 GMACS. Experimental results on VoxCeleb1 benchmarks demonstrate that ReDimNet2 improves the Pareto front of computational cost versus accuracy at every scale point compared to ReDimNet, achieving 0.287% EER on Vox1-O with 12.3M parameters and 13 GMACS.


[26] 2603.11845

Acoustic-to-Articulatory Inversion of Clean Speech Using an MRI-Trained Model

Articulatory acoustic inversion reconstructs vocal tract shapes from speech. Real-time magnetic resonance imaging (rt-MRI) allows simultaneous acquisition of both the acoustic speech signal and articulatory information. Besides the complexity of rt-MRI acquisition, the recorded audio is heavily corrupted by scanner noise and requires denoising to be usable. For practical use, it must be possible to invert speech recorded without MRI noise. In this study, we investigate the use of speech recorded in a clean acoustic environment as an alternative to denoised MRI speech. To this end we compare two signals from the same speaker with identical sentences which are aligned using phonetic segmentation. A model trained on denoised MRI speech is evaluated on both denoised MRI and clean speech. We also assess a model trained and tested only on clean speech. Results show that clean speech supports articulatory inversion effectively, achieving an RMSE of 1.56 mm, close to MRI-based performance.


[27] 2603.11847

Reconstruction of the Vocal Tract from Speech via Phonetic Representations Using MRI Data

Articulatory acoustic inversion aims to reconstruct the complete geometry of the vocal tract from the speech signal. In this paper, we present a comparative study of several levels of phonetic segmentation accuracy, together with a comparison to the baseline introduced in our previous work, which is based on Mel-Frequency Cepstral Coefficients (MFCCs). All the approaches considered are based on a denoised speech signal and aim to investigate the impact of incorporating phonetic information through three successive levels: an uncorrected automatic transcription, a temporally aligned phonetic segmentation, and an expert manual correction following alignment. The models are trained to predict articulatory contours extracted from vocal tract MRI images using an automatic contour tracking method. The results show that, among the models relying on phonetic representations, manual correction after alignment yields the best performance, approaching that of the baseline.


[28] 2603.11850

Deep Learning-based Assessment of the Relation Between the Third Molar and Mandibular Canal on Panoramic Radiographs using Local, Centralized, and Federated Learning

Impaction of the mandibular third molar in proximity to the mandibular canal increases the risk of inferior alveolar nerve injury. Panoramic radiography is routinely used to assess this relationship. Automated classification of molar-canal overlap could support clinical triage and reduce unnecessary CBCT referrals, while federated learning (FL) enables multi-center collaboration without sharing patient data. We compared Local Learning (LL), FL, and Centralized Learning (CL) for binary overlap/no-overlap classification on cropped panoramic radiographs partitioned across eight independent labelers. A pretrained ResNet-34 was trained under each paradigm and evaluated using per-client metrics with locally optimized thresholds and pooled test performance with a global threshold. Performance was assessed using area under the receiver operating characteristic curve (AUC) and threshold-based metrics, alongside training dynamics, Grad-CAM visualizations, and server-side aggregate monitoring signals. On the test set, CL achieved the highest performance (AUC 0.831; accuracy = 0.782), FL showed intermediate performance (AUC 0.757; accuracy = 0.703), and LL generalized poorly across clients (AUC range = 0.619-0.734; mean = 0.672). Training curves suggested overfitting, particularly in LL models, and Grad-CAM indicated more anatomically focused attention in CL and FL. Overall, centralized training provided the strongest performance, while FL offers a privacy-preserving alternative that outperforms LL.


[29] 2603.11877

Silent Speech Interfaces in the Era of Large Language Models: A Comprehensive Taxonomy and Systematic Review

Human-computer interaction has traditionally relied on the acoustic channel, a dependency that introduces systemic vulnerabilities to environmental noise, privacy constraints, and physiological speech impairments. Silent Speech Interfaces (SSIs) emerge as a transformative paradigm that bypasses the acoustic stage by decoding linguistic intent directly from the neuro-muscular-articulatory continuum. This review provides a high-level synthesis of the SSI landscape, transitioning from traditional transducer-centric analysis to a holistic intent-to-execution taxonomy. We systematically evaluate sensing modalities across four critical physiological interception points: neural oscillations, neuromuscular activation, articulatory kinematics (ultrasound/magnetometry), and pervasive active probing via acoustic or radio-frequency sensing. Critically, we analyze the current paradigm shift from heuristic signal processing to Latent Semantic Alignment. In this new era, Large Language Models (LLMs) and deep generative architectures serve as high-level linguistic priors to resolve the ``informational sparsity'' and non-stationarity of biosignals. By mapping fragmented physiological gestures into structured semantic latent spaces, modern SSI frameworks have, for the first time, approached the Word Error Rate usability threshold required for real-world deployment. We further examine the transition of SSIs from bulky laboratory instrumentation to ``invisible interfaces'' integrated into commodity-grade wearables, such as earables and smart glasses. Finally, we outline a strategic roadmap addressing the ``user-dependency paradox'' through self-supervised foundation models and define the ethical boundaries of ``neuro-security'' to protect cognitive liberty in an increasingly interfaced world.


[30] 2603.11886

Beyond the Limits of Rigid Arrays: Flexible Intelligent Metasurfaces for Next-Generation Wireless Networks

Following recent advances in flexible electronics and programmable metasurfaces, flexible intelligent metasurfaces (FIMs) have emerged as a promising enabling technology for next-generation wireless networks. A FIM is a morphable electromagnetic surface capable of dynamically adjusting its physical geometry to influence the radiation and propagation of electromagnetic waves. Unlike conventional rigid arrays, FIMs introduce an additional spatial degree of design freedom enabled by mechanical flexibility, which can enhance beamforming, spatial focusing, and adaptation to dynamic wireless environments. This added capability enables wireless systems to shape the propagation environment not only through electromagnetic tuning but also through controllable geometric reconfiguration. This article explores the potential of FIMs for next-generation wireless networks. We first introduce the main hardware architectures of FIMs and explain how they can be integrated into wireless communication systems. We then present representative application scenarios, highlighting the advantages of FIMs for future wireless networks and comparing them with other emerging flexible wireless technologies. To illustrate their potential impact, we present case studies comparing FIM-enabled architectures with conventional rigid-array systems, demonstrating the performance gains enabled by surface flexibility for both communication and sensing applications. Finally, we discuss key opportunities, practical challenges, and open research directions that must be addressed to fully realize the potential of FIM technology in future wireless communication systems.


[31] 2603.11905

Risk-Based Dynamic Thermal Rating in Distribution Transformers via Probabilistic Forecasting

Low voltage (LV) distribution transformers face accelerating demand growth while replacement lead times and costs continue to rise, making improved utilisation of existing assets essential. Static and conservative protection devices (PDs) in distribution transformers are inflexible and limit the available headroom of the transformer. This paper presents a probabilistic framework for dynamically forecasting optimal thermal protection settings. The proposed approach directly predicts the day-ahead scale factor which maximises the dynamic thermal rating of the transformer from historical load, temperature, and metadata using clustered quantile regression models trained on 644 UK LV transformers. Probabilistic forecasting quantifies overheating risk directly through the prediction percentile, enabling risk-informed operational decisions. Results show a 10--12\% additional capacity gain compared to static settings, with hotspot temperature risk matching the selected percentile, including under realistic temperature forecast errors. These results demonstrate a practical approach for distribution network operators to take advantage of PDs with adaptive settings to maximise capacity and manage risk on operational time scales.


[32] 2603.11918

Indirect and Direct Multiuser Hybrid Beamforming for Far-Field and Near-Field Communications: A Deep Learning Approach

Hybrid beamforming for extremely large-scale multiple-input multiple-output (XL-MIMO) systems is challenging in the near field because the channel depends jointly on angle and distance, and the multiuser interference (MUI) is strong. Existing deep learning methods typically follow either a decoupled design that optimizes analog beamforming without explicitly accounting for MUI, or an end-to-end (E2E) joint analog-digital optimization that can be unstable under nonconvex constant-modulus (CM), pronounced analog-digital coupling, and gradient pattern of sum-rate loss. To address both issues, we develop a complex-valued E2E framework based on a variant minimum mean square error (variant-MMSE) criterion, where the digital precoder is eliminated in closed form via Karush-Kuhn-Tucker (KKT) conditions so that analog learning is trained with a stable objective. The network employs a grouped complex-convolution sensing front-end for uplink (UL) measurements, a shared complex multi-layer perceptron (MLP) for per-user feature extraction, and a merged constant-modulus head to output the analog precoder. In the indirect mode, the network designs hybrid beamformers from estimated channel state information (CSI). In the direct mode where explicit CSI is unavailable, the network learns the sensing operator and the analog mapping from short pilots, after which additional pilots estimate the equivalent channel and enable a KKT closed-form digital precoder. Simulations show that the indirect mode approaches the performance of iterative variant-MMSE optimization with a complexity reduction proportional to the antenna number. In the direct mode, the proposed method improves spectral efficiency over sparse-recovery pipelines and recent deep learning baselines under the same pilot budget.


[33] 2603.11943

Emergency-Aware and Frequency-Constrained HVDC Planning for A Multi-Area Asynchronously Interconnected Grid

High-voltage direct current (HVDC) technology has played a crucial role for long-distance transmission of renewable power generation. However, the integration of large-capacity HVDC lines introduces significant frequency security challenges during HVDC fault emergencies. This paper proposes an emergency-aware and frequency-constrained HVDC planning method to optimize the capacity of inter-area HVDC tie-lines in a multi-area asynchronously interconnected grid. Firstly, a coordinated emergency frequency control scheme is proposed to allocate the emergency control resources during HVDC faults. Then, an enhanced system frequency response model integrating event-driven emergency frequency control is developed and a weighted oblique decision tree approach is employed to extract frequency nadir security constraints. The proposed planning model considers all potential HVDC fault emergencies while treating candidate HVDC capacities as decision variables. Simulation results demonstrate superior performance in balancing economic efficiency with frequency security requirements, providing a practical solution for inter-area HVDC planning.


[34] 2603.11959

Near-Field Multiuser Beam Training for XL-MIMO: An End-to-End Interference-Aware Approach with Pilot Limitations

Near-field propagation in extremely large-scale MIMO (XL-MIMO) enlarges the beam training (BT) search space by introducing an additional range dimension, which makes conventional codebook-based beam sweeping prohibitively expensive under limited pilot resources, especially for multiuser sub-connected hybrid architectures. This letter proposes a deep-learning-based interference-aware multiuser BT framework (DL-IABT) that directly predicts analog beam indices from a small number of uplink sensing measurements. By exploiting a subarray-level approximation, a far-field codebook is adopted to represent each subarray response with negligible mismatch. To enable end-to-end (E2E) learning, we derive a variant-MSE surrogate loss by eliminating the digital precoder through a closed-form MMSE solution from KKT conditions, which implicitly accounts for multiuser interference (MUI). The proposed network integrates a complex-valued sensing front-end, a shared complex-valued encoder, a Transformer-based multiuser predictor, and a scalable Gumbel--Softmax beam selection head. Simulation results show that DL-IABT achieves near-optimal sum-rate performance while providing markedly higher effective throughput under pilot overhead constraints.


[35] 2603.11978

Robust Parametric Microgrid Dispatch Under Endogenous Uncertainty of Operation- and Temperature-Dependent Battery Degradation

Batteries play a critical role in microgrid energy management by ensuring power balance, enhancing renewable utilization, and reducing operational costs. However, battery degradation poses a significant challenge, particularly under extreme temperatures. This paper investigates the optimal trade-off between battery degradation and operational costs in microgrid dispatch to find a robust cost-effective strategy from a full life-cycle perspective. A key challenge arises from the endogenous uncertainty (or decision-dependent uncertainty, DDU) of battery degradation: Dispatch decisions influence the probability distribution of battery degradation, while in turn degradation changes battery operation model and thus affects dispatch. In this paper, we first develop an XGBoost-based probabilistic degradation model trained on experimental data across varying temperature conditions. We then formulate a parametric model predictive control (MPC) framework for microgrid dispatch, where the weight parameters of the battery degradation penalty terms are tuned through long-term simulation of degradation and dispatch interactions. Case studies validate the effectiveness of the proposed approach.


[36] 2603.12014

Array Geometry-Centric Axial Sidelobe Interference Analysis for Near-Field Multi-User MIMO

With the deployment of large antenna arrays at high-frequency bands, future wireless communication systems are likely to operate in the radiative near-field (NF). Unlike far-field beam steering, NF beams can be focused on a spatial region with finite depth, enabling user multiplexing in both range and angle. In NF multiuser multiple-input multiple-output (MU-MIMO) systems, achievable rates are limited by interference arising from sidelobes in both the axial (range) and lateral (angle) dimensions. This work investigates how axial sidelobes (ASLs) vary with array geometry. Closed-form array gain expressions are derived to characterize ASLs for uniform planar arrays. Analytical results show that the uniform square array (USA) yields the lowest ASLs, followed by the uniform concentric circular array (UCCA), uniform linear array (ULA), and uniform circular array (UCA). Specifically, the USA achieves a peak sidelobe level (PSLL) of -17.6 dB versus -7.9 dB for the UCA. Numerical simulations confirm that the USA provides superior sidelobe suppression and highest sumrate performance.


[37] 2603.12027

A Joint JSCC-Resource Allocation Framework for QoS-Aware Semantic Communication in LEO Satellite-based EO Missions

In Earth observation (EO) missions with Low Earth orbit (LEO) satellites, high-resolution image acquisition generates a massive data volume that poses a significant challenge for transmission under the limited satellite power budget, while LEO movement introduces dynamic systems. To enable efficient image transmission, this paper employs semantic communication (SemCom) with joint source-channel coding (JSCC), which focuses on transmitting meaningful information to reduce power consumption. Under a quality-of-service (QoS) requirement defined by image reconstruction quality, this work aims to minimize the total transmit power by jointly optimizing the JSCC encoder-decoder parameters and resource allocation. However, the implicit relationship among JSCC parameters, link quality, and image quality, coupled with the presence of mixed integer-continuous variables, makes the problem difficult to solve directly. To address this, a curve-fitting model is proposed to approximate the JSCC compression-SNR-quality relationship. Then, the joint compression ratio-resource allocation (JCRRA) algorithm is proposed to address the underlying problem. Numerical results demonstrate that the proposed method achieves substantial power savings compared to both greedy algorithms and conventional transmission paradigms.


[38] 2603.12046

Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition

Audio-Visual Speech Recognition (AVSR) leverages both acoustic and visual information for robust recognition under noise. However, how models balance these modalities remains unclear. We present Dr. SHAP-AV, a framework using Shapley values to analyze modality contributions in AVSR. Through experiments on six models across two benchmarks and varying SNR levels, we introduce three analyses: Global SHAP for overall modality balance, Generative SHAP for contribution dynamics during decoding, and Temporal Alignment SHAP for input-output correspondence. Our findings reveal that models shift toward visual reliance under noise yet maintain high audio contributions even under severe degradation. Modality balance evolves during generation, temporal alignment holds under noise, and SNR is the dominant factor driving modality weighting. These findings expose a persistent audio bias, motivating ad-hoc modality-weighting mechanisms and Shapley-based attribution as a standard AVSR diagnostic.


[39] 2603.12098

Maximum-Entropy Random Walks on Hypergraphs

Random walks are fundamental tools for analyzing complex networked systems, including social networks, biological systems, and communication infrastructures. While classical random walks focus on pairwise interactions, many real-world systems exhibit higher-order interactions naturally modeled by hypergraphs. Existing random walk models on hypergraphs often focus on undirected structures or do not incorporate entropy-based inference, limiting their ability to capture directional flows, uncertainty, or information diffusion in complex systems. In this article, we develop a maximum-entropy random walk framework on directed hypergraphs with two interaction mechanisms: broadcasting where a pivot node activates multiple receiver nodes and merging where multiple pivot nodes jointly influence a receiver node. We infer a transition kernel via a Kullback--Leibler divergence projection onto constraints enforcing stochasticity and stationarity. The resulting optimality conditions yield a multiplicative scaling form, implemented using Sinkhorn--Schrödinger-type iterations with tensor contractions. We further analyze ergodicity, including projected linear kernels for broadcasting and tensor spectral criteria for polynomial dynamics in merging. The effectiveness of our framework is demonstrated with both synthetic and real-world examples.


[40] 2603.12172

Simultaneous Multi-Modal Covert Communications: Analysis and Optimization

This paper investigates the problem of covert communications in a heterogeneous wireless network where multiple communication modalities are used simultaneously. In this setup, a legitimate transmitter sends confidential data to its receiver by selecting multiple modalities with the goal of maximizing communication covertness against a passive adversary (Willie) while satisfying a transmission rate requirement. We analyze two distinct scenarios for a given observation time by Willie. The two scenarios are: (i) Willie knows the modalities selected by the friendly transmitter, and (ii) Willie is unaware of the selected modalities. We first derive the optimal detector for Willie that minimizes the detection error probability (DEP) in both cases. For the first scenario, we derive an exact expression for the DEP and provide a computationally efficient approximation. For the second scenario, we introduce the DEP expressions in the low-signal-to-noise ratio (SNR) regime at Willie. Building on this analysis, we propose a novel low-complexity modality set selection technique designed to maximize the DEP subject to a rate constraint. Numerical simulations validate the derived analytical expressions and demonstrate that the proposed modality set selection technique achieves near-optimal performance, outperforming benchmark schemes.


[41] 2603.12187

Integrated Online Monitoring and Adaption of Process Model Predictive Controllers

This paper addresses the design of an event-triggered, data-based, and performance-oriented adaption method for model predictive control (MPC). The performance of such a strategy strongly depends on the accuracy of the prediction model, which may require online adaption to prevent performance degradation under changing operating conditions. Unlike existing methods that continuously update model and control parameters from data, potentially leading to catastrophic forgetting and unnecessary control modifications, we propose a novel approach based on statistical monitoring of closed-loop performance indicators. This framework enables the detection of performance degradation, and, when required, controller adaption is performed via reinforcement learning and identification techniques. The proposed strategy is validated on a high-fidelity simulation of a district heating system benchmark.


[42] 2603.12202

Technology configurations for decarbonizing residential heat supply through district heating and implications for the electricity network

District heating networks (DHNs) have significant potential to decarbonize residential heating and accelerate the energy transition. However, designing carbon-neutral DHNs requires balancing several objectives, including economic costs, social acceptance, long-term uncertainties, and grid-integration challenges from electrification. By combining modeling-to-generate-alternatives with power flow simulation techniques, we develop a decision-support method for designing carbon-neutral DHNs that are cost-effective, socially acceptable, robust to future risks, and impose minimal impacts on the electricity grid. Applying our method to a Dutch case, we find substantial diversity in how carbon-neutral DHNs can be designed. The flexibility in technology choice, sizing, and location enables accommodating different real-world needs and achieving high electrification levels without increasing grid loading. For instance, intelligently located heat pumps and thermal storage can limit grid stress even when renewable baseload heat sources and green-fuel boilers are scarce. Using our method, planners can explore diverse carbon-neutral DHN designs and identify the design that best balances stakeholders' preferences.


[43] 2603.12220

Conformalized Data-Driven Reachability Analysis with PAC Guarantees

Data-driven reachability analysis computes over-approximations of reachable sets directly from noisy data. Existing deterministic methods require either known noise bounds or system-specific structural parameters such as Lipschitz constants. We propose Conformalized Data-Driven Reachability (CDDR), a framework that provides Probably Approximately Correct (PAC) coverage guarantees through the Learn Then Test (LTT) calibration procedure, requiring only that calibration trajectories be independently and identically distributed. CDDR is developed for three settings: linear time-invariant (LTI) systems with unknown process noise distributions, LTI systems with bounded measurement noise, and general nonlinear systems including non-Lipschitz dynamics. Experiments on a 5-dimensional LTI system under Gaussian and heavy-tailed Student-t noise and on a 2-dimensional non-Lipschitz system with fractional damping demonstrate that CDDR achieves valid coverage where deterministic methods do not provide formal guarantees. Under anisotropic noise, a normalized score function reduces the reachable set volume while preserving the PAC guarantee.


[44] 2603.11055

Wide-Area GNSS Spoofing and Jamming Detection Using AIS-Derived Spatiotemporal Integrity Monitoring

Global Navigation Satellite System (GNSS) spoofing and jamming threaten maritime navigation by corrupting positions from Automatic Identification System (AIS) transponders. Crucially, raw AIS messages contain communication-layer defects (duplicated MMSIs, timestamp errors, stale retransmissions, and multi-station rebroadcast delays) that can mimic spoofing or jamming. Thus, AIS positions are unreliable without pre-filtering. We propose a three-stage AIS-based framework that (1) uses rule-based diagnostics to discard communication faults, (2) applies an interacting multiple model filter and transmission-interval analysis to extract kinematic-consistency and continuity anomalies, and (3) applies spatiotemporal DBSCAN to group anomalies by multi-vessel coherence and temporal persistence and classify them as sensor faults, spoofing, or jamming. Tested on approximately 966 million AIS messages from Korean coastal waters, the framework detected 17 spoofing and 343 jamming clusters and reduced false alarms by 98.6% relative to naive clustering. These results show that, after rigorous pre-filtering, AIS data can enable wide-area GNSS interference detection without dedicated sensors.


[45] 2603.11074

DRAFTO: Decoupled Reduced-space and Adaptive Feasibility-repair Trajectory Optimization for Robotic Manipulators

This paper introduces a new algorithm for trajectory optimization, Decoupled Reduced-space and Adaptive Feasibility-repair Trajectory Optimization (DRAFTO). It first constructs a constrained objective that accounts for smoothness, safety, joint limits, and task requirements. Then, it optimizes the coefficients, which are the coordinates of a set of basis functions for trajectory parameterization. To reduce the number of repeated constrained optimizations while handling joint-limit feasibility, the optimization is decoupled into a reduced-space Gauss-Newton (GN) descent for the main iterations and constrained quadratic programming for initialization and terminal feasibility repair. The two-phase acceptance rule with a non-monotone policy is applied to the GN model, which uses a hinge-squared penalty for inequality constraints, to ensure globalizability. The results of our benchmark tests against optimization-based planners, such as CHOMP, TrajOpt, GPMP2, and FACTO, and sampling-based planners, such as RRT-Connect, RRT*, and PRM, validate the high efficiency and reliability across diverse scenarios and tasks. The experiment involving grabbing an object from a drawer further demonstrates the potential for implementation in complex manipulation tasks. The supplemental video is available at this https URL.


[46] 2603.11077

TATIC: Task-Aware Temporal Learning for Human Intent Inference from Physical Corrections in Human-Robot Collaboration

In human-robot collaboration (HRC), robots must adapt online to dynamic task constraints and evolving human intent. While physical corrections provide a natural, low-latency channel for operators to convey motion-level adjustments, extracting task-level semantic intent from such brief interactions remains challenging. Existing foundation-model-based approaches primarily rely on vision and language inputs and lack mechanisms to interpret physical feedback. Meanwhile, traditional physical human-robot interaction (pHRI) methods leverage physical corrections for trajectory guidance but struggle to infer task-level semantics. To bridge this gap, we propose TATIC, a unified framework that utilizes torque-based contact force estimation and a task-aware Temporal Convolutional Network (TCN) to jointly infer discrete task-level intent and estimate continuous motion-level parameters from brief physical corrections. Task-aligned feature canonicalization ensures robust generalization across diverse layouts, while an intent-driven adaptation scheme translates inferred human intent into robot motion adaptations. Experiments achieve a 0.904 Macro-F1 score in intent recognition and demonstrate successful hardware validation in collaborative disassembly (see experimental video at this https URL).


[47] 2603.11089

V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation

This paper introduces V2A-DPO, a novel Direct Preference Optimization (DPO) framework tailored for flow-based video-to-audio generation (V2A) models, incorporating key adaptations to effectively align generated audio with human preferences. Our approach incorporates three core innovations: (1) AudioScore-a comprehensive human preference-aligned scoring system for assessing semantic consistency, temporal alignment, and perceptual quality of synthesized audio; (2) an automated AudioScore-driven pipeline for generating large-scale preference pair data for DPO optimization; (3) a curriculum learning-empowered DPO optimization strategy specifically tailored for flow-based generative models. Experiments on benchmark VGGSound dataset demonstrate that human-preference aligned Frieren and MMAudio using V2A-DPO outperform their counterparts optimized using Denoising Diffusion Policy Optimization (DDPO) as well as pre-trained baselines. Furthermore, our DPO-optimized MMAudio achieves state-of-the-art performance across multiple metrics, surpassing published V2A models.


[48] 2603.11095

Multimodal Self-Attention Network with Temporal Alignment for Audio-Visual Emotion Recognition

Audio-visual emotion recognition (AVER) methods typically fuse utterance-level features, and even frame-level attention models seldom address the frame-rate mismatch across modalities. In this paper, we propose a Transformer-based framework focusing on the temporal alignment of multimodal features. Our design employs a multimodal self-attention encoder that simultaneously captures intra- and inter-modal dependencies within a shared feature space. To address heterogeneous sampling rates, we incorporate Temporally-aligned Rotary Position Embeddings (TaRoPE), which implicitly synchronize audio and video tokens. Furthermore, we introduce a Cross-Temporal Matching (CTM) loss that enforces consistency among temporally proximate pairs, guiding the encoder toward better alignment. Experiments on CREMA-D and RAVDESS datasets demonstrate consistent improvements over recent baselines, suggesting that explicitly addressing frame-rate mismatch helps preserve temporal cues and enhances cross-modal fusion.


[49] 2603.11230

Monitoring and Prediction of Mood in Elderly People during Daily Life Activities

We present an intelligent wearable system to monitor and predict mood states of elderly people during their daily life activities. Our system is composed of a wristband to record different physiological activities together with a mobile app for ecological momentary assessment (EMA). Machine learning is used to train a classifier to automatically predict different mood states based on the smart band only. Our approach shows promising results on mood accuracy and provides results comparable with the state of the art in the specific detection of happiness and activeness.


[50] 2603.11260

Irreversible Port-Hamiltonian Formulations for 1-Dimensional fluid systems

The Irreversible Port-Hamiltonian Systems (IPHS) framework is extended to the modelling of non-isentropic fluids with viscous dissipation in the Eulerian description. Building on earlier IPHS formulations for diffusion-driven and non-convective distributed systems, it is shown that convective transport can be consistently encompassed by the framework by modifying the underlying differential operators. After revisiting the constitutive relations of non-isentropic fluids in both Eulerian and Lagrangian coordinates, it is demonstrate how these systems fit within an extended IPHS formulation. Furthermore, an extended parametrisation of the boundary port variables which ensures that the first and second laws of Thermodynamics are fulfilled allows to define a general class of boundary controlled IPHS.


[51] 2603.11328

Distributed Kalman--Consensus Filtering with Adaptive Uncertainty Weighting for Multi-Object Tracking in Mobile Robot Networks

This paper presents an implementation and evaluation of a Distributed Kalman--Consensus Filter (DKCF) for Multi-Object Tracking (MOT) in mobile robot networks operating under partial observability and heterogeneous localization uncertainty. A key challenge in such systems is the fusion of information from agents with differing localization quality, where frame misalignment can lead to inconsistent estimates, track duplication, and ghost tracks. To address this issue, we build upon the MOTLEE framework and retain its frame-alignment methodology, which uses consistently tracked dynamic objects as transient landmarks to improve relative pose estimates between robots. On top of this framework, we propose an uncertainty-aware adaptive consensus weighting mechanism that dynamically adjusts the influence of neighbor information based on the covariance of the transmitted estimates, thereby reducing the impact of unreliable data during distributed fusion. Local tracking is performed using a Kalman Filter (KF) with a Constant Velocity Model (CVM) and Global Nearest Neighbor (GNN) data association. simulation results demonstrate that adaptive weighting effectively protects local estimates from inconsistent data, yielding a MOTA improvement of 0.09 for agents suffering from localization drift, although system performance remains constrained by communication latency.


[52] 2603.11360

Fair-Gate: Fairness-Aware Interpretable Risk Gating for Sex-Fair Voice Biometrics

Voice biometric systems can exhibit sex-related performance gaps even when overall verification accuracy is strong. We attribute these gaps to two practical mechanisms: (i) demographic shortcut learning, where speaker classification training exploits spurious correlations between sex and speaker identity, and (ii) feature entanglement, where sex-linked acoustic variation overlaps with identity cues and cannot be removed without degrading speaker discrimination. We propose Fair-Gate, a fairness-aware and interpretable risk-gating framework that addresses both mechanisms in a single pipeline. Fair-Gate applies risk extrapolation to reduce variation in speaker-classification risk across proxy sex groups, and introduces a local complementary gate that routes intermediate features into an identity branch and a sex branch. The gate provides interpretability by producing an explicit routing mask that can be inspected to understand which features are allocated to identity versus sex-related pathways. Experiments on VoxCeleb1 show that Fair-Gate improves the utility--fairness trade-off, yielding more sex-fair ASV performance under challenging evaluation conditions.


[53] 2603.11378

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data

We investigate continued pretraining (CPT) for adapting wav2vec2-bert-2.0 to Swahili automatic speech recognition (ASR). Our approach combines unlabeled audio with limited labeled data through pseudo-labeled CPT followed by supervised finetuning. With 20,000 labeled samples, we achieve 3.24% WER on Common Voice Swahili-an 82% relative improvement over the baseline. This result surpasses the best previously reported academic system (8.3% WER from XLS-R) by 61% relative improvement. We provide concrete data requirements and a replicable methodology applicable to other low-resource languages.


[54] 2603.11390

SliceFed: Federated Constrained Multi-Agent DRL for Dynamic Spectrum Slicing in 6G

Dynamic spectrum slicing is a critical enabler for 6G Radio Access Networks (RANs), allowing the coexistence of heterogeneous services. However, optimizing resource allocation in dense, interference-limited deployments remains challenging due to non-stationary channel dynamics, strict Quality-of-Service (QoS) requirements, and the need for data privacy. In this paper, we propose SliceFed, a novel Federated Constrained Multi-Agent Deep Reinforcement Learning (F-MADRL) framework. SliceFed formulates the slicing problem as a Constrained Markov Decision Process (CMDP) where autonomous gNB agents maximize spectral efficiency while explicitly satisfying inter-cell interference budgets and hard ultra-reliable low-latency communication (URLLC) latency deadlines. We employ a Lagrangian primal-dual approach integrated with Proximal Policy Optimization (PPO) to enforce constraints, while Federated Averaging enables collaborative learning without exchanging raw local data. Extensive simulations in a dense multi-cell environment demonstrate that SliceFed converges to a stable, safety-aware policy. Unlike heuristic and unconstrained baselines, SliceFed achieves nearly 100% satisfaction of 1~ms URLLC latency deadlines and exhibits superior robustness to traffic load variations, verifying its potential for reliable and scalable 6G spectrum management.


[55] 2603.11473

Slack More, Predict Better: Proximal Relaxation for Probabilistic Latent Variable Model-based Soft Sensors

Nonlinear Probabilistic Latent Variable Models (NPLVMs) are a cornerstone of soft sensor modeling due to their capacity for uncertainty delineation. However, conventional NPLVMs are trained using amortized variational inference, where neural networks parameterize the variational posterior. While facilitating model implementation, this parameterization converts the distributional optimization problem within an infinite-dimensional function space to parameter optimization within a finite-dimensional parameter space, which introduces an approximation error gap, thereby degrading soft sensor modeling accuracy. To alleviate this issue, we introduce KProxNPLVM, a novel NPLVM that pivots to relaxing the objective itself and improving the NPLVM's performance. Specifically, we first prove the approximation error induced by the conventional approach. Based on this, we design the Wasserstein distance as the proximal operator to relax the learning objective, yielding a new variational inference strategy derived from solving this relaxed optimization problem. Based on this foundation, we provide a rigorous derivation of KProxNPLVM's optimization implementation, prove the convergence of our algorithm can finally sidestep the approximation error, and propose the KProxNPLVM by summarizing the abovementioned content. Finally, extensive experiments on synthetic and real-world industrial datasets are conducted to demonstrate the efficacy of the proposed KProxNPLVM.


[56] 2603.11482

AnimeScore: A Preference-Based Dataset and Framework for Evaluating Anime-Like Speech Style

Evaluating 'anime-like' voices currently relies on costly subjective judgments, yet no standardized objective metric exists. A key challenge is that anime-likeness, unlike naturalness, lacks a shared absolute scale, making conventional Mean Opinion Score (MOS) protocols unreliable. To address this gap, we propose AnimeScore, a preference-based framework for automatic anime-likeness evaluation via pairwise ranking. We collect 15,000 pairwise judgments from 187 evaluators with free-form descriptions, and acoustic analysis reveals that perceived anime-likeness is driven by controlled resonance shaping, prosodic continuity, and deliberate articulation rather than simple heuristics such as high pitch. We show that handcrafted acoustic features reach a 69.3% AUC ceiling, while SSL-based ranking models achieve up to 90.8% AUC, providing a practical metric that can also serve as a reward signal for preference-based optimization of generative speech models.


[57] 2603.11547

Forward and Backward Reachability Analysis of Closed-loop Recurrent Neural Networks via Hybrid Zonotopes

Recurrent neural networks (RNNs) are widely employed to model complex dynamical systems due to their hidden-state structure, which inherently captures temporal dependencies. This work presents a hybrid zonotope-based approach for computing exact forward and backward reachable sets of closed-loop RNN systems with ReLU activation functions. The method formulates state-pair sets to compute reachable sets as hybrid zonotopes without requiring unrolling. To improve scalability, a tunable relaxation scheme is proposed that ranks unstable ReLU units across all layers using a triangle-area score and selectively applies convex relaxations within a fixed binary limit in the hybrid zonotopes. This scheme enables an explicit tradeoff between computational complexity and approximation accuracy, with exact reachability as a special case. In addition, a sufficient condition is derived to certify the safety of closed-loop RNN systems. Numerical examples demonstrate the effectiveness of the proposed approach.


[58] 2603.11723

Exploiting Parallelism in a QPALM-based Solver for Optimal Control

We discuss the opportunities for parallelization in the recently proposed QPALM-OCP algorithm, a solver tailored to quadratic programs arising in optimal control. A significant part of the computational work can be carried out independently for the different stages in the optimal control problem. We exploit this specific structure to apply parallelization and vectorization techniques in an optimized C++ implementation of the method. Results for optimal control benchmark problems and comparisons to the original QPALM method are provided.


[59] 2603.11876

On the Possible Detectability of Image-in-Image Steganography

This paper investigates the detectability of popular imagein-image steganography schemes [1, 2, 3, 4, 5]. In this paradigm, the payload is usually an image of the same size as the Cover image, leading to very high embedding rates. We first show that the embedding yields a mixing process that is easily identifiable by independent component analysis. We then propose a simple, interpretable steganalysis method based on the first four moments of the independent components estimated from the wavelet decomposition of the images, which are used to distinguish between the distributions of Cover and Stego components. Experimental results demonstrate the efficiency of the proposed method, with eight-dimensional input vectors attaining up to 84.6% accuracy. This vulnerability analysis is supported by two other facts: the use of keyless extraction networks and the high detectability w.r.t. classical steganalysis methods, such as the SRM combined with support vector machines, which attains over 99% accuracy.


[60] 2603.11947

Resurfacing Paralinguistic Awareness in Large Audio Language Models

Large Audio Language Models (LALMs) have expanded the interaction with human to speech modality, which introduces great interactive potential, due to the paralinguistic cues implicitly indicating the user context. However, building on the current content-centred paradigm, LALMs usually neglect such paralinguistic cues and respond solely based on query content. In this work, to resurface the paralinguistic awareness in LALMs, we introduce five diverse layer-wise analyses to jointly identify paralinguistic layers and semantic understanding layers. Based on these insights, we propose a paralinguistic-enhanced fine-tuning (PE-FT) protocol accordingly to equip LALMs with paralinguistic-aware capabilities, including (1) selective-layer fine-tuning, and (2) an auxiliary dual-level classification head. Our experiments demonstrate that PE-FT protocol efficiently and effectively resurfaces the paralinguistic awareness, even surpassing the performance of the all-layer fine-tuning strategy.


[61] 2603.11982

Approximate Reduced Lindblad Dynamics via Algebraic and Adiabatic Methods

We present an algebraic framework for approximate model reduction of Markovian open quantum dynamics that guarantees complete positivity and trace preservation by construction. First, we show that projecting a Lindblad generator on its center manifold -- the space spanned by eigenoperators with purely imaginary eigenvalue -- yields an asymptotically exact reduced quantum dynamical semigroup whose dynamics is unitary, with exponentially decaying transient error controlled by the generator's spectral gap. Second, for analytic perturbations of a Lindblad generator with a tractable center manifold, we propose a perturbative reduction that keeps the reduced space fixed at the unperturbed center manifold. The resulting generator is shown to remain a valid Lindbladian for arbitrary perturbation strengths, and explicit finite-time error bounds, that quantify leakage from the unperturbed center sector, are provided. We further clarify the connection to adiabatic elimination methods, by both showing how the algebraic reduction can be directly related to a first-order adiabatic-elimination and by providing sufficient conditions under which the latter method can be applied while preserving complete positivity. We showcase the usefulness of our techniques in dissipative many-body quantum systems exhibiting non-stationary long-time dynamics.


[62] 2603.12059

Flight through Narrow Gaps with Morphing-Wing Drones

The size of a narrow gap traversable by a fixed-wing drone is limited by its wingspan. Inspired by birds, here, we enable the traversal of a gap of sub-wingspan width and height using a morphing-wing drone capable of temporarily sweeping in its wings mid-flight. This maneuver poses control challenges due to sudden lift loss during gap-passage at low flight speeds and the need for precisely timed wing-sweep actuation ahead of the gap. To address these challenges, we first develop an aerodynamic model for general wing-sweep morphing drone flight including low flight speeds and post-stall angles of attack. We integrate longitudinal drone dynamics into an optimal reference trajectory generation and Nonlinear Model Predictive Control framework with runtime adaptive costs and constraints. Validated on a 130 g wing-sweep-morphing drone, our method achieves an average altitude error of 5 cm during narrow-gap passage at forward speeds between 5 and 7 m/s, whilst enforcing fully swept wings near the gap across variable threshold distances. Trajectory analysis shows that the drone can compensate for lift loss during gap-passage by accelerating and pitching upwards ahead of the gap to an extent that differs between reference trajectory optimization objectives. We show that our strategy also allows for accurate gap passage on hardware whilst maintaining a constant forward flight speed reference and near-constant altitude.


[63] 2603.12069

Numerical benchmark for damage identification in Structural Health Monitoring

The availability of a dataset for validation and verification purposes of novel data-driven strategies and/or hybrid physics-data approaches is currently one of the most pressing challenges in the engineering field. Data ownership, security, access and metadata handiness are currently hindering advances across many fields, particularly in Structural Health Monitoring (SHM) applications. This paper presents a simulated SHM dataset, comprised of dynamic and static measurements (i.e., acceleration and displacement), and includes the conceptual framework designed to generate it. The simulated measurements were generated to incorporate the effects of Environmental and Operational Variations (EOVs), different types of damage, measurement noise and sensor faults and malfunctions, in order to account for scenarios that may occur during real acquisitions. A fixed-fixed steel beam structure was chosen as reference for the numerical benchmark. The simulated monitoring was operated under the assumptions of a Single Degree of Freedom (SDOF) for generating acceleration records and of the Euler-Bernoulli beam for the simulated displacement measurements. The generation process involved the use of parallel computation, which is detailed within the provided open-source code. The generated data is also available open-source, thus ensuring reproducibility, repeatability and accessibility for further research. The comprehensive description of data types, formats, and collection methodologies makes this dataset a valuable resource for researchers aiming to develop or refine SHM techniques, fostering advancements in the field through accessible, high-quality synthetic data.


[64] 2603.12075

Decentralized Cooperative Localization for Multi-Robot Systems with Asynchronous Sensor Fusion

Decentralized cooperative localization (DCL) is a promising approach for nonholonomic mobile robots operating in GPS-denied environments with limited communication infrastructure. This paper presents a DCL framework in which each robot performs localization locally using an Extended Kalman Filter, while sharing measurement information during update stages only when communication links are available and companion robots are successfully detected by LiDAR. The framework preserves cross-correlation consistency among robot state estimates while handling asynchronous sensor data with heterogeneous sampling rates and accommodating accelerations during dynamic maneuvers. Unlike methods that require pre-aligned coordinate systems, the proposed approach allows robots to initialize with arbitrary reference-frame orientations and achieves automatic alignment through transformation matrices in both the prediction and update stages. To improve robustness in feature-sparse environments, we introduce a dual-landmark evaluation framework that exploits both static environmental features and mobile robots as dynamic landmarks. The proposed framework enables reliable detection and feature extraction during sharp turns, while prediction accuracy is improved through information sharing from mutual observations. Experimental results in both Gazebo simulation and real-world basement environments show that DCL outperforms centralized cooperative localization (CCL), achieving a 34% reduction in RMSE, while the dual-landmark variant yields an improvement of 56%. These results demonstrate the applicability of DCL to challenging domains such as enclosed spaces, underwater environments, and feature-sparse terrains where conventional localization methods are ineffective.


[65] 2603.12083

Towards Universal Computational Aberration Correction in Photographic Cameras: A Comprehensive Benchmark Analysis

Prevalent Computational Aberration Correction (CAC) methods are typically tailored to specific optical systems, leading to poor generalization and labor-intensive re-training for new lenses. Developing CAC paradigms capable of generalizing across diverse photographic lenses offers a promising solution to these challenges. However, efforts to achieve such cross-lens universality within consumer photography are still in their early stages due to the lack of a comprehensive benchmark that encompasses a sufficiently wide range of optical aberrations. Furthermore, it remains unclear which specific factors influence existing CAC methods and how these factors affect their performance. In this paper, we present comprehensive experiments and evaluations involving 24 image restoration and CAC algorithms, utilizing our newly proposed UniCAC, a large-scale benchmark for photographic cameras constructed via automatic optical design. The Optical Degradation Evaluator (ODE) is introduced as a novel framework to objectively assess the difficulty of CAC tasks, offering credible quantification of optical aberrations and enabling reliable evaluation. Drawing on our comparative analysis, we identify three key factors -- prior utilization, network architecture, and training strategy -- that most significantly influence CAC performance, and further investigate their respective effects. We believe that our benchmark, dataset, and observations contribute foundational insights to related areas and lay the groundwork for future investigations. Benchmarks, codes, and Zemax files will be available at this https URL.


[66] 2603.12144

O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

Understanding and reconstructing the 3D world through omnidirectional perception is an inevitable trend in the development of autonomous agents and embodied intelligence. However, existing 3D occupancy prediction methods are constrained by limited perspective inputs and predefined training distribution, making them difficult to apply to embodied agents that require comprehensive and safe perception of scenes in open world exploration. To address this, we present O3N, the first purely visual, end-to-end Omnidirectional Open-vocabulary Occupancy predictioN framework. O3N embeds omnidirectional voxels in a polar-spiral topology via the Polar-spiral Mamba (PsM) module, enabling continuous spatial representation and long-range context modeling across 360°. The Occupancy Cost Aggregation (OCA) module introduces a principled mechanism for unifying geometric and semantic supervision within the voxel space, ensuring consistency between the reconstructed geometry and the underlying semantic structure. Moreover, Natural Modality Alignment (NMA) establishes a gradient-free alignment pathway that harmonizes visual features, voxel embeddings, and text semantics, forming a consistent "pixel-voxel-text" representation triad. Extensive experiments on multiple models demonstrate that our method not only achieves state-of-the-art performance on QuadOcc and Human360Occ benchmarks but also exhibits remarkable cross-scene generalization and semantic scalability, paving the way toward universal 3D world modeling. The source code will be made publicly available at this https URL.


[67] 2303.16278

Hybrid RIS-Assisted MIMO Dual-Function Radar-Communication System

Dual-function radar-communication (DFRC) technology is emerging in next-generation wireless systems. Reconfigurable intelligent surface (RIS) arrays have been suggested as a crucial sensor component of the DFRC. In this paper, we propose a hybrid RIS (HRIS)-assisted multiple-input multiple-output (MIMO) DFRC system, where the HRIS is capable of reflecting communication signals to mobile users and receiving the scattering signal reflected from the radar target simultaneously. Under such a scenario, we are interested in characterizing the fundamental trade-off between radar sensing and communication. Specifically, we study the joint design of the beamforming vectors at the base station (BS) and the parameter configuration of the HRIS so as to maximize the signal-to-interference-and-noise ratio (SINR) of the radar while guaranteeing a communication SINR requirement. To solve the formulated non-convex beamforming design problem, we propose an efficient alternating optimization approach. In particular, for fixed beams at the BS, we use a fast grid search-assisted auto gradient descent (FGS-AGD) algorithm to seek the best HRIS configuration; Then, a closed-form BS beamforming solution is obtained using semidefinite relaxation. Numerical results indicate that compared with benchmark schemes, the proposed approach is capable of improving the radar performance and communication quality significantly and simultaneously.


[68] 2406.04762

Holographic Intelligence Surface Assisted Integrated Sensing and Communication

Traditional discrete-array-based systems fail to exploit interactions between closely spaced antennas, resulting in inadequate utilization of the aperture resource. In this paper, we propose a holographic intelligence surface (HIS) assisted integrated sensing and communication (HISAC) system, wherein both the transmitter and receiver are fabricated using a continuous-aperture array. A continuous-discrete transformation of the HIS pattern based on the Fourier transform is proposed, converting the continuous pattern design into a discrete beamforming design. We formulate a joint transmit-receive beamforming optimization problem for the HISAC system, aiming to balance the performance of multi-target sensing while fulfilling the performance requirement of multi-user communication. To solve the non-convex problem with coupled variables, an alternating optimization-based algorithm is proposed to optimize the HISAC transmit-receive beamforming in an alternate manner. Specifically, the transmit beamforming design is solved by decoupling into a series of feasibility-checking sub-problems while the receive beamforming is determined by the Rayleigh quotient-based method. Simulation results demonstrate the superiority of the proposed HISAC system over traditional discrete-array-based ISAC systems, achieving significantly higher sensing performance while guaranteeing predetermined communication performance.


[69] 2412.07212

Distributed Koopman Learning using Partial Trajectories for Control

This paper proposes a distributed data-driven framework for dynamics learning, termed distributed deep Koopman learning using partial trajectories (DDKL-PT). In this framework, each agent in a multi-agent system is assigned a partial trajectory offline and locally approximates the unknown dynamics using a deep neural network within the Koopman operator framework. By exchanging local estimated dynamics rather than training data, agents achieve consensus on a global dynamics model without sharing their private training trajectories. Simulation studies on a surface vehicle demonstrate that DDKL-PT achieves consensus on the learned dynamics, and each agent attains reasonably small approximation errors on the testing dataset. Furthermore, a model predictive control scheme is developed by integrating the learned Koopman dynamics with known kinematic relations. Results on a reference-tracking task indicate that the distributedly learned dynamics are sufficiently accurate for model-based optimal control.


[70] 2412.09019

Operator Learning for Robust Stabilization of Linear Markov-Jumping Hyperbolic PDEs

This paper addresses the problem of robust stabilization for linear hyperbolic Partial Differential Equations (PDEs) with Markov-jumping parameter uncertainty. We consider a 2 x 2 heterogeneous hyperbolic PDE and propose a control law using operator learning and the backstepping method. Specifically, the backstepping kernels used to construct the control law are approximated with neural operators (NO) in order to improve computational efficiency. The key challenge lies in deriving the stability conditions with respect to the Markov-jumping parameter uncertainty and NO approximation errors. The mean-square exponential stability of the stochastic system is achieved through Lyapunov analysis, indicating that the system can be stabilized if the random parameters are sufficiently close to the nominal parameters on average, and NO approximation errors are small enough. The theoretical results are applied to freeway traffic control under stochastic upstream demands and then validated through numerical simulations.


[71] 2502.03863

Compact Nested Hexagonal Metamaterial Sensor for High-Sensitivity Permittivity Characterization Across S and X-Band Frequencies

This article presents a Compact Nested Hexagonal Metamaterial Sensor designed for microwave sensing to characterize material permittivity in S and X-band applications. The proposed sensor attained compact dimensions of merely 30 mm x 30 mm x 0.79 mm. This innovative design technique employs a distinctive and compact architecture with elevated electromagnetic (EM) field strength, enhancing the precision of the sensing mechanism in the microwave frequency spectrum. The design geometry and dimensions attained resonance frequencies of 3.98 GHz and 11.57 GHz, with notch depths of -13.16129 dB and -10.23024 dB, respectively. The design evolution, metamaterial properties, equivalent circuit model, and electric (E) is delineated to elucidate the stopband features at the resonant frequency. The suggested sensor attains a very high sensitivity of 9.55% in transmission mode (S21) for a permittivity range of 1 to 6. The reflection and transmission properties of the proposed CRR-based sensor are validated by simulations using the mathematical equations of the design. Furthermore, the sensor's performance is corroborated by utilizing several dielectric materials (Roger R04350B, Roger RT5880 and FR-4). The computed outcomes demonstrate alignment with the simulated results. The compact, low-profile sensor design and its excellent sensitivity for characterizing material permittivity render the suggested sensor appropriate for permittivity sensing applications.


[72] 2505.17932

Geometric SSM: LTI State Space Models for Selective Tasks

A key claim in recent work on Selective State Space Models is that selectivity, the ability to focus on relevant information while filtering irrelevant inputs, requires breaking the Linear Time-Invariant (LTI) property through time-varying dynamics. We challenge this claim by demonstrating that LTI systems can achieve selectivity when designed using principles from geometric control. We introduce the Geometric SSM, in which different input patterns excite distinct invariant subspaces of the dynamics. Unlike Mamba's memoryless selection mechanism, our approach employs a dynamic residual generator that maintains temporal memory, enabling recognition of multi-token patterns without time-varying system matrices. The Geometric SSM achieves near-perfect performance on a novel extended induction head task where Mamba fails, while preserving efficient FFT-based training. Our results demonstrate that geometric control theory can inform the design of novel selective sequence models that combine theoretical rigor with practical efficiency.


[73] 2506.02841

Enhancing Sample Efficiency in Multi-Agent RL with Uncertainty Quantification and Selective Exploration

Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD($\lambda$) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.


[74] 2509.13505

Identifying Network Structure of Nonlinear Dynamical Systems: Contraction and Kuramoto Oscillators

In this work, we study the identifiability of network structures (i.e., topologies) for networked nonlinear systems when partial measurements of the nodal dynamics are taken. We explore scenarios where different candidate structures can yield similar measurements, thus limiting identifiability. To do so, we apply the contraction theory framework to facilitate comparisons between different networks. We show that semicontraction in the observable space is a sufficient condition for two systems to become indistinguishable from one another based on partial measurements. We apply this framework to study networks of Kuramoto oscillators, and discuss scenarios in which different network structures (both connected and disconnected) become indistinguishable.


[75] 2509.14065

Identifying Network Structure of Linear Dynamical Systems: Observability and Edge Misclassification

This work studies the limitations of uniquely identifying the structure (i.e., topology) of a networked linear system from partial measurements of its nodal dynamics. In general, many networks can be consistent with these measurements; this is a consideration often neglected by standard network inference methods. We show that the space of these networks are related through the nullspace of the observability matrix for the true network. We establish relevant metrics to investigate this space, including an analytic characterization of the most structurally dissimilar network that can be inferred, as well as the possibility of mis-inferring presence or absence of edges. In simulations, we find that when observing over 6\% of nodes in random network models (e.g., Erd\H os-R\' enyi and Watts-Strogatz), approximately 99\% of edges are correctly classified. Extending this discussion, we construct a family of networks that keep measurements $\epsilon$-close to each other, and connect the identifiability of these networks to the spectral properties of an augmented observability Gramian.


[76] 2510.14045

Multi-Period Sparse Optimization for Proactive Grid Blackout Diagnosis

Existing or planned power grids need to evaluate survivability under extreme events, like a number of peak load overloading conditions, which could possibly cause system collapses (i.e. blackouts). For realistic extreme events that are correlated or share similar patterns, it is reasonable to expect that the dominant vulnerability or failure sources behind them share the same locations but with different severity. Early warning diagnosis that proactively identifies the key vulnerabilities responsible for a number of system collapses of interest can significantly enhance resilience. This paper proposes a multi-period sparse optimization method, enabling the discovery of persistent failure sources across a sequence of collapsed systems with increasing system stress, such as rising demand or worsening contingencies. This work defines persistency and efficiently integrates persistency constraints to capture the ``hidden'' evolving vulnerabilities. Circuit-theory based power flow formulations and circuit-inspired optimization heuristics are used to facilitate the scalability of the method. Experiments on benchmark systems show that the method reliably tracks persistent vulnerability locations under increasing load stress, and solves with scalability to large systems (on average taking around 200 s per scenario on 2000+ bus systems).


[77] 2511.17895

Radiative-Structured Neural Operator for Continuous Spectral Super-Resolution

Spectral super-resolution (SSR) aims to reconstruct hyperspectral images (HSIs) from multispectral observations, with broad applications in computer vision and remote sensing. Deep learning-based methods have been widely used, but they often treat spectra as discrete vectors learned from data, rather than continuous curves constrained by physics principles, leading to unrealistic predictions and limited applicability. To address this challenge, we propose the Radiative-Structured Neural Operator (RSNO), which learns a continuous mapping for spectral super-resolution while enforcing physical consistency under the radiative prior. The proposed RSNO consists of three stages: upsampling, reconstruction, and refinement. In the upsampling stage, we leverage prior information to expand the input multispectral image, producing a physically plausible hyperspectral estimate. Subsequently, we adopt a neural operator backbone in the reconstruction stage to learn a continuous mapping across the spectral domain. Finally, the refinement stage imposes a hard constraint on the output HSI to eliminate color distortion. The upsampling and refinement stages are implemented via the proposed angular-consistent projection (ACP), which is derived from a non-convex optimization problem. Moreover, we theoretically demonstrated the optimality of ACP by null-space decomposition. Various experiments validate the effectiveness of the proposed approach in both discrete and continuous spectral super-resolution.


[78] 2601.01773

Joint Sparsity and Beamforming Design for RDARS-Aided Systems

Reconfigurable distributed antennas and reflecting surface (RDARS) has emerged as a promising architecture for communication and sensing performance enhancement. In particular, the new selection gain can be achieved by leveraging the dynamic working mode selection between connection and reflection modes, whereas low-complexity element configuration remains an open issue. In this paper, we consider a RDARS-assisted communication system, where the connected elements are formed as a uniform sparse array for simplified mode configuration while achieving enlarged physical array aperture. The sum rate maximization problem is then formulated by jointly optimizing the active and passive beamforming matrices and sparsity of connected element array. For the special cases of a single user equipment (UE) and two UEs, the optimal sparsity designs are derived in closed-form. Then, for an arbitrary number of UEs, a weighted minimum mean-square error-based alternating optimization (AO) algorithm is proposed to tackle the non-convex optimization problem. Numerical results demonstrate the importance of optimizing the sparsity and the effectiveness of low-complexity sparsity optimization.


[79] 2601.13799

Linear viscoelastic rheological FrBD models

In [1], a new modeling paradigm for developing rate-and-state-dependent, control-oriented friction models was introduced. The framework, termed Friction with Bristle Dynamics (FrBD), combines nonlinear analytical expressions for the friction coefficient with constitutive equations for bristle-like elements. Within the FrBD framework, this letter introduces two novel formulations based on the two most general linear viscoelastic models for solids: the Generalized Maxwell (GM) and Generalized Kelvin-Voigt (GKV) elements. Both are analyzed in terms of boundedness and passivity, revealing that these properties are satisfied for any physically meaningful parametrization. An application of passivity for control design is also illustrated, considering an example from robotics. The findings of this letter systematically integrate rate-and-state dynamic friction models with linear viscoelasticity.


[80] 2602.18899

[b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

Self-supervised speech models (S3Ms) are known to encode rich phonetic information, yet how this information is structured remains underexplored. We conduct a comprehensive study across 96 languages to analyze the underlying structure of S3M representations, with particular attention to phonological vectors. We first show that there exist linear directions within the model's representation space that correspond to phonological features. We further demonstrate that the scale of these phonological vectors correlate to the degree of acoustic realization of their corresponding phonological features in a continuous manner. For example, the difference between [d] and [t] yields a voicing vector: adding this vector to [p] produces [b], while scaling it results in a continuum of voicing. Together, these findings indicate that S3Ms encode speech using phonologically interpretable and compositional vectors, demonstrating phonological vector arithmetic. All code and interactive demos are available at this https URL .


[81] 2603.03708

Scalable and Convergent Generalized Power Iteration Precoding for Massive MIMO Systems

In massive multiple-input multiple-output (MIMO) systems, achieving high spectral efficiency (SE) often requires advanced precoding algorithms whose complexity scales rapidly with the number of antennas, limiting practical deployment. In this paper, we develop a scalable and computationally efficient generalized power iteration precoding (GPIP) framework for massive MIMO systems under both perfect and imperfect channel state information at the transmitter (CSIT). By exploiting the low-dimensional subspace property of optimal precoders, we reformulate the high-dimensional beamforming problem into a lower-dimensional weight optimization that scales with the number of users rather than antennas. We further extend this framework to the imperfect CSIT scenario by showing that stationary solutions reside in a combined subspace spanned by the estimated channel and error covariance matrices, enabling a robust design via low-rank approximation. To reduce computational cost, we leverage the Sherman-Morrison formula to simplify matrix inversions. Moreover, interpreting the GPIP update as a projected preconditioned gradient ascent method, we establish convergence guarantees and develop a stable and monotonic algorithm using a backtracking line search. Numerical results demonstrate that the proposed methods achieve the highest SE performance compared to state-of-the-art linear precoders with significantly reduced complexity and convergence, highlighting their suitability for large-scale MIMO systems.


[82] 2312.16807

Efficient Interference Graph Estimation via Concurrent Flooding

Traditional wisdom for network management allocates network resources separately for the measurement and data transmission tasks. Heavy measurement tasks may take up resources for data transmission and significantly reduce network performance. It is therefore challenging for interference graphs, deemed as incurring heavy measurement overhead, to be used in practice in wireless networks. To address this challenge in wireless sensor networks, we propose to use power as a new dimension for interference graph estimation (IGE) and integrate IGE with concurrent flooding such that IGE can be done simultaneously with flooding using the same frequency-time resources. With controlled and real-world experiments, we show that it is feasible to efficiently achieve IGE via concurrent flooding on the commercial off-the-shelf (COTS) devices by controlling the transmit powers of nodes. We believe that efficient IGE would be a key enabler for the practical use of the existing scheduling algorithms assuming known interference graphs.


[83] 2402.01703

Community-Informed AI Models for Police Accountability

Face-to-face interactions between police officers and the public affect both individual well-being and democratic legitimacy. Many government-public interactions are captured on video, including interactions between police officers and drivers captured on bodyworn cameras (BWCs). New advances in AI technology enable these interactions to be analyzed at scale, opening promising avenues for improving government transparency and accountability. However, for AI to serve democratic governance effectively, models must be designed to include the preferences and perspectives of the governed. This article proposes a community-informed, approach to developing multi-perspective AI tools for government accountability. We illustrate our approach by describing the research project through which the approach was inductively developed: an effort to build AI tools to analyze BWC footage of traffic stops conducted by the Los Angeles Police Department. We focus on the role of social scientists as members of multidisciplinary teams responsible for integrating the perspectives of diverse stakeholders into the development of AI tools in the domain of police -- and government -- accountability.


[84] 2410.08706

Goal-Oriented Status Updating for Real-time Remote Inference over Networks with Two-Way Delay

We study a setting where an intelligent model (e.g., a pre-trained neural network) infers the real-time value of a target signal using data samples transmitted from a remote source. The transmission scheduler decides (i) the freshness of packets, (ii) their length (i.e., the number of samples they contain), and (iii) when they should be transmitted. The freshness is quantified using the Age of Information (AoI), and the inference quality for a given packet length is a general function of AoI. Previous works assumed i.i.d. transmission delays with immediate feedback or were restricted to the case where inference performance degrades as the input data ages. Our formulation, in addition to capturing non-monotone age dependence, also covers Markovian delay on both forward and feedback links. We model this as an infinite-horizon average-cost Semi-Markov Decision Process. We obtain a closed-form solution that decides on (i) and (iii) for any constant packet length. The solution for when to transmit is an index-based threshold policy, where the index function is expressed in terms of the delay state and AoI at the receiver. In contrast, the freshness of the selected packet is a function of only the delay state. We then separately optimize the value of the constant packet length. Moreover, we also develop an index-based threshold policy for the time-variable packet length case, which allows a complexity reduction. In simulation results, we observe that our goal-oriented scheduler drops inference error down to one-sixth with respect to the age-based scheduling of unit-length packets.


[85] 2501.06620

Functional Approximation Methods for Differentially Private Distribution Estimation

The cumulative distribution function (CDF) is fundamental for characterizing random variables, making it essential in applications that require privacy-preserving data analysis. This paper introduces a novel framework for constructing differentially private CDFs inspired by functional analysis and the functional mechanism. We develop two variants: a polynomial projection method, which projects the empirical CDF into a polynomial space, and a sparse approximation method via matching pursuit, which projects it into arbitrary function spaces constructed from dictionaries. In both cases, the empirical CDF is approximated within the chosen space, and the corresponding coefficients are privatized to guarantee differential privacy. Compared with existing approaches such as histogram queries and adaptive quantiles, our methods achieve comparable or superior performance. Our methods are particularly well-suited to decentralized settings and scenarios where CDFs must be efficiently updated with newly collected or streaming data. In addition, we investigate the influence of parameters such as dictionary size and systematically evaluate different dictionary constructions, including Legendre polynomials, B-splines, and distribution-based functions. Overall, our contributions advance the development of practical and reliable methods for privacy-preserving CDF estimation.


[86] 2501.15177

Audio-Language Models for Audio-Centric Tasks: A Systematic Survey

Audio-Language Models (ALMs), trained on paired audio-text data, are designed to process, understand, and reason about audio-centric multimodal content. Unlike traditional supervised approaches that use predefined labels, ALMs leverage natural language supervision to better handle complex real-world audio scenes with multiple overlapping events. While demonstrating impressive zero-shot and task generalization capabilities, there is still a notable lack of systematic surveys that comprehensively organize and analyze developments. In this paper, we present the first systematic review of ALMs with three main contributions: (1) comprehensive coverage of ALM works across speech, music, and sound from a general audio perspective; (2) a unified taxonomy of ALM foundations, including model architectures and training objectives; (3) establishment of a research landscape capturing mutual promotion and constraints among different research aspects, aiding in summarizing evaluations, limitations, concerns and promising directions. Our review contributes to helping researchers understand the development of existing technologies and future trends, while also providing valuable references for implementation in practical applications.


[87] 2501.16619

SHIELD: A Host-Independent Framework for Ransomware Detection using Deep Filesystem Features

Ransomware's escalating sophistication necessitates tamper-resistant, off-host detection solutions that capture deep disk activity beyond the reach of a compromised operating system. Existing detection systems use host/kernel signals or rely on coarse block-I/O statistics, which are easy to evade and miss filesystem semantics. The filesystem layer itself remains underexplored as a source of robust indicators for storage-controller-level defense. To address this, we present SHIELD: a Secure Host-Independent Extensible Metric Logging Framework for Tamper-Proof Detection and Real-Time Mitigation of Ransomware Threats. SHIELD parses and logs filesystem-level features that cannot be evaded or obfuscated to expose deep disk activity for real-time ML-based detection and mitigation. We evaluate the efficacy of these metrics through experiments with both binary (benign vs. malicious behavior) and multiclass (ransomware strain identification) classifiers. In evaluations across diverse ransomware families, the best binary classifier achieves 97.29% accuracy in identifying malicious disk behavior. A hardware-only feature set that excludes all transport-layer metrics retains 95.97% accuracy, confirming feasibility for FPGA/ASIC deployment within the storage controller datapath. In a proof-of-concept closed-loop deployment, SHIELD halts disk operations within tens of disk actions, limiting targeted files affected to <0.4% for zero-shot strains at small action-windows, while maintaining low false-positive rates (<3.6%) on unseen benign applications. Results demonstrate that filesystem-aware, off-host telemetry enables accurate, resilient ransomware detection, including intermittent/partial encryption, and is practical for embedded integration in storage controllers or alongside other defense mechanisms.


[88] 2505.16211

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

The rapid development and widespread adoption of Audio Large Language Models (ALLMs) demand rigorous evaluation of their trustworthiness. However, existing evaluation frameworks are primarily designed for text and fail to capture vulnerabilities introduced by the acoustic properties of audio. We find that significant trustworthiness risks in ALLMs arise from non-semantic acoustic cues, such as timbre, accent, and background noise, which can be exploited to manipulate model behavior. To address this gap, we propose AudioTrust, the first large-scale and systematic framework for evaluating ALLM trustworthiness under audio-specific risks. AudioTrust covers six key dimensions: fairness, hallucination, safety, privacy, robustness, and authenticition. It includes 26 sub-tasks and a curated dataset of more than 4,420 audio samples collected from real-world scenarios, including daily conversations, emergency calls, and voice assistant interactions, and is specifically designed to probe trustworthiness across multiple dimensions. Our comprehensive evaluation spans 18 experimental settings and uses human-validated automated pipelines to enable objective and scalable assessment of model outputs. Experimental results on 14 state-of-the-art open-source and closed-source ALLMs reveal important limitations and failure boundaries under diverse high-risk audio scenarios, providing critical insights for the secure and trustworthy deployment of future audio models. Our platform and benchmark are publicly available at this https URL.


[89] 2509.15423

Online Slip Detection and Friction Coefficient Estimation for Autonomous Racing

Accurate knowledge of the tire-road friction coefficient (TRFC) is essential for vehicle safety, stability, and performance, especially in autonomous racing, where vehicles often operate at the friction limit. However, TRFC cannot be directly measured with standard sensors, and existing estimation methods either depend on vehicle or tire models with uncertain parameters or require large training datasets. In this paper, we present a lightweight approach for online slip detection and TRFC estimation. Our approach relies solely on IMU and LiDAR measurements and the control actions, without special dynamical or tire models, parameter identification, or training data. Slip events are detected in real time by comparing commanded and measured motions, and the TRFC is then estimated directly from observed accelerations under no-slip conditions. Experiments with a 1:10-scale autonomous racing car across different friction levels demonstrate that the proposed approach achieves accurate and consistent slip detections and friction coefficients, with results closely matching ground-truth measurements. These findings highlight the potential of our simple, deployable, and computationally efficient approach for real-time slip monitoring and friction coefficient estimation in autonomous driving.


[90] 2509.26489

Contrastive Diffusion Guidance for Spatial Inverse Problems

We consider a class of inverse problems characterized by forward operators that are partially specified, non-smooth, and non-differentiable. Although generative inverse solvers have made significant progress, we find that these forward operators introduce a distinct set of challenges. As a concrete instance, we consider the problem of reconstructing spatial layouts, such as floorplans, from human movement trajectories, where the underlying path-generation process is inherently non-differentiable and only partially known. In such problems, direct likelihood-based guidance becomes unstable, since the underlying path-planning process does not provide reliable gradients. We break-away from existing diffusion-based posterior samplers and reformulate likelihood-based guidance in a smoother embedding space. This embedding space is learned using a contrastive objective to bring compatible trajectory-floorplan pairs close together while pushing mismatched pairs apart. We show that this surrogate likelihood score in the embedding space provides a valid approximation to the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent reconstructions and is more robust than existing inverse-solvers and guided diffusion. Beyond spatial mapping, we show that our method can be applied more broadly, suggesting a route toward solving generalized blind inverse problems using diffusion models.


[91] 2510.13031

Towards xApp Conflict Evaluation with Explainable Machine Learning and Causal Inference in O-RAN

The Open Radio Access Network (O-RAN) architecture enables a flexible, vendor-neutral deployment of 5G networks by disaggregating base station components and supporting third-party xApps for near real-time RAN control. However, the concurrent operation of multiple xApps can lead to conflicting control actions, which may cause network performance degradation. In this work, we propose a framework for xApp conflict management that combines explainable machine learning and causal inference to evaluate the causal relationships between RAN Control Parameters (RCPs) and Key Performance Indicators (KPIs). We use model explainability tools such as SHAP to identify RCPs that jointly affect the same KPI, signaling potential conflicts, and represent these interactions as a causal Directed Acyclic Graph (DAG). We then estimate the causal impact of each of these RCPs on their associated KPIs using metrics such as Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE). This approach offers network operators guided insights into identifying conflicts and quantifying their impacts, enabling more informed and effective conflict resolution strategies across diverse xApp deployments.


[92] 2510.25389

CNNs in the Air via Reconfigurable Intelligent Surfaces

This paper introduces AirCNN, a novel paradigm for implementing convolutional neural networks (CNNs) via over-the-air (OTA) analog computation. By leveraging multiple reconfigurable intelligent surfaces (RISs) and transceiver designs, we engineer the ambient wireless propagation environment to emulate the operations of a CNN layer. To comprehensively evaluate AirCNN, we consider two types of CNNs, namely classic two-dimensional (2D) convolution (Conv2d) and light-weight convolution, i.e., depthwise separable convolution (ConvSD). For Conv2d realization via OTA computation, we propose and analyze two RIS-aided transmission architectures: multiple-input multiple-output (MIMO) and multiple-input single-output (MISO), balancing transmission overhead and emulation performance. We jointly optimize all parameters, including the transmitter precoder, receiver combiner, and RIS phase shifts, under practical constraints such as transmit power budget and unit-modulus phase shift requirements. We further extend the framework to ConvSD, which requires distinct transmission strategies for depthwise and pointwise convolutions. Simulation results demonstrate that the proposed AirCNN architectures can achieve satisfactory classification performance. Notably, Conv2d MISO consistently outperforms Conv2d MIMO across various settings, while for ConvSD, MISO is superior only under poor channel conditions. Moreover, employing multiple RISs significantly enhances performance compared to a single RIS, especially in line-of-sight (LoS)-dominated wireless environments.


[93] 2511.00783

When Semantics Connect the Swarm: LLM-Driven Fuzzy Control for Cooperative Multi-Robot Underwater Coverage

Underwater multi-robot cooperative coverage remains challenging due to partial observability, limited communication, environmental uncertainty, and the lack of access to global localization. To address these issues, this paper presents a semantics-guided fuzzy control framework that couples Large Language Models (LLMs) with interpretable control and lightweight coordination. Raw multimodal observations are compressed by the LLM into compact, human-interpretable semantic tokens that summarize obstacles, unexplored regions, and Objects Of Interest (OOIs) under uncertain perception. A fuzzy inference system with pre-defined membership functions then maps these tokens into smooth and stable steering and gait commands, enabling reliable navigation without relying on global positioning. Then, we further coordinate multiple robots by introducing semantic communication that shares intent and local context in linguistic form, enabling agreement on who explores where while avoiding redundant revisits. Extensive simulations in unknown reef-like environments show that, under limited sensing and communication, the proposed framework achieves robust OOI-oriented navigation and cooperative coverage with improved efficiency and adaptability, narrowing the gap between semantic cognition and distributed underwater control in GPS-denied, map-free conditions.


[94] 2601.20900

Text-only adaptation in LLM-based ASR through text denoising

Adapting large language model (LLM)-based automatic speech recognition (ASR) systems to new domains using text-only data is a significant yet underexplored challenge. Standard fine-tuning of the LLM on the target domain text often disrupts the critical alignment between the speech and text modality learned by the projector, degrading performance. We introduce a novel text-only adaptation method that frames this process as a text denoising task. Our approach trains the LLM to recover clean transcripts from noisy inputs. This process effectively adapts the model to a target domain while preserving cross-modal alignment. Our solution is lightweight, requiring no architectural changes or additional parameters. Extensive evaluation on two datasets demonstrates up to 22.1% relative improvement, outperforming recent state-of-the-art text-only adaptation methods.


[95] 2602.23312

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.


[96] 2603.09600

A Variational Latent Equilibrium for Learning in Neuronal Circuits

Brains remain unrivaled in their ability to recognize and generate complex spatiotemporal patterns. While AI is able to reproduce some of these capabilities, deep learning algorithms remain largely at odds with our current understanding of brain circuitry and dynamics. This is prominently the case for backpropagation through time (BPTT), the go-to algorithm for learning complex temporal dependencies. In this work we propose a general formalism to approximate BPTT in a controlled, biologically plausible manner. Our approach builds on, unifies and extends several previous approaches to local, time-continuous, phase-free spatiotemporal credit assignment based on principles of energy conservation and extremal action. Our starting point is a prospective energy function of neuronal states, from which we calculate real-time error dynamics for time-continuous neuronal networks. In the general case, this provides a simple and straightforward derivation of the adjoint method result for neuronal networks, the time-continuous equivalent to BPTT. With a few modifications, we can turn this into a fully local (in space and time) set of equations for neuron and synapse dynamics. Our theory provides a rigorous framework for spatiotemporal deep learning in the brain, while simultaneously suggesting a blueprint for physical circuits capable of carrying out these computations. These results reframe and extend the recently proposed Generalized Latent Equilibrium (GLE) model.


[97] 2603.10065

The Epistemic Support-Point Filter: Jaynesian Maximum Entropy Meets Popperian Falsification

This paper proves that the Epistemic Support-Point Filter (ESPF) is the unique optimal recursive estimator within the class of epistemically admissible evidence-only filters. Where Bayesian filters minimize mean squared error and are driven toward an assumed truth, the ESPF minimizes maximum entropy and surfaces what has not been proven impossible -- a fundamentally different epistemic commitment with fundamentally different failure modes. Two results locate this theorem within the broader landscape of estimation theory. The first is a unification: the ESPF's optimality criterion is the log-geometric mean of the alpha-cut volume family in the Holder mean hierarchy. The Popperian minimax bound and the Kalman MMSE criterion occupy the p=+inf and p=0 positions on the same curve. Possibility and probability are not competing frameworks: they are the same ignorance functional evaluated under different alpha-cut geometries. The Kalman filter is the Gaussian specialization of the ESPF's optimality criterion, not a separate invention. The second result is a diagnostic: numerical validation over a 2-day, 877-step Smolyak Level-3 orbital tracking run shows that possibilistic stress manifests through necessity saturation and surprisal escalation rather than MVEE sign change -- a direct consequence of the Holder ordering, not an empirical observation. Three lemmas establish the result: the Possibilistic Entropy Lemma decomposes the ignorance functional; the Possibilistic Cramer-Rao Bound limits entropy reduction per measurement; the Evidence-Optimality Lemma proves minimum-q selection is the unique minimizer and that any rule incorporating prior possibility risks race-to-bottom bias.


[98] 2603.10711

Parallel-in-Time Nonlinear Optimal Control via GPU-native Sequential Convex Programming

Real-time trajectory optimization for nonlinear constrained autonomous systems is critical and typically performed by CPU-based sequential solvers. Specifically, reliance on global sparse linear algebra or the serial nature of dynamic programming algorithms restricts the utilization of massively parallel computing architectures like GPUs. To bridge this gap, we introduce a fully GPU-native trajectory optimization framework that combines sequential convex programming with a consensus-based alternating direction method of multipliers. By applying a temporal splitting strategy, our algorithm decouples the optimization horizon into independent, per-node subproblems that execute massively in parallel. The entire process runs fully on the GPU, eliminating costly memory transfers and large-scale sparse factorizations. This architecture naturally scales to multi-trajectory optimization. We validate the solver on a quadrotor agile flight task and a Mars powered descent problem using an on-board edge computing platform. Benchmarks reveal a sustained 4x throughput speedup and a 51% reduction in energy consumption over a heavily optimized 12-core CPU baseline. Crucially, the framework saturates the hardware, maintaining over 96% active GPU utilization to achieve planning rates exceeding 100 Hz. Furthermore, we demonstrate the solver's extensibility to robust Model Predictive Control by jointly optimizing dynamically coupled scenarios under stochastic disturbances, enabling scalable and safe autonomy.