New articles on Electrical Engineering and Systems Science


[1] 2606.24910

End-to-End Voice Intent Recognition for Spontaneous Human-Drone Interaction with Naive Users

Voice control offers an intuitive alternative to manual drone piloting, yet most existing systems rely on rigid command vocabularies that fail to handle the spontaneous, disfluent speech of naive users. This paper addresses this gap by proposing an End-to-End Spoken Language Understanding architecture for real-time human-drone interaction in French. Our model combines a frozen Self-Supervised Learning acoustic encoder with a lightweight LSTM-based classification head, augmented by a cross-modal knowledge distillation objective that aligns acoustic representations with semantic embeddings from a text teacher, without requiring transcription at inference time. We evaluate our approach on VoiceStick, a novel French corpus of spontaneous speech collected during real teleoperation sessions with 29 nonexpert dyads. On simple voice commands, our best configuration achieves 93% accuracy at 7 ms inference latency, outperforming cascade baselines (79%, 202 ms) with a 29x speedup. On the full spontaneous speech test set, our architecture reaches 82% accuracy, with crossmodal distillation consistently improving robustness across all configurations. These results demonstrate that End-to-End architectures are not only feasible but preferable for spontaneous voice-guided UAV teleoperation, combining semantic robustness, low latency, and calibrated confidence.


[2] 2606.24929

Reconstruction of chaotic systems in invariant jet space

Takens' theorem is the gold standard for attractor reconstruction from time series, but it guarantees only topological equivalence and does not preserve metric or group properties such as symmetries. We show that switching from delay-coordinate space to jet space (signal and its derivatives) allows one to exactly preserve the symmetry group of the original system. This statement is rigorously justified by a theorem on the isomorphism of Lie algebras under jet prolongation. Numerical experiments on the Lorenz and Rössler systems confirm that jet-space reconstruction preserves geometry and symmetries, whereas Takens embedding distorts them. As quantitative metrics we use a variational elastic energy functional and the correlation dimension. It is shown that jet-space reconstruction not only outperforms Takens embedding but in some cases yields more accurate estimates of invariants than projections of the original system. The proposed approach provides a coordinate-invariant criterion for the classification of strange attractors and can serve as a basis for detecting hidden attractors.


[3] 2606.24944

A Leakage-Aware Comparative Benchmark of Machine Learning, Deep Learning, and Transformer Models for Reliable Leukemia Detection

Automated classification of acute lymphoblastic leukemia (ALL) from peripheral blood smear images has often reported near-perfect performance on the C-NMC 2019 dataset. We show that such results can be inflated by patient-level data leakage caused by random image-level partitioning, where cells from the same subject may appear in both training and test folds. We establish a leakage-aware benchmark under a strict subject-disjoint protocol, comparing LightGBM, RBF-SVM, EfficientNet-B0, EfficientNet-B1, and ViT-Tiny. Models are developed using three subject-disjoint folds from 73 subjects and evaluated on an external preliminary-phase test set of 1,867 images from 28 unseen subjects with zero patient overlap. Beyond discrimination, we assess calibration using expected calibration error, Brier score, and temperature scaling. Under honest evaluation, EfficientNet-B1 achieves the best performance, with AUROC 0.913, sensitivity 0.87, specificity 0.80, and calibrated ECE 0.024. Frozen-feature classifiers and ViT-Tiny show high sensitivity but poor specificity, indicating a tendency to over-predict the malignant class. A random-versus-subject-disjoint ablation shows that random splitting inflates AUROC by about 0.04 even in the conservative frozen-feature setting. These findings caution against image-level evaluation on C-NMC 2019 and provide a reproducible, calibration-aware benchmark for future work.


[4] 2606.24991

Solving Markov Decision Processes with Future Information via MPC

Model Predictive Control (MPC) is widely used in industrial and robotic systems for enforcing constraints and embedding domain knowledge through finite-horizon optimization-based planning. However, despite these strengths, an MPC scheme typically does not yield optimal policies for sequential decision-making problems formulated as Markov Decision Processes (MDPs). Recent combinations of MPC with Reinforcement Learning (RL) alleviate this issue by treating MPC as a parameterized model of the optimal policy of an MDP and adjusting its parameters using data. While these approaches typically consider classical MDPs, many real-world problems include future information--such as forecasts, prices, or reference trajectories--at decision time, which must be included in the MDP state for optimal decision-making. Current MPC-RL approaches do not directly account for this augmented-state structure, raising the question of how to incorporate future information into MPC to obtain an optimal policy. This work establishes the structural requirements under which a parameterized MPC can exactly represent the optimal value functions and policy of an MDP with future information. We further demonstrate that such a parameterized MPC can serve as a structured function approximator, with its parameters learned using RL. The approach is illustrated on a point-mass racing task with future reference information.


[5] 2606.25095

Toward Next-Generation AI Data Centers: Power Delivery Architecture Shifts, Emerging Technologies, and Challenges

The rapid growth of AI workloads is driving unprecedented increases in data center power demand, current transients, and thermal stress, exposing fundamental limitations in traditional 48 V rack architectures, low-voltage AC distribution, and line-frequency transformer interfaces. This paper reviews the three stages of architectural shifts required to support next-generation AI data centers and identifies three enabling technological building blocks: high-voltage conversion-ratio DC/DC converters, facility-level low-voltage DC distribution, and medium-voltage solid-state transformers. The advantages, technical challenges, and potential solutions associated with each building block are reviewed. Finally, future research directions and open challenges are discussed.


[6] 2606.25116

BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions

Respiratory acoustic foundation models (FMs) are benchmarked exclusively on smartphone recordings, yet clinical deployment increasingly targets body-coupled (BC) wearables whose sensors attenuate high-frequency content through tissue and bone, leaving FM reliability uncharacterised. We introduce BCoughBench, evaluating five FMs (OPERA-CT/CE/GT, HeAR, M2D+Resp) on nine classification tasks (AUROC, sensitivity at 95% specificity, Expected Calibration Error) and three age regression tasks (MAE vs. a mean-predictor baseline) across five EBEN-simulated BC sensor conditions on five labeled cough datasets. Mean AUROC declines from 0.785 (smartphone) to 0.689-0.723, degrading most under temple vibration pickup ($\Delta$ = -0.096) and least under the soft in-ear ($\Delta$ = -0.062). No FM meets the clinical sensitivity threshold (Se@Sp95 $\geq$ 0.20) on most disease tasks under any BC sensor. Sex classification on the CIDRZ cohort collapses (AUROC 0.954 to 0.596-0.628, $\Delta$ = -0.341) while COVID detection is nearly unaffected ($\Delta$ = -0.004). Age regression is robust, improving under the forehead accelerometer on CoughVID (MAE 9.61 to 8.97 yr); HeAR leads on regression and demographic tasks, M2D+Resp on disease and characteristic tasks. BCoughBench provides a reproducible framework for FM evaluation under wearable conditions.


[7] 2606.25128

Benchmarking the Alignment of Data-Quality Metrics, Human Judgment and Land-Cover Segmentation Performance for Earth Observation

Volume and quality of datasets are crucial for deep learning model training, yet they are often constrained by availability and data acquisition costs. Synthetic data augmentation can extend existing datasets with realistic images, and the quality of these images is generally assessed through fidelity metrics such as FID, KID, IS, LPIPS and SSIM that measure structural or distributional similarity. However, such metrics, including the widely used FID, focus on visual fidelity without reflecting downstream utility, and can diverge from human perception under perturbations that are imperceptible to human observers. In this work, we systematically evaluate Earth observation datasets alongside synthetic counterparts generated by deep generative models, comparing automatic metrics against human perception and downstream tasks. Our results reveal a stark misalignment: semantics-preserving perturbations such as rotation drastically alter metric scores while leaving human recognition unaffected, and synthetic samples that score poorly on automatic metrics achieve comparable or higher perceived realism, and can improve downstream performance when combined with real data. By benchmarking semantic segmentation models trained on mixed real-synthetic datasets, we demonstrate that quality metrics rooted in ImageNet-pretrained feature spaces are unreliable indicators for geospatial data. Our findings underscore that automatic quality evaluation of synthetic datasets should be grounded in downstream task performance and human evaluation.


[8] 2606.25139

Buildrix: An Open Platform for Sharing and Benchmarking Agentic AI Skills in Building Engineering

Agentic AI offers significant potential to automate complex building-engineering workflows. However, most existing applications remain isolated proof-of-concept demonstrations and lack reusable domain capabilities, human-verified evaluation cases, and standardized benchmarking infrastructure. This study presents Buildrix, an open, community-driven platform for developing, sharing, executing, and evaluating agentic AI skills for building engineering. Buildrix integrates three components: a Python command-line package for developing, validating, publishing, installing, and managing skills and test cases; a web-based Hub for organizing open challenges, reusable skills, test cases, reviews, and benchmark results; and a local agent harness that supports skill discovery, external toolchain provisioning, progressive context loading, and multi-step workflow execution. Buildrix skills are organized as standardized, self-contained packages containing task instructions, executable scripts, dependencies, and supporting resources. Quantitative test cases can be verified by domain experts and promoted to golden test cases for reproducible benchmark evaluation. Buildrix provides an open foundation for reusable capability development, transparent evaluation, and community-driven advancement of agentic AI in building engineering.


[9] 2606.25181

Phoneme-Level Mispronunciation Screening in Polish-Speaking Children with an Explainable Assistant

Early identification of speech sound errors in children is often limited by access to specialists, motivating lightweight screening tools that can operate outside the clinic. We present a screening pipeline for Polish-speaking children focused on sibilant substitutions, coupling a wav2vec2-based CTC token recognizer with alignment-based error typing and a template-grounded caregiver assistant for screening, not diagnosis. On a held-out test set of 10 unseen children comprising 559 utterances, the recognizer achieves 88.7 percent exact sequence match. As a conservative screening proxy, we flag a mismatch when the system emits substitution-evidence bracketed tokens at the target segment, yielding 72.9 percent precision, 61.4 percent recall, F1 = 0.67, and a 2.7 percent false-alarm rate on target-correct items. We describe the assistant's safety boundaries and outline a clinician-in-the-loop validation plan for future deployment.


[10] 2606.25226

Dimension expansion for simulation-efficient nanophotonic neural networks

Inverse design of nanophotonic structures is challenging due to the large design space, nonlinear structure-response relationships, and the high computational cost of iterative electromagnetic simulations. Existing deep-learning approaches typically rely on large precomputed datasets or libraries of optimized structures, which limits scalability to continuous and complex inverse-design tasks. We introduce a Dimension Expansion Network (DEN), a fully unsupervised, simulation-efficient framework for nanophotonic inverse design. DEN addresses the mismatch between low-dimensional design objectives and high-dimensional nanophotonic structures by transforming compact target parameters into structured, high-dimensional conditioning representations before inverse design. This improves target expressivity and conditioning quality for structure generation. The model is trained end-to-end using differentiable electromagnetic simulations, removing the need for any pre-generated dataset. We validate DEN on free-form metalens and asymmetric Y-splitter design problems. For metalens design, DEN achieves focal intensities comparable to adjoint-based optimization while reducing simulation cost by approximately 50% and generalizing across tens to thousands of focal targets within a shared focal region. For Y-splitter design, DEN accurately produces arbitrary power-splitting ratios using only 21 training targets and demonstrates robust broadband performance. Ablation studies and representation analyses show that dimension expansion enhances sensitivity to target variations, increases structural diversity, and reduces mode-collapse-like behavior. Overall, DEN provides a scalable conditioning strategy for inverse design with low-dimensional objectives, enabling efficient photonic design across large continuous target spaces.


[11] 2606.25254

Dual Agreement Consistency Learning for Semi-Supervised Fetal Ultrasound Segmentation

Maternal-fetal US is the primary imaging modality for monitoring fetal development, yet accurate automated segmentation remains challenging due to the scarcity of pixel-level annotations. To address this issue, we propose DACL, a semi-supervised framework for robust fetal US image segmentation. DACL jointly trains a deployment-oriented lightweight convolutional network (1.47\thinsp\mathrm{M} parameters) and a Transformer-based network, leveraging labeled data for supervised learning and unlabeled data via CPS. To enhance prediction stability, we introduce a dual-agreement consistency loss that couples pixel-wise probabilistic divergence with entropy-guided confidence alignment. Unlike conventional CPS methods that enforce agreement only at the prediction level, DACL explicitly regularizes both distributional alignment and uncertainty, thereby suppressing unreliable pseudo-labels and enabling stable cross-architecture pseudo-label learning under extreme annotation scarcity. Furthermore, an interpolation-based consistency strategy using mixup is applied to unlabeled samples to enhance robustness. Under 5% labeled data, DACL improves Dice by up to 2.77% and reduces HD95 by up to 14.69 mm compared with the strongest recent semi-supervised methods, demonstrating significant improvements in boundary accuracy on both fetal head and abdomen datasets. These results demonstrate the effectiveness of agreement-based consistency learning for annotation-efficient fetal US segmentation. Our code is on GitHub.


[12] 2606.25301

Active Learning for Optimal Experimental Design in Machine Learning-Based Building Energy System Identification

Machine learning (ML) techniques have been commonly adopted to identify the dynamics of building energy systems (BESs), owing to their flexibility relative to first-principles, physics-based modeling approaches. Beyond the choice of ML architecture, the quality of the training data plays an essential role in the resulting model performance. Optimal experimental design (OED), realized in this work through active learning (AL), determines which experiments to conduct in order to collect informative data, rather than relying on standard approaches such as uniformly random sampling. This paper proposes a systematic comparison of OED via AL for building energy system identification, with a particular focus on HVAC thermal dynamics. We investigate fourteen AL techniques across two ML model classes, namely a deterministic feedforward neural network and a stochastic Gaussian process, and classify these techniques into four categories: data space, uncertainty, information gain, and model change. To examine the AL algorithms under realistic conditions, we implement and evaluate them on the high-fidelity building simulator BOPTEST. The results, reported as the root mean square error across multiple test scenarios with varying initial dataset sizes and control input constraints, show that AL-based models generally outperform models trained via passive learning (PL) with uniformly random control inputs, achieving error reductions of up to 54\%, although the magnitude and consistency of this improvement vary across acquisition functions and operating regimes.


[13] 2606.25310

Theoretical Analysis of Diffusion Models for Radio Map Estimation with Ultra-low Sampling Rates

Radio maps, which characterize the spatial distribution of radio frequency metrics such as received signal strength, are essential for a wide range of wireless applications. The problem of radio map estimation involves constructing a radio map from sparse sensor measurements at multiple locations. This problem is particularly challenging due to ultra-low sampling rates, where available sensor measurements are far fewer than the high resolution requirement of radio maps to be estimated. Recently, diffusion models have been increasingly adopted for this problem, yet its theoretical performance remains unexamined. This paper bridges this gap by formulating radio map estimation as a non-linear matrix completion problem. Based on this formulation, we first derive a theoretical lower bound on the minimum estimation error achievable by diffusion models, which is fundamentally governed by the discrepancy between the deployment distribution and the true underlying radio propagation law. We then extend this bound to incorporate the effect of sampling sparsity, capturing the additional error introduced by ultra-low sampling rates. Furthermore, we establish a critical sampling rate threshold necessary for diffusion models to achieve performance convergence. Finally, considering that the derived error bounds depend on certain information that is difficult to obtain in practice, we propose empirical approximations that are readily computable from observable data. Extensive simulations based on real-world traces demonstrate that these empirical formulas tightly approximate the theoretical error bounds, validating their effectiveness for practical deployment.


[14] 2606.25371

Conformal Recovery-Deadline Certificates for Runtime Assurance of Adapting Controllers

Runtime assurance (RTA) protects a safety-critical system by switching from an advanced controller to a verified safe controller when a monitored condition is violated. The standard latching rule, which trips on the first breach of the safe set and then coasts, is correct for a diverging controller but pathological for a capable online-adapting one. Such a controller is unsafe by design during a bounded recovery transient. It must excite the plant to identify the fault before it can correct it, so a latching shield trips on that transient and suppresses a controller that would have recovered. We introduce the conformal recovery-deadline certificate, a split-conformal, distribution-free, finite-sample upper bound on the adapting controller's recovery time that licenses delayed fallback with a coverage guarantee, backstopped by a verified monitor at a hard critical limit. The certified deadline discriminates capable from incapable controllers, keeping the recoverer autonomous while catching the diverger. The construction separates autonomy, governed by statistical coverage, from safety, governed by the verified backstop, as an instance of reliability-asymmetric design. We prove marginal coverage, a weighted extension that restores coverage under a known fault-distribution shift, and group-conditional Mondrian coverage. We demonstrate all three on two unrelated Simplex testbeds: a 6-DOF spacecraft attitude controller and a torque-controlled inverted pendulum. Both show the same suppression pathology and the same cure, making the certificate a domain-general mechanism rather than a single-system trick.


[15] 2606.25403

CrossAccent-TTS: Cross-Lingual Accent-Intensity Controllable Text-to-Speech via Disentangled Speaker and Accent Representations

Accent conversion and controllability remain fundamental challenges in cross-lingual text-to-speech (TTS), particularly for low-resource and phonetically diverse Indic languages. While recent large language model (LLM)-based TTS systems exhibit strong cross-lingual generalization, they provide limited explicit control over accent characteristics and intensity. In this paper, we propose CrossAccentTTS, a framework that enables both accent control and conversion while preserving speaker identity. Specifically, we introduce an Accent Intensity Controller (AIC) that injects weighted language embeddings into the accent subspace, allowing smooth interpolation between accents and fine-grained modulation of accent strength at inference time. Experiments on the Indic Multilingual and L2-arctic datasets shows that CrossAccent-TTS achieves precise control of accent intensity, outperforming strong baselines in accent similarity and controllability by maintaining speaker similarity and naturalness.


[16] 2606.25424

Adaptive Oscillatory Inductive Bias for Modeling Sharp Prosodic Dynamics in Diffusion-Based TTS

Diffusion-based text-to-speech (TTS) models have achieved significant improvements in speech quality. However, modeling sharp prosodic transitions and rapid pitch variations in expressive speech remains challenging. Existing diffusion-based TTS decoders commonly utilize periodic nonlinearities such as Snake activation function to capture harmonic structures, but this activation funcation provides limited adaptability when modeling abrupt amplitude and frequency variations. In this paper, we investigate the role of oscillatory inductive bias in diffusion-based TTS decoders and introduce an adaptive oscillatory nonlinearity that enables controllable periodic modulation while maintaining signal stability through a linear bypass component. We refer the resulting TTS system as OscillaTTS. Experiments on the LJSpeech and Emotional Speech Dataset show consistent improvements across objective and subjective evaluations, indicating improved modeling of expressive prosodic dynamics.


[17] 2606.25436

Evaluating Japanese Dialect Robustness Across Speech and Text-based Large Language Models

Dialogue systems based on large language models (LLMs) have advanced significantly in recent years. However, dialectal variation remains a major challenge, particularly for systems that process spoken input. LLM-based speech language models (SLMs), which integrate LLMs with speech processing components, show promise for spoken language tasks, yet their ability to comprehend dialects has not been sufficiently studied. Moreover, it remains unclear how the dialectal understanding of the base LLM affects SLM performance. This study investigates the dialectal robustness of both LLMs and SLMs using Japanese dialects as a test case. We define robustness as the ratio of performance on dialectal versus standard inputs, enabling fair comparisons. Our experiments show that SLM robustness correlates with that of their text-based counterparts. Furthermore, training with dialectal data and fine-tuning the speech encoder each improves robustness in SLMs.


[18] 2606.25444

Does Translation-Enhanced Speech Encoder Pre-training Affect Speech LLMs?

Connecting a pre-trained speech encoder to a Large Language Model (LLM) is the standard architecture for building Speech LLMs. However, a structural misalignment exists between the encoder and the LLM. Unlike encoders based on automatic speech recognition, which often produce representations in separate language-specific spaces, LLMs operate within a unified language-agnostic space. A mechanism is required to align the encoder's language-specific representations with the LLM's shared space. We argue that speech translation provides a principled way to achieve this. Unlike monolingual transcription, translation requires the model to bridge different languages and learn language-agnostic representations. We experimentally evaluate the impact of incorporating translation objectives into speech encoder pre-training. Our results demonstrate that translation-enhanced pre-training improves cross-modal integration and leads to superior performance across downstream Speech LLM tasks.


[19] 2606.25452

Control Barrier Function only Formation Tracking in Multi-Agent Systems

This paper presents a real-time control framework for formation tracking of heterogeneous multi-agent systems with non-linear dynamics. The proposed method formulates a single Control Barrier Function-like constraint within a quadratic optimization setting that addresses formation tracking. Relying on the relative information of neighboring agents, the controller is designed to operate without the need for manual parameter tuning or a separate nominal formation controller. The leader-follower framework is validated through simulations of moving formations.


[20] 2606.25460

Fully Differentiable Neural Forced Alignment via Soft Dynamic Programming

Recent advances in sequence modeling have significantly improved ASR systems, bringing them close to human-level recognition accuracy and enhancing robustness across diverse acoustic conditions and languages. In contrast, Forced Alignment has not experienced comparable progress, and traditional HMM-GMM frameworks remain widely adopted and highly competitive. To address this gap, we propose an end-to-end, fully differentiable neural architecture specifically designed for phoneme alignment. The model consists of an encoder that processes the input signal and a decoder that produces alignment decisions. The encoder is structured into two complementary branches: one dedicated to phoneme identity verification and the other to phoneme boundary detection. The decoder is implemented as a trainable module based on differentiable soft dynamic programming. The entire system is optimized end-to-end using a novel contrastive loss that encourages clear separation between steady-state phoneme regions and transition boundaries. The proposed approach outperforms the current state of the art in phoneme alignment on hand-annotated English benchmarks, achieves strong word-level generalization results, and demonstrates generalization on unseen languages.


[21] 2606.25463

Blasto-Net: An Explainable Multi-Task Learning for Blastocyst Segmentation, Grading, and Implantation Prediction

This study introduces Blasto-Net, a multi-task deep learning model for comprehensive blastocyst analysis. The proposed model performs three tasks simultaneously in a single forward pass: segmentation of the ZP, TE, and ICM compartments, morphological grading, and implantation outcome prediction. Accurate blastocyst analysis in in vitro fertilization (IVF) is challenging. The compartments often have similar textures but very different structures. To address these challenges, Blasto-Net employs an EfficientNet-B3 encoder with a UNet-style decoder enhanced by the Convolutional Block Attention Module (CBAM) and a novel Edge-Aware Attention Module (EAAM) to effectively capture both semantic and boundary information. To handle distinct compartment topologies, the network employs specialized segmentation heads and a composite region- and boundary-based loss. Additionally, Grad-CAM++ visualizations are used to verify the anatomical consistency of the model's predictions. Evaluated on a public HMC blastocyst dataset, Blasto-Net achieves Dice scores of 94.93%, 91.60%, and 88.82% for ICM, ZP, and TE, respectively, alongside an implantation F1-score of 80.0%. These results demonstrate that Blasto-Net offers an accurate, interpretable, and efficient solution for automated blastocyst assessment, with strong potential to support clinical decision-making in IVF.


[22] 2606.25571

One Terahertz Full-Field Digital Back-Propagation over 3000 km

We implement full-field digital back-propagation with a 1-THz receiver using 20 synchronous frequency-adjacent coherent receivers with digital stitching and a frequency-comb local oscillator. Relative to electronic dispersion compensation, per-channel DBP and full-field DBP achieve throughput gains of 2.2\% and 5.4\%, respectively.


[23] 2606.25572

MCRB and MSE Analysis for Parameter Estimation in AFDM-ISAC Systems

Affine frequency division multiplexing (AFDM) is a promising waveform for integrated sensing and communication (ISAC). In AFDM systems, the complex gains, delays, and Doppler shifts are commonly estimated from the AFDM symbols carrying pilots and data simultaneously. In practice, however, the unknown data symbols and data-pilot coupling interference may render the estimator mismatched to the true signal model. In this paper, we systematically characterize the parameter-estimation performance of AFDM-ISAC systems under practical model misspecification. The main contributions are threefold. First, we extend the Cramér-Rao bound (CRB) for a general observation model that treats the data symbols as unknown, which generalizes existing AFDM CRB analyses and serves as the matched benchmark for the subsequent analysis. Second, we identify two practical sources of misspecification, namely a covariance mismatch caused by insufficient pilot-data isolation and a combined covariance-and-mean mismatch caused by sequential single-target estimation, and derive the corresponding misspecified CRB (MCRB). Third, we characterize the pseudotrue parameters under different levels of prior knowledge, analyze the resulting estimation bias, and establish a lower bound (LB) on the mean square error (MSE). Simulation results validate the derived bounds and show that, under model misspecification, the CRB is overly optimistic while the MCRB and LB faithfully characterize the achievable accuracy. The comparison further reveals how these bounds vary with the pilot length and pilot power, providing useful guidance for pilot configuration.


[24] 2606.25579

Cross-Attention Multimodal Learning for Predicting Response to Neoadjuvant Imatinib in Gastrointestinal Stromal Tumors: A Multicenter Retrospective Study

Background: Response to neoadjuvant imatinib in gastrointestinal stromal tumors (GISTs) is highly variable and cannot be reliably predicted using current clinical or molecular markers. This study developed and evaluated an explainable multimodal deep learning framework integrating computed tomography (CT) imaging and clinical variables to predict treatment response. Methods: Patients from four tertiary centers were retrospectively included between 2000-2023 in independent pretraining (n=935) and prediction (n=213) cohorts. A cross-attention framework integrating clinical variables and tumor-centered CT imaging was developed to predict response to neoadjuvant imatinib. Two training strategies were evaluated: (1) self-supervised pretraining with low-rank adaptation and (2) training from scratch. Hyperparameters were optimized using SMAC3. Performance was assessed through internal cross-validation and external testing. Ablation analyses and attention-based explanations were used to quantify modality contributions. Results: Among 213 patients (54.5% responders), responders had larger tumors (112 vs. 89 mm, P=0.026), higher mitotic index (3 vs. 0, P<0.001), and more frequent KIT mutations (69.0% vs. 56.7%, P=0.019). Cross-attention models achieved the highest internal performance (AUC up to 0.99) but lower external performance (AUC 0.60-0.63). Clinical-only performance was moderate (AUC 0.66), whereas imaging-only models showed limited generalizability (AUC 0.56-0.66). Explainability analyses identified significant differences in feature importance between responders and non-responders, including CD117, BRAF, PDGFRA, age, sex, disease status, and comorbidities (FDR-adjusted P<=0.036). Conclusion: The cross-attention framework shows potential for improving imatinib response prediction in GIST while providing interpretable insights into multimodal determinants of treatment response.


[25] 2606.25599

Reference-Free Heterogeneous Multi-Agent Reinforcement Learning for Grid-Friendly Tie-Line Power Shaping in Industrial Microgrids

Tie-line power (TLP) shaping is a key requirement for the grid-friendly operation of industrial microgrids (IMGs). This paper studies the coordination of multi-timescale heterogeneous adjustable resources in a steel IMG to shape a grid-friendly TLP trajectory considering multiple objectives. A sequential heterogeneous-agent coordination (SHAC) framework is proposed, where process loads, hydrogen storage, and battery storage are modeled as functionally heterogeneous agents with cross-role observations, asynchronous decision intervals, role-specific rewards and critics. This design captures the heterogeneous temporal effects of different resources on the TLP trajectory and alleviates ambiguous credit assignment and weak inter-agent coordination. To ensure feasible real-time execution, process-knowledge-based action masking and feasibility projection are embedded into policy execution, and a role-aware multi-timescale actor--critic training scheme is developed for agents with different action structures and decision intervals. Numerical studies using real renewable generation and electricity market data show that SHAC effectively eliminates the dependence on predefined reference trajectories and enables adaptive 1-min online decision-making, achieving zero production failures with an average computational time of only 0.4 ms per step. Compared with the original operation, SHAC reduces the total grid purchase cost, contract-demand exceedance time, and cumulative ramp excess by 91.27\%, 98.64\%, and 96.91\%, respectively. These results demonstrate that the proposed framework improves the economic efficiency and grid friendliness of industrial microgrid operation while satisfying strict process-safety constraints and real-time computational requirements.


[26] 2606.25639

Optimization-Based Velocity-Integral Sliding-Window Coarse Alignment: Attitude Error Analysis and Validation

The optimization-based alignment (OBA) approach transforms the strapdown inertial navigation system (SINS) coarse alignment into a constant initial attitude estimation problem, serving as a prevalent technique for global navigation satellite system (GNSS)-aided in-motion alignment. While existing studies focus on improving accuracy by refining attitude determination algorithms or constructing robust observation vectors, a rigorous analytical mapping to evaluate the resulting attitude errors from raw sensor and aiding-velocity uncertainties has yet to be established for fixed-length sliding-window velocity-integral OBA. To address this issue, this paper proposes a first-order attitude error propagation model for GNSS-aided sliding-window velocity-integral OBA. Specifically, a sliding-window observation model and its discrete implementation are formulated, through which gyroscope errors, accelerometer errors, GNSS velocity noise, and lever-arm effects are analytically propagated to non-normalized observation vectors. Subsequently, Davenport's q method is used to establish the mapping from these vector perturbations to attitude misalignment. By decoupling systematic errors and stochastic noise, the deterministic attitude offsets and the attitude error covariances are respectively derived. Monte Carlo simulations demonstrate that the analytical model accurately captures the deterministic attitude offsets and precisely characterizes the statistical spread, yielding standard-deviation ratios between 0.929 and 1.060 with empirical coverage above 99.4%. Vehicle field tests further confirm its practical applicability, showing that the predicted covariance envelopes reliably bound the actual initial-attitude errors, with steady-state RMSEs strictly below 0.00495 deg. These results validate the proposed model for coarse-alignment attitude error assessment.


[27] 2606.25672

Joint Residual Reweighting for Classifier Free Guidance in Flow-Matching Zero-Shot TTS

Classifier-free guidance (CFG) is widely used in flow-matching-based zero-shot text-to-speech (TTS), where generation is typically controlled by two conditions: the target text and a prompt speech signal. Standard CFG strengthens these conditions jointly, while recent branch-selective guidance methods attempt to enhance text or speaker conditioning separately, often leading to a trade-off between text correctness and speaker similarity. In this paper, we revisit the CFG under independently masked text and speech-prompt conditions, and decompose the guidance field into text, speaker, and joint residuals. We show that conventional speaker-selective guidance entangles the speaker residual with the joint residual, which may disturb text-related generation. Based on this observation, we propose joint residual reweighting, which independently controls the speaker and joint residuals within the standard CFG framework. Experiments on F5-TTS and CosyVoice2 show that the proposed method improves speaker similarity while maintaining competitive text correctness, demonstrating the usefulness of the joint residual for balancing speaker fidelity and text accuracy in zero-shot TTS.


[28] 2606.25708

Empirical characterization of the Translational acoustic-RF communication channel

Translational acoustic-radio frequency (TARF) communication paves the way for translating information from an underwater acoustic signal to the over-the-air (OTA) electromagnetic receiver through the medium interface. The study and characterization of the channel is essential for establishing a reliable communication link. Although channel modeling has been extensively studied for OTA and underwater channels, the amplitude characteristics of the TARF cross-medium channel have not been investigated in comparison with well-known distributions to date. In this work, we first refine the signal model to incorporate the effects of the wavefront-water surface interactions. With the help of numerical and graphical methods, we then attempt to characterize the cross-medium channel with empirical data using existing models developed for OTA and underwater channels. We further evaluate channel linearity and time invariance empirically. Observations from these studies over multiple experiments are detailed with additional discussions that enable better channel characterization to develop reliable and consistent cross-medium TARF communication in challenging scenarios.


[29] 2606.25714

CSI-CLIP++: A Scalable Channel Foundation Model for Wireless Communication via CIR-CSI Consistency

Self-supervised learning can exploit large-scale unlabeled channel data to improve the transferability of wireless AI models. Existing channel foundation models are often built on single-domain representations or reconstruction-oriented objectives, which may not explicitly capture the physical correspondence between frequency- and delay-domain channel views. This paper proposes CSI-CLIP++, a scalable channel foundation model for MIMO wireless channels. CSI-CLIP++ treats frequency-domain channel state information (CSI) and delay-domain channel impulse response (CIR) as paired views of the same propagation process and learns transferable representations through CSI-CIR contrastive alignment. The pretrained CSI encoder is adapted to channel identification, beam prediction, and positioning, representing PHY, RAN, and ISAC applications. Experiments on large-scale DeepMIMO scenarios show consistent gains over supervised baselines across environments, carrier frequencies, and data scales. CSI-CLIP++ improves beam prediction Top-1 accuracy by up to 19.31 percentage points and achieves competitive positioning performance, including cross-simulator transfer on a Sionna RT dataset. Backbone scaling results further show that the proposed objective remains effective across encoder architectures and benefits from larger model capacity.


[30] 2606.25725

Deep Learning-Assisted Multicast Subgrouping in Massive MIMO

Efficient content delivery in massive multiple-input multiple-output (mMIMO) multicasting is fundamentally limited by pilot overhead and the need to serve heterogeneous users with a common transmission rate. Conventional approaches either suffer from pilot contamination or are constrained by the worst-user effect, motivating the need for adaptive subgrouping strategies. In this paper, we propose a deep learning-assisted multicast subgrouping framework that infers the number of multicast subgroups directly from users' spatial channel statistics. A snapshot-specific principal component analysis (PCA) is applied to user covariance matrices to obtain a compact representation, which is processed by a sequential long short-term memory (LSTM) encoder capable of handling variable-size user sets. The model predicts the number of subgroups and groups of users based on their statistical similarity. To further improve system performance, we introduce a transfer learning (TL) extension where a pretrained LSTM encoder is reused, and a lightweight dense head is fine-tuned to estimate the sum spectral efficiency (SE) as a function of the subgroup configuration. This enables selecting near-optimal subgrouping solutions without exhaustive search. Simulation results demonstrate that the proposed approach consistently outperforms benchmark methods, including unicast transmission, conventional multicast, random subgrouping, and density-based clustering. The TL-enhanced model achieves up to 85% of the maximum achievable spectral efficiency while maintaining robust performance across diverse spatial user distributions and under imperfect covariance information.


[31] 2606.25744

Networked Control System Under Controller-Actuator Channel Jamming

Wireless channels in the networked control systems are vulnerable to intentional interference, such as jamming attacks. This paper investigates jamming attacks on the wireless controller actuator channel of a control system that can tolerate occasional control inputs from the controller. We start with a worst case scenario for the jammer where the controller knows its channel state. We develop an adaptive jamming strategy in which the jammer, observing the success or failure of each controller transmission, forms beliefs about its own and the controller actuator channel states. Using this belief, it optimizes its actions under a limited jamming budget. To counter this, we develop an event-triggered defense scheme for the controller in two settings: with and without the knowledge of its channel state. Simulation results show that optimal adaptive jamming attacks can significantly degrade control performance, even with a limited budget, while the defense scheme, even without channel state knowledge, can effectively reduce this impact.


[32] 2606.25864

Sequential and Generative Models for Vehicular Distributed MIMO Channel Prediction

Vehicular communication is a key 6G use case requiring reliable and high-capacity connectivity under fast mobility and highly time-varying propagation conditions. However, large-scale vehicular channel estimation is costly and limited, impacting system-level performance of vehicular communications, and realistic channel prediction models are needed. This paper proposes a vehicular channel prediction framework based on real measured urban channels collected through a dedicated measurement campaign using the MaMIMOSA channel sounder. The framework enables the training and systematic benchmarking of sequential and generative models for both single-step and multi-horizon vehicular channel state information (CSI) prediction to assess prediction robustness across different forecasting horizons, including LSTM, TCN, a CNN-enhanced Transformer, and ChannelGPT, with the goal of accurately predicting channel evolution while preserving spatiotemporal dynamics and non-stationarity. In addition, a system-level evaluation framework is introduced to assess the impact of channel prediction on the performance of vehicular distributed MIMO communications. Using predicted channels, spectral efficiency (SE) is evaluated against true CSI. Results show that ChannelGPT achieves over 94% normalized mean squared error (NMSE) reduction compared to LSTM and significant improvements over other baselines, while reducing FLOPs by 28% and inference latency by 39% relative to the CNN + Transformer. Moreover, ChannelGPT-predicted channels yield SE distributions nearly indistinguishable from those obtained with real measurements, demonstrating its effectiveness for reliable performance evaluation in high-mobility 6G vehicular networks.


[33] 2606.25888

Robustness and Leadership in Markov-switching Consensus Networks

We investigate how time-varying interactions, modeled via a Markov switching graph (MSG), impact the robustness of noisy multi-agent dynamics in both continuous- and discrete-time settings. Our focus is on the steady-state performance of consensus and leader-follower tracking dynamics subject to stochastic noise. Using the framework of Markov jump linear systems (MJLS), we derive expressions for the steady-state covariance of each agent's deviation from consensus and tracking error, respectively, and use them to quantify individual and group performance as a function of the interaction graphs and the switching dynamics. We extend established notions of robustness, certainty indices, and joint centrality from static graphs to the MSG setting. To gain analytical insight, we specialize our results to systems switching between two topologies and characterize how switching influences performance. Numerical simulations further illustrate how switching topologies affects system robustness in both coordination tasks.


[34] 2606.25924

Improving Richardson--Lucy Deconvolution with Diffusion Priors for Fluorescence Microscopy

Richardson--Lucy (RL) deconvolution improves fluorescence microscopy images by recovering details lost to diffraction. It estimates the original fluorescence signal that most likely produced the measured photon counts under a Poisson imaging model. Although RL incorporates a physical model of fluorescence image formation and can improve contrast, deconvolution remains fundamentally ill-posed, and the measurements alone provide limited evidence for reliably reconstructing fine biological structure. Without additional structural guidance, RL can amplify noise and exhibit unstable convergence in low-photon regimes. Regularizers such as total variation (TV) reduce this instability but often introduce oversmoothing. Here, we investigate learned generative priors as a form of structural guidance for RL by integrating a score-based diffusion prior into a decoupled inverse-problem framework for fluorescence microscopy deconvolution. The diffusion prior is used during the RL optimization iterations, while RL retains Poisson data consistency. We validate the framework across diverse biological samples and cellular morphologies. The results show reduced RL noise amplification with improved preservation of weak filamentous and punctate structures under low photon counts.


[35] 2606.25959

SE-AGCNet: An End-to-End Framework for Joint Speech Enhancement and Loudness Control in Meeting Scenarios

Conventional audio pipelines typically treat speech enhancement (SE) and automatic gain control (AGC) as discrete modules, which often limits overall performance. For instance, applying AGC before SE may inadvertently amplify background noise, while prioritizing SE tends to over-suppress low-volume speech. To address these limitations, we propose SE-AGCNet, an end-to-end framework that jointly optimizes SE and AGC. Tailored for meeting scenarios with significant volume variations, SE-AGCNet leverages the synergy between the two tasks: SE preserves quiet speech, thereby facilitating effective volume adjustment by the AGC component. Furthermore, we propose a specialized data simulation pipeline, SE-AGC-DataGen, and incorporate standardized loudness evaluation metrics: integrated loudness (LUFS), short-term loudness (St LUFS), and LRA. Experiments show that SE-AGCNet consistently achieves target loudness while improving speech quality and ASR accuracy over competitive baselines.


[36] 2606.26020

A Simple Numerical Method for Non-Gaussian Signal Ensembles in Nonlinear Power Amplifiers

Beam tracking in vehicular communication systems is inherently challenging due to high mobility and the use of narrow millimeter-wave (mmWave) beams. These challenges are further exacerbated by power amplifier (PA) nonlinearities, which introduce distortion-induced beam pattern deviations, array-gain loss, and non-Gaussian signal distortions. Motivated by the need for analytical tools capable of characterizing such effects, this paper extends Rice characteristic-function (ch. f.) method for the stochastic analysis of signals and noise in memoryless nonlinear systems. The proposed approach represents the nonlinearity using a Fourier series rather than a Fourier transform, transforming the evaluation of output correlation functions from computationally intensive double or triple improper integrals into tractable summations. The resulting framework preserves the generality of the original method, supporting one or more sinusoidal signals and noise processes that are not restricted to Gaussian distributions. A new fundamental ch. f.-based formulation is derived in terms of Fourier-series coefficients and a discrete parameterization of the generalized characteristic function. Numerical results are presented for a nonlinear GaN HEMT transconductance characteristic driven by a sinusoidal signal and Gaussian noise, demonstrating the applicability of the proposed method. The framework provides a computationally efficient tool for analyzing nonlinear RF front-end impairments and their impact on future wireless and vehicular communication systems.


[37] 2506.03211

Channel-adaptive Cross-modal Generative Semantic Communication for Point Cloud Transmission

With the rapid development of autonomous driving and extended reality, efficient transmission of point clouds (PCs) has become increasingly important. In this context, we propose a novel channel-adaptive cross-modal generative semantic communication (SemCom) for PC transmission, called GenSeC-PC. GenSeC-PC employs a semantic encoder that fuses images and point clouds, where images serve as non-transmitted side information. Meanwhile, the decoder is built upon the backbone of PointDif. Such a cross-modal design not only ensures high compression efficiency but also delivers superior reconstruction performance compared to PointDif. Moreover, to ensure robust transmission and reduce system complexity, we design a streamlined and asymmetric channel-adaptive joint semantic-channel coding architecture, where only the encoder needs the feedback of average signal-to-noise ratio (SNR) and available bandwidth. In addition, rectified denoising diffusion implicit models is employed to accelerate the decoding process to the millisecond level, enabling real-time PC communication. Unlike existing methods, GenSeC-PC leverages generative priors to ensure reliable reconstruction even from noisy or incomplete source PCs. More importantly, it supports fully analog transmission, improving compression efficiency by eliminating the need for error-free side information transmission common in prior SemCom approaches. Simulation results confirm the effectiveness of cross-modal semantic extraction and dual-metric guided fine-tuning, highlighting the framework's robustness across diverse conditions, including low SNR, bandwidth limitations, varying numbers of 2D images, and previously unseen objects.


[38] 2606.24039

TurboMPC: Fast, Scalable, and Differentiable Model Predictive Control on the GPU

Robotics increasingly relies on GPUs for parallel simulation, large-scale learning, and neural-network inference. For model predictive control (MPC) to scale with this paradigm, solvers must run efficiently on this hardware while remaining fast, differentiable, and compatible with expressive MPC formulations used in robotics. We present TurboMPC, a differentiable MPC solver that runs entirely on the GPU and supports state and control inequality constraints, implicit integrators, cross-time-coupled costs, and slack variables. TurboMPC combines sequential quadratic programming (SQP), an alternating direction method of multipliers (ADMM) inner solver, implicit differentiation, and a co-designed JAX-CUDA implementation for efficiency and ease of use. In simulation, we validate TurboMPC on constrained planning, humanoid imitation learning, and reinforcement learning with neural-network cost function tasks, achieving up to $15\times$ and $58\times$ speedups over state-of-the-art CPU and GPU differentiable solvers, respectively. We deploy TurboMPC on a full-scale car for minimum-time racing and find that batched, GPU-accelerated tuning of MPC parameters via Bayesian optimization yields significantly faster driving than a hand-tuned baseline. TurboMPC also scales to planning horizons of over $8000$ knot points while maintaining control of the vehicle. We open-source TurboMPC at: this https URL


[39] 2606.24632

Parallel Dynamic Programming for Conic Linear Quadratic Control

Linear Quadratic (LQ) control problems are at the heart of linear control theory and Model Predictive Control (MPC). While performant, standard approaches to solving such problems are inherently serial, limiting real-time scalability despite the parallel computing power available on modern multi-core CPUs. Contributing to addressing this challenge and motivated by ``divide and conquer'' strategies, we present a parallel-in-time approach that solves computationally demanding conic optimal control problems through the use of the alternating direction method of multipliers (ADMM). In particular, we formulate the inner primal update of ADMM as an LQ problem and split the reformulated problem along the time horizon. This enables us to derive a variant of the Riccati recursion using dynamic programming to solve each subproblem in parallel. Numerical benchmarks on two real-world applications demonstrate as much as a 5x speedup compared to existing related approaches on multi-core CPU hardware.


[40] 2606.24912

Velocity Prediction in Automatic Guitar Transcription

Automatic Music Transcription (AMT) models have achieved a high level of success in polyphonic transcription of various instruments. Velocity, typically a measure of note intensity, is less commonly predicted in these models due to the absence of velocity labels in available datasets and lack of a proper definition for instruments other than piano. We present a methodology and model for velocity prediction in Automatic Guitar Transcription (AGT) which uses virtual instruments to generate synthetic training data with velocity labels. We first pretrain a model on this synthetic data. These weights are then transferred to a different model and trained on real guitar audio, allowing the model to retain the working velocity prediction while also achieving high performance and generalisability from the real training data. The velocity prediction is shown to outperform a baseline model which does not use the pretrained velocity weights, when evaluated on synthetic data. In addition, using the pretrained velocity weights offers a small improvement in note transcription, though the magnitude of this improvement is limited and not always significant depending on the testing data. Overall the model achieves results comparable to the state of the art in guitar transcription, while also successfully predicting velocity.


[41] 2606.24947

Supervised Reinforcement Learning for the Coordination of Distributed Energy Resources

The increasing integration of distributed energy resources (DERs) is crucial for power system decarbonization, yet unlocking DERs' flexibility is challenged by their inherent uncertainties and modelling complexity. As traditional optimization methods struggle with such uncertainty and complexity of DERs, reinforcement learning (RL) has emerged as a promising alternative for DER management. However, standard RL methods suffer from sample inefficiency and sub-optimality when trained from scratch. Inspired by the training paradigms in large language models, this paper proposes a Supervised Reinforcement Learning (SRL) framework for learning DER coordination policies. This framework first pre-trains a policy on demonstration data in a supervised-learning fashion, which is then further fine-tuned using RL. Furthermore, we propose a two-step fine-tuning process: offline fine-tuning for enhancing policy performance and online fine-tuning for adapting it to the real-world dynamics. Experiments demonstrate that RL implementations based on the proposed framework significantly outperform all benchmarks, achieving high cost efficiency even under low-quality demonstration data.


[42] 2606.25098

Power-Flexible AI Data Centers: A New Paradigm for Grid-Responsive Compute

The rapid expansion of artificial intelligence (AI) infrastructure is driving unprecedented growth in electricity demand from data centers. Traditional power-system planning treats large computing facilities as inflexible peak loads, leading to costly infrastructure upgrades and long delays in grid interconnection. Recent work has shown that AI clusters can reduce electricity consumption during peak demand through software-based workload orchestration. This article explores how modern GPU-based AI data centers can operate as grid-interactive assets that respond dynamically to power system conditions. We describe an architecture integrating grid signals, workload scheduling, and power telemetry for fine-grained cluster power control. Experimental results from a real-world deployment on a 130 kW GPU cluster demonstrate multiple forms of flexibility, including rapid load reduction, sustained curtailment, and carbon-aware operation while preserving service levels for priority jobs. We further demonstrate performance-aware load shifting across geographically distributed clusters, enabling workloads to migrate toward regions with lower grid stress. Together, these capabilities transform AI infrastructure from static electricity consumers into flexible resources that support grid reliability, accelerate interconnection, and improve computing sustainability.


[43] 2606.25101

Wideband Near-Field Channel Estimation Under Hybrid Compression: Cross-Subcarrier KL Covariance Fitting With OFDM Fresnel Model

We consider wideband channel estimation for extremely large-scale multiple-input multiple-output (XL-MIMO) arrays under hybrid analog-digital compression, in which a uniform linear array (ULA) is observed through far fewer radio-frequency (RF) chains than antennas. At a carrier frequency of 28 GHz with bandwidths reaching several hundred MHz, the standard narrowband polar-domain channel model fails: the near-field Fresnel curvature becomes subcarrier-dependent, and the compressed observation destroys the per-subcarrier spatial covariance structure that narrowband methods exploit. We propose the Wideband Cross-subcarrier Kullback--Leibler (WB-CL-KL) estimator, which jointly estimates angle and range directly from the compressed sample covariance, without full-array reconstruction, by fitting a structured Fresnel covariance model across orthogonal frequency-division multiplexing (OFDM) subcarriers via a cross-subcarrier Kullback--Leibler (KL) divergence criterion. We also derive the wideband compressed-domain Cramér--Rao bound (CRB) -- the performance lower bound for this hybrid architecture -- from the Slepian--Bangs formula, and decompose its gain over the narrowband bound into a data-diversity component of +27.093 dB and a geometric-diversity component of +0.701 dB, totalling +27.793 dB at B = 400 MHz (Propositions 1 and 2). In the single-path line-of-sight regime, WB-CL-KL attains a range root-mean-square error of 19.8 mm against a 19.9 mm bound at signal-to-noise ratio (SNR) = 10 dB, a ratio of 0.996. Under the 3GPP Urban Micro (UMi) path-loss and shadow-fading SNR distribution, it achieves a bound ratio of 0.959 at the median deployment SNR of 9.6 dB, indicating near-CRB operation at the representative deployment point, where the compressed-domain bound is evaluated at the scene-median geometry.


[44] 2606.25112

A Framework for Directed Hypergraph Signal Processing via tensor t-SVD

We introduce Directed Hypergraph Signal Processing (DHGSP), a unified framework that extends graph signal processing to accommodate both higher-order (polyadic) and asymmetric (directional) relationships simultaneously. Using the tensor singular value decomposition (t-SVD) within the t-product algebra, we define a novel adjacency tensor for directed hypergraphs, a topologically faithful shift operator, and a lossless Directed Hypergraph Fourier Transform (t-DHGFT). Experiments on real traffic networks demonstrate that DHGSP outperforms matrix-based (graph and digraph) and undirected tensor-based (hypergraph) baselines in denoising tasks.


[45] 2606.25174

An iterative energy-based multimodal transformer for joint retrieval of wheat soil moisture, leaf area index, and plant height from Sentinel-1 and Sentinel-2 time series

Field-scale retrieval of surface soil moisture (SM), leaf area index (LAI), and plant height (PH) is essential for precision agriculture, yet it remains an ill-posed inverse problem. Concurrent variations in soil moisture and canopy density generate substantial ambiguities in radar backscatter and spectral responses, which reduces the effectiveness of traditional feedforward regression models in heterogeneous smallholder cropping systems. This study presents the Iterative Energy-Based Transformer (iEBT) for the joint retrieval of coupled soil-canopy states from Sentinel-1 C-band SAR and Sentinel-2 multispectral time series. Instead of direct regression, iEBT embeds multi-modal predictors within a shared sequence, produces an initial state estimate, and iteratively updates the target [SM, LAI, PH] vector through normalized gradient descent to minimize a learned scalar compatibility energy function. Using 700 quality-controlled field measurements from Varanasi, India, iEBT achieved the highest learned-model performance on the random test split, with a four-seed mean R^2 of 0.854 \pm 0.012 (R_SM^2 = 0.841, R_LAI^2 = 0.905, R_PH^2 = 0.821). WCM and PROSAIL were retained as physically interpretable SAR and optical reference models for comparison. Modality ablations confirmed that Sentinel-1 drives SM retrieval, while Sentinel-2 dominates LAI, whereas PH relies on combined structural-phenological signatures. Crucially, the model's terminal energy functions as an uncalibrated post-retrieval quality diagnostic; screening the 10% highest-energy samples markedly reduced target level root-mean-square errors. While leave-one-campaign-out validation highlights persistent cross-season domain shift challenges due to localized management variations, compatibility-guided multimodal fusion offers a structured self-diagnostic path toward reliable biophysical parameter estimation


[46] 2606.25281

GaN Power Devices and Converter Architectures for AI Data Centers: Efficiency, Reliability, and Deployment Pathways

The growth of artificial-intelligence workloads is increasing the electrical and thermal demands on data-center power-delivery systems, making conversion efficiency, power density, and reliability critical design priorities. This review examines how gallium-nitride (GaN) power devices can be matched to specific stages of the grid-to-load conversion chain, including power-factor correction, isolated DC/DC conversion, 48-V intermediate-bus conversion, and point-of-load regulation. Si, SiC, and GaN are compared using converter-relevant metrics, and lateral, vertical, and specialized GaN architectures are evaluated in terms of voltage scalability, switching behavior, reverse conduction, thermal pathways, gate control, and technology maturity. The analysis shows that GaN provides a stage-dependent rather than universal advantage. Commercial lateral GaN HEMTs are particularly effective in high-frequency, low-to-mid-voltage stages, while specialized and hybrid devices support bidirectional operation, normally-off control, extreme conversion ratios, and integration. Vertical GaN remains an emerging option for higher-voltage and higher-power conversion. A quantitative framework links cascaded converter efficiency to electrical-loss reduction, cooling demand, annual facility energy use, and operational carbon emissions. Broad deployment further requires low-parasitic packaging, disciplined gate-drive and EMI co-design, mission-profile reliability qualification, scalable manufacturing, and supply-chain resilience. GaN is therefore best treated as a stage-specific system lever whose value depends on coordinated device, topology, package, and thermal co-design.


[47] 2606.25333

Deterministic Non-Smooth Safety via Dual-Algebraic Control Barrier Functions

This paper presents a dual-algebraic framework for control barrier functions (CBFs) that guarantees deterministic execution using exclusively elementary arithmetic. We develop this deterministic approach to solve a fundamental bottleneck in safety-critical control: pointwise minima naturally compose intersecting safe sets, but generate non-smooth boundaries where standard Lie derivatives fail. Existing mathematical workarounds inject approximation bias, probabilistic non-determinism, or combinatorial execution delays that strictly impede hard real-time hardware certification. By embedding the system state and vector field into the dual-number ring, our method extracts both the composite barrier value and its exact directional derivative in a single evaluation. The standard floating-point minimum deterministically isolates a single vertex of the Clarke generalized gradient for the quadratic-program solver. We prove this selected vertex constitutes a valid Clarke subgradient and the resulting simultaneous-enforcement safety filter guarantees forward invariance. The arithmetic overhead remains a fixed constant factor, strictly independent of state dimension or constraint count. We extend this framework to arbitrary $\min$/$\max$ Boolean compositions and systems of higher relative degree, validating the computational scaling on three physical examples.


[48] 2606.25351

Inner and Outer Bounds on the Secrecy Capacity of Degraded Broadcast Channels with RMSI and Transmitter CSI

This paper studies the secrecy capacity of a class of degraded broadcast channels in the presence of an external eavesdropper, where a transmitter aims to deliver two independent confidential messages to two legitimate receivers. The transmitter is assumed to have non-causal access to the channel state information (CSI), and each legitimate receiver possesses prior knowledge of the other receiver's message, referred to as receiver message side information (RMSI). We consider two distinct scenarios: complementary RMSI, where each receiver knows only the other's message, and non-complementary RMSI, where the side information does not perfectly align. For both scenarios, we derive novel inner bounds on the achievable secrecy rate region and present tight outer bounds, establishing the secrecy capacity region for the considered degraded channel settings. Unlike prior works, which primarily address general broadcast settings without secrecy constraints or omit key interactions between RMSI and CSI, our results provide a complete characterization of the secure communication limits under these conditions. Moreover, we extend our analysis to the Gaussian degraded broadcast channel, highlighting the pivotal role of CSI in enhancing secure transmission performance. Our findings demonstrate that the combination of RMSI and CSI can be strategically leveraged to expand the secrecy capacity region, thus offering new insights into secure multiuser communication system design.


[49] 2606.25420

OAMP-Aided Joint Channel Estimation and Data Detection for ODDM Systems

In this work, to address the challenge of joint channel estimation and data detection (JED) for orthogonal delay-Doppler (DD) division multiplexing (ODDM) in doubly selective channels, we propose an orthogonal approximate message passing (OAMP)-aided JED (OAMP-JED) receiver. We first formulate a bilinear cross-domain JED model, which can be linearized into separate channel estimation and data detection subproblems. The proposed OAMP-JED receiver alternately executes two OAMP modules for these subproblems, effectively coupled through a variational noise term to account for model uncertainty. Leveraging OAMP's error orthogonality, we derive closed-form scalar-variance updates to enable efficient and principled soft information exchange between the modules, thereby mitigating error propagation during JED. Simulation results show that, for both uncoded and coded ODDM, OAMP-JED achieves a lower bit error rate (BER) than benchmark schemes. Moreover, its BER performance closely approaches that of OAMP with perfect CSI.


[50] 2606.25454

Delta-Position Estimation-Based IMU Odometry: A Comparison of MLP and Kolmogorov-Arnold Networks

In this study, the learning-based inertial odometry problem is investigated using raw IMU measurements obtained from the EuRoC MAV benchmark dataset. Instead of absolute position regression-a formulation that may lead to large constant errors-the models are trained to estimate the incremental displacement ({\Delta}p) over a fixed 50 ms sliding window, and the full trajectory is reconstructed through numerical integration. A standard Multi-Layer Perceptron (MLP) is compared with a Kolmogorov-Arnold Network (KAN) equipped with learnable B-spline activations. Although KAN has 6.9 times fewer parameters than MLP (8,444 versus 57,859), it produces a 44% lower error in terms of final cumulative drift on the test trajectory (9.61 m versus 17.23 m). In addition, KAN exhibits more stable behavior in terms of long-term error accumulation, with lower P_50 and P_90 cumulative drift values. These findings indicate that learnable B-spline-based activations have the potential to reduce error accumulation in the inertial odometry problem.


[51] 2606.25620

1000 Rallies: An Event-Camera Dataset and Real-Time Learned Ball-State Estimation for Robotic Table Tennis

Robotic table tennis has emerged as a compelling benchmark for real-time robotic perception due to its fast ball dynamics and stringent timing requirements. Accurate, high-frequency, and low-latency ball state estimation is critical for reliable trajectory prediction and timely control. Traditional frame-based cameras face an inherent trade-off: low frame rates leave temporal blind spots that miss fast-moving objects and high frame rates raise data and computational cost. Event cameras instead offer microsecond temporal resolution and, under sufficient illumination, remain largely free of motion blur even at high ball speeds. However, the community lacks large-scale datasets to develop and benchmark event-based perception in realistic sports scenarios. We address this gap by introducing the first large-scale event-camera dataset for table tennis, comprising over 1000 rallies from a diverse group of players ranging from amateurs to elite-level athletes. Each recording captures the event stream alongside 14 synchronized high-speed frame-based cameras at 200 FPS, which we use to produce 1 kHz pseudo ground-truth labels for ball position, velocity, and spin. Building on this dataset, we train a convolutional neural network robust to background player motion that jointly estimates the ball's position and velocity in the image-plane from events. Treating the predicted velocity as an additional measurement in the Kalman filter reduces bounce-point prediction error by 36% relative to a position-only baseline. Finally, we close the perception-action loop by integrating the event-based system with a Stäubli robotic arm, enabling the first real-time human-robot table tennis rallies driven by event-based perception.


[52] 2606.25680

Power-Budgeted Underwater Vehicle Control via Constrained Reinforcement Learning

Underwater vehicles operate from a fixed onboard energy budget that propulsion rapidly depletes, so a controller that completes its task while drawing less thruster power directly extends mission range and endurance. Reinforcement learning yields capable model-free controllers for station-keeping and trajectory tracking, but optimizing task accuracy alone drives the policy toward oscillatory, energy-wasting actuation. The established remedy subtracts an energy penalty from the reward, yet this sets the task-power trade-off through a single weight with no physical units: a target power level cannot be specified, the weight must be re-tuned for every vehicle and task, and a mismatched weight can even raise power. This paper instead formulates energy-efficient underwater control as a constrained Markov decision process in which average thruster power is subject to an explicit budget, solved with a PPO-Lagrangian algorithm. The power level is set by declaring a budget in physical units, and a single dual variable is updated online to meet it for each vehicle and task, without manual weight search. Across three vehicles and four tasks in the MarineGym simulator, the energy-constrained policy draws the least power in all twelve settings, reducing it by 14--65\% (up to 64.9\%) over a task-only baseline and below an energy-reward baseline everywhere, while remaining the smoothest in ten settings and preserving task accuracy except in one deliberately power-limited regime. Imposing energy as an explicit constraint thus offers a tuning-free route to energy-efficient underwater control that needs no per-vehicle, per-task weight search.


[53] 2606.25941

Explainable Control Framework (XCF) based on Fuzzy Model-Agnostic Explanation and LLM Agent-Supported Interface

Increasing demand for precise and reliable control in complex scenarios has led to the development of increasingly sophisticated controllers, including data-driven approaches employing closed box models and mathematically rigorous yet complex designs. This complexity highlights the needs for explainable control that can provide human-understandable insights into controller behavior. In this paper, an explainable control framework (XCF) along with supporting algorithms and user interface are proposed to explain how controllers determine their control actions and their underlying working mechanism. The novel contributions of this work are threefold: First, the XCF is designed to provide model-agnostic explanations for controllers in closed-loop systems and can optionally refine local explanations by system response dynamics. Second, a novel explanation method, hierarchical fuzzy model-agnostic explanation for control systems (HFMAE-C), is proposed based on the designed framework. The HFMAE-C employs a fuzzy logic system to approximate the controller's behavior and system dynamics, providing sample, local, domain and universe level explanations via IF-THEN rules revealing the controller's decision logic and salience values quantifying the contribution of system states to control actions. Third, a large language model agent-supported user interface is developed to automatically analyze user requirements, select appropriate algorithms, interpret the generated explanations to a natural language report, and provide interactive consultation. Case studies on inverted pendulum system and Turtlebot obstacle avoidance demonstrate the effectiveness of the proposed method through simulated user experiments and quantitative comparisons with mainstream explainable control approaches.


[54] 2606.25956

Pulmonary Embolism Risk Stratification from CTPA and Medical Records: Vascular Graphs Are Not All You Need

Risk stratification for pulmonary embolism (PE) is critical for clinical decision-making. Stratification guidelines are based on patient medical records, parameters measured from computed tomography pulmonary angiography (CTPA), and blood tests. However, blood tests are often missing in routine practice. This work studies whether state-of-the-art models can accurately classify risk stratification from only medical records and biomarkers extracted from CTPA images. We benchmark different approaches to combine medical records and cardiac biomarkers with rich pulmonary vascular information; we add vascular biomarkers to tabular models and apply graph neural networks (GNNs) on the vascular tree's intrinsic graph representation. We use a private dataset (n=353) with uniquely complete data for PE risk stratification. Our results show that, among global features, medical records and cardiac biomarkers are the most significant predictors, while vascular biomarkers do not further improve stratification. Even more surprising, even GNNs on vascular graphs fail to outperform strong tabular baseline on global features. We consider hypotheses, on both models and data, that could explain this suboptimal performance. Our investigation suggests that, counter-intuitively, vascular graphs might hold no discriminative information for PE risk stratification. Code is available from this https URL.


[55] 2606.26048

Deep Reinforcement Learning-Enhanced Event-Triggered Data-Driven Predictive Control for a 3D Cable-Driven Soft Robotic Arm

Soft robots are challenging to control due to their nonlinear and time-varying dynamics. Data-enabled predictive control (DeePC) offers a model-free alternative by directly leveraging measured input-output trajectories to construct a predictive controller. However, its receding-horizon formulation requires solving a constrained optimization problem at every sampling instant, which can be computationally demanding for real-time deployment on resource-limited robotic this http URL address this limitation, we propose an adaptive reinforcement-learning-based event-triggered DeePC (RL-ET-DeePC) framework for soft robotic control. A model-free RL policy is trained to determine when to invoke the DeePC optimizer based on the current system state representation, thereby reducing unnecessary optimization calls while preserving closed-loop this http URL results show that RL-ET-DeePC reduces optimization frequency by up to 66% compared to periodic DeePC, while maintaining comparable tracking accuracy. Hardware experiments on a three-dimensional cable-driven soft robotic arm demonstrate zero-shot transfer, achieving a 34% reduction in optimization frequency with tracking accuracy comparable to periodic DeePC and more consistent performance than a static threshold-based event-triggered baseline.


[56] 2606.26083

Real-Time Voice AI Hears but Does Not Listen

Speech conveys information through both words and vocal delivery. We evaluate four leading production realtime voice systems-OpenAI's GPT Realtime 2, Google's Gemini 3.1 Flash Live, and Alibaba's Qwen3.5 Omni Plus and Omni Flash-on tasks where the words and the delivery patterns both convey meaningful information. Across three consequential scenarios, all four systems act on the words rather than the voice. They end calls with crying callers who insist nothing is wrong, approve wire transfers authorized in frightened voices, and enroll callers whose agreement is clearly sarcastic. Surprisingly, this is often not a failure of perception. When asked directly, three of the four systems reliably identify the distress, fear, or sarcasm they later ignore when making decisions. We observe a similar pattern when these realtime voice systems estimate accent and age, as their responses frequently follow the biases of the words rather than the acoustic properties of the speaker. We term this disconnect between perception and action the emotional intelligence gap of voice AI. Prompting systems to explicitly attend to vocal delivery improves performance only partially and inconsistently. Our findings show that current realtime voice AI systems often behave as if speech had been reduced to a transcript, suggesting that they should be used with caution in settings where the tone and emotion of delivery convey important information.


[57] 2305.00384

Dynamic and Robust Sensor Selection Strategies for Wireless Positioning with TOA/RSS Measurement

Emerging wireless applications are requiring ever more accurate location-positioning from sensor measurements. In this paper, we develop sensor selection strategies for 3D wireless positioning based on time of arrival (TOA) and received signal strength (RSS) measurements to handle two distinct scenarios: (i) known approximated target location, for which we conduct dynamic sensor selection to minimize the positioning error; and (ii) unknown approximated target location, in which the worst-case positioning error is minimized via robust sensor selection. We derive expressions for the Cramér-Rao lower bound (CRLB) as a performance metric to quantify the positioning accuracy resulted from selected sensors. For dynamic sensor selection, two greedy selection strategies are proposed, each of which exploits properties revealed in the derived CRLB expressions. These selection strategies are shown to strike an efficient balance between computational complexity and performance suboptimality. For robust sensor selection, we show that the conventional convex relaxation approach leads to instability, and then develop three algorithms based on (i) iterative convex optimization (ICO), (ii) difference of convex functions programming (DCP), and (iii) discrete monotonic optimization (DMO). Each of these strategies exhibits a different tradeoff between computational complexity and optimality guarantee. Simulation results show that the proposed sensor selection strategies provide significant improvements in terms of accuracy and/or complexity compared to existing sensor selection methods.


[58] 2312.11061

Linear Lyapunov Functions for Nonlinear Compartmental Systems

This technical note examines exponential stability of the null solution to a large class of compartmental systems governed by ordinary differential equations. Sufficient conditions under which these systems admit a linear Lyapunov function are provided. The coefficients of the Lyapunov functions and the exponential decay rate they yield are obtained from an eigenvalue problem. For a special case of the system class considered, we derive an equivalence between attractivity of the null solution and the existence of a linear Lyapunov function.


[59] 2410.00208

A Data-Driven Approach To Preserve Safety and Reference Tracking for Constrained Cyber-Physical Systems Under Network Attacks

This paper proposes a worst-case data-driven control architecture capable of ensuring the safety of constrained Cyber-Physical Systems under cyber-attacks while minimizing, whenever possible, potential degradation in tracking performance. To this end, a data-driven robust anomaly detector is designed to detect cyber-attack occurrences. Moreover, an add-on tracking supervisor module allows safe open-loop tracking control operations in case of unreliable measurements. On the plant side, a safety verification module and a local emergency controller are designed to manage severe attack scenarios that cannot be handled on the controller's side. These two modules resort to worst-case reachability and controllability data-driven arguments to detect potential unsafe scenarios and replace, whenever strictly needed, the tracking controller with emergency actions whose objective is to steer the plant's state trajectory in a predefined set of admissible and safe robust control invariant region until an attack-free scenario is restored. The effectiveness of the proposed solution has been shown through a simulation example.


[60] 2501.15373

Safe Learning Control with Optimality and Stability Guarantees

Merely pursuing performance may adversely affect safety, while a conservative policy for safe exploration will degrade the performance. How to guarantee both safety and performance in learning-based control problems is an interesting yet challenging issue. This paper aims to enhance system performance with a safety guarantee by solving reinforcement learning (RL)-based optimal control problems for nonlinear systems subject to high-relative-degree state constraints and unknown time-varying disturbance/actuator faults. A new type of control barrier functions (CBFs), termed high-order reciprocal-based control barrier function, is proposed to handle high-relative-degree constraints, which extends the design of CBFs to enforce robust safety without knowing the disturbance bound. The concept of gradient similarity is proposed to quantify the relationship between safety and performance. Finally, gradient manipulation and adaptive mechanisms are introduced in the model-based safe RL framework to enhance the performance with a safety guarantee. Two simulation examples illustrate the efficacy of the proposed algorithms.


[61] 2507.07920

ArteryX: A Reliable End-to-End Toolbox for Standardized Intracranial Artery Feature Extraction from 3D TOF-MRA

Cerebrovascular research heavily relies on quantitative analysis of intracranial arteries from time-of-flight magnetic resonance angiography, yet existing processing pipelines remain limited by inconsistent artery labeling and a high manual correction burden. We present ArteryX, a toolbox for extracting features that standardizes artery classification across proximal and distal vascular territories. It integrates segmentation handling, isotropic processing, vessel-fused graph construction, and constrained landmark-based classification within a unified artery-specific feature reporting and reproducible workflow. The toolbox extracts morphological, topological, and complexity features including total length, mean radius, volume, surface area, branch count, tortuosity, and fractal dimensionality for standardized artery-segments. Test-and-validation were performed using three complementary datasets: (1)TopBrain-Challenge benchmarking with annotated arteries, (2)synthetic known-reference validation, and (3)exploratory in-vivo cohort of cerebral small vessel disease. In TopBrain analyses, ArteryX with supervised nnUnet segmentation showed minimal bias, while iCafe showed the highest bias and a large limit-of-agreement. ArteryX consistently demonstrated robust downstream quantification performance across segmentation sources (unsupervised/supervised). Agreement analyses showed minimal bias for radius and good sensitivity of extent-dependent metrics throughout the noisier segmentations compared to the state-of-the-art iCafe-toolbox. Furthermore, a stage-wise human-in-the-loop protocol showed lower intervention time than iCafe. In an in-vivo-cohort (48CSVD+, 20CSVD-), ArteryX-derived distal and territory-level features showed group-level differences, not evident with iCafe. To facilitate adoption-and-reproducibility, ArteryX is designed with versioned builds, tutorials, and documentation.


[62] 2507.13637

Towards channel foundation models (CFMs): Motivations, methodologies and opportunities

Artificial intelligence (AI) has emerged as a pivotal enabler for next-generation wireless communication systems. However, conventional AI-based models encounter several limitations, such as heavy reliance on labeled data, limited generalization capability, and task-specific design. To address these challenges, this paper introduces, for the first time, the concept of channel foundation models (CFMs)-a novel and unified framework designed to tackle a wide range of channel-related tasks through a pretrained, universal channel feature extractor. By leveraging advanced AI architectures and self-supervised learning techniques, CFMs are capable of effectively exploiting large-scale unlabeled data without the need for extensive manual annotation. We further analyze the evolution of AI methodologies, from supervised learning and multi-task learning to self-supervised learning, emphasizing the distinct advantages of the latter in facilitating the development of CFMs. Additionally, we provide a comprehensive review of existing studies on self-supervised learning in this domain, categorizing them into generative, discriminative and the combined paradigms. Given that the research on CFMs is still at an early stage, we identify several promising future research directions, focusing on model architecture innovation and the construction of high-quality, diverse channel datasets.


[63] 2509.19092

Data-Free Knowledge Distillation for LiDAR-Aided Beam Tracking in MmWave Systems

We propose a data-free knowledge distillation (DF- KD) framework for LiDAR-aided mmWave beam tracking, where the objective is to predict the optimal current and future beams from a sequence of past LiDAR measurements. Specifically, we propose a knowledge inversion approach where a generator synthesizes LiDAR-like sequences from random noise, using a metadata loss to align the teachers internal feature statistics of synthetic and real data, without access to raw LiDAR samples. The student model is then trained exclusively on the synthetic data using either the Kullback- Leibler (KL) divergence loss or a proposed mean squared error (MSE) loss between the teachers and students raw output logits. Simulation results on the DeepSense dataset demonstrate the effectiveness of the proposed approach. In particular, the proposed convolutional neural network-gated recurrent unit (CNN-GRU) teacher architecture yields superior DF-KD student performance compared to GRU-only alternatives, and the MSE loss achieves performance comparable to the standard KD loss while requiring fewer hyperparamete


[64] 2510.12315

Systematic Constructions of Complementary Sets and Hadamard Matrices from Circulant Operator

A Hadamard matrix $H$ of order $n$ is a square matrix with entries $\pm 1$ satisfying $HH^T = nI_n$, where $I_n$ is the identity matrix of order $n$. A circulant Hadamard matrix is a Hadamard matrix whose rows are cyclic shifts of one another. This work establishes a unified algebraic framework that treats arbitrary Hadamard matrices as flexible seeds to systematically generate Golay complementary sets (GCS), cross Z-complementary sets (CZCS), complete complementary codes (CCC), and optimal cross-Z complementary sequence sets (CZCSS) through algebraic transformations. In this paper, a systematic framework using cyclic operators is presented. First, circulant Hadamard matrices of order 4 are utilized recursively to propose binary CZCS of arbitrary lengths, achieving a maximum ZCZ ratio of 2/3, and binary GCS. Significantly, this framework is generalized to establish that by employing binary or complex Hadamard matrices of any order, binary or non-binary CZCSs of arbitrary lengths can be constructed with a ZCZ ratio of 1/2. Furthermore, to provide flexible user capacity, an alternative construction of binary GCS of all lengths and Hadamard matrices of order $2^{a+1} 10^b 26^c$ ($a, b, c \geq 0$) is proposed using circulant matrices and Golay complementary pairs (GCP). These constructions are further extended to form binary CCC with parameters $(2N, 2N, 2N)$, where $N=2^a 10^b 26^c$, and $(4n, 4n, 4n)$ for $n \geq 1$. Additionally, optimal binary $(8n, 8n, 8n, 4n)$-CZCSS and their complex versions with parameters $(2m, 2m, 2m, m)$ are proposed for $n, m \geq 1$. These results provide the first generalized framework for constructing optimal CZCSS from arbitrary Hadamard seeds. Finally, a theoretical relation between Hadamard matrices and GCSs is established, and fundamental properties of circulant matrices over aperiodic correlation functions are presented.


[65] 2510.15759

On the Impact of Electromagnetic Interference and Inter-RIS Reflections in Indoor Factory Local 6G Networks

The Sixth Generation (6G) radio technology is expected to include local 6G networks as a special use case, extending the capabilities of `generic' 6G networks towards more demanding performance requirements. Reconfigurable intelligent surfaces (RISs) offer a novel paradigm for next-generation wireless communications, especially in the context of local 6G networks, enabling advanced signal propagation control through intelligent phase-shift configurations. However, in practical deployments, their performance can be adversely affected by electromagnetic interference (EMI) from external sources and inter-RIS reflections (IRR) caused by signal reflections between multiple colocated RIS units. This paper presents a comprehensive analysis of the joint impact of EMI and IRR in a multi-RIS multi-cell system deployed within an indoor factory environment. A detailed evaluation study is first carried out to investigate their impact on system performance. System-level simulations demonstrate that the joint impact of EMI and IRR degrades system performance more significantly than their individual effects, particularly as RIS dimensions and transmit power increase. To address these adverse effects, an alternate optimization algorithm using the Riemannian conjugate gradient method is then proposed. The novel algorithm optimizes the phase shifts of the RIS elements considering the spatial correlation among their associated channels, and is found to provide up to several orders of magnitude gains in terms of the system sum rate and the outage probability.


[66] 2510.27217

Joint Visible Light and Backscatter Communications for Proximity-Based Indoor Asset Tracking Enabled by Energy-Neutral Devices

In next-generation wireless systems, providing location-based mobile computing services for energy-neutral devices has become a crucial objective for the provision of sustainable Internet of Things (IoT). Visible light positioning (VLP) has gained great research attention as a complementary method to radio frequency (RF) solutions since it can leverage ubiquitous lighting infrastructure. However, conventional VLP receivers often rely on photodetectors or cameras that are power-hungry, complex, and expensive. To address this challenge, we propose a hybrid indoor asset tracking system that integrates visible light communication (VLC) and backscatter communication (BC) within a simultaneous lightwave information and power transfer (SLIPT) framework. We design a low-complexity and energy-neutral IoT node, namely backscatter device (BD) which harvests energy from light-emitting diode (LED) access points, and then modulates and reflects ambient RF carriers to indicate its location within particular VLC cells. We present a multi-cell VLC deployment with frequency division multiplexing (FDM) method that mitigates interference among LED access points by assigning them distinct frequency pairs based on a four-color map scheduling principle. We develop a lightweight particle filter (PF) tracking algorithm at an edge RF reader, where the fusion of proximity reports and the received backscatter signal strength are employed to track the BD. Experimental results show that this approach achieves the positioning error of 0.318 m at 50th percentile and 0.634 m at 90th percentile, while avoiding the use of complex photodetectors and active RF synthesizing components at the energy-neutral IoT node. By demonstrating robust performance in multiple indoor trajectories, the proposed solution enables scalable, cost-effective, and energy-neutral indoor tracking for pervasive and edge-assisted IoT applications.


[67] 2511.07802

Deep-Learning-based Frequency-Domain Watermarking for Energy System Time Series Data Asset Protection

Data has been regarded as a valuable asset with the fast development of artificial intelligence technologies. In this paper, we introduce deep-learning neural network-based frequency-domain watermarking for protecting energy system time series data assets and secure data authenticity when being shared or traded across communities. First, the concept and desired watermarking characteristics are introduced. Second, a deep-learning neural network-based watermarking model with specially designed loss functions and network structure is proposed to embed watermarks into the original dataset. Third, a frequency-domain data preprocessing method is proposed to eliminate the frequency bias of neural networks when learning time series datasets to enhance the model performances. Last, a comprehensive watermarking performance evaluation framework is designed for measuring its invisibility, restorability, robustness, secrecy, false-positive detection, generalization, and capacity. Case studies based on practical load and photovoltaic time series datasets demonstrate the effectiveness of the proposed method.


[68] 2512.21314

A Lyapunov-Based Small-Gain Theorem for Fixed-Time ISS: Theory, Optimization, and Games

We develop a Lyapunov-based small-gain theorem for establishing fixed-time input-to-state stability (FxT-ISS) guarantees in interconnected nonlinear dynamical systems. The proposed framework considers interconnections in which each subsystem admits a FxT-ISS Lyapunov function, providing robustness with respect to external inputs. We show that, under an appropriate nonlinear small-gain condition, the overall interconnected system inherits the FxT-ISS property. In this sense, the proposed result complements existing Lyapunov-based smallgain theorems for asymptotic and finite-time stability, and enables a systematic analysis of interconnection structures exhibiting fixed-time stability. To illustrate the applicability of the theory, we study feedback-based optimization problems with time-varying cost functions, and Nash-equilibrium seeking for noncooperative games with nonlinear dynamical plants in the loop. For both problems, we present a class of non-smooth gradient or pseudogradient-based controllers that achieve fixed-time convergence without requiring time-scale separation and using real-time feedback. Numerical examples are provided to validate the theoretical findings.


[69] 2512.24755

Asymmetry-Aware Routing for Industrial Multimodal Monitoring: A Diagnostic Framework

Multimodal fusion is the default approach for combining heterogeneous sensor streams in industrial monitoring, yet no systematic method exists for determining \textit{when fusion degrades rather than improves} detection performance. We present an \textbf{Asymmetry-Aware Routing Framework} -- a three-step diagnostic procedure (unimodal performance gap, gate weight attribution, modality corruption testing) with formal decision criteria -- that routes multimodal systems toward the appropriate fusion strategy before deployment. We validate the framework on three datasets spanning two routing outcomes: (1)~the OHT/AGV industrial dataset (thermal + sensors, 13{,}121 samples), where the framework correctly identifies severe asymmetry (gap ratio 3.1$\times$) and recommends \textsc{cascade}; (2)~a chain conveyor fault detection scenario (audio + vibration), where moderate asymmetry leads to a \textsc{fuse} recommendation with positive fusion benefit; and (3)~the CWRU bearing dataset, providing controlled validation in both directions. Threshold sensitivity analysis across all three datasets shows that the framework's recommendations are robust to threshold perturbation, with correct routing maintained over a wide parameter plateau. Comparison against simpler diagnostics (gap ratio alone) reveals that Step~1 alone is ambiguous for moderate-asymmetry cases, demonstrating the necessity of the full protocol for reliable routing decisions.


[70] 2602.20359

StochasticBarrier.jl: A Toolbox for Stochastic Barrier Function Synthesis

We present this http URL, an open-source Julia-based toolbox for generating Stochastic Barrier Functions (SBFs) for safety verification of discrete-time stochastic systems with additive Gaussian noise. this http URL certifies linear, polynomial, and piecewise affine (PWA) systems. The latter enables verification for a wide range of system dynamics, including general nonlinear types. The toolbox implements a Sum-of-Squares (SOS) optimization approach, as well as methods based on piecewise constant (PWC) functions. For SOS-based SBFs, this http URL leverages semi-definite programming solvers, while for PWC SBFs, it offers three engines: two using linear programming (LP) and one based on gradient descent (GD). Benchmarking this http URL against the state-of-the-art shows that the tool outperforms existing tools in computation time, safety probability bounds, and scalability across over 30 case studies. Compared to its closest competitor, this http URL is up to four orders of magnitude faster, achieves significant safety probability improvements, and supports higher-dimensional systems.


[71] 2603.10371

Speech Codec Probing from Semantic and Phonetic Perspectives

Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. Speech tokenizers are expected to preserve both semantic and acoustic information for downstream understanding and generation tasks. However, emerging evidence suggests that the term "semantic" in speech processing does not align with linguistic lexical-semantic, leading to a mismatch between speech and text modality. In this paper, we systematically analyze the information encoded by several widely used speech tokenizers, evaluating their lexical-semantic and phonetic content through three tasks. Our results show that current tokenizers primarily capture phonetic rather than lexical-semantic structure, deriving practical implications for the design of next-generation speech tokenization methods. Code is released to public at this https URL.


[72] 2603.25645

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation masks, and 133,000 words of clinical descriptions. We utilize Colon-Bench to rigorously evaluate state-of-the-art MLLMs across lesion classification, Open-Vocabulary Video Object Segmentation (OV-VOS), and video Visual Question Answering (VQA). The MLLM results demonstrate surprisingly high localization performance in medical domains compared to SAM-3. Finally, we analyze common VQA errors from MLLMs to introduce a novel "colon-skill" prompting strategy, improving zero-shot MLLM performance by up to 9.7% across most MLLMs. The dataset and the code are available at this https URL .


[73] 2603.28011

Learning Certified Neural Network Controllers Using Contraction and Interval Analysis

We present a novel framework that jointly trains a neural network controller and a neural Riemannian metric with rigorous closed-loop contraction guarantees using formal bound propagation. Directly bounding the symmetric Riemannian contraction linear matrix inequality causes unnecessary overconservativeness due to poor dependency management. Instead, we analyze an asymmetric matrix function $G$, where $2^n$ GPU-parallelized corner checks of its interval hull verify that an entire interval subset $X$ is a contraction region in a single shot. This eliminates the sample complexity problems encountered with previous Lipschitz-based guarantees. Additionally, for control-affine systems under a Killing field assumption, our method produces an explicit tracking controller capable of exponentially stabilizing any dynamically feasible trajectory using just two forward inferences of the learned policy. Using JAX and \immrax{} for linear bound propagation, we apply this approach to a full 10-state quadrotor model. In $<$10 minutes of post-JIT training, we simultaneously learn a control policy $\pi$, a neural contraction metric $\Theta$, and a verified 10-dimensional contraction region $X$.


[74] 2604.00287

From Net Load Modifiers to Firm Capacity: The Role of Distributed Energy Resources in Resource Adequacy

Distributed energy resources (DERs) such as rooftop solar, batteries, demand response, and electric vehicles can contribute to power system reliability, yet their performance is difficult to translate into firm resource adequacy (RA) capacity across jurisdictions. Existing analyses often locate this difficulty within individual technical requirements, such as metering, accreditation, or dispatch performance, but give less attention to how constraints at one stage carry over to the next. This review traces the RA participation pathway through five stages: load forecasting, registration and classification, metering and verification, capacity accreditation, and performance obligations. We synthesize literature, tariffs, market manuals, and regulatory documents from California, PJM, ISO-NE, Great Britain, and Ireland, spanning U.S. capacity markets and European capacity remuneration mechanisms. Across these frameworks, similar barriers recur despite different procurement models and regulatory structures, indicating that participation is constrained by cross-stage design, not jurisdiction-specific rules alone. We identify three cross-stage couplings through which capacity value is lost between stages: mismatches between resource classification and operational obligations, weak links between verification evidence and accreditation, and temporal misalignment between planning forecasts and scarcity-hour performance. The central finding is that compliance architecture, not DER technology alone, is often the binding constraint on translating DER capability into firm RA contributions. This points to reforms that codify cross-stage information handoffs, tie accreditation to auditable verification evidence, and refresh capacity values as deployment changes system conditions. Rather than adjusting individual stages in isolation, RA reform should redesign the participation pathway end-to-end.


[75] 2605.10078

Scalable Design of Attack-Resilient Controllers for Positive Systems

This paper proposes a framework for secure and resilient controller design for positive systems against cyber-attacks. In particular, we consider a network-controlled system where an adversary injects false data into the actuator channels to increase the control cost (performance measure) while penalizing the attack effort and subject to state-dependent constraints. Using a minimax formulation, we analyze the worst-case performance loss caused by such adversaries, which is given by the solution of a difference equation, and an algebraic equation when the time horizon is infinite. We show that the optimal attack policy, among possible nonlinear policies, is linear. Despite the lack of explicit stealthiness constraints, we also show that when the measured output has an unstable zero which is not an unstable zero of the performance measure, the attacks can induce unbounded performance degradation. The proposed framework is also extended to systems with model uncertainty. Numerical examples illustrate the results and demonstrate how tools from positive systems and linear regulator theory can be used to mitigate cyber-attacks with low computational effort.


[76] 2606.07803

Stable but Unsafe: Agent-Driven Cyber-Physical Systems Under Gain Manipulation Attacks

AI agents are increasingly being connected to Cyber-Physical Systems (CPS) to generate or modify control-relevant parameters at runtime, including feedback gains, cost weights, and reference signals. These updates create a parameter channel: a pathway between the agent and the controller that is structurally distinct from classical sensor and actuator channels. Among the parameters carried by this channel, feedback gains are especially high-leverage: under linear state feedback, a single gain matrix determines closed-loop eigenvalue placement for the entire system. Consequently, malicious gain updates can reshape the closed-loop dynamics without producing the signal-level inconsistencies targeted by residual-based monitors. We formalize this attack surface through a three-axis attacker model and a taxonomy of Gain Manipulation Attacks (GMA). Two impact classes are identified: stability-margin erosion under sustained gain drift and transient amplification under one-shot gain replacement. We demonstrate that an attacker can drive the system past its safe physical operating limits while maintaining mathematical stability, proving that stability verification alone is insufficient to bound the physical impact. Using Bauer--Fike eigenvalue bounds and the Kreiss matrix theorem, we derive exact stealthiness conditions and worst-case impact certificates for each class. Finally, we propose preliminary detection directions and validate our framework through a vehicle lateral dynamics case study.


[77] 2606.12223

Characterization of Speech Imagery in Scalp EEG and Comparison with Motor Imagery

Speech imagery is attractive as a brain-computer interface paradigm for communication because it is endogenous and intrinsically linguistic. Yet despite growing interest, its dominant scalp-EEG spatiotemporal characteristics remain poorly characterized. Here, we asked how speech imagery appears in scalp EEG and compared it against finger motor imagery. Using a within-subject dataset containing speech imagery, finger motor imagery, and no-task trials recorded under the same trial structure, we analyzed band-power dynamics across channels and time. Finger motor imagery showed the expected contralateral mu/alpha and low-beta desynchronization over sensorimotor areas, whereas speech imagery showed a weaker, more distributed alpha-dominant increase. After normalization to each condition's own post-trial interval, the speech-related alpha increase changed only modestly after cue onset, indicating that much of the speech-versus-no-task difference was already present during the instruction period. A classifier discriminating imagery from no-task reached mean balanced accuracies of 0.563 $\pm$ 0.072 for speech imagery and 0.718 $\pm$ 0.127 for motor imagery, with a stronger alpha/beta dependence for motor imagery than for speech imagery. Together, these results provide a clearer group-level characterization of speech imagery in scalp EEG and indicate that its dominant spatiotemporal pattern differs from that of finger motor imagery and is more consistent with substantial non-articulatory task-related contributions than with a clear articulatory-motor analogue.


[78] 2606.14223

Toward Deeper Environmental Understanding: Event-Level Sensing for Intelligent 6G ISAC

The intelligent evolution of mission-critical networks, such as the Internet of vehicles (IoV) and the low-altitude economy (LAE), requires sixth-generation (6G) networks to move beyond discrete physical parameter estimation toward deeper environmental understanding. However, existing integrated sensing and communications (ISAC) studies mainly focus on target-level sensing, which provides fragmented snapshots of the physical world and lacks the behavioral semantic capability to interpret intent. This limitation hinders the intelligent evolution of such networks and prevents 6G from acquiring the essential sensing foundation to evolve into an "intelligent service engine". To bridge this gap, ISAC must advance toward event-level sensing, which models continuous-time states to enable persistent recognition and prediction of target intent and behavioral semantics. This article presents a comprehensive overview of event-level sensing in 6G ISAC networks. We first introduce its fundamental concepts, sensing types, and representative scenarios. We then review key enabling techniques across waveform design, target state estimation and tracking, and event recognition. Furthermore, focusing on IoV and LAE scenarios, we discuss representative applications of ISAC event-level sensing and the intelligent enhancement of downstream operational functions enabled by event-level information. Finally, we highlight future research trends and potential directions to further advance ISAC event-level sensing toward intelligent and proactive 6G networks.


[79] 2606.17699

Joint Synchronization and Radar Parameter Estimation for OFDM-based DISAC Systems

We propose a novel approach to the synchronization paradigm in distributed ISAC (DISAC) systems in doubly-dispersive (DD) channel environments via a joint synchronization and radar parameter estimation framework. The proposed method exploits the structure of the system model, which can be linearized in order to apply a bivariate Gaussian belief propagation (GaBP) algorithm that jointly estimates the time offset (TO) and carrier frequency offset (CFO) of each base station (BS), as well as the delay and Doppler parameters of the DD channel in conventional orthogonal frequency division multiplexing (OFDM) systems. Simulation results demonstrate the effectiveness of the proposed algorithm, showing that the radar parameter estimates (i.e., range and velocity) and synchronization parameter estimates (i.e., TO and CFO) approach the Cramér Rao lower bound (CRLB) even at moderate-to-high signal-to-noise ratio (SNR) regimes.


[80] 2606.19157

IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages

AudioLLMs enable speech recognition conditioned on textual prompts such as domain descriptions or entity lists. However, it remains unclear whether these models genuinely utilise such context or rely on parametric knowledge learned during pretraining. Existing benchmarks cannot answer this question because they evaluate transcription under fixed prompting conditions and rarely include explicit contextual inputs. We introduce IndicContextEval, a 56-hour multilingual benchmark of natural speech from 555 speakers across 8 Indian languages and 23 professional domains. We design a 7-level prompting framework that progressively introduces contextual signals, including metadata, natural-language descriptions, entity lists in English and native script, and adversarial prompts with incorrect entities. Evaluating five models reveals substantial differences in context utilisation behaviour, highlighting the need for explicit evaluation of contextual grounding in AudioLLMs.


[81] 2606.23711

Optical Ground Stations for Space Communications:Systems Engineering, Availability, and Service Economics Through 2030

Optical ground stations (OGSs) are becoming networked infrastructure for high-rate space-to-Earth communications, but their adoption is governed by service availability and utilization as much as by optical line rate. This paper develops a systems-engineering and service-economics assessment of the OGS sector as of June~2026. The analysis combines public flight demonstrations and operational records with scalar link-budget, availability, and cost-normalization models. Public benchmarks span 25 Mbps from interplanetary range, 260 Mbps-class lunar links, 1.2 Gbps-class ISS relay, 1.8 Gbps operational GEO relay, 120 Gbps-class direct-to-ground demonstrations in China, and 200 Gbps from LEO in NASA's TBIRD mission. The resulting conclusion is that the bottleneck has shifted from peak line rate to repeatable service under weather, acquisition, scheduling, and operations constraints. Under one explicit planning normalization - a 10 Gbps near-Earth station, annualized cost of \$2 million/year, scheduled pre-weather optical contact time of 0.5 h/day, and weather-inclusive combined efficiency $\eta=0.7$ - the fixed-cost component is of order $(3-4)\times10^{3}$ USD/TB. This number is a sensitivity anchor, not a tariff forecast; the controlling variables are duty factor, effective weather diversity, shared-network loading, and service-level allocation. The public industrial evidence is best interpreted as a stratified value chain, not as a single vendor ranking. The defensible 2030 baseline is hybrid optical+radio-frequency (RF): optical for throughput, relay, and spectrum relief; RF for continuity, contingency, and assured command paths.


[82] 2606.24476

WiWorld-RealData: A Real-World Multi-Modal Dataset for 6G Wireless World Models

Wireless world models aim to represent, predict, and reason about wireless propagation by jointly understanding physical environments and channel responses. Realizing such models in sixth-generation (6G) digital twin channels requires datasets that capture measured wireless responses and environment states under real-world propagation conditions. This paper presents WiWorld-RealData, a real-world outdoor multi-band channel and multi-modal sensing dataset collected along campus mobile routes. WiWorld-RealData provides measured channel impulse responses (CIRs) at 3.7 GHz and 6.775 GHz, together with multi-view images, panoramic images, light detection and ranging (LiDAR) point clouds, millimeter-wave (mmWave) radar records, and global navigation satellite system (GNSS) trajectories. Through unified file organization and metadata manifests, the dataset establishes sample-level correspondences among channel responses, environment observations, timestamps, route information, antenna configurations, and quality flags. The overall measurement campaign has produced 10 TB-level multi-modal field data. The current public release provides one representative dual-band route at 3.7 GHz and 6.775 GHz with complete channel-environment alignment, while the acquisition framework supports extension to more frequency bands and scenarios. A case study on environment-assisted path-loss prediction achieves a mean absolute error (MAE) of 2.02 dB and a root mean squared error (RMSE) of 2.69 dB, indicating that the aligned environment observations contain predictive information for channel variations. The dataset is available at this https URL, and a ScienceDB mirror will be provided upon release.


[83] 2312.00206

SparseGS: Sparse View Synthesis using 3D Gaussian Splatting

3D Gaussian Splatting (3DGS) has recently enabled real-time rendering of unbounded 3D scenes for novel view synthesis. However, this technique requires dense training views to accurately reconstruct 3D geometry. A limited number of input views will significantly degrade reconstruction quality, resulting in artifacts such as "floaters" and "background collapse" at unseen viewpoints. In this work, we introduce SparseGS, an efficient training pipeline designed to address the limitations of 3DGS in scenarios with sparse training views. SparseGS incorporates depth priors, novel depth rendering techniques, and a pruning heuristic to mitigate floater artifacts, alongside an Unseen Viewpoint Regularization module to alleviate background collapses. Our extensive evaluations on the Mip-NeRF360, LLFF, and DTU datasets demonstrate that SparseGS achieves high-quality reconstruction in both unbounded and forward-facing scenarios, with as few as 12 and 3 input images, respectively, while maintaining fast training and real-time rendering capabilities.


[84] 2404.04355

Gray-Box Nonlinear Feedback Optimization

Feedback optimization enables autonomous optimality seeking of a dynamical system through its closed-loop interconnection with iterative optimization algorithms. Among various iteration structures, model-based approaches require the input-output sensitivity matrix of the system to construct gradients, whereas model-free approaches eliminate this need by estimating gradients from real-time objective evaluations. These approaches offer complementary benefits in sample efficiency and accuracy against model mismatch, i.e., sensitivity errors. To achieve balanced closed-loop performance, we propose a gray-box feedback optimization controller, featuring systematic incorporation of approximate sensitivities into model-free updates via a tunable convex combination. We provide unified performance characterizations covering different approaches. We elucidate how cumulative sensitivity errors (model-based) and variances due to stochastic exploration (model-free) shape the closed-loop behavior and induce a trade-off between iteration and dimensional dependence. The proposed controller retains sample efficiency and provable (local) optimality for nonconvex problems despite inaccurate sensitivities. We further develop and characterize a running gray-box controller that handles constrained time-varying problems with changing objectives and steady-state input-output maps.


[85] 2408.01273

Certified Robust Invariant Polytope Training in Neural Controlled ODEs

We propose a framework for training neural network controllers with certified robust forward invariant polytopes. First, we parameterize a family of lifted control systems in a higher dimensional space, where the original neural controlled system evolves on an invariant subspace of each lifted system. We use interval analysis and neural network verifiers to further construct a family of lifted embedding systems, carefully capturing the knowledge of this invariant subspace. If the vector field of any lifted embedding system satisfies a sign constraint at a single point, then a certain convex polytope of the original system is robustly forward invariant. Treating the neural network controller and the lifted system parameters as variables, we propose an algorithm to train controllers with certified forward invariant polytopes in the closed-loop control system. Through two examples, we demonstrate how the simplicity of the sign constraint allows our approach to scale with system dimension to over $50$ states, and outperform state-of-the-art Lyapunov-based sampling approaches in runtime.


[86] 2409.19071

Analog fast Fourier transforms for scalable and efficient signal processing

Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing -- such as by artificial intelligence (AI) algorithms -- and for transmission over communication networks. Analog in-memory computing has been shown to be a fast, energy-efficient, and scalable solution for processing edge AI workloads, but not for Fourier transforms. This is because of the existence of the fast Fourier transform (FFT) algorithm, which enormously reduces the complexity of the DFT but has so far belonged only to digital processors. Here, we show that the FFT can be mapped to analog in-memory computing systems, enabling them to efficiently scale to arbitrarily large Fourier transforms without requiring large sizes or large numbers of non-volatile memory arrays. We experimentally demonstrate analog FFTs on 1D audio and 2D image signals, performing analog computations on up to 524K charge-trapping memory devices simultaneously, where each device has precisely tunable, low-conductance analog states. The scalability of both the new analog FFT approach and the charge-trapping memory device is leveraged to compute a 65,536-point analog DFT, a scale that is otherwise inaccessible by analog systems and which is $>$500$\times$ larger than any previous analog DFT demonstration. Analog FFT cores can provide higher energy efficiency and performance per area than specialized digital FFT processors at all FFT sizes, while also functioning as efficient matrix multiplication engines for AI workloads.


[87] 2411.15490

Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

Acute ischemic stroke (AIS) requires time-critical decision-making, where inaccurate interpretation of neuroimaging findings can lead to irreversible disability. Diffusion-weighted imaging (DWI) and apparent diffusion coefficient (ADC) maps from magnetic resonance imaging (MRI) are central to detecting acute infarction, yet generating factually reliable radiology reports directly from 3D MRI remains challenging due to the difficulty of learning robust cross-modal alignments between volumetric images and clinical text. We propose paired image-domain retrieval and text-domain augmentation (PIRTA), a retrieval-augmented generation framework that improves report factuality by avoiding explicit image-text alignment. PIRTA retrieves clinically similar 3D DWI/ADC volumes using a pretrained 3D vision encoder and leverages their paired clinician-authored reports to ground large language model (LLM)-based report generation. Experiments on multi-institutional in-house data, a held-out external privacy-preserving cohort, and the public ISLES benchmark demonstrate that PIRTA achieves strong image-domain retrieval performance and consistently improves ischemic-territory accuracy, a clinically grounded surrogate for report factuality, compared to direct image-to-text baselines. These results indicate that retrieval-grounded generation provides a scalable and reliable paradigm for producing factually consistent radiology reports from complex 3D brain MRI. Source code is available at this https URL.


[88] 2505.03719

Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach

In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms to solve this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, under additional assumptions, we establish linear convergence rates and derive significantly lower communication and computational complexity bounds than those of existing algorithms. Several numerical experiments validate our theoretical analysis and demonstrate the practical superiority of the proposed algorithms.


[89] 2510.04584

Robustness assessment of large audio language models in multiple-choice evaluation

Recent advances in large audio language models (LALMs) have primarily been assessed using a multiple-choice question answering (MCQA) framework. However, subtle changes, such as shifting the order of choices, result in substantially different results. Existing MCQA frameworks do not account for this variability and report a single accuracy number per benchmark or category. We dive into the MCQA evaluation framework and conduct a systematic study spanning three benchmarks (MMAU, MMAR and MMSU) and four models: Audio Flamingo 2, Audio Flamingo 3, Qwen2.5-Omni-7B-Instruct, and Kimi-Audio-7B-Instruct. Our findings indicate that models are sensitive not only to the ordering of choices, but also to the paraphrasing of the question and the choices. Finally, we propose a simpler evaluation protocol and metric that account for subtle variations and provide a more detailed evaluation report of LALMs within the MCQA framework.


[90] 2510.22022

Control of neural field equations with step-function inputs

Wilson-Cowan and Amari-type models capture nonlinear neural population dynamics, providing a fundamental framework for modeling how sensory and other exogenous inputs shape activity in neural tissue. We study the controllability properties of Amari-type neural fields subject to piecewise/constant-in-time inputs. The model describes the time evolution of the polarization of neural tissue within a spatial continuum, with synaptic interactions represented by a convolution kernel. We study the synthesis of piecewise/constant-in-time inputs to achieve two-point boundary-type control objectives, namely, steering neural activity from an initial state to a prescribed target state. This approach is particularly relevant for predicting the emergence of paradoxical neural representations, such as discordant visual illusions that occur in response to overt sensory stimuli. We first present a control synthesis based on the Banach fixed-point theorem, which yields an iterative construction of a constant-in-time input under minimal regularity assumptions on the kernel and transfer function; however, it exhibits practical limitations, even in the linear case. To overcome these challenges, we then develop a generic synthesis framework based on the flow of neural dynamics drift, enabling explicit piecewise constant and constant-in-time inputs. Extensive numerical results in one and two spatial dimensions confirm the effectiveness of the proposed syntheses and demonstrate their superior performance compared to inputs derived from naive linearization at the initial or target states when these states are not equilibria of the drift dynamics. By providing a mathematically rigorous framework for controlling Amari-type neural fields, this work advances our understanding of nonlinear neural population control with potential applications in computational neuroscience, psychophysics, and neurostimulation.


[91] 2512.05337

Symmetric Linear Dynamical Systems are Learnable from Few Observations

We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce and analyze a new estimator that achieves a small maximum element-wise error on the recovery of symmetric dynamic matrices using only $T=\mathcal{O}(\log N)$ observations, irrespective of whether the matrix is sparse or dense. This estimator is based on the method of moments and does not rely on problem-specific regularization. This is especially important for applications such as structure discovery.


[92] 2603.22225

Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease

The limited availability of dysarthric speech data makes cross-lingual detection an important but challenging problem. A key difficulty is that speech representations often encode language-dependent structure that can confound dysarthria detection. We propose a representation-level language shift (LS) that aligns source-language self-supervised speech representations with the target-language distribution using centroid-based vector adaptation estimated from healthy-control speech. We evaluate the approach on oral DDK recordings from Parkinson's disease speech datasets in Czech, German, and Spanish under both cross-lingual and multilingual settings. LS substantially improves sensitivity and F1 in cross-lingual settings, while yielding smaller but consistent gains in multilingual settings. Representation analysis further shows that LS reduces language identity in the embedding space, supporting the interpretation that LS removes language-dependent structure.


[93] 2604.00748

Optimal Sampling and Actuation for Real-Time Monitoring of Markov Sources

This paper studies efficient data management and timely information dissemination for real-time monitoring of an N-state Markov process, with the objective of enabling accurate state estimation and reliable actuation decisions. We analyze the real-time reconstruction error and the Age of Incorrect Information (AoII), and derive closed-form expressions for their time-averaged values under several sampling and transmission policies. We then formulate and solve constrained optimization problems to minimize the time-averaged reconstruction error and the average AoII under a time-averaged sampling frequency constraint. The resulting optimal sampling and transmission policies are compared to identify the conditions under which each policy is most effective. We further show that directly using the reconstructed state for actuation can degrade system performance, especially when the receiver is uncertain about the state estimate or when actuation is costly. These findings reveal that accurate state estimation alone does not necessarily lead to effective actuation, highlighting the importance of incorporating uncertainty into the decision-making process. To address this issue, we introduce a cost function, termed the Cost of Actions under Uncertainty (CoAU), which characterizes correct and incorrect actuation decisions under receiver-side uncertainty. We propose a randomized actuation policy and derive a closed-form expression for the probability of a correct actuation decision, defined as the event in which the CoAU equals zero. Finally, we formulate an optimization problem to find the optimal randomized actuation policy that maximizes this probability. The results show that the resulting policy substantially reduces incorrect actuator actions.


[94] 2606.05367

Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech

We investigate whether task-vector arithmetic, successful for cross-speaker emotional intensity control in modular text-to-speech (TTS), transfers to large-scale TTS systems built on language-model backbones with in-context learning (LM-TTS). Through a systematic elimination study over four progressively narrower operands on Qwen3-TTS-12Hz-1.7B - model weights via LoRA fine-tuning, continuous codec embeddings, discrete codec tokens, and the speaker embedding (x-vector) produced by an ECAPA-TDNN encoder jointly trained with the synthesis backbone - we localize the dominant carrier of emotional prosody to the x-vector. Building on this finding, we propose a training-free method based on centroid arithmetic in x-vector space: an emotion direction $\tau = \mathbb{E}_i[x(s_i,\text{emo})] -\mathbb{E}_i[x(s_i,\text{neutral})]$ applied to an unseen target speaker as $x_{\text{new}} = x(\text{target},\text{neutral}) + \alpha\cdot\tau$. Using ESD (English) as the $\tau$ source and emoUERJ (Brazilian Portuguese) as a cross-lingual ground-truth target, we observe average gains of $+0.29$ in emotion2vec cosine over the ICL baseline on English held-out speakers and $+0.09$ on Brazilian Portuguese held-out speakers, while largely preserving identity (WavLM SECS $\gtrsim 0.88$ for the multi-speaker $\tau$ variant) and intelligibility (WER $\approx 0$ in PT-BR). These results offer initial evidence that the dominant carrier of emotional prosody in this class of models is localizable, by elimination, to the co-trained speaker embedding, where training-free centroid arithmetic remains effective even under cross-lingual transfer.


[95] 2606.15834

AIChilles: Automatically Uncovering Hidden Weaknesses in AI-Evolved Systems

The computer systems community has recently seen growing interest in AI-driven system evolution, where AI agents iteratively rewrite systems. Frameworks such as AdaEvolve and Engram report 12-60% score improvements over human-designed algorithms. While these results are promising, there are practical concerns if these AI-evolved programs can perform worse on unseen workloads and exhibit scalability regressions. Given the speed and scale of AI-generated code, we need automated mechanisms to uncover such identify hidden weaknesses in AI-evolved systems programs. To this end, we develop AIChilles that takes as input a baseline program $P$ and an AI-evolved program $P'$, AIChilles searches for valid workloads where $P'$ regresses relative to $P$ in correctness, runtime, memory usage, or output quality. To tackle the diversity in system applications, weakness types and potential bugs, AIChilles combines deterministic workload-parameter extraction, agent-based constraint inference, differential oracles, and code-frequency coverage to discover diverse failures. Across five system applications and 30 AI-evolved programs, AIChilles finds 49 distinct hidden weaknesses. We also show that explicitly including AIChilles in the AI-driven development lifecycle can mitigate several of these weaknesses.