New articles on Electrical Engineering and Systems Science


[1] 2605.28901

Identifiability of Low Frequency Li-ion Battery Parameters in Time Domain

This paper investigates the identification of observable low-frequency (LF) parameters of battery cell's equivalent circuit models (ECMs) using time-domain voltage and current measurements sampled at low frequency by built-in battery management systems (BMS) during operation. Accurate estimation of such parameters is challenging due to measurement resolution available in practical settings. To address this, a modeling and identification framework is proposed in which fractional constant phase element (CPE), commonly used to model LF diffusion phenomena of battery cells, is approximated in the time domain using a high-order RC network with a recursive definition. The parameter estimation problem is formulated as a constrained, non-convex least-squares problem in a discretized state-space representation. To improve robustness, parameter initialization strategies, bounds, and a procedure for selecting the number of RC branches are rigorously derived. The method is evaluated in a numerical study based on a power system application where the battery under the study provides primary frequency control to the grid. Under noise levels representative of typical BMS measurements, the proposed approach achieves, from time-domain measurements, accurate LF parameter estimation (including the CPE), with average errors below 1 %.


[2] 2605.28992

FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

Media compression standards have reached a plateau in terms of the rate-distortion-complexity trade-off, limiting the ability to offload expensive AI perception to the cloud in applications like robotics, wearables, and remote sensing. DNN-based codecs improve compression efficiency, but at a cost: they cannot easily adapt to large changes in available bitrate, and real-time encoding requires expensive, power-hungry GPUs that prohibit use on low-cost or resource-constrained platforms. To address these limitations, we propose a novel autoencoding framework (FRAPPE) that uses the Full input to predict the Residual output via a Projection Pursuit Encoder. FRAPPE's encoding objective naturally sorts latent channels by importance, allowing zero-overhead variable-rate coding. Unlike RNN-based learned codecs, whose encoder consumes the previous reconstruction's residual, or RVQ-style codecs, whose codebooks must be applied sequentially, FRAPPE's analysis path is an embarrassingly parallel DAG of independent input projections. Using FRAPPE, we build a variable-rate RGB image codec (FRAPPE-Image), and evaluate its rate-distortion-complexity trade-off against standard image codecs. At high compression ratios (approx. 0.1 bpp) FRAPPE-Image provides higher perceptual quality than AVIF with 47 times faster encoding, making it capable of real-time 1080p, 30fps CPU-only encoding. Our code and pre-trained models are available: this https URL .


[3] 2605.29003

Tensorized Radiative Heat Transfer for a Scalable and Calibrated Building Energy Simulator

Accurate building energy simulation is essential for developing advanced control strategies that enable demand flexibility and grid responsiveness. The Smart Buildings Control Suite (sbsim) offers a lightweight, scalable, and data-calibrated simulation environment based on a tensorized finite difference model. Previous work extended sbsim to include interior long-wave radiative heat exchange between indoor surfaces. However, a complete thermal model must also account for exterior radiative processes, including long-wave radiation exchange with the sky and surroundings, as well as short-wave solar radiation incident on building surfaces. This paper presents a comprehensive radiative heat transfer implementation for sbsim that integrates both interior and exterior radiation mechanisms. Our primary contribution is the development and integration of a fully tensorized exterior radiation module that captures sky and ground long-wave exchange as well as solar heat gains through opaque and transparent surfaces. By formulating these processes as tensor operations compatible with the existing framework, we preserve the computational efficiency necessary for reinforcement learning applications. We validate our implementation against established simulation tools and demonstrate improved prediction accuracy for surface temperatures and building thermal loads. This enhancement significantly increases the physical fidelity of sbsim, enabling more realistic training environments for building energy optimization and control.


[4] 2605.29013

Local Observability and Moving Horizon Estimation-based Training of Feedforward Neural Networks

In this paper, we propose a moving horizon estimation (MHE)-based training method for feedforward neural networks (FNNs) with rectified linear unit (ReLU) activation functions to determine their ideal weights from a control-theoretic perspective. This allows for a rigorous theoretical analysis of the trained network. First, we reformulate the FNN as a dynamical system with the weights as states. Then, we investigate the local observability of such a system. For two-layer FNNs with fixed output weights, we derive a sufficient condition under which the observability rank condition holds, ensuring a locally observable state. We also show that multi-layer FNNs in general fail to satisfy the observability rank condition. Based on this analysis, we develop a persistently exciting (PE) input design method, which renders a state distinguishable from its neighbors. The resulting local observability provides convergence guarantees for the proposed MHE-based training, where only the projection of the state onto the observable subspace is updated using a fixed-length window of input-output data. The effectiveness of the approach is illustrated via numerical examples.


[5] 2605.29053

Grid Capacity Expansion under Data Centers and Electrified Manufacturing Large Loads

In this paper, we consider the expansion of power grids under emerging large loads from data centers and electrified manufacturing. We develop a multi-period grid capacity expansion model to determine optimal investment profiles for power generation, storage, and transmission capacity while accounting for hourly power dispatch, such that electricity demand is satisfied and the total planning and operation cost is minimized. We also propose a new modeling approach regarding the spatial distribution of demand from large loads. The model is used to analyze the expansion of a synthetic grid that follows key characteristics of the ERCOT system over a seven-year planning horizon, under loads from data centers and electrified oil refining, which account for 17.5% and 4.7% of total annual electricity demand by the end of the planning horizon. The optimal investment policy leads to an 83.6% increase in generation capacity and exploits the short construction times of solar and storage as well as the operational flexibility of thermal generators. Finally, sensitivity analysis reveals that the construction time of grid assets substantially impacts investment timing, generation technology mix, and transmission capacity expansion. The proposed modeling framework is general and can be extended to other grid systems, enabling the exploration of diverse demand scenarios, policy assumptions, and regional characteristics.


[6] 2605.29063

Accelerating HEVC Intra Partitioning via a CNN-Hierarchical Attention Transformer Hybrid

The recursive quad-tree partitioning in High Efficiency Video Coding (HEVC) incurs considerable computational overhead, with exhaustive rate-distortion optimization for CTU partition prediction consuming the dominant share of encoding time. Although partition prediction through deep learning has emerged as a viable encoding accelerator, an architectural dichotomy remains largely unaddressed: CNNs are computationally efficient but spatially myopic due to their localized effective receptive fields, failing to capture long range semantic relationships and repetitive textures; conversely, transformer based architectures are better at capturing global context but incur prohibitive CPU latency, a critical liability that impedes deployment which is predominantly CPU-bound. This paper introduces Hybrid Fast Vision Transformer (HFViT), a hybrid architecture designed to accelerate HEVC intra-mode partition prediction. HFViT fuses a reparameterized depthwise-separable convolutional backbone with a Hierarchical Attention Transformer (HAT) mechanism, leveraging a carrier token scheme to enable efficient global information propagation at sub-quadratic complexity. Post-training structural fusion collapses batch normalization into preceding layers to further reduce latency. Comprehensive evaluation reveals the efficacy of HFViT in accelerating HEVC intra-encoding across resolutions. On standard JCT-VC test sequences, HFViT reduces the average VMAF BD-rate penalty by 2.4, 2.6, and 7.9 percentage points on Classes A, B and E, respectively, as compared to the competing ETH-CNN baseline while maintaining CPU inference latency within 8% of the CNN baseline and surpassing it on GPU by 40%, establishing practical viability for real-time encoder integration.


[7] 2605.29085

Dimming Space-Time Code (DSTC) for Visible Light Communication with Semi-Blind Detection

Visible light communication (VLC) provides a unified framework for wireless data transmission and illumination, but its practical deployment requires transmission schemes that jointly satisfy communication and lighting constraints. In color-shift keying (CSK) systems, dimming remains a challenging and underexplored problem because the average optical power must be controlled without altering the perceived chromaticity. This paper proposes a dimming space-time code (DSTC) for CSK-based VLC systems, where a structured dimming matrix introduces controlled temporal power variations while satisfying physical feasibility, color preservation, and identifiability conditions. Two receiver architectures are developed: a pilot-assisted zero-forcing (ZF) receiver and a tensor-based semi-blind PARAFAC receiver that jointly estimates the channel and transmitted symbols using only one training time slot. Simulation results show that the proposed DSTC provides diversity gains and substantial BER reductions with respect to conventional CSK, while the tensor-based receiver improves spectral efficiency by reducing training overhead, with particular benefits in large-scale MIMO configurations.


[8] 2605.29104

Energy-Optimal Thermal Management of Heat-Pump Battery Electric Vehicles

This paper presents an energy-optimal hybrid control framework for thermal management of heat-pump battery electric vehicles (BEVs). The controller coordinates the compressor, coolant pumps, and cabin blower across the coupled refrigerant, coolant, and air loops, while enforcing cabin comfort and component temperature constraints. The framework combines a rule-based supervisory layer, which handles discrete system configuration, with a continuous nonlinear model predictive control (NMPC) optimizer that minimizes thermal energy consumption over a finite prediction horizon. A control-oriented model is developed to capture the dominant dynamics of the cabin, refrigerant loop, reconfigurable coolant circuits, and key thermal masses including the battery, motor, and inverter. The model is validated against a high-fidelity reference, achieving a mean absolute temperature prediction error below \SI{1.8}{\celsius} for key thermal states including the battery, motor, and cabin air temperature, while reducing simulation time by approximately \SI{85}{\percent}. The terminal cost is computed by linearizing the system about a quasi-steady operating point and solving the discrete-time algebraic Riccati equation, ensuring well-conditioned optimization across varying operating conditions. The proposed framework is evaluated against the built-in rule-based controller of MathWorks Simscape \emph{Electric Vehicle Thermal Management with Heat Pump} model under cold-climate extended driving conditions, demonstrating consistent reductions of \SI{20}{}-\SI{28}{\percent} in thermal energy consumption across all tested scenarios. The complete implementation, developed using the open-source CasADi framework, is made openly available at \href{this https URL}{GitHub} repository to support reproducibility and further development.


[9] 2605.29163

BCER Agent: Reliable Long-Horizon MRI Workflow Execution via Compilation, Artifact Binding, and Bounded Local Recovery

Many recent medical VLM and agent studies are benchmarked on 2D images or comparatively short tool-calling exchanges, whereas real MRI analysis typically demands long, interdependent pipelines that operate on 3D/4D volumetric data. Under these conditions, reactive tool-calling agents are prone to cascading breakdowns triggered by faulty intermediate references, mismatched tool arguments, and limited control over cross-step dependencies. To address this, we introduce BCER (Brain-Cerebellum-Extremity-Reflector), a controller architecture aimed at dependable long-horizon MRI workflow execution. BCER decouples high-level planning from execution and provides bounded local recovery. We assess BCER on a multi-organ MRI benchmark covering brain, prostate, and cardiac tasks with both short- and long-chain workflows, using matched task contracts across controller variants and several backbone models. Relative to reactive baselines, BCER yields consistent improvements in end-to-end execution, with the most pronounced gains observed on long-chain workflows. BCER additionally enables auditability by maintaining explicit links between final outputs and intermediate artifacts and measurements. Code and benchmark are released at this https URL.


[10] 2605.29164

Low-Complexity Tensor-Based Monostatic Sensing for IRS-Assisted Communication Systems

This paper proposes a tensor-based parameter estimation algorithm for sensing in an intelligent reflecting surface-assisted system. We present a higher-order singular value decomposition-based solution that exploits the tensor structure of the received echo signal to jointly estimate the target's delay, Doppler, and angular information. Our tensor-based solution can estimate the parameters individually at low complexity, benefiting from parallel computation. Complexity analysis is carried out in comparison with a baseline scheme that does not exploit the intrinsic multilinear structure of the sensed signal. Simulation results show that our proposed tensor-based method can achieve the same performance as the reference method while drastically reducing the computational complexity.


[11] 2605.29171

PARAFAC-Based Time-Varying Channel Estimation for IRS-Aided Communications

This paper proposes a tensor-based parametric channel estimation technique for IRS-assisted communication systems with time-varying channel parameters. We exploit the multidimensional structure of the received signal by developing a $3$rd-order PARAFAC tensor model that is solved by employing the iteratively ALS algorithm. Our simulation results show that the proposed approach provides enhanced performance in terms of NMSE of the concatenated channel compared to the competing solutions by capitalizing on the intrinsic tensor structure of the received signal without increasing the computational complexity of the channel estimation.


[12] 2605.29191

Distributed Non-Uniform Scaling Control of Multi-Agent Formation with Dynamic Agent Joining

Non-uniform scaling control of formation enables multi-agent systems to adjust their shape by scaling with different ratios along different coordinate axes, offering enhanced flexibility in complex environments. However, like most existing formation maneuver strategies, it typically assumes a fixed set of agents, limiting its applicability in scenarios requiring dynamic team expansion. This paper introduces a distributed control framework that enables a formation to incorporate new agents during non-uniform scaling maneuvers in arbitrary dimensions while preserving the spectral properties of the graph Laplacian. Simulation examples validate the effectiveness of the theoretical results.


[13] 2605.29209

The WER Trap: Shattering the Illusion of Unified Tokens in Speech Language Models

The pursuit of a "unified" discrete token for both speech understanding and generation has led the Speech Language Model (SLM) community to heavily rely on Word Error Rate (WER) -- the core metric for Whisper-style tokenizers -- as the definitive proxy for representation quality. This fosters the assumption that low-WER tokens inherently preserve the information necessary for intelligible acoustic synthesis. We argue this is fundamentally deceptive. While high-frequency tokens succeed in generation tasks due to implicit information leakage, isolating pure semantic information at ultra-low frame rates strips away the finegrained articulation and micro-dynamics essential for ODE-based generation. Empirically validating this requires extreme compression without sacrificing WER -- a methodological bottleneck, as standard fixed-stride downsampling arbitrarily truncates phonetic boundaries. To overcome this, we develop a dynamic compression tokenizer that intelligently aligns representations with semantic boundaries, achieving ultra-low frame rates with exceptionally low WER. Using these isolated "pure" semantic tokens, we expose the WER trap: when conditioning generative models -- even with oracle duration alignments -- the reconstructed speech suffers from severe articulation blur and is rendered acoustically unintelligible. Our findings demonstrate that semantic categorization rewarded by low WER is inherently orthogonal to the continuous phonetic trajectories required for synthesis, shattering the illusion of the unified token and advocating for explicitly decoupled speech representations.


[14] 2605.29227

Channel Estimation for Flexible Intelligent Metasurface Aided MIMO Communications

Flexible Intelligent Metasurfaces (FIMs) enable wireless systems to adapt their three-dimensional geometry through morphing, thereby providing new spatial degrees of freedom. However, continuous deformation complicates the accurate acquisition of Channel State Information (CSI). This work proposes a multidimensional framework for MIMO systems with active FIM arrays at both the transmitter and receiver. A split single-time-scale training protocol sequentially introduces spatial variation by morphing the receiver, then the transmitter. The resulting signal model is formulated as a PARAFAC decomposition, and an alternating least squares (ALS) algorithm is employed to estimate steering matrices and path gains. Our numerical results show that the proposed channel estimation method yields accurate CSI recovery for different system setups.


[15] 2605.29332

MIMO-OTFS-Based Semantic Communication for High-Mobility Scenarios

In high-mobility scenarios with time-frequency doubly-selective channels, existing semantic communication systems suffer significant performance degradation. To address this issue, we propose a semantic communication framework that synergistically integrates multiple-input multiple-output orthogonal time frequency space (MIMO-OTFS) with semantic-aware sub-channel allocation. First, an entropy module is employed to evaluate importance of different semantic features, and the Kendall correlation coefficient is used to quantify the alignment between semantic importance and sub-channel conditions. Subsequently, joint optimization of the encoder and decoder is achieved through a comprehensive loss function that balances image classification accuracy, reconstruction quality, and sub-channel matching degree. Experimental results confirm the superior reconstruction quality of our proposed framework compared to conventional semantic communication systems based on orthogonal frequency division multiplexing in high-mobility channel environment.


[16] 2605.29364

Iterative Reduced-Rank MMSE Estimation of Sparse Range Profiles from Non-Contiguous Radar Transmission Spectra

Ongoing demand for radio spectrum by commercial wireless services has steadily increased pressure on the frequency bands traditionally reserved for radar. This paper addresses the joint problem of designing non-contiguous radar transmission spectra and estimating the range profile from the resulting reduced measurement set. Transmission spectra are constructed using a Marginal Fisher Information (MFI) criterion that removes blocks of frequencies contributing least to estimation accuracy. To process the underdetermined signals acquired from the resulting sparse measurement vector, an iterative Reduced-Rank Minimum Mean-Square Error (RRMMSE) estimator is proposed. The estimator starts with a single-target hypothesis and grows the active target subspace one range bin at a time, updating the a~priori target covariance matrix in each iteration using both the largest estimated reflection coefficient and its posterior error variance. This avoids inversion of the full $M{\times}M$ covariance matrix that would be required by a one-step MMSE and concentrates the rank of the estimator on the support of significant scatterers. The Bayesian Cramér--Rao Lower Bound (CRLB) on the per-bin reflection coefficient is derived for the non-contiguous spectrum measurement model, and the computational complexity of the proposed estimator is shown to scale as $\Order(G^2 M K^2)$, where $G$ is the number of detectable scatterers, $M$ is the number of range bins, and $K$ is the number of preserved spectral samples. Simulations using $50\%$ and $75\%$ spectrally occupied MFI-designed spectra confirm that the algorithm recovers sparse range profiles with Mean-Square Error (MSE) close to the fully filled baseline when the number of significant scatterers is not larger than the rank of the sparse sensing matrix.


[17] 2605.29385

Closed-Loop Identification of Periodically Time-Varying Systems via Cyclic Reformulation

This paper studies closed-loop identification of linear periodically time-varying (LPTV) plants, with emphasis on open-loop unstable plants for which open-loop experiments are not practically available. The central contribution is an exact algebraic plant-extraction theorem for cycled closed-loop realizations: for square strictly proper plants and a controller path satisfying an invertibility condition, the cycled plant transfer matrix is recovered from a shared state-space realization of the stable closed-loop maps from the external reference to the plant output and to the control input, without state augmentation, and without requiring the recovered plant realization to be stable. Thus, the stability requirement for data generation is shifted from the open-loop plant to the internally stable closed-loop system. Building on this result, a closed-loop identification algorithm is constructed that takes the reference, output, and input signals as data, applies standard subspace identification to the cycled signals, performs the algebraic plant extraction, and recovers the LPTV plant state-space parameters via a coordinate transformation; the conditioning of the inverse controller path governs the reliability of the extraction step. Numerical examples demonstrate the recovery of stable and open-loop unstable SISO LPTV plants and validate a MIMO case through coordinate-invariant Markov-parameter comparisons.


[18] 2605.29409

Decoupled Thrust-Axis Attitude Control Using Quaternions for Chandrayaan-3 Lunar Landing Mission

Chandrayaan-3 mission achieved a historic milestone with its successful soft landing near the lunar south pole, highlighting the critical role of the navigation, guidance, and control (NGC) system. Navigation provided vehicle state estimates relative to the Moon center, while a polynomial based guidance scheme computed the required acceleration profile to meet terminal landing conditions. This acceleration demand was translated into total thrust magnitude and attitude commands generation. Attitude command generation involved aligning the thrust axis with the required acceleration vector and constraining rotation about the thrust axis, typically governed by mission-specific requirements. Although quaternion-based control laws are preferred for their singularity-free representation, they inherently couple all three rotational axes. This coupling can lead to undesirable interactions between guidance and control, especially during large rotations about the thrust axis, due to the quaternion shortest-path property. This paper proposes a novel quaternion-based decoupling method that enables independent thrust-axis control, mitigating guidance-control interaction and ensuring proper attitude commands generation for lander attitude control.


[19] 2605.29412

Real-Time Retargeting Using Controllability Boundary for Chandrayaan-3 Lunar Landing

This paper presents the real-time retargeting guidance policy developed for the Chandrayaan-3 lunar landing mission. The baseline guidance generates approximate fuel-optimal descent trajectories, while a high-level policy enables safe retargeting to alternate sites when the nominal site becomes infeasible. The retargeting strategy leverages a convex representation of the controllability boundary, allowing rapid feasibility checks and real-time target updates. To the best of the authors knowledge, this represents the first application of a data-driven retargeting framework in an operational lunar landing mission. Pre-flight simulations and Chandrayaan-3 flight results validate the effectiveness of the proposed approach.


[20] 2605.29415

Constructing efficient channels for ideal observers using the conjugate gradient method

Task-based assessment of image quality (IQ) is critically important for the design and optimization of medical imaging systems. Ideal observers, including the Bayesian Ideal Observer (IO) and the ideal linear observer, i.e., the Hotelling observer (HO), provide objective figures of merit (FOMs) that quantify system performance on signal detection tasks. However, the application of ideal observers to high-dimensional image data is often computationally intractable. Channel mechanisms provide an effective framework for dimensionality reduction that can facilitate the computation of ideal observers. This work presents a conjugate gradient (CG)-based method to construct efficient channels for approximating the IO and HO performance.


[21] 2605.29449

Column-Wise Analog Processing for Hybrid Precoding in Millimeter Wave Downlink Multi-User Massive MIMO Systems

In millimeter wave (mmWave) massive MIMO systems, existing alternating minimization (AltMin) based hybrid digital-analog precoding algorithms can achieve near-optimal spectral efficiency (SE) of the fully-digital precoding. However, this kind of AltMin algorithms require several hundreds of iterations to optimize the analog precoder, thus increasing the complexity. This paper focuses on reducing complexity of the analog precoding and proposes a column-wise analog precoding (CWAP) algorithm. The main idea is to seek closed-form solution of the analog precoder, through which the analog precoder can be easily computed in one step instead of iterations, thus reducing the complexity. Specifically, by assuming that perfect digital precoder is deployed at the base station (BS) to eliminate interferences among users, we simplify the expression of achievable SE for each user. Subsequently, the simplified SE is further converted to the sum of a series of sub-rates, each of which is related to the corresponding column of the analog precoder. The optimization problem of maximizing the SE is then transformed into a series of sub-problems of maximizing each sub-rate. Upon solving each sub-problem, closed-form solution of each column of the analog precoder can be directly obtained without iterations, resulting in reduced complexity. Simulation results demonstrate that (a) when the number of RF chains equals the number of data streams, the proposed scheme can achieve approximately same sum-rate as the AltMin algorithms; (b) when the number of RF chains is larger than the number of data streams, the proposed scheme can achieve higher sum-rate than the AltMin algorithms; (c) the proposed scheme has lower complexity than the AltMin algorithms (almost one order of magnitude reduction in some cases).


[22] 2605.29481

Hybrid Digital and Analog Airy Beamforming for Near-Field Multi-User Communications

The demands for high data rates in 6G networks have driven the transition toward higher frequencies and larger antenna apertures, giving rise to the near-field communications. In the near-field region, spherical waves enable beam focusing to enhance the received power. However, high-frequency focused beams are highly susceptible to ubiquitous obstacles due to rectilinear trajectories. Particularly in multi-user communications with hybrid precoding, focused beamforming suffers from impaired spectral efficiency under potential multi-user link blockages. In this paper, we propose an Airy beamforming enabled multi-user transmission scheme. The near-field Airy wavefront with a bending trajectory is first developed to cope with the obstructed channels, possessing the dual capability of bypassing obstacles and concentrating energy. Moreover, a low-complexity Airy beamforming enabled multi-user communication scheme is designed. Specifically, Airy beams capable of circumventing obstacles and aligning with users are first obtained through hierarchical Airy beam training. Then, the selected Airy beams are leveraged to configure the analog beamformer to achieve multi-user obstacle-avoiding access without full channel state information acquisition. Finally, the digital beamformer is utilized to further mitigate inter-user interference. In simulations, the beam patterns demonstrate that the proposed Airy beamforming successfully circumvents blockages and aligns with multiple users. Across typical mmWave to THz bands, the proposed scheme outperforms conventional focused beamforming in terms of spectral efficiency.


[23] 2605.29527

Robustness Enhancement of Consensus Networks: the Optimal Memory Depth

Understanding what governs collective robustness and how it can be enhanced remains a central pursuit in network science. This paper investigates the robustness of multi-agent consensus networks, quantified by the $H_2$ performance metric, and delves into the enhancing effect of agents' local memory on it. Inspired by the hierarchical temporal structure of memory observed in neuroscience, we focus on the role of memory depth, which reflects the temporal features of memory from recent to remote. Building on linear extrapolation, we propose a consensus protocol with single-step memory and tunable memory depth, derive the necessary and sufficient condition for achieving consensus, and show that the protocol exhibits an inheritable consensus property across memory depths. Furthermore, analytical expressions for the $H_2$ performance metric, which depend on the memory factor, memory depth, coupling gain, and Laplacian spectrum, are established. Under balanced usage of real-time and memory information, we demonstrate that memory at any accessible depth enhances $H_2$ performance, and the optimal memory depth occurs at either the most recent or the most remote memory, contingent upon certain parameter regions. Further detailed discussions are provided to clarify the broader implications of our findings.


[24] 2605.29613

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

While LLM-based Automatic Speech Recognition (ASR) achieves high accuracy, its speed is limited by sequential autoregressive decoding. Diffusion Language Models (DLMs) offer a parallel alternative, yet their decoding strategies remain under-explored in ASR contexts. This paper analyzes three decoding schemes for DLM-based ASR: fixed-number, static confidence threshold, and dynamic confidence threshold. We propose measuring round-wise accuracy using Negative Log-Likelihood-based uncertainty as a proxy for decoding progress. Our results show that both threshold-based strategies significantly outperform fixed-number schemes in accuracy and speed. We attribute this to a property unique to ASR: most tokens reach high confidence early, allowing reliable ones to be harvested aggressively while leaving only difficult tokens for later rounds. Notably, the static-threshold strategy matches the accuracy of autoregressive decoding while offering superior efficiency.


[25] 2605.29753

A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging

Dual-energy CT (DECT) enables virtual monochromatic imaging (VMI) and improved contrast resolution, but its clinical adoption is limited by hardware complexity and cost. In this work, we propose a unified deep learning framework that synthesizes contrast-phase-specific virtual monochromatic 50 keV images from single-energy CT (SECT) data by leveraging contrast phase information as a prior. The model is trained using DECT-derived 70 keV and 50 keV image pairs across four contrast phases -- Angio, Arterial, Portal, and Delayed -- using a novel prior conditioning architecture that integrates contrast phase priors into the energy transformation process. We demonstrate that the proposed unified model achieves contrast enhancement and generalizes well across contrast phases. Additionally, we show that the model can generate 50 keV-like images from SECT inputs, preserving contrast phase-specific dynamics.


[26] 2605.29777

Multi-Snapshot Deep Denoising for Channel Estimation in OTFS Modulated Systems

A deep denoising based channel estimation framework is proposed for orthogonal time frequency space (OTFS) modulated systems, wherein channel state information (CSI) recovery is formulated as an image restoration problem. A salient attribute of the approach is the exploitation of structural invariance in the delay Doppler (DD) domain channel over a geometric coherence time, allowing multiple OTFS frames captured during this period to serve as noisy snapshots of the approximately identical channel. These snapshots jointly enhance the effectiveness of the proposed lightweight denoiser based on nonlinear activation free network (NAFNet). The method exhibits low computational complexity, operates reliably even at low pilot signal-to-noise ratio (PSNR), and can accommodate both fractional delay and fractional Doppler effects. Simulation results demonstrate significant performance gains over the existing methods.


[27] 2605.29808

Absorption and Phase-Contrast Microtomography Using Direct X-ray Detection With COTS CMOS Sensors

This work presents a high-resolution X-ray microtomography system that uses commercial off-the-shelf (COTS) CMOS image sensors as direct detectors, relying on the sensor s intrinsic resolution to achieve tomographic reconstructions without optical components. The system employs a microfocus X-ray source in cone-beam geometry, enabling both absorption-contrast and propagation-based phase-contrast imaging. A dynamic flat-field correction algorithm mitigates radiation-induced degradation during long acquisitions, helping to overcome limitations of consumer-grade hardware. The setup provides voxel sizes from 3.9 micron to this http URL. Phase contrast visualizes soft tissue boundaries that would be undetectable by conventional radiography. Compared to synchrotron or nanofocus systems, our solution is simpler, lower-cost, and avoids complex optics or slow scans. COTS CMOS sensors appear as a viable alternative for laboratory-scale high-resolution microtomography.


[28] 2605.29818

Teleoperation Operational Design Domain based on Minimal Risk Maneuver Capability

This article discusses the concept of an Operational Design Domain (ODD) designed specifically for teleoperated road vehicles. For this purpose, the ODD concept designed for automated driving is adapted for teleoperation. As teleoperation becomes more common in regular traffic, the question arises under which operating conditions such vehicles are able and allowed to drive. Currently, these conditions are selected primarily based on network performance. From a safety perspective, it is difficult to base such a selection on a reliable connection because it is almost impossible to guarantee sufficient reliability. With this in mind, the ODD concept designed for automated driving is adapted for teleoperation: A concept is proposed for basing the ODD for a teleoperation system on the capability of the teleoperated vehicle to perform a minimal risk maneuver using a dedicated system designed solely for this purpose. This concept is then demonstrated using a use case example.


[29] 2605.29841

Distributed Nonlinear Model Predictive Control for District Heating Networks

This paper presents a distributed nonlinear model predictive control that uses alternating direction method of mul tipliers for district heating networks. Exploiting a graph-based modeling of the thermal dynamics, our controller optimizes the mass flow absorption of buildings in a distributed cooperative scheme that mediates between the superior performance of the centralized control and the privacy preservation of the decentralized schemes. A benchmark three-building network simulation is used to compare the performance of the proposed solution with a decentralized model predictive control scheme.


[30] 2605.29849

BuilDyn: Excitation-Driven Data Generation for Building Thermal Dynamics Modeling and Control

Machine learning (ML) is increasingly used for data-driven modeling of buildings to enable downstream tasks such as fault detection and diagnosis, and energy-efficient control. While recent work improves generalization across building characteristics, weather, and occupancy, generalization also depends on sufficient exploration of the control-driven system state space. Existing real-world datasets and simulation environments predominantly reflect stationary operation under fixed control policies, resulting in limited excitation and reduced robustness to unseen operating conditions. This paper introduces BuilDyn, a package based on BuilDa that enables customizable excitation strategies for control-oriented data generation. BuilDyn further supports sampling from representative building distributions and provides a Python interface for easy integration into machine learning pipelines. We demonstrate the benefits of BuilDyn by comparing the performance of data-driven ML models trained on non-excited and excited data for one building. With BuilDyn, we hope to advance scalable control-oriented modeling and support future directions such as transfer learning and building-specific foundation models.


[31] 2605.29859

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables

Recent speech language models rely on encoders that are optimized separately from autoregressive models. Since these encoders are unaware of the downstream objectives, the extracted representations may not be optimal for downstream tasks. To address this limitation, we introduce a discrete latent variable model on mel spectrograms that jointly optimizes the encoder and the speech language model. Joint optimization not only brings improvements over codec-based and other mel-spectrogram-based baselines on zero-shot Text-to-Speech (TTS) and Speech-to-Text (STT) tasks, but also effectively alleviates common issues in autoregressive mel-spectrogram modeling, such as prolonged silence generation and word omissions.


[32] 2605.29862

Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions

AI-driven respiratory sound classification (RSC) is promising for automated pulmonary disease detection, yet multi-site deployment is hindered by inter-stethoscope variability. We introduce a federated domain generalization (FedDG) formulation for RSC under stethoscope-induced device shifts, where clients use heterogeneous devices and the model is evaluated on unseen devices. Our empirical analysis shows that stethoscope-induced style and disease-specific content are tightly entangled, making deterministic style removal unreliable. In response, we propose a causality-inspired multimodal FedDG framework that combines: (i) a causality-inspired device style intervention network that performs content-preserving style perturbations, (ii) counterfactual text augmentation that neutralizes metadata shortcuts, and (iii) gradient alignment that facilitates device-invariant representations across clients. Built on a multimodal language-audio pretraining model, it outperforms conventional data augmentation and federated learning baselines in leave-one-device-out validation on ICBHI and SPRSound datasets. Code will be released upon publication.


[33] 2605.29950

Frequency-Modulated and Single-Tone Excitation to Reveal Vibro-Acoustic Nonlinearities in Loosened Bolted Joints

Preload loss in bolted joints results in alterations of the stiffness, damping, and nonlinearity of the structure, but existing monitoring techniques for rail-vehicle systems are often not capable of combining controlled shaker tests and sensing of nonlinear features. This paper proposes a method for detecting bolt loosening using a vibro-acoustic technique, where the structure is subjected to controlled shaker tests to sense the nonlinear features. A triaxial accelerometer was attached to the demonstrator, a microphone was placed in close proximity, and one of the bolts was tested under 0%, 20%, 40%, and 80% preload conditions. Single-tone and frequency-modulated (FM) signals close to the main natural frequency of 130 Hz, which was identified using sine sweep and narrow-band excitation, were applied to the demonstrator. When the structure was subjected to 130 Hz single-tone excitation, the loose state of the bolt exhibited several additional high-frequency spectral peaks. FM excitation between 125 and 135 Hz further distinguished between the states. Harmonic band power ratios, normalized to the carrier, distinguished between the loose state and the 80% preload state, where the difference between the loose and 80% preload states was 17.5 dB for l = 2 and 36.5 dB for l = 6.


[34] 2605.30183

Fault-Ride-Through Coordination Strategy for Offshore AC Islands with Multi-Infeed HVDC Interconnections

Large-scale offshore Wind Farms (WFs) are considered key assets towards realizing a sustainable power system. These systems are often configured as offshore AC islands and their integration largely depends on the High-Voltage-Direct-Current (HVDC) technology. This topology, while it enables cost-effective transmission over large offshore distances, may lead to operational challenges. Specifically, the operation of offshore AC islands during faults and the grid code requirement fulfillment are identified as a major challenges for their large-scale deployment. To address this pressing issue, a comprehensive coordination control strategy for the different participating converters in multi-infeed AC offshore islands during Fault Ride Through (FRT) operation is presented in this work. The proposed strategy introduces advanced control functions in the FRT schemes of both the HVDC and WF converters, such as zero active and reactive power injection during faults, as well as post-fault active power droop control coordination to tackle power imbalances. The proposed FRT coordination strategy is validated through both extensive simulations in PSCAD/EMTDC, as well as with Power Hardware-in-the-Loop (PHIL) experimental results, considering both AC and DC faults.


[35] 2605.30222

Optimization of Predictive Maintenance Schedules under Uncertainty: A Scenario-Based Theoretical Framework

This paper proposes a scenario-based framework for predictive maintenance scheduling under uncertainty in a finite planning horizon. The considered setting involves multiple assets for which maintenance decisions are informed by three heterogeneous sources of information: calendar-based overhaul intervals, usage-based limits driven by uncertain future operating cycles, and condition-monitoring outputs represented through remaining useful life (RUL) estimates with uncertainty. While these elements have been studied extensively in the maintenance literature, they are often treated separately or only partially integrated. In contrast, the proposed formulation evaluates complete maintenance schedules under simulated future scenarios and compares them using expected-cost and tail-risk criteria. The contribution is primarily conceptual and methodological: we define a unified finite-horizon decision framework that combines calendar-, usage-, and prognostics-based information within a common scheduling problem. A small synthetic computational example is used as a proof of concept. The results show that integrated scenario-based policies can substantially outperform simpler single-trigger rules, while the difference between risk-neutral and risk-aware integrated policies remains modest under the present calibration.


[36] 2605.28980

Manifold-based Algorithms for the Hadamard Decomposition

Given a matrix $X$, and two ranks $r_1$ and $r_2$, the Hadamard decomposition (HD) looks for two low-rank matrices, $X_1$ of rank $r_1$ and $X_2$ of rank $r_2$, both of the same size as $X$, such that $X\approx X_1\circ X_2$, where $\circ$ is the Hadamard (element-wise) product. In most cases, HD is more expressive than standard low-rank approximations such as the truncated singular value decomposition (TSVD), as it can represent higher-rank matrices with the same number of parameters; this is because the rank of $X_1 \circ X_2$ is generically equal to $r_1 r_2$. In this paper, we first present some theoretical insights for HD, in particular a useful reformulation $X\approx WH^\top$ where $W$ and $H$ have $r_1 r_2$ columns and belong to certain manifolds. These allow us to develop three new algorithms for computing HD. The first one uses the representation $X\approx X_1\circ X_2$ and relies on the Manopt toolbox. The other two rely on the reformulation $X\approx WH^\top$: one is a block projected gradient method, and the other is a manifold-based gradient descent algorithm that does not require projection onto the feasible set. The last two algorithms are particularly effective for handling large sparse data. We also propose new initializations that allow us to improve the accuracy of the HD. We compare our algorithms and initialization strategies with the TSVD and with the state of the art. Numerical results show that the new methods are efficient and competitive on both synthetic and real data.


[37] 2605.29138

Multi-Resolution End-to-End Deep Neural Network for Optimizing Latency-Accuracy Tradeoff in Autonomous Driving

Latency-accuracy tradeoffs are fundamental in real-time applications of deep neural networks (DNNs) for cyber-physical systems. In autonomous driving, in particular, safety depends on both prediction quality and the end-to-end delay from sensing to actuation. We observe that (1) when latency is accounted for, the latency-optimal network configuration varies with scene context and compute availability; and (2) a single fixed-resolution model becomes suboptimal as conditions change. We present a multi-resolution, end-to-end deep neural network for the CARLA urban driving challenge using monocular camera input. Our approach employs a convolutional neural network (CNN) that supports multiple input resolutions through per-resolution batch normalization, enabling runtime selection of an ideal input scale under a latency budget, as well as resolution retargeting, which allows multi-resolution training without access to the original training dataset. We implement and evaluate our multi-resolution end-to-end CNN in CARLA to explore the latency-safety frontier. Results show consistent improvements in per-route safety metrics - lane invasions, red-light infractions, and collisions - relative to fixed-resolution baselines.


[38] 2605.29144

Learning and Adaptation in Wire Arc Additive Manufacturing Bead Geometry Control

Robotics Wire Arc Additive Manufacturing (WAAM) is governed by complex and nonlinear process dynamics coupling thermal field to the build geometry. The process may be regarded as a multi-input/multi-output dynamical system with welding torch speed and wire feed rate as inputs and weld bead deposition height and width as outputs. In this paper, we use the input/output data to learn a data-driven model and use it for weld planning and control. We show that a simple recurrent neural network architecture and one-step-ahead predictive control can improve the process performance in terms of height and width consistency. To account for the changing thermal conditions during the printing process, we update the learning model using prediction error from the previous layer. This adaptation step further improves the prediction accuracy and controller performance. Experiments on a robotic WAAM testbed with integrated line-scanner feedback significant improvements in height and width consistency compared to constant input and static model baselines. The proposed learning and adaptation framework provides a practical pathway toward robust, data-driven regulation of additive manufacturing processes.


[39] 2605.29231

$α$-stability of Differentially Flat Systems with Application to Newton-Raphson Tracking Control for Vehicle Dynamics

This paper studies the $\alpha$-stability property of differentially flat nonlinear dynamical systems. The results build off the recently introduced notion of $\alpha$-stability, which is particularly amenable to characterize the ability of a system to track dynamic output reference signals. We consider systems controlled using the Newton-Raphson tracking controller, which results in closed-form control policies, therefore it is computationally efficient, and it has been shown to be effective to control a large variety of mobile robots, including autonomous vehicles. The main results of the paper consist in sufficient conditions for the $\alpha$-stability of differentially flat systems and for the equivalence between the proposed control algorithm and the Newton-Raphson tracking controller applied directly to the nonlinear dynamics. We demonstrate the behavior of the proposed controller applied to the kinematic unicycle and dynamic bicycle models.


[40] 2605.29455

Uni-RCM: Unified Reference-guided Cross-modal Mapping for Multi-Class Anomaly Detection

Multi-modal industrial anomaly detection typically relies on separate models for each product category, fundamentally limiting practical scalability. When shifting to a unified paradigm that handles diverse classes simultaneously, detection accuracy often degrades due to inter-class interference and feature manifold confusion. To overcome these challenges, we propose a Unified Reference guided Cross-modal Mapping framework, named Uni-RCM. At its core, we propose a reference guide block to dynamically filter out category-specific noise by introducing a learnable reference feature, which captures the commonalities across different modalities. Besides, an offline residual quantizer is proposed to characterize the normal distribution by multiple cascaded codebooks. Extensive evaluations on the MVTec-3D AD dataset demonstrate the state-of-the-art performance in the challenging multi-class setting and in terms of image-level detection and pixel-level localization.


[41] 2605.29489

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an \emph{expert access-set} problem: given a merge operator and a checkpoint family in a shared weight coordinate system, choose which expert delta blocks to access under an explicit I/O budget. MergePipe indexes parameter blocks, builds deterministic access plans, and executes the induced budgeted merge with replayable manifests. The plan is budget-sound by construction and recovers the full-read merge at full budget; for fixed-coefficient additive operators, the omitted-update error is bounded by the norm of omitted deltas. Across Qwen and Llama merging workloads, MergePipe reduces expert-read I/O by up to an order of magnitude and achieves up to $11\times$ speedups. Representative budget sweeps show $O(10^{-3})$ parameter deviation from full-read merges and no monotonic degradation on downstream benchmarks.


[42] 2605.29628

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings

Contrastive Language-Audio Pretraining (CLAP) models are widely used for audio understanding and support modality-agnostic condition swapping in many zero-shot applications. However, their performance is heavily affected by the modality gap between audio and text embeddings. Existing explanations mainly attribute this gap to the cone effect, treating it as a shift between mean embeddings, yet correcting the mean alone yields only limited improvements. Alternative hypotheses, such as information imbalance and dimensionality collapse, have also been proposed, but they remain insufficiently verified and have not been thoroughly studied in the audio domain. Meanwhile, several works attempt to decompose multimodal contrastive embeddings into interpretable concepts, but none explicitly analyze the modality gap from the perspective of concept decomposition. In this work, we introduce COMET (Concept space Organization and Modality gap Explanation with PLS-SVD Transformation), a novel partial least squares singular value decomposition (PLS-SVD) framework for CLAP that unveils a broader perspective of the modality gap. Our framework reveals that only a small, interpretable subset of axes, which captures shared concepts, contributes substantially to similarity computation, and that the mean component represents only partially the modality gap. Building on this insight, we propose a simple spectral truncation method that mitigates the modality gap in a training-free manner. The method enables zero-shot audio captioning with condition swapping to approach fully supervised performance, without requiring large auxiliary memory banks or expensive computation. At the same time, it achieves substantial embedding dimensionality reduction while preserving strong performance on retrieval and audio captioning tasks.


[43] 2605.29677

Embodied Virtual Reality Feedback Reshapes Neural Representations to Support Continuous Three-Dimensional Motor Imagery Decoding

Continuous brain-computer interfaces (BCIs) that decode motion trajectories from imagined movement offer intuitive motor control, yet how feedback modality and longitudinal training shape neural representations and decoding performance remains poorly understood. We present the first systematic investigation of embodied virtual reality (VR) feedback during real-time 3D virtual limb control driven by motor imagery, across ten longitudinal sessions in ten participants. Performance was evaluated using three strategies: actual online performance (Fixed Decoder Generalisation, FDG), periodic retraining (Sequential Adaptive Training, SAT), and within-session upper-bound estimation (Within-Session Reconstruction, WSR). A CNN-LSTM decoder achieved within-session imagined movement correlations of r = 0.762 under VR and r = 0.672 under screen feedback. VR significantly outperformed screen feedback across all strategies and movement dimensions (improvements of 8.9-13.0%, all p <= 0.002, d = 1.42-2.05). This advantage persisted under fixed decoders without retraining, demonstrating that embodied VR feedback elicits inherently more decodable and generalisable neural representations. Linear mixed-effects modelling confirmed robust main effects of feedback modality and movement axis with no interaction. Neurophysiologically, VR produced stronger sensorimotor-parietal desynchronisation and enhanced motor-frontal functional connectivity, with pervasive anterior insula engagement across all frequency bands and increased superior parietal lobule coupling, paralleling patterns associated with real movement execution. These findings establish embodied spatial feedback as a key design principle for next-generation continuous BCIs targeting intuitive motor control and neurorehabilitation.


[44] 2605.29679

A Unified Two-Stage Generative Diffusion Framework for Channel Estimation and Port Selection in Multiuser MIMO-FAS

Fluid antenna systems (FAS) have emerged as a promising technology for next-generation wireless systems. However, practical multiuser multiple-input multiple-output FAS (MIMO-FAS) faces two inherently coupled challenges: acquiring accurate high-dimensional channel state information (CSI) from limited RF chains and solving the combinatorial port selection problem, where the effectiveness of the latter highly depends on the result of the former. In this paper, we propose a unified two-stage diffusion framework that formulates the joint task as a maximum-a-posteriori (MAP) inference problem and decomposes it into two sequential sampling stages through a plug-in approximation. For Stage I, a continuous flow-based diffusion model serves as a powerful implicit prior for 2D FAS channels, and a parallel guided generation scheme realizes approximate posterior sampling, enabling accurate multiuser channel recovery even under severely low sub-sampling ratios. For Stage II, a discrete diffusion model is trained to approximate the conditional port selection distribution by combining supervised learning on heuristic labels with reinforcement fine-tuning, effectively overcoming the local optima of conventional heuristic algorithms. Extensive simulations demonstrate that the proposed framework simultaneously achieves exceptional channel estimation accuracy and globally optimized port selection, substantially improving the minimum achievable rate.


[45] 2605.29798

Low-Magnification SEM May Suffice: Interpretable Deep Learning for Multi-Scale Fracture-Cause Classification in Zirconia-Toughened Alumina

Reliable identification of fracture origins in alumina matrix composite hip and knee implants is critical for quality assurance and patient safety, yet current fractographic workflows are time-consuming, partly subjective, and reliant on high-magnification scanning electron microscopy (SEM). We present an interpretable vision-transformer (ViT) workflow for automated classification of fracture causes in an alumina matrix composite (BIOLOX delta, CeramTec GmbH) widely used in total joint replacements. A dataset of 8,493 SEM images (50x-10,000x) was curated from five years of in-production burst and proof tests and annotated into three defect categories defined along the manufacturing chain: green body, hard machining, and material defects. Under severe class imbalance, the fine-tuned ViT reached an accuracy of 0.907 and a macro-F1 of 0.888 in stratified five-fold cross-validation, with a two-stage perceptual-hash/SSIM leakage audit confirming negligible specimen overlap. Notably, performance at low magnification (50x) was comparable to that at high magnification (1k-10kx), indicating that macro-scale features - mirror geometry and hackle line fields - already encode sufficient diagnostic signal. Grad-CAM attributions consistently localised on canonical fractographic cues (mirrors, hackles, pores, machining marks), aligning with established fractographic criteria. Together, these results position interpretable ViTs as a complementary tool for ceramic-implant quality assurance, enabling low-magnification pre-screening and reducing reliance on time-intensive high-magnification inspection.


[46] 2605.29813

Tackling Interference in HAPS Networks via Angular-Aware Clustering and RSMA

High Altitude Platform Stations (HAPS) have emerged as a promising enabler for next-generation wireless networks, offering ubiquitous connectivity to ground users. Operating either in standalone mode or in integration with terrestrial networks, HAPS can significantly enhance both coverage and capacity due to their strategic placement in the stratosphere. However, interference management in HAPS-empowered networks requires special attention due to the unique propagation characteristics of HAPS links. In particular, the strong line-of-sight (LoS) conditions between HAPS and ground users result in limited channel variability, thereby intensifying inter-user interference. In this work, we consider a single HAPS serving multiple ground users through multiple beams over a limited number of orthogonal resource blocks (RBs). To address the resulting interference, we propose a novel angular-aware user clustering and interference-aware RB allocation framework that strategically clusters users, designs beams to serve each cluster, and allocates RBs to users across clusters. To further mitigate intra-RB interference, a rate-splitting multiple access (RSMA) scheme is incorporated. Simulation results demonstrate that the proposed clustering and RSMA-based approach significantly outperforms baseline schemes in terms of achievable per-user spectral efficiency.


[47] 2605.29824

On the Effect of Pulse Shaping Filters in Zak-OTFS Waveform for Radar Sensing

In radar sensing, the self-ambiguity function of the probing waveform plays a crucial role in the resolvability and detection of multiple targets. In the recent Zak-OTFS based radar literature, Gaussian pulse shaping filter has been considered, and it has been shown to offer better range/velocity estimation performance compared to the traditional chirp waveform in scenes with multiple targets. While the self-ambiguity function with Gaussian filter has very low side lobes, its main lobe is wide which compromises resolvability and performance. Motivated by this, we seek filters with better ambiguity characteristics. Specifically, we explore two other known filters, namely, sinc and Gaussian-sinc (GS) filters, and demonstrate that these filters offer better performance compared to Gaussian filter under different scenarios and receiver processing. Towards demonstrating this, we derive closed-form expressions for the self-ambiguity functions of Zak-OTFS waveform with sinc and GS filters. The ambiguity functions of sinc and GS filtered waveforms have narrow main lobes, resulting in better resolvability in scenes with densely populated targets for the basic peak-detection based receiver. The ambiguity function of Gaussian filtered waveform has very low sidelobes, resulting in better performance in sparsely populated scenes. When a receiver with inter-target interference mitigation is used, the sinc and GS filters perform better in both dense and sparsely populated scenes compared to Gaussian filter.


[48] 2605.29931

It`s All About Speed: AI`s Impact on Workflow in Music Production

In this paper, we present the results of an ethnographic study into the impact of AI and automated tools on music production workflow. Focusing specifically on professional participants who identified as recording engineers, mixers, and producers, we discuss their usage of common AI and automated software, as well as their sentiments on the proliferation of these tools. We discuss tensions that may be created between users and automated tools in key areas such as the need for speed and efficiency, controllability, and maintaining creative agency, and how these tensions may be alleviated through tool design.


[49] 2605.29942

Reconfigurable Multistate MRAM Synapses with Vortex STNO based Neurons for Scalable In-Memory Convolutional Neural Networks

Magnetic tunnel junction (MTJ)-based magnetic random-access memory (MRAM) is a promising platform for neuromorphic and in-memory computing owing to its non-volatility, high endurance, fast switching dynamics and CMOS compatibility. However, conventional spin-transfer torque and spin-orbit torque MRAM implementations for neural networks often suffer from high critical switching currents, large latency, thermal instability and significant read-write overheads. Here, we demonstrate a unified multistate MRAM-spin-torque nano-oscillator (STNO) architecture that integrates synapses and neurons on a single chip for convolutional neural network (CNN) applications. The system employs 1x8 multistate MRAM arrays as programmable synapses coupled with a vortex-based STNO neuron, enabling both individual and collective programming through fieldline-driven write channels. Multiple configurable resistance states are achieved by tuning internal and external magnetic fields together with bias currents, allowing quantized positive and negative synaptic weights for configurable kernel and pooling operations. The proposed architecture is evaluated through simulation on MNIST, SVHN, CIFAR-10, Google Speech Commands (GSC) and RadioML datasets, achieving accuracy of 99.76%, 87.93%, 78.14%, 87.96% and 56.46% respectively. Based on fabricated device dimensions, the complete architecture occupies ~6171.2 {\mu}m2 with an average energy consumption of 200.08 pJ per training and inference cycle for MNIST, highlighting its potential for scalable low-power neuromorphic computing


[50] 2605.29948

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Unified speech foundation models require a holistic tokenization space that is both learnable by language models and decodable into high-quality waveforms. Existing speech tokenizers, however, often fail to satisfy these requirements simultaneously, leading to increased architectural complexity and more involved training designs. We propose HoliTok, a continuous Holistic speech Tokenization model designed for unified generation-understanding modeling. HoliTok encodes 48~kHz speech into a compact 25~Hz sequence of 128-dimensional latents. It is trained with a progressive strategy that jointly preserves signal-level fidelity, incorporates semantic information, and maintains strong latent learnability. Based on this tokenization, we build a unified AR+DiT model for speech synthesis and recognition, where the same latent sequence supports both generation-specific and unified generation-understanding tasks. Experiments show that HoliTok achieves competitive reconstruction fidelity, improves generative learnability for high-quality and controllable synthesis, and, among the evaluated representations, is the only one that operates robustly in our unified generation-understanding architecture without additional optimization tricks. These results suggest that HoliTok serves as an effective speech tokenizer and a foundational representation interface for unified spoken language modeling. The code is available at: this https URL.


[51] 2605.29975

A Fully Convolutional Approach to Denoising Structural Dynamics Data from X-Ray Photon Correlation Spectroscopy

We present a fully convolutional denoising autoencoder (FC-DAE) for denoising two-time intensity-intensity correlation functions ($C_2$) in X-ray photon correlation spectroscopy (XPCS). Unlike conventional denoising autoencoders that are typically restricted to fixed input sizes, the FC-DAE accepts inputs of arbitrary dimensions while preserving correlation structures across diverse dynamical regimes. The model is trained using experimentally derived $C_2$ data collected at NSLS-II beamlines, with data augmentation applied to expand the diversity of the dataset and reduce overfitting. The FC-DAE successfully recovers intricate dynamical features in low signal-to-noise conditions while maintaining structural fidelity. To assess reconstruction reliability, we employ quantitative metrics to evaluate structural fidelity and identify potential model-induced bias. Our results demonstrate that the FC-DAE provides robust denoising performance with high computational efficiency, enabling recovery of XPCS dynamics under photon-limited and low-dose measurement conditions.


[52] 2605.29995

Low-Overhead Receiver Design for Data-Dependent Superimposed Training via Deep Learning

Superimposed pilot (SIP) transmission improves spectral efficiency by eliminating the dedicated pilot overhead required in orthogonal pilot (OP)-based schemes. However, SIP suffers from severe pilot-data coupling, which leads to a critical performance-complexity bottleneck at the receiver. To address this issue, this paper proposes a low-overhead transmission framework that revitalizes data-dependent superimposed training (DDST) with enhanced interference mitigation strategies. First, for quasi-static block-fading channels, an enhanced DDST receiver is developed to achieve non-iterative pilot-data decoupling by exploiting data-dependent algebraic structures. Second, to overcome the sensitivity of conventional DDST to channel variations and symbol misidentification in fast time-varying environments, a mix transmission scheme is developed. By strategically applying DDST to a subset of resource elements, the proposed scheme combines the interference-free transmission property of OP with the zero-pilot-overhead advantage of SIP, thereby improving demapping reliability and interference suppression. Furthermore, under the proposed mix scheme, a Vision Transformer-based neural receiver is designed to capture the orthogonal structure between pilots and perturbation-bearing data, as well as the underlying channel correlations, thereby relaxing the stringent quasi-static assumption required for interference disentanglement. Simulation results demonstrate that the proposed framework achieves significant performance gains in the low-to-medium SNR regime under time-varying channels while providing superior computational efficiency compared with state-of-the-art SIP receivers.


[53] 2605.29996

A Lumped RC Equivalent Circuit Model of Head Tissues in sub-MHz Frequency Regimes

Accurate modeling of electric potential and current distribution in head tissues is crucial for the design and evaluation of neuro-sensing and neuro-stimulation systems operating in the sub megahertz frequency range. Numerical methods are widely employed in electromagnetic simulations, however their computational cost can limit their applicability to rapid prototyping, real-time simulations, and circuit-level integration. In this work, we introduce a lumped RC equivalent circuit model that reproduces the electrical behavior of a canonical three-layer spherical head geometry over a frequency range up to 50 kHz. The model accounts for frequency-dependent tissue conductivity and permittivity to capture dispersive effects, employing complex conductivity in the electro-quasi-static (EQS) regime. The circuit topology uses a minimal set of impedance elements in order to represent the essential mechanisms of electric signal propagation. Validation was performed using a dipolar brain source configuration for scalp voltage peak estimation, showing close agreement with semi-analytical solutions across different skull thicknesses and dipole eccentricities. In addition, the impact of tissue dispersion and displacement current inclusion on the model accuracy was quantitatively assessed, highlighting their contribution to the overall fidelity of the proposed approach.


[54] 2605.30095

The generalized method of moments is (almost) statistically efficient in low-SNR Gaussian latent-variable models

We study estimation in the low signal-to-noise ratio (SNR) regime for a broad class of Gaussian latent-variable models, including Gaussian mixtures and orbit recovery problems. We show that, in this regime, the generalized method-of-moments (GMoM) matches the first-order asymptotic efficiency of maximum likelihood. In particular, if the moment features are chosen up to the minimal local order required for identification and are weighted optimally, then the resulting GMoM estimator has the same leading asymptotic covariance as the maximum-likelihood estimator. Our analysis shows that, in low SNR, this equivalence is governed by a layered local geometry: different directions become informative at different moment orders, partitioning the space into layers with distinct SNR scalings. We prove that the observed Fisher information and the GMoM information operator admit matching layerwise expansions across these layers. As a consequence, in the low-SNR regime, GMoM provides a statistically efficient alternative to maximum likelihood, while preserving the computational advantages of moment-based estimation.


[55] 2605.30127

REACT: A Conditioning Framework for User-Adaptive sEMG Hand Pose Estimation

Surface electromyography (sEMG) enables continuous hand pose estimation on wearable devices, but models trained on multi-user corpora degrade on unseen individuals due to inter-user variability in anatomy and electrode placement. We propose REACT, a lightweight conditioning framework that personalizes a frozen pretrained EMG-to-pose backbone at inference time using only a handful of calibration recordings. REACT learns a compact user embedding from calibration data and applies Feature-wise Linear Modulation (FiLM) to adapt the shared encoder's feature space, requiring no gradient updates at deployment. On the large-scale EMG2POSE benchmark, REACT improves over the state-of-the-art baseline across all three generalization splits in both regression and tracking modes, reducing angular error by up to 3.9% with minimal parameter overhead and under 45 seconds of per-user calibration.


[56] 2605.30172

A Lumped-Element Electrical Model of the Human Head for Brain-Oriented Applications

In this work, we present a compact surrogate circuit for electro-quasi-static (EQS) head modeling. A three-shell geometry (brain, skull, scalp) is considered, and each layer is modeled through radial and tangential pathways, implemented as RC branches. Frequency-dependent tissue conductivity and permittivity are mapped into dispersive resistive and capacitive elements. The model is validated against a semi-analytical spherical-harmonics reference solution over multiple geometrical configurations and operating frequencies, demonstrating good agreement. Neglecting dispersion and capacitive pathways can lead to an overestimation of scalp potentials over the considered frequency range, highlighting the need for dispersive RC circuit modeling.


[57] 2605.30269

Boosting Image Quality Assessment Performance: Unsupervised Score Fusion by Deep Maximum a Posteriori Estimation

Over the past decades, numerous Image Quality Assessment (IQA) models have emerged, aiming to predict the perceptual quality of images. However, individual models are often biased toward certain types of image content or distortions, depending on the design principle and process. An intuitive idea is to harness the strengths and mitigate the weaknesses of each IQA model, by fusing the scores of multiple models into a stronger one. Here we make one of the first attempts to seek an optimal solution for the idea and propose a general framework for unsupervised IQA score fusion using deep Maximum a Posteriori (MAP) estimation. The proposed model conducts fine-grained uncertainty estimation at the score level to increase the accuracy and reduce the uncertainty in fused predictions. Comprehensive experiments demonstrate the superiority of the proposed model over individual IQA models and other fusion methods. It also exhibits an interesting capability of rejecting ``bad" models in the fusion process.


[58] 2605.30339

Benchmarking Single-Factor Physical Video-to-Audio Generation

Generative video-to-audio (V2A) models produce highly plausible soundtracks, but it remains unclear whether they capture the underlying physical processes. Existing evaluations emphasize perceptual realism and overlook physical correctness under controlled interventions. In this paper, we introduce FlatSounds, a benchmark that audits the physical reasoning of V2A models through: 1) controlled counterfactual pairs in which a single physical factor is varied, and 2) single-video pattern tests that probe internal consistency and directional trends. These settings test whether the generated audio correctly reflects specific physical properties and timings. Our evaluation of state-of-the-art models reveals a consistent trade-off: models rely more on text captions than the visual stream to infer physics and semantics. Captions generally improve physical and semantic accuracy, but paradoxically degrade temporal alignment. Our results highlight the need to move beyond audio quality toward learning physical processes directly from pixels. Finally, we find that our physics-based metrics correlate strongly with human preference tests on our own data. Project webpage: this https URL


[59] 2503.24085

Unraveling tensor structures in correct-by-design controller synthesis

Formal safety guarantees on the synthesis of controllers for stochastic systems can be obtained using correct-by-design approaches. These approaches often use abstractions as finite-state Markov Decision Processes. As the state space of these MDPs grows, the curse of dimensionality makes the computational and memory cost of the probabilistic guarantees, quantified with dynamic programming, scale exponentially. In this work, we leverage decoupled dynamics and unravel, via dynamic programming operations, a tree structure in the Canonical Polyadic Decomposition (CPD) of the value functions. For discrete-time stochastic systems with syntactically co-safe linear temporal logic (scLTL) specifications, we provide provable probabilistic safety guarantees and significantly alleviate the computational burden. We provide an initial validation of the theoretical results on several typical case studies and showcase that the uncovered tree structure enables efficient reductions in the computational burden.


[60] 2506.00706

Björck Sequences: Extension to Arbitrary Lengths, Correlation Analysis, and Applications to Wireless Systems

In this paper, we propose a sequence construction framework that extends prime-length Björck sequences, a class of Constant Amplitude Zero Autocorrelation (CAZAC) sequences, to arbitrary lengths using Goldbach's conjecture for even and odd integers. The framework is generic and applies to any CAZAC family defined for prime lengths and supports extensions to both cyclically shifted sequences and sequences with different root indices. We analytically characterize the resulting correlation behavior and show that the construction preserves orthogonality among cyclic shifts while maintaining favorable zero-lag cross-correlation across different root-index sequences. We further investigate Björck sequences as candidates for reference signals in next-generation wireless systems. Using the proposed framework, we extend Björck sequences to arbitrary lengths and evaluate their time- and frequency-offset estimation performance in terrestrial (TNs) and non-terrestrial networks (NTNs). Results show performance comparable to Zadoff--Chu (ZC) sequences in low-Doppler TN environments and improved robustness in high-Doppler NTN scenarios due to superior ambiguity-function properties. We also identify an inherent Doppler-dependent behavior that can cause sequence misidentification under large Doppler shifts. To address this, we propose two mitigation strategies: (i) leveraging coarse Doppler estimates prior to detection, and (ii) selecting appropriately spaced subsets of orthogonal sequences. Ambiguity function-based analysis demonstrates the effectiveness of these approaches in improving estimation reliability. Overall, this work enables practical arbitrary-length CAZAC sequence design and establishes Björck sequences as a strong alternative for reference signal design in high-Doppler environments.


[61] 2506.08028

Sensor Fusion for Track Geometry Monitoring: Integrating On-Board Condition Monitoring and Degradation Models via Kalman Filtering

Track geometry monitoring is essential for maintaining the safety and efficiency of railway operations. While Track Recording Cars (TRCs) provide accurate measurements of track geometry indicators, their limited availability and high operational costs restrict frequent monitoring across large rail networks. Recent advancements in on-board sensor systems installed on in-service trains offer a cost-effective alternative by enabling high-frequency, albeit less accurate, data collection. This study proposes a method to enhance the reliability of track geometry predictions by integrating low-accuracy sensor vibration signals with degradation models through a Kalman filter framework. An experimental campaign using a low-cost sensor system mounted on a TRC evaluates the proposed approach. The results demonstrate that incorporating frequent sensor data significantly reduces prediction uncertainty, even when the data is noisy. The study also investigates how the frequency of data recording influences the size of the credible prediction interval, providing guidance on the optimal deployment of on-board sensors for effective track monitoring and maintenance planning.


[62] 2508.12001

FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis

Current non-autoregressive (NAR) text-to-speech (TTS) systems still struggle to model diverse and speaker-dependent duration variation. We further observe that richer duration variation can increase the synthesis difficulty of existing HiFi-GAN-based vocoders, leading to spectral artifacts and unstable time-frequency structures. To address these issues, we propose FNH-TTS, a VITS-based end-to-end TTS system with Mixture-of-Experts duration modeling and robust vocoder-side synthesis. Specifically, we introduce a Mixture-of-Experts Duration Predictor (MoE-DP) to capture diverse phoneme duration patterns and speaker-dependent speaking-rate characteristics. To convert richer duration variation into stable waveform generation, we further integrate a VOCOS-style vocoder with Collaborative Multi-Band and Sub-Band Discriminators. Experiments on LJSpeech, VCTK, and LibriTTS show that FNH-TTS achieves improved synthesis quality, duration-category accuracy, vocoder reconstruction quality, and inference efficiency. Further analysis shows that MoE-DP is the main source of improved duration modeling, while stronger vocoder-side components are necessary for robust synthesis under richer duration variation.


[63] 2508.15151

Zero-shot CT Super-Resolution using Diffusion-based 2D Projection Priors and Signed 3D Gaussians

Computed tomography (CT) is important in clinical diagnosis, but acquiring high-resolution (HR) CT is constrained by radiation exposure risks. While deep learning-based super-resolution (SR) methods have shown promise for reconstructing HR CT from low-resolution (LR) inputs, supervised approaches require paired datasets that are often unavailable. Zero-shot methods address this limitation by operating on single LR inputs; however, they frequently fail to recover fine structural details due to limited LR information within individual volumes. To overcome these limitations, we propose a novel zero-shot 3D CT SR framework that integrates diffusion-based upsampled 2D projection priors into the 3D reconstruction process. Specifically, our framework consists of two stages: (1) LR CT projection SR, training a diffusion model on abundant X-ray data to upsample LR projections, thereby enhancing the scarce information inherent in the LR inputs. (2) 3D CT volume reconstruction, using 3D Gaussian splatting with our novel Negative Alpha Blending (NAB-GS), which models positive and negative Gaussian densities to learn signed residuals between diffusion-generated HR and upsampled LR projections. Our framework demonstrates superior quantitative and qualitative performance on two public datasets, and expert evaluations present the framework's clinical potential at 4x.


[64] 2509.00608

Realization of Precise Perforating Using Dynamic Threshold and Physical Plausibility Algorithm for Self-Locating Perforating in Oil and Gas Wells

Accurate depth measurement is critical for targeting designated perforation intervals to maximize hydrocarbon recovery. While next-generation automated wireless perforating techniques reduce reliance on costly surface infrastructure and personnel, they lack the continuous depth correlation provided by conventional wireline cables. Consequently, correlating real-time casing collar locator (CCL) signals with a pre-recorded casing tally is essential for automatic depth determination. However, implementing this measurement remains challenging: downhole instruments must process CCL signals in real-time to identify collar signatures from complex interference, a task severely restricted by the limited computational resources and power budget of high-temperature downhole electronics. To address these constraints, this work proposes the Dynamic Threshold and Physical Plausibility Depth Measurement and Perforation Control (DTPPMP) system. This integrated solution enables in situ depth calibration by correlating CCL signals with the casing tally using lightweight algorithms for dynamic-threshold-based collar recognition and physical plausibility verification. Field tests demonstrate a collar recognition F1-score of 98.6% at a throughput of 1000 Sa/s. Notably, the algorithm requires only 1.5 {\mu}s per sample, confirming its computational efficiency and suitability for deployment on resource-constrained, high-temperature downhole platforms.


[65] 2509.13745

Theoretical Validation of the Latent Optimally Partitioned-$\ell_2/\ell_1$ Penalty with Application to Angular Power Spectrum Estimation

This paper demonstrates that, in both theory and practice, the latent optimally partitioned (LOP)-$\ell_2/\ell_1$ penalty is effective for exploiting block-sparsity without knowledge of the concrete block structure. More precisely, we first present a novel theoretical result showing that the optimized block partition in the LOP-$\ell_2/\ell_1$ penalty satisfies a condition required for accurate recovery of block-sparse signals. Motivated by this result, we present a new application of the LOP-$\ell_2/\ell_1$ penalty to estimation of angular power spectrum, which is block-sparse with unknown block partition, in MIMO communication systems. Numerical simulations show that the proposed use of block-sparsity with the LOP-$\ell_2/\ell_1$ penalty significantly improves the estimation accuracy of the angular power spectrum.


[66] 2509.19318

Scensory: Real-Time Robotic Olfactory Perception for Joint Identification and Source Localization

While robotic perception has advanced rapidly in vision and touch, enabling robots to reason about indoor fungal contamination from weak, diffusion-dominated chemical signals remains an open challenge. We introduce Scensory, a learning-based robotic olfaction framework that simultaneously identifies fungal species and localizes their source from short time series measured by affordable, cross-sensitive VOC sensor arrays. Temporal VOC dynamics encode both chemical and spatial signatures, which we decode through neural networks trained on robot-automated data collection with spatial supervision. Across five fungal species, Scensory achieves up to 89.85% species accuracy and 87.31% source localization accuracy under ambient conditions with 3-7s sensor inputs. These results demonstrate real-time, spatially grounded perception from diffusion-dominated chemical signals, enabling scalable and low-cost source localization for robotic indoor environmental monitoring.


[67] 2510.21378

Optimized Power Control for Multi-User Integrated Sensing and Edge AI

This work investigates an integrated sensing and edge artificial intelligence (ISEA) system, where multiple devices first transmit probing signals for target sensing and then offload locally extracted features to the access point (AP) via analog over-the-air computation (AirComp) for collaborative inference. To characterize the relationship between AirComp error and inference performance, two proxies are established: the \emph{computation-optimal} proxy that minimizes the aggregation distortion, and the \emph{decision-optimal} proxy that maximizes the inter-class separability, respectively. Optimal transceiver designs in terms of closed-form power allocation are derived for both time-division multiplexing (TDM) and frequency-division multiplexing (FDM) settings, revealing threshold-based and dual-decomposition structures, respectively. Experimental results validate the theoretical findings.


[68] 2510.27663

Bayesian model selection and misspecification testing in imaging inverse problems only from noisy and partial measurements

Modern imaging techniques heavily rely on Bayesian statistical models to address difficult image reconstruction and restoration tasks. This paper addresses the objective evaluation of such models in settings where ground truth is unavailable, with a focus on model selection and misspecification diagnosis. Existing unsupervised model evaluation methods are often unsuitable for computational imaging due to their high computational cost and incompatibility with modern image priors defined implicitly via machine learning models. We herein propose a general methodology for unsupervised model selection and misspecification detection in Bayesian imaging sciences, based on a novel combination of Bayesian cross-validation and data fission, a randomized measurement splitting technique. The approach is compatible with any Bayesian imaging sampler, including diffusion and plug-and-play samplers. We demonstrate the methodology through experiments involving various scoring rules and types of model misspecification, where we achieve excellent selection and detection accuracy with a low computational cost.


[69] 2511.16627

TFCDiff: Robust ECG Denoising via Time-Frequency Complementary Diffusion

Ambulatory electrocardiogram (ECG) readings are prone to mixed noise from physical activities, including baseline wander (BW), muscle artifact (MA), and electrode motion artifact (EM). Developing a method to remove such complex noise and reconstruct high-fidelity signals is clinically valuable for diagnostic accuracy. However, denoising of multi-beat ECG segments remains understudied and poses technical challenges. To address this, we propose Time-Frequency Complementary Diffusion (TFCDiff), a novel approach that operates in the Discrete Cosine Transform (DCT) domain and uses the DCT coefficients of noisy signals as conditioning input. To refine waveform details, we incorporate Temporal Feature Enhancement Mechanism (TFEM) to reinforce temporal representations and preserve key physiological information. Comparative experiments on a synthesized dataset demonstrate that TFCDiff achieves state-of-the-art performance across five evaluation metrics. Furthermore, TFCDiff shows superior generalization on the unseen SimEMG Database, outperforming all benchmark models. Notably, TFCDiff processes raw 10-second sequences and maintains robustness under flexible random mixed noise (fRMN), enabling plug-and-play deployment in wearable ECG monitors for high-motion scenarios. Source code is available at this https URL.


[70] 2512.00309

Distributed Integrated Sensing and Edge AI Exploiting Prior Information

This paper investigates a distributed ISEA system under a Bayesian framework, focusing on incorporating task-relevant priors to maximize inference performance. At the sensing level, an RWB estimator with a GM prior is designed. By weighting class-conditional posterior means with responsibilities, RWB effectively denoises features and outperforms ML at low SNR. At the communication level, two theoretical proxies are introduced: the computation-optimal and decision-optimal proxies. Optimal transceiver designs in terms of closed-form power allocation are derived for both TDM and FDM settings, revealing threshold-based and dual-decomposition structures. Results show that the discriminant-aware allocation yields additional inference gains.


[71] 2601.00502

MIMO-AFDM Outperforms MIMO-OFDM in the Face of Hardware Impairments

The impact of both multiplicative and additive hardware impairments (HWIs) on multiple-input multiple-output affine frequency division multiplexing (MIMO-AFDM) systems is investigated. For small-scale MIMO-AFDM systems, a tight bit error rate (BER) upper bound associated with the maximum likelihood (ML) detector is derived. By contrast, for large-scale systems, a closed-form BER approximation associated with the linear minimum mean squared error (LMMSE) detector is presented, including realistic imperfect channel estimation scenarios. Our first key observation is that the full diversity order of a hardware-impaired AFDM system remains unaffected, which is a unique advantage. Furthermore, our analysis shows that 1) the BER results derived accurately predict the simulated ML performance in moderate-to-high signal-to-noise ratios (SNRs), while the theoretical BER curve of the LMMSE detector closely matches that of the Monte-Carlo based one. 2) MIMO-AFDM is more resilient to multiplicative distortions, such as phase noise and carrier frequency offset, compared to its orthogonal frequency division multiplexing (OFDM) counterparts. This is attributed to its inherent chirp signal characteristics; 3) MIMO-AFDM consistently achieves superior BER performance compared to conventional MIMO-OFDM systems under the same additive HWI conditions, as well as different velocity values. The latter is because MIMO-AFDM is also resilient to the additional inter-carrier interference (ICI) imposed by the nonlinear distortions of additive HWIs. In a nutshell, compared to OFDM, AFDM demonstrates stronger ICI resilience and achieves the maximum full diversity attainable gain even under HWIs, thanks to its intrinsic chirp signalling structure as well as to the beneficial spreading effect of the discrete affine Fourier transform.


[72] 2601.14233

Burst Aware Forecasting of User Traffic Demand in LEO Satellite Networks

In Low Earth Orbit (LEO) satellite networks, Beam Hopping (BH) technology enables the efficient utilization of limited radio resources by adapting to varying user demands and link conditions. Effective BH planning requires prior knowledge of upcoming traffic at the time of scheduling, making forecasting an important sub-task. Forecasting becomes particularly critical under heavy load conditions where an unexpected demand burst combined with link degradation may cause buffer overflows and packet loss. To address this challenge, we propose a burst aware forecasting solution. This challenge may arise in a wide range of wireless networks; therefore, the proposed solution is broadly applicable to settings characterized by bursty traffic patterns where accurate demand forecasting is essential. Our approach introduces three key enhancements to a transformer architecture: (i) a distance from the last burst embedding to capture burst proximity, (ii) two additional linear layers in the decoder to forecast both upcoming bursts and their relative impact, and (iii) use of an asymmetric cost function during model training to better capture burst dynamics. Empirical evaluations in an Earth-fixed cell under high-traffic demand scenario demonstrate that the proposed model reduces prediction error by up to 94% at a one-step horizon and maintains the ability to accurately capture bursts even near the end of longer prediction horizons following Mean Square Error (MSE) metric.


[73] 2603.14644

LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol

Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical annotations, and vendor diversity, hindering the development of robust models. We introduce LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to capture clinically relevant appearance variations often overlooked in existing benchmarks. This dataset contains 1824 images from 468 patients (960 benign, 864 malignant), with pathology-confirmed labels, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and includes both high- and low-energy imaging styles, enabling systematic analysis of vendor- and energy-induced domain shifts. To address these variations, we propose a foreground-only pixel-space alignment method (''energy harmonization'') that maps images to a low-energy reference while preserving lesion morphology. We benchmark CNN and transformer models on three clinically relevant tasks: diagnosis (benign vs. malignant), BI-RADS classification, and density estimation. Two-view models consistently outperform single-view models. EfficientNet-B0 achieves an AUC of 93.54% for diagnosis, while Swin-T achieves the best macro-AUC of 89.43% for density prediction. Harmonization improves performance across architectures and produces more localized Grad-CAM responses. Overall, LUMINA provides (1) a vendor-diverse benchmark and (2) a model-agnostic harmonization framework for reliable and deployable mammography AI.


[74] 2604.02813

Multi-Band Patch Antenna Array for Out-of-Band Aided Millimeter Wave Communication

Future wireless communication systems will integrate both sub-6 GHz and millimeter wave (mmWave) frequency bands within multi-antenna architectures to meet the increasing demand for high data rates. In such multi-band systems, reliable information obtained from the sub-6 GHz band can be exploited to support communication at mmWave frequencies. To ensure that both systems experience similar multi-path propagation effects, the sub-6GHz and mmWave antenna arrays have to be colocated and precisely aligned. However, such a configuration may adversely alter the radiation characteristics of the arrays, potentially degrading their performance. In this paper, we investigate the impact of positioning a mmWave antenna structure in front of a sub-6 GHz antenna structure. Through both simulations and measurements, we evaluate how the presence of the mmWave structure affects the radiation pattern of the sub-6 GHz one. The results demonstrate that the influence of the mmWave structure on the sub-6 GHz performance is minor, indicating that co-located configurations are feasible with negligible degradation.


[75] 2604.09136

Frequency Quality Metrics based on Second-Order Derivative and Autocorrelation

This industry-oriented paper originates from the observation that current frequency quality metrics utilized by transmission system operators (TSOs) fail to fully capture the dynamic behavior of the grid frequency. Motivated by this gap, the paper proposes novel frequency quality metrics based on second-order dynamics and stochastic autocorrelation. Using real-world data with 0.1 s and 1 s resolution from the Irish, Great Britain and Nordic systems and running dynamic stochastic simulations, the paper shows that the proposed metrics bring new and counterintuitive insights in terms of how good or poor the frequency quality of power grids is beyond current well-known metrics. In particular, the paper shows that a power system may show good frequency quality using standard metrics and poor frequency quality using the proposed metrics. Overall, the paper contributes to improve the understanding of frequency quality.


[76] 2604.17176

Intent-aligned Autonomous Spacecraft Guidance via Reasoning Models

Future spacecraft operations require autonomy that can interpret high-level mission intent while preserving safety. However, existing trajectory optimization still relies heavily on expert-crafted formulations and does not support intent-conditioned decision-making. This paper proposes an intent-aligned spacecraft guidance framework that links high-level reasoning and safe trajectory optimization through explicit intermediate abstractions, based on behavior sequences and waypoint constraints. A foundation model first predicts an intent-aligned behavior plan, a waypoint generation model then converts it into waypoint constraints, and the safe trajectory is computed via optimization. This decomposition enables scalable supervision without sacrificing safety. Numerical experiments in close-proximity operation scenarios demonstrate that the proposed pipeline achieves over 90\% SCP convergence and yields a $1.5\times$ higher rate of generating trajectories that satisfy the top intent-prioritized performance criteria than heuristic decision-making. These results support the use of intermediate behavior abstraction as a practical interface between foundation-model reasoning and safety-critical onboard spacecraft autonomy.


[77] 2604.23354

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to study an XAI topic: uncovering the unknown organisation in the representations, particularly those a speaker recognition network learns from utterances, for recognising speaker identity. Past studies have employed algorithms (e.g. K-means) to analyse how network representations can be naturally organised into independent clusters in different ways, i.e., to analyse flat clustering phenomena within the space defined by these representations, referred to as the network representation space. In contrast, this work applies two algorithms, Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), to analyse how representations form hierarchical clusters in different ways, i.e., to analyse hierarchical clustering phenomena within the network representation space. To further understand these hierarchical clustering phenomena, we propose a new algorithm termed Hierarchical Cluster-Class Matching (HCCM). HCCM provides a semantic interpretation for the hierarchical clusters produced by SLINK and HDBSCAN by matching them to predefined semantic classes. Through this process, some clusters are interpreted as individual semantic classes (e.g. male), whereas others are interpreted as conjunctions of individual semantic classes (e.g. female and Ireland). In addition, we develop a new metric, the Liebig score, to quantify how well a cluster matches a semantic class, which helps identify the factor that most strongly limits each match.


[78] 2605.00898

A Deep Learning Model for Battery State Prediction towards Intelligent Energy Management

Accurate forecasting of battery health indicators, including remaining capacity and lifetime, is of paramount importance for ensuring the reliability, safety, and operational efficiency of applications such as electric vehicles and large scale energy storage infrastructures. The result of the forecasting can be adopted to build an advanced monitoring mechanism for continuous checking batteries' health status to assist in the efficient real-time management of numerous applications. This research investigates the development and implementation of a Deep Learning (DL) model for the prediction of the future state and performance of industrial electrochemical energy storage systems. To address this challenge, we propose a dedicated computational framework that integrates advanced neural network architectures with large-scale training datasets, enabling precise modeling of batteries degradation dynamics and operational trends. The proposed approach provides a decision support mechanism for the optimal management of batteries facilitating both predictive maintenance and the efficient allocation of energy resources. Our findings highlight the potential of DL-based predictive modeling to significantly contribute to the advancement of sustainable and intelligent energy management systems.


[79] 2605.01395

Quasi-Static Control of Discrete Cosserat Rod

In this paper, we design feedback control laws for soft robots modelled using the Cosserat rod, which is spatially discretised using the Piecewise Constant Strain (PCS) approach. The PCS approach transforms the nonlinear PDEs describing the Cosserat rod to a system of nonlinear ODEs. This simplification results in a model describing soft robots which is similar to the serial rigid-link manipulators. We design feedback control laws for the quasi-static PCS model by using the external wrenches as control input. The control laws are designed based on state-feedback linearisation in strain and task spaces. An extensive set of numerical results demonstrates the performance of the control laws for end-effector trajectory tracking and shape control of soft robots.


[80] 2605.05105

Minimizing the Expected Cost of Synchronization in Lossless Power Networks

The reliable operation of large-scale electric power networks is increasingly challenging, particularly with the integration of stochastic renewable generation. In this work, we address the problem of minimizing network transients by optimally modifying the underlying network. We formulate the problem in terms of graph Laplacian matrices and show that, under certain assumptions, the problem is convex. We derive a linear matrix inequality whose feasibility guarantees the existence and uniqueness of phase cohesive steady-state angles; this condition can be directly incorporated as a convex constraint in the optimization framework and we provide several geometric interpretations of the optimization problem. The proposed method is validated on the IEEE 30-bus test system, where results demonstrate that our approach effectively identifies critical links on the network. Dynamic simulations show a significant reduction in network transients and overall improvements across several performance metrics. We explore the sparsity-optimality trade-off using a reweighted $\ell_1$ heuristic.


[81] 2605.05154

CTseg: A Tool for Brain CT Segmentation, Spatial Normalisation, and Volumetrics

This paper presents and validates CTseg, a freely available software for brain CT segmentation, spatial normalisation, and volumetrics. CTseg builds on the Multi-Brain generative modelling framework, providing a CT-specific pipeline that produces tissue maps, deformation fields, and brain volume estimates in the same format as SPM's unified segmentation, thereby extending SPM's established analysis chain from MRI to CT. CTseg is designed for routine hospital CT scans without requiring preprocessing or resampling in deployment. Although CTseg has been adopted in clinical research spanning, among other things, stroke, dementia, and brain morphometry, a systematic validation against an independent reference standard has been lacking. Using paired MR/CT head scans, we evaluate CTseg across four dimensions: segmentation accuracy against an MRI-derived silver standard; spatial normalisation consistency through group-average sharpness and voxelwise coefficient of variation; brain volume agreement via intraclass correlation and Bland-Altman analysis; and downstream sex classification performance from normalised tissue maps. As a baseline, we apply SPM's MRI-based unified segmentation directly to the CT images. CTseg significantly outperformed this baseline for segmentation and normalisation, showed stronger TBV agreement, and achieved comparable TIV agreement. CTseg is freely available at this https URL, and all experiment code is included in the repository for full reproducibility.


[82] 2605.26255

Prospective evaluation of multimodal respiratory failure prediction: Do chest X-rays improve performance beyond EHR signals?

Early prediction of respiratory failure is critical for timely clinical intervention in intensive care units. Existing electronic health record (EHR)-based models can continuously monitor physiologic deterioration, but they may not fully capture pulmonary pathophysiology reflected in chest radiographs (CXRs). In this study, we ask whether CXR information improves prospective prediction of invasive mechanical ventilation beyond EHR signals alone. We develop a gated multimodal framework that integrates structured EHR time-series data with CXR foundation-model representations. The gating module adaptively controls the contribution of imaging features based on patient-specific clinical context, allowing the model to selectively rely on imaging information when it is informative. We prospectively evaluate the framework for predicting invasive mechanical ventilation within 24 hours in ICU patients and compare it with an established EHR-only model (Ventio), physician predictions obtained at matched clinical time points, and alternative multimodal variants. The gated multimodal models achieved higher discrimination than the EHR-only baseline, with AUROC values of 0.860 and 0.858 using REMEDIS and MedInsight CXR representations, respectively, compared with 0.752 for Ventio. Relative to physician predictions, the multimodal framework substantially improved sensitivity while maintaining favorable specificity. Compared with the EHR-only model, multimodal integration increased specificity and positive predictive value, suggesting that CXR information can refine risk estimation in selected patients. These findings support adaptive multimodal fusion as a practical strategy for incorporating imaging into prospective respiratory failure prediction.


[83] 2605.28569

Actor-Identifier-Critic Reinforcement Learning for Adaptive Model-Free Optimal Control of Nonlinear Systems with Stochastic Packet Dropouts

Packet dropouts in control systems poses a critical challenge, as it can significantly compromise system performance and stability. In these conditions, classical controllers often struggle to deliver effective control, as they rely on accurate system models, which may not always be available. This paper proposes a novel Actor-Identifier-Critic~(AIC) controller to address model-free tracking control of nonlinear systems in the presence of packet dropouts in both the controller-to-actuator and sensor-to-controller channels. Using an identifier to learn the system dynamics, the proposed controller is able to handle packet dropouts in the communication link and facilitate gradient propagation from the critic to the actor within a model-free control framework. The performance of the proposed method is demonstrated on two nonlinear SIMO and MIMO systems and a case study on power system stability subject to stochastic packet dropouts.


[84] 2306.10356

MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV Generation Forecasting

Accurate forecasting of renewable generation is crucial to facilitate the integration of Renewable Energy Sources into the power system. Focusing on photovoltaic (PV) units, forecasting methods can be divided into two main categories: physics-based and data-based strategies, with Artificial Intelligence (AI)-based models providing state-of-the-art performance. However, while these AI-based models can capture complex patterns and relationships in the data, they ignore the underlying physical prior knowledge of the phenomenon. Therefore, in this paper, we propose MATNet, a novel transformer-based multimodal architecture for multi-step day-ahead PV power generation forecasting. The model is fed with historical PV data and historical and forecast weather data through a multi-level joint fusion approach, employing a soft-attention mechanism at multiple fusion stages. We evaluate the effectiveness of MATNet on the Ausgrid benchmark dataset, where it significantly outperforms various baseline models, achieving an RMSE of 0.0445, corresponding to a relative improvement of approximately 65% compared to the best-performing baseline method. The analysis is further enriched by a comprehensive set of ablation studies, a sensitivity analysis on missing data, which highlights MATNet's resilience to input degradation, a cross-site zero-shot generalization evaluation on five external PV datasets, demonstrating MATNet's robustness under significant domain shifts, and an assessment of the model's computational complexity, confirming its favorable balance between predictive accuracy and computational efficiency. These results highlight MATNet's potential as a reliable and efficient solution to facilitate the integration of PV energy into the power grid. The code is available at this https URL.


[85] 2401.08197

Matrix Completion with Hypergraphs:Sharp Thresholds and Efficient Algorithms

This paper considers the problem of completing a rating matrix based on sub-sampled matrix entries as well as observed social graphs and hypergraphs. We show that there exists a \emph{sharp threshold} on the sample probability for the task of exactly completing the rating matrix -- the task is achievable when the sample probability is above the threshold, and is impossible otherwise -- demonstrating a phase transition phenomenon. The threshold can be expressed as a function of the ``quality'' of hypergraphs, enabling us to \emph{quantify} the amount of reduction in sample probability due to the exploitation of hypergraphs. This also highlights the usefulness of hypergraphs in the matrix completion problem. En route to discovering the sharp threshold, we develop a computationally efficient matrix completion algorithm that effectively exploits the observed graphs and hypergraphs. Theoretical analyses show that our algorithm succeeds with high probability as long as the sample probability exceeds the aforementioned threshold, and this theoretical result is further validated by synthetic experiments. Moreover, our experiments on a real social network dataset (with both graphs and hypergraphs) show that our algorithm outperforms other state-of-the-art matrix completion algorithms.


[86] 2502.20838

Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data

Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.88--0.91 on recordings of 300--1800s, where fully supervised CNN baselines degrade to 0.19--0.64. It also provides temporal localization that these baselines cannot produce without frame-level annotation. Code: this https URL


[87] 2504.12512

Practical Insights on Grasp Strategies for Mobile Manipulation in the Wild

Mobile manipulation robots are continuously advancing, with their grasping capabilities rapidly progressing. However, there are still significant gaps preventing state-of-the-art mobile manipulators from widespread real-world deployments, including their ability to reliably grasp items in unstructured environments. To help bridge this gap, we developed SHOPPER, a mobile manipulation robot platform designed to push the boundaries of reliable and generalizable grasp strategies. We develop these grasp strategies and deploy them in a real-world grocery store -- an exceptionally challenging setting chosen for its vast diversity of manipulable items, fixtures, and layouts. In this work, we present our detailed approach to designing general grasp strategies towards picking any item in a real grocery store. Additionally, we provide an in-depth analysis of our latest real-world field test, discussing key findings related to fundamental failure modes over hundreds of distinct pick attempts. Through our detailed analysis, we aim to offer valuable practical insights and identify key grasping challenges, which can guide the robotics community towards pressing open problems in the field.


[88] 2505.10975

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio

Monaural multi-speaker automatic speech recognition (ASR) remains challenging due to data scarcity and the intrinsic difficulty of recognizing and attributing words to individual speakers, particularly in overlapping speech. Recent advances have driven the shift from cascade systems to end-to-end (E2E) architectures, which reduce error propagation and better exploit the synergy between speech content and speaker identity. Despite rapid progress in E2E multi-speaker ASR, the field lacks a comprehensive review of recent developments. This survey provides a systematic taxonomy of E2E neural approaches for multi-speaker ASR, highlighting recent advances and comparative analysis. Specifically, we analyze: (1) architectural paradigms (SIMO vs.~SISO) for pre-segmented audio, analyzing their distinct characteristics and trade-offs; (2) recent architectural and algorithmic improvements based on these two paradigms; (3) extensions to long-form speech, including segmentation strategy and speaker-consistent hypothesis stitching. Further, we (4) evaluate and compare methods across standard benchmarks. We conclude with a discussion of open challenges and future research directions towards building robust and scalable multi-speaker ASR.


[89] 2507.23270

Simulation-based planning of Motion Sequences for Automated Procedure Optimization in Multi-Robot Assembly Cells

Reconfigurable multi-robot cells offer a promising approach to meet fluctuating assembly demands. However, the recurrent planning of their configurations introduces new challenges, particularly in generating optimized, coordinated multi-robot motion sequences that minimize the assembly duration. This work presents a simulation-based method for generating such optimized sequences. The approach separates assembly steps into task-related core operations and connecting traverse operations. While core operations are constrained and predetermined, traverse operations offer substantial optimization potential. Scheduling the core operations is formulated as an optimization problem, requiring feasible traverse operations to be integrated using a decomposition-based motion planning strategy. Several solution techniques are explored, including a sampling heuristic, tree-based search and gradient-free optimization. For motion planning, a decomposition method is proposed that identifies specific areas in the schedule, which can be solved independently with modified centralized path planning algorithms. The proposed method generates efficient and collision-free multi-robot assembly procedures that outperform a baseline relying on decentralized, robot-individual motion planning. Its effectiveness is demonstrated through simulation experiments.


[90] 2508.12176

Scalable RF Simulation in Generative 4D Worlds

Radio Frequency (RF) sensing has emerged as a powerful, privacy-preserving alternative to vision-based methods for various perception tasks. However, building high-quality RF datasets in dynamic and diverse environments remains a major challenge. To address this, we introduce WaveVerse, a prompt-based, scalable framework that simulates realistic RF signals from generated indoor scenes with human motions guided by spatial paths, enabling diverse and feasible behaviors without manual trajectory design. WaveVerse features a language-guided 4D world generator and a physics-based signal simulator that enables realistic simulation of RF signals in diverse environments. It employs a phase-coherent ray tracer that preserves both spatial and temporal phase consistency. The simulated signals show high fidelity on phase-sensitive benchmarks, and closely align with both real-world collected measurements and simulations from a proprietary electromagnetic solver. When used for data augmentation, WaveVerse consistently improves performance in downstream tasks like RF imaging and human activity recognition, with gains that grow with the amount of simulated data and surpass existing methods. Code and additional materials are available on the webpage.


[91] 2509.15629

An Extensive Analysis of the Singing Voice Conversion Challenge 2025 Evaluation Results

We present a thorough analysis of the findings of the latest iteration of the Singing Voice Conversion Challenge, a scientific event aiming to compare and understand different voice conversion systems in a controlled environment. Compared to previous iterations which solely focused on converting the singer identity, this year we also focused on converting the singing style of the singer. To create a controlled environment and thorough evaluations, we developed a new challenge database, introduced two tasks, open-sourced baselines, and conducted large-scale crowd-sourced listening tests and objective evaluations. The challenge was run for two months and in total we evaluated 33 different systems. The results of the large-scale crowd-sourced listening test showed that top systems had comparable singer identity scores to ground truth samples. However, modeling the singing style and consequently achieving high naturalness still remains a challenge in this task, primarily due to the difficulty in modeling dynamic information in breathy, glissando, and vibrato singing styles. Further analyses of the challenge also discuss the limitations of both the traditional similarity test and the dynamic preference test in evaluating singing style similarity. Moreover, calculating Spearman's rank correlation coefficient shows that dependent objective metrics such as chroma-alignment and non-match metrics such as speaker embeddings are the most correlated to subjective scores, but are still not at a level where it could be considered as a true replacement for subjective scores.


[92] 2510.15340

Singularity-free dynamical invariants-based quantum control

State preparation is a cornerstone of quantum technologies, underpinning applications in computation, communication, and sensing. Its importance becomes even more pronounced in non-Markovian open quantum systems, where environmental memory and model uncertainties pose significant challenges to achieving high-fidelity control. Invariant-based inverse engineering provides a principled framework for synthesizing analytic control fields, yet existing parameterizations often lead to experimentally infeasible, singular pulses and are limited to simplified noise models such as those of Lindblad form. Here, we introduce a generalized invariant-based protocol for finite-dimensional state preparation under arbitrary noise conditions. We transform the finite-dimensional control problem into the equivalent problem for a single-qubit, by restricting the dynamics to a designed SU(2) subspace. The control protocol then proceeds in two-stages: first, we construct a family of bounded pulses that achieve perfect state preparation in a closed system; second, we identify the optimal member of this family that minimizes the effect of noise. The framework accommodates both (i) characterized noise, enabling noise-aware control synthesis, and (ii) uncharacterized noise, where a noise-agnostic variant preserves robustness without requiring a master-equation description. Numerical simulations demonstrate high-fidelity state preparation across diverse targets while producing smooth, hardware-feasible control fields. This singularity-free framework extends invariant-based control to realistic open-system regimes, providing a versatile route toward robust quantum state engineering on NISQ hardware and other platforms exhibiting non-Markovian dynamics.


[93] 2601.10912

Graph Neural Network Reveals the Cortical Morphology of Local Brain Aging in Normal Cognition and Alzheimer's Disease

Estimating brain age (BA) from T1-weighted magnetic resonance images (MRIs) provides a powerful framework for quantifying anatomical brain aging. Whereas global BA (GBA) summarizes overall brain health, local BA (LBA) provides cortically specific patterns of aging at the subject level. Although previous studies have examined anatomical contributors to GBA, to our knowledge, no framework has been established to estimate LBA using cortical morphology. To address this gap, we introduce a graph neural network (GNN) that uses morphometric features$\unicode{x2013}$cortical thickness, surface area, curvature, gray/white matter intensity ratio (GWR), sulcal depth$\unicode{x2013}$to estimate LBA across the cortical surface at high spatial resolution (mean inter-vertex distance = 1.37 mm). Trained on cortical surface meshes extracted from the MRIs of cognitively normal (CN) adults (N = 14,423), our model achieves lower mean absolute error (MAE) than the existing state-of-the-art while identifying more biologically plausible patterns of aging in Alzheimer's disease (AD) on the ADNI dataset. Association cortices emerge as primary sites of morphometric aging in CNs, whereas mild cognitive impairment is characterized by widespread aging that is pronounced in the parahippocampal gyrus. AD subjects demonstrate significant aging across the entire cortex, particularly within medial temporal regions and associated cortical networks. Feature ablation highlights curvature and GWR as preferentially sensitive to AD pathology. Regional LBA gaps are significantly associated with neuropsychological measures of AD-related cognitive impairment, linking cortical aging patterns to clinical outcomes. These results demonstrate that GNN-based modeling of cortical morphometry enables biologically interpretable mapping of local brain aging with greater interpretability than prior work.


[94] 2601.12699

Bandit Algorithms for Deep Brain Stimulation

Deep Brain Stimulation (DBS) is an effective treatment for Parkinson's disease, but conventional fixed-parameter stimulation can reduce battery life and cause side effects while failing to adapt to changing neural dynamics. Recent reinforcement learning approaches improve adaptability, yet most rely on deep neural networks that require offline training and are computationally too expensive for implantable hardware. This paper presents a resource-conscious adaptive DBS framework based on a Time- and Threshold-Triggered Pruned Multi-Armed Bandit (T3P MAB) algorithm. The proposed method jointly tunes stimulation frequency and amplitude, avoids prior training, and remains transparent enough to support clinician-guided adjustment. Using a computational basal ganglia-thalamic model, we show that T3P converges faster than competing MAB methods and outperforms deep-RL baselines in suppressing pathological beta-band activity while reducing stimulation power. We implemented it on different microcontrollers and report detailed energy measurements, showing convergence in under two minutes and suitability for resource-constrained implantable systems. These results support lightweight bandit-based control as a practical path toward personalized, energy-efficient DBS.


[95] 2602.00324

Dual Quaternion SE(3) Synchronization with Recovery Guarantees

Synchronization over the special Euclidean group SE(3) aims to recover absolute poses from noisy pairwise relative transformations and is a core primitive in robotics and 3D vision. Standard approaches often require multi-step heuristic procedures to recover valid poses, which are difficult to analyze and typically lack theoretical guarantees. This paper adopts a dual quaternion representation and formulates SE(3) synchronization directly over the unit dual quaternion. A two-stage algorithm is developed: A spectral initializer computed via the power method on a Hermitian dual quaternion measurement matrix, followed by a dual quaternion generalized power method (DQGPM) that enforces feasibility through per-iteration projection. The estimation error bounds are established for spectral estimators, and DQGPM is shown to admit a finite-iteration error bound and achieves linear error contraction up to an explicit noise-dependent threshold. Experiments on synthetic benchmarks and real-world multi-scan point-set registration demonstrate that the proposed pipeline improves both accuracy and efficiency over representative matrix-based methods.


[96] 2602.12304

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

Existing mainstream video customization methods focus on generating identity-consistent videos based on given reference images and textual prompts. Benefiting from the rapid advancement of joint audio-video generation, this paper proposes a more compelling new task: sync audio-video customization, which aims to synchronously customize both video identity and audio timbre. Specifically, given a reference image $I^{r}$ and a reference audio $A^{r}$, this novel task requires generating videos that maintain the identity of the reference image while imitating the timbre of the reference audio, with spoken content freely specifiable through user-provided textual prompts. To this end, we propose OmniCustom, a powerful DiT-based audio-video customization framework that can synthesize a video following reference image identity, audio timbre, and text prompts all at once in a zero-shot manner. Our framework is built on three key contributions. First, identity and audio timbre control are achieved through separate reference identity and audio LoRA modules that operate through self-attention layers within the base audio-video generation model. Second, we introduce a contrastive learning objective alongside the standard flow matching objective. It uses predicted flows conditioned on reference inputs as positive examples and those without reference conditions as negative examples, thereby enhancing the model ability to preserve identity and timbre. Third, we train OmniCustom on our constructed large-scale, high-quality audio-visual human dataset. Extensive experiments demonstrate that OmniCustom outperforms existing methods in generating audio-video content with consistent identity and timbre fidelity. Project page: this https URL.


[97] 2603.00357

SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs

In large-scale LLM pre-training systems with 100k+ GPUs, failures become the norm rather than the exception, and restart costs can dominate wall-clock training time. However, existing fault-tolerance mechanisms are largely unprepared for this restart-dominant regime. To address this challenge, we propose SPARe - Stacked Parallelism with Adaptive Reordering - a fault-tolerance framework that masks node failures during gradient synchronization by stacking redundant data shards across parallelism groups and adaptively reordering execution. SPARe achieves availability comparable to traditional replication while maintaining near-constant computation overhead of only 2~3x, even under high redundancy where traditional replication would require linearly inflating overhead. We derive closed-form expressions for endurable failure count and computation overhead, validate them via SimGrid-based discrete-event simulation, and jointly optimize redundancy and checkpointing to minimize time-to-train. At extreme scale with up to 600k GPUs, SPARe reduces time-to-train by 40~50% compared to traditional replication.


[98] 2603.06951

Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory

Memory disaggregation via CXL enables multi-host resource sharing. However, existing CXL sharing mechanisms enforce coarse-grained, host-level permissions only, leaving isolation to the operating system. Today, virtual memory enables process-level isolation on a host and CXL enables host-level isolation. This creates a critical security gap: the absence of process-level memory isolation in shared disaggregated memory. We present Space-Control, an architectural abstraction that introduces a cross-host identity primitive to enforce confidentiality and integrity. We decouple authorization from the untrusted OS using a hardware-rooted validation engine (SPACE) to establish immutable process identity and a Permission Checker at the memory egress point for fine-grained permission validation. Our design supports 127 concurrent processes across 255 hosts with only 1.56% storage overhead. Cycle-level evaluation using gem5 + SST shows that Space-Control incurs a minimal 3.3% performance penalty with a modest 16 KiB cache, providing a practical and scalable foundation for secure, process-level memory disaggregation.


[99] 2605.20560

Reconfigurable Coupler Antenna for Wireless Networks

The reconfigurable coupler antenna (RCA), also called the flexible coupler antenna (FCA), is a new technique that aims to improve the performance of wireless communication networks by reconfiguring the positions and rotations of low-cost couplers around fixed-position active antennas to harness mutual coupling. Specifically, different couplers can independently adjust their positions and/or rotations at the transceiver to reshape the induced currents on the couplers for radiation, thereby collaboratively achieving mechanical beamforming for directional signal enhancement or nulling. The position and/or rotation reconfiguration of passive couplers provides a new and cost-effective means of enhancing wireless communication performance, while significantly reducing the antenna and radio-frequency (RF) chain costs of conventional active arrays. The compact and low form-factor structure of the RCA makes it particularly appealing for devices with stringent size, weight, and power (SWAP) constraints. In this article, we provide an overview of RCA to reveal its promising capabilities in wireless networks, including its system modeling, practical implementation, and competitive advantages over existing techniques. We present a variety of RCA-enabled performance enhancements in terms of mechanical beamforming gain, path-loss reduction, fading mitigation, spatial multiplexing gain, interference suppression, and geometric gain. Furthermore, we elaborate on the design challenges of RCA as well as promising solutions, and discuss the key applications of RCA in wireless networks. Finally, numerical results are presented to verify the substantial capacity gains enabled by RCA-aided transmission in wireless networks.


[100] 2605.27755

A Vertical Look at UAV Connectivity in the Wild: Cellular vs. Starlink, 3D Characterization, and Performance Prediction

In this paper, we present an open-source measurement platform designed to characterize the performance of commercial cellular (Verizon, a major US provider) and LEO satellite (Starlink) networks through real-world flight tests in rural environments. We implement a comprehensive multi-layer measurement approach spanning physical layer signal metrics, multi-cell network topology, and end-to-end (E2E) application performance. Through an extensive flight campaign with more than $10$ flight tests, $4.5$+ hours of flight time resulting in more than $18$K samples, we present the first detailed, open-source dataset analyzing dual cellular and Starlink performance for low-altitude UAV operations. Our cellular-Starlink comparative results, which are collected \emph{simultaneously at the same time and location}, demonstrate significant performance differences between the two technologies: the LEO satellite link achieves superior latency performance with $95\%$ of Round-Trip Time (RTT) measurements below $50$ ms compared to $80\%$ under $150$ ms for cellular, and exceptional downlink capacity with $95\%$ exceeding $25$ Mbps versus only $5$ Mbps for cellular. Our analysis on cellular network performance demonstrates that while higher altitudes (e.g., $330+$ m above the sea level) improve signal power by $15-20$ dB via line-of-sight (LOS) propagation, it causes a $3-4$ $\times$ increase in handover rates, which is due to excessive multi-cell visibility rather than signal degradation. Furthermore, we observe asymmetric impacts on the RTT performance due to handovers such that $53.5$\% of handovers improve RTT, but worst-case degradation ($275$ ms) is $2$ $\times$ larger than best-case improvement ($137$ ms).