New articles on Electrical Engineering and Systems Science


[1] 2606.28431

A Zero-Shot Deep Image Prior Framework for Denoising and Deconvolution in Fluorescence Microscopy

Fluorescence microscopy images are degraded by noise and diffraction-induced blur, which compromise structural fidelity and limit quantitative analysis. Supervised deep learning methods achieve impressive restoration performance but require large-scale paired datasets that are difficult to obtain in practice. To address this issue, we propose SDIP, a zero-shot deep image prior (DIP) framework that sequentially performs denoising and deconvolution without external training data. An aSeqDIP-based module first suppresses noise while preserving fine structures through sequential autoencoding regularization. In the deconvolution stage, a wavelet-based background correction step is incorporated before the proposed RLG-DIP module performs artifact-reduced deconvolution. RLG-DIP uses the Richardson-Lucy deconvolution result as a physically consistent guidance prior, integrating the imaging model with the implicit prior of DIP to stabilize the ill-posed deconvolution process. Experiments on the BioSR dataset across multiple cellular structures demonstrate that SDIP improves both signal-to-noise ratio and resolution, achieving superior visual quality and improved quantitative performance on most evaluated structures. The proposed framework may also provide useful insights for designing physically guided DIP methods for other inverse problems.


[2] 2606.28448

Measured-Subspace Consistency: A Plug-and-Play Operator for Diffusion Posterior Sampling in Accelerated MRI Reconstruction

Diffusion posterior samplers for accelerated MRI can reconstruct accurately yet still disagree on the acquired k-space across samples, placing posterior variability on coefficients the scanner has already measured. We identify this measured-subspace leakage as a physical-admissibility failure. Under a hard-constraint model it violates the measurement constraint and inflates the reported uncertainty with disagreement about coefficients the scanner has already determined. To quantify this leakage, we introduce complementary measured- and unmeasured-subspace k-space dispersion metrics (MSD/USD). We then present Measured-Subspace Consistency (MSC), a training-free terminal correction that wraps any compatible image-space posterior sampler with a standard multi-coil consistency lock. The ideal lock follows classical range/null-space data consistency. Our contribution is to repurpose it as a black-box posterior audit and correction rather than a new reconstructor or learned sampler. Theoretically, we prove that the ideal transform confines pairwise sample differences to the MRI null space and bound the residual cross-subspace coupling left by practical sensitivity-weighted implementations. Across six base samplers and two MRI anatomies, including out-of-distribution transfer where a knee prior reconstructs brain, MSC substantially reduces measured-subspace dispersion for Soft samplers (a median 16.5x reduction for DPS across five brain contrasts, up to ~29x), while preserving unmeasured-subspace diversity and acting as a near-identity map for Consistent ones. Furthermore, MSC maintains or modestly improves PSNR/SSIM, with no retraining, retuning, or significant computational overhead.


[3] 2606.28453

DeVAR: Low-Dose CT Denoising via Visual Autoregressive Modeling

Computed tomography (CT) plays a crucial role in medical diagnosis, but minimizing radiation exposure while maintaining image quality remains a critical challenge. Low-dose CT (LDCT) protocols reduce radiation risks but inevitably suffer from severe noise and artifacts that compromise diagnostic accuracy. While existing deep learning methods have achieved promising results, there remains a continuous quest for generative paradigms that intrinsically capture global-to-local structural dependencies to better preserve fine anatomical details. To this end, we propose DeVAR, a novel generative framework that applies visual autoregressive modeling (VAR) to LDCT denoising for the first time. Conditioned on global context provided by LDCT prefix tokens, DeVAR progressively generates discrete token maps of the target normal-dose CT (NDCT) via next-scale prediction. Because quantization inherently discards high-frequency information, we introduce a residual refiner to capture subtle anatomical structures beyond the capacity of a discrete codebook. Finally, empowered by a dual-representation hybrid training strategy, our hybrid NDCT decoder seamlessly integrates continuous and discrete latents to reconstruct high-fidelity, detail-preserved images. Extensive experiments on two public datasets demonstrate that DeVAR consistently achieves superior qualitative and quantitative performance compared to state-of-the-art LDCT denoising methods.


[4] 2606.28454

From Focusing to Con-Focusing: Optimal Power Transfer in Line-of-Sight Near-Field MIMO

Beamfocusing is the established near-field strategy for a large array serving a single-antenna user. We consider the single-user line-of-sight MIMO link, free of multipath, in which the user, too, carries an extended aperture, and show that the focusing prescription inverts: beyond a modest Fresnel number, focusing on the user is outperformed by far-field steering. Under fully analog, unit-modulus beamforming, we derive closed-form power gains for focusing (each aperture phase-matched to the other's center) and for steering (a planar phase ramp) in the Fresnel regime, and prove that their comparison is governed by two dimensionless quantities: the link Fresnel number, the product of the two aperture lengths normalized by wavelength and link distance, and the aperture ratio, irrespective of how many elements discretize the apertures. For equal apertures the two gains cross exactly once, at the universal value 1.947; beyond it, focusing loses ten dB per decade of Fresnel number, and the advantage celebrated in the MISO literature survives only as the receive aperture vanishes. We then derive the strategy that is order-optimal at every Fresnel number, con-focusing: both apertures aim at the common point from which they subtend equal angles. It attains the rank-one eigenbound in leading constant, needs no channel knowledge, degenerates to plain steering for equal apertures, and is acquirable within one beam-refinement round with no geometry exchange between the terminals.


[5] 2606.28474

Anatomy-Grounded Synthetic Coronary Angiography for Geometry-Informed Multi-View Matching

Accurate correspondence matching across multiple angiographic views is the prerequisite for 3D coronary reconstruction and interventional guidance. However, the development of robust deep learning models for this task has been stifled by a fundamental data bottleneck. Obtaining ground truth for matching tasks in angiography pairs is prohibitively expensive and hard to scale. To overcome this barrier, we introduce a physically-grounded data generation framework that synthesizes high-fidelity Digital Reconstructed Radiographs (DRRs) from 3D Coronary CT Angiography (CCTA) volumes. Our framework generates dense, highly accurate 3D-to-2D projection labels by simulating realistic C-arm acquisition geometry on patient anatomy at zero human cost. Leveraging this dense supervision, we propose a Geometry-Informed Matching Module (GIMM) that integrates global feature and anatomical structure into correspondence learning. Unlike real angiography where assessment relies on subjective human annotation, our dataset provides 2D correspondence labels with paired images, allowing human-free evaluation. We comprehensively evaluate our method on the proposed CT-derived DRR dataset and demonstrate improvements over other matching baseline models.


[6] 2606.28507

Rapid and robust parameter estimation for electrochemical battery models via BOLT: A batch-optimized local-to-global technique

Accurate and efficient parameter estimation is essential for applying electrochemical battery models in simulation, state estimation, control, and repeated model updating. However, conventional optimization methods, such as particle swarm optimization (PSO) and genetic algorithms (GA), often require many model evaluations and show considerable run-to-run variability, limiting their use in time-sensitive calibration scenarios. This study proposes a Batch-Optimized Local-to-Global Technique (BOLT) for rapid and robust parameter estimation of electrochemical battery models. BOLT combines diversified candidate initialization, batch-parallel trust-region reflective (TRF) local refinement, JIT-accelerated model evaluation, and multi-condition consistency screening within a unified calibration workflow. Comparative experiments based on a grouped single-particle model and measured data from a commercial 18650 NMC lithium-ion cell show that BOLT achieves a favorable trade-off among voltage-response accuracy, computational efficiency, and repeated-run stability. BOLT(32) achieves an average mean absolute error of \(12.4 \pm 0.1\) mV over five operating conditions, requiring only \(20636 \pm 3081\) model calls and \(8.97 \pm 1.20\) s per run. Synthetic-data validation with a known parameter vector in the grouped SPM formulation further shows that BOLT recovers the reference parameter vector under model-consistent conditions and remains robust under 1--3 mV voltage-noise perturbations, with the mean parameter absolute relative error below \(0.6\%\). These results indicate that BOLT provides a practical calibration framework for BMS parameter updating, control-oriented battery digital twins, and second-life battery screening.


[7] 2606.28513

HDDPM: Heteroscedastic Denoising Diffusion Probabilistic Model for Quantitative Low-Count Brain PET Recovery

Positron emission tomography (PET) seeks to balance diagnostic quality with ra-diation dose. Low-count PET noise is non-Gaussian, non-stationary, and spatial-ly dependent. It scales directly with local activity and is shaped by iterative recon-struction and physical corrections. Standard denoising diffusion probabilistic models (DDPMs) ignore these PET properties. Their forward process adds iso-tropic, homoscedastic Gaussian noise to the target. Such an approach fails to cap-ture the realistic physical degradation generated by the imaging system. To ad-dress the above limitations, this study introduces a heteroscedastic residual diffu-sion model (HDDPM) for low-count brain PET recovery in which the forward corruption is itself intensity-aware. We designed a fixed, Poisson-based variance module to generate voxel-wise noise maps. These maps naturally place stronger noise perturbation on low-activity regions than high-activity ones, meanwhile the network predicts the low-to-standard-count residual under explicit dose-fraction conditioning. We evaluated our proposed model (HDDPM) alongside generative frameworks across three different scanners, using both internal and external da-tasets at various simulated dose levels (1% to 50%). HDDPM and isotropic DDPM showed comparable overall image quality, but HDDPM stood out in the lowest-dose (1%) external scans. It is highly reliable and significantly reduces measurement errors in both high- and low-activity regions, compared to the standard model. These results support that heteroscedastic noising with the pro-posed HDDPM is feasible, and it provides a physically motivated inductive bias for quantitative low-count PET recovery by reflecting the activity-dependent noise structure of PET.


[8] 2606.28588

Resilient Control Lyapunov Function-based Quadratic Program for Quadrotors Under Cyberattacks

Ensuring the operational safety of quadrotors under partial actuator failures, lumped external disturbances, and malicious cyberattacks is a critical challenge due to the system's underactuated and highly nonlinear nature. Building on the existing result of a fault-tolerant control approach for a quadrotor experiencing a complete loss of two opposing rotors \cite{chen2024quadrotor}, this letter further addresses the additional challenge of malicious cyberattacks, which could be unknown and unbounded. While the baseline control law, rooted in proportional-derivative (PD) feedback and observer-based decoupling, effectively handles mismatched disturbances, it remains vulnerable to maliciously injected cyberattacks on the pseudo-control channels. To address this, a Resilient Control Lyapunov Function-based Quadratic Program (RCLF-QP) is developed, where a resilient compensational term with real-time online adaptation is designed in the conventional CLF to compensate for the maliciously injected unknown and unbounded attacks. Compared with the PD feedback control, the proposed QP-based constrained optimization control framework provides a systematic and extensible framework that allows new control objectives and constraints to be seamlessly integrated without altering the underlying stability guarantees. The overall proposed controller integrates a model-based extended state observer with the proposed RCLF-QP mechanism to mitigate both lumped disturbances caused by aerodynamics and strong wind, and adversarial cyberattacks injected by malicious adversaries. Simulations in a high-fidelity environment demonstrate that the proposed RCLF-QP control architecture prevents trajectory divergence and system instability in scenarios where the baseline controller fails in maintaining the stability of Quadrotors under malicious attacks.


[9] 2606.28602

Bayesian-Optimized Multi-Source Domain Adaptation for Post-Earthquake Damage Assessment

Efficient and intelligent post-earthquake structural damage assessment is critical for rapid disaster response. Although data-driven approaches have shown promise in this domain, traditional supervised learning relies on large labeled datasets that are impractical to obtain for earthquake-damaged structures. To overcome this limitation, we propose a Bayesian-optimized multisource domain adaptation framework for predicting post-earthquake structural damage on a target building without the need for any damage labels. The framework comprises three key steps. First, it extracts features from multiple source domains and the target domain and feeds them into a classifier and a domain discriminator. The classifier ensures the features remain damage-sensitive, while the discriminator promotes their invariance across domains. Second, the framework assigns a weighing factor to each source domain to balance their contributions during training. Finally, Bayesian optimization is employed to optimize these source domain weights, aiming to maximize prediction accuracy on the target domain. This framework offers a robust solution for structural damage assessment when labeled data are scarce, significantly enhancing post-earthquake damage assessment capabilities.


[10] 2606.28605

Hybrid AI-Physics Framework for Post-Earthquake Structural Damage Diagnosis with Sparse Sensing

Rapid and reliable post-earthquake damage assessment is critical for public safety, re-occupancy decisions, and effective emergency response. This paper presents a physics-informed, unsupervised learning framework that enables structural damage diagnosis in sparsely instrumented buildings following seismic events. The approach fuses real sensor data with physics-based simulations to create a hybrid spatiotemporal input grid, extending observability to regions without sensors. A Spatiotemporal Composite Autoencoder Network (SCAN) processes this hybrid input, learning the structure's undamaged behavior from pre-event or ambient-condition data alone. SCAN integrates convolutional layers for extracting localized spatial features and LSTM layers for modeling temporal dynamics, enabling it to recognize deviations from normal behavior caused by damage. Post-event sensor data are analyzed through the trained model, and anomalies are flagged based on elevated reconstruction and prediction errors. These error patterns are spatially mapped to localize potential damage, even in uninstrumented areas. By embedding low-fidelity physical estimates directly into the input representation, the framework enhances detection sensitivity while requiring no labeled damage examples. This hybrid AI-physics approach offers a scalable, interpretable, and data-efficient solution for real-time post-earthquake structural diagnostics, providing critical decision support for emergency managers and accelerating safe and targeted recovery efforts.


[11] 2606.28627

Reachability Guarantees for Cart-Pole Swing-Up and Stabilization

The cart-pole swing-up is a canonical benchmark for nonlinear control of underactuated systems, yet an end-to-end guarantee linking the global swing-up maneuver to the local stabilizer is seldom formalized. We present a reachability analysis of a switched energy-based/LQR controller that certifies convergence to the upright equilibrium from a compact set of initial conditions. The swing-up law is derived from an energy-error Lyapunov function; canceling the autonomous conservative term yields a strictly sign-definite Lyapunov derivative, and convergence follows from LaSalle's invariance principle. We also propose an augmented Lyapunov function to regulate the steady-state cart velocity to zero, for which we establish almost-global convergence. For the controller handoff, a switching region is designed to lie strictly within the LQR region of attraction, formally certifying the swing-up-to-stabilization transition. Numerical simulations corroborate the theoretical analysis.


[12] 2606.28628

Envisage: Diffusion-Based Rhinoplasty Goal Visualization with Mask-Decomposed Evaluation

Localized generative editing needs localized evaluation: full-image identity metrics are structurally confounded under hard-composited edits. We present Envisage, a FLUX.1-Fill inpainting reference pipeline for rhinoplasty goal visualization from a single frontal photograph. The pipeline combines 8 rhinoplasty clinical presets (the released framework also includes 8 blepharoplasty and 8 rhytidectomy presets), MediaPipe masks, and hard-mask compositing. The composite preserves outside-mask pixels by construction, so full-face identity scores are dominated by copied pixels rather than by the diffusion backbone. Because full-face identity metrics cannot grade localized edits, we introduce SurgicalScore, a mask-decomposed 0-1 protocol scoring edit direction, edit magnitude, masked LPIPS, realism, and outside-mask preservation; SS_raw assigns 0.919 [0.918, 0.920] to a perfect-predictor control , anchoring the ceiling. On N=211, the paired ArcFace gain (output-to-GT minus input-to-GT) is negative for all methods (Envisage -0.048 smallest, vs. ICEdit -0.139, Kontext -0.242, InstructPix2Pix -0.294; p < 1e-4), with external validation on a 457-pair ASPS/PCA corpus showing a larger negative gap. With SurgicalScore, Envisage achieves the highest score (0.599 [0.579, 0.619]) and leads on both metrics, but the all-negative ArcFace gap shows that full-face identity is poorly aligned with localized surgical accuracy under hard compositing. A 5-seed GT-oracle (an upper bound, not a deployable result) reduces the residual ArcFace gap by 73% (-0.054 to -0.015), with positive output-to-GT gain on 33.9% of cases, indicating candidate-space headroom for a learned ranker. For localized edits, progress should be measured with edit-region fidelity rather than full-face identity metrics. We release Envisage, SurgicalScore, preset definitions, and matched split manifests.


[13] 2606.28642

Model-Free Budgeted Attack Scheduling for Cyber-Physical Systems

This letter studies the budgeted scheduling of stealthy false data-injection (FDI) attacks against state estimators in cyber-physical systems. Existing event-based attack schedulers require full knowledge of the plant model and assume the residual distribution is exactly Gaussian -- assumptions that fail for real-world CPS sensor streams whose residuals are heavy-tailed and whose dynamics are unknown to the adversary. We propose a model-free attack-scheduler that replaces the parametric Gaussian threshold with the empirical quantile of a learned sequence autoencoder residual, calibrated from measurements alone without any plant matrices. We prove that the realized attack rate converges almost surely to the target budget under stationary ergodic residuals. Experiments on two synthetic systems and a real heavy-duty truck dataset show that the proposed scheduler tracks the budget to within 1-2% while also preserving the residual magnitude, guaranteeing stealthiness against any residual-based detector. Comparing with the model-based baseline -- granted the true plant and innovation covariance -- mis-realizes the budget by up to 8.96% under heavy-tailed residual distribution, causing the attacker to achieve only 1.37x system degradation when 1.84x is intended.


[14] 2606.28684

A Neuroimaging Simulation Framework for Developing and Evaluating Causal AI

Causally linking disease-related factors to image-derived biomarkers provides a powerful pathway to understanding disease mechanisms. Despite growing interest in applying causal artificial intelligence (AI) approaches for this task, these methods still need to be adapted for complex medical images, and especially, neuroimaging. However, the lack of ground-truth data presents a barrier to development. To bridge this gap, we developed and tested a method for generating synthetic neuroimages, which adhere to a user-specified causal structure describing the non-image to image variable relationships, permitting the creation of ground-truth neuroimaging datasets. In the simulated T1-weighted magnetic resonance images, anatomical variability is modeled by sampling from a subspace estimated from real data and deforming a template image to create unique simulated subjects. Causal relationships are encoded via precise volumetric changes of any region-of-interest without unwanted global artifacts. We achieved relative volume errors of 0.3-2.66% for the targeted regions-of-interest and demonstrate their statistically significant causal relationships, while maintaining mean absolute errors for non-target brain regions between 0.034-0.397ml. An initial evaluation of causal discovery methods exposes their limited ability to suppress spurious connections, highlighting the need for image-appropriate methods. Our framework is the first to enable the generation of realistic synthetic 3D neuroimages with explicit causal control that can serve as the missing ground-truth data necessary for the objective benchmarking and development of causal AI methods.


[15] 2606.28705

Negative Resistance Caused by Intra-Loop Coupling in Virtual-Admittance-Based Grid-Forming Control

This paper addresses the harmonic instability problem of the virtual-admittance (VA)-based grid-forming control. It is revealed that the intra-loop coupling among the VA control, the inner-loop current control, and the voltage feedforward control results in an \(s^2\)-term in the equivalent output impedance of the inverter, which induces a negative-resistance property in the harmonic range. It is worth highlighting that this negative resistance is independent of the control delay. Consequently, this harmonic instability mechanism is fundamentally different from the extensively investigated cases in the literature, which are induced by the digital control delay of inverters. Then, a simple passivity-oriented damping control is proposed to mitigate the negative resistance arising from the intra-loop coupling. The method fully retains the well-established current controller and voltage feedforward, and does not require grid impedance information. Finally, experimental tests verify the theoretical findings and the effectiveness of the damping method.


[16] 2606.28728

Improving Large-Scale Weakly Supervised ASR by Filtering and Selection

Leveraging large-scale weakly supervised datasets is crucial to train robust end-to-end automatic speech recognition (ASR) models. However, such datasets often contain noisy labels and lack domain specificity, limiting their effectiveness. To address these issues and make better use of weakly supervised datasets, we propose a novel training approach incorporating data filtering and selection. Our approach consists of three steps: pretraining on the entire dataset, continued pretraining on a filtered subset based on character error rate (CER), and fine-tuning on a small number of acoustically similar samples to the target domain, selected from the filtered subset. In experiments with a 90,000-hour weakly supervised Japanese dataset, the proposed filtering and selection methods synergistically reduced CER by up to 6.4% and 4.0%, respectively, even though these steps reused training samples already used in the first pretraining step.


[17] 2606.28732

CTC-Seeded Token Edit Refinement for Non-Autoregressive Speech Recognition

Non-autoregressive automatic speech recognition (ASR) enables parallel decoding, but many refinement-based methods begin from random, fully masked, or fixed-length token sequences, requiring multiple iterations to reconstruct the complete transcript. We instead formulate ASR decoding as a variable-length edit refinement of a greedy connectionist temporal classification (CTC) hypothesis. An acoustic-conditioned Edit Flow decoder operates directly on the collapsed CTC hypothesis, predicting insertion, deletion, and substitution operations in parallel. The Edit Flow decoder is jointly trained with a CTC model using a continuous-time discrete diffusion loss. During inference, we find that just two edit steps yield substantial Word Error Rate (WER) reductions, and classifier-free guidance (CFG) further enhances recognition quality by focusing the model on audio features. We also constrain edit proposals using CTC confidence to improve accuracy. Finally, ablation studies validate our design choices, while decoder pretraining and pretrained encoder integration yield significant additional performance gains.


[18] 2606.28753

BLUE: A Stale-Pixel Optical-Flow Compositor for Entropy-Efficient Surveillance Video Encoding

Continuous-recording surveillance systems face a storage problem that codec tuning alone cannot fully solve: even at aggressive CRF settings, a static-camera scene spends most of its bits re-encoding a background that has not changed. We present BLUE, a pre-encode compositor that exploits this structure by maintaining a persistent seed frame of the background and substituting background pixels with seed pixels before the encoder runs. The encoder then emits near-free SKIP macroblocks for the frozen background, while live pixels in foreground regions are carried unchanged at full quality. We evaluate BLUE on all 308 annotated short subclips from the VIRAT Ground Surveillance Release 2.0 dataset using a six-point CRF sweep with both x264 and x265. At CRF 28, BLUE reduces file size by a mean of 34.6% (x264) / 39.4% (x265) on 95.8% / 99.4% of clips respectively. Foreground-region PSNR, computed only over VIRAT object-annotation bounding boxes, is preserved or improved on 60.7% of clips (+0.36 dB mean, +5.48 dB maximum). Full-frame perceptual quality (VMAF) drops by a median of 6.75-8.59 points; we quantify and disclose this trade-off explicitly. A lightweight deployment gate measuring the compositor's own VMAF on a 2-second prefix identifies the 40% of clips where even full-frame quality degradation is near-imperceptible (Delta VMAF <= -2.9), enabling a selective-activation strategy that retains both the storage benefit and acceptable perceptual fidelity.


[19] 2606.28801

Cross-channel Specific Emitter Identification and Verification via Signal Envelope

Specific emitter identification (SEI) determines which known emitter a received signal originates from, while specific emitter verification (SEV) determines whether the received signal genuinely comes from its claimed emitter. In this paper, we consider the effect of wireless fading channels on SEI and SEV. When the Rician $K$-factor varies, the resulting distribution shift induced by the channel degrades both identification and verification performance. To address this issue, we first theoretically prove that the coefficient of variation of the signal envelope is strictly monotonic with respect to the Rician $K$-factor. Motivated by this property, we propose an envelope-guided adaptive feature modulation (EAFM) identifier for SEI and an EAFM with Mahalanobis distance metric learning (EAFM-MD) verifier for SEV. Specifically, the proposed EAFM identifier adopts a dual-branch neural network to extract device-oriented features from the IQ-domain input and channel-conditioning features from the normalized signal envelope, and adaptively modulates the former via feature-wise linear modulation. Then, we extend the EAFM identifier to an EAFM-MD verifier. The device-fingerprint library is constructed by storing the feature centroid and covariance for each enrolled device, along with the within-device Mahalanobis distances of training signals. For verification, the Mahalanobis distance between the extracted test features and each stored centroid is computed using the stored covariance matrix, and the minimum distance is compared to the corresponding device threshold to make a decision. Finally, numerical results show that the proposed EAFM identifier improves cross-channel identification performance, while the proposed EAFM-MD verifier achieves superior detection performance against unknown spoofing attacks.


[20] 2606.28807

A Survey of Physical-layer Authentication Enhanced by Emerging Spatial Domain Technologies

This article surveys spatial-domain-enhanced Physical-layer Authentication (PLA), with Dual-polarized Antennas (DPA), Massive Multiple-Input Multiple-Output (MIMO), and Reconfigurable Intelligent Surfaces (RIS) as the primary focus. With the rapid growth of wireless deployments, authentication mechanisms face stringent requirements for high security, low overhead, and low latency. PLA offers lightweight identity verification by exploiting physical-layer characteristics. However, the effectiveness of PLA critically depends on how physical observations are constructed and validated under wireless channels. Unlike existing surveys that mainly organize PLA by authentication modality, feature source, and evaluation metrics, this work emphasizes the connection between spatial-domain enhancement mechanisms, the resulting feature representation, and the authentication procedure. We review how DPA, Massive MIMO, and RIS reshape PLA feature representation, and we summarize newly introduced security threats along with representative defense strategies. Case studies further illustrate the practical impact, such as representative detection-probability trends across Signal-to-Noise Ratio regimes and quantitative comparisons among representative schemes. Finally, we outline promising future opportunities enabled by Dynamic Metasurface Antennas, Extra-large MIMO, and spatial configuration with artificial intelligence.


[21] 2606.28817

Trust-Calibrated Certified Repair for Physics-Constrained Decisions under Localized Model Misspecification

Feasibility-restoration layers turn learned, market-based, or optimizer-generated decisions into actions satisfying hard constraints in systems such as power grids. Yet a repair is only as trustworthy as its constraint model: line parameters, sensitivities, ratings, and topology can be locally wrong, so a decision certified feasible under the nominal model may violate the deployed system. We identify this false safety as a dominant failure mode of model-trusting repair and propose Trust-Calibrated Certified Repair (TCR). TCR treats repair as trust calibration and answers four questions in one pipeline: where the physical model is wrong, discovered from measurements with false-discovery control; how much each constraint should be trusted, set by test-gated shrinkage and uncertainty-proportional security margins; what least-cost intervention restores feasibility, computed by a certified repair program; and why the cost was paid, attributed to genuine congestion versus avoidable model error through dual prices. On a physically grounded dynamic-line-rating benchmark whose true ratings follow IEEE 738 under real weather, TCR reaches 98% true-network feasibility, within two points of a clairvoyant oracle, at lower-than-naive cost and with perfect localization. Model-trusting repair, robust margins, and chance-constrained tightening leave substantial feasibility or cost gaps. The same method transfers unchanged to transmission redispatch over PGLib-OPF networks and distribution voltage regulation on the IEEE 33-bus feeder. Across all three task families, TCR gives the strongest deployable feasibility-cost frontier under localized physical-model misspecification. Calibrating trust in the constraint model is the missing ingredient for reliable AI-assisted engineering decisions.


[22] 2606.28834

Q-DASC: State-of-the-Art Safe Quantum Control for HVAC under Local Model Misspecification

Variational quantum reinforcement learning offers a compact policy class for building-energy control, but it inherits a deployment weakness shared by learned controllers: when the thermal model is locally wrong, a policy that appears safe on the model can violate occupant comfort in the real building. Guarantees that depend on noisy quantum read-out are also insufficient for safety-critical control. We address this gap with Q-DASC, Discrepancy-Attributed Safe Quantum Control. Q-DASC wraps a variational-quantum-circuit (VQC) policy with a certified classical safety layer that discovers misspecified operating regimes with false-discovery-rate control, repairs their local thermal gains with shrinkage, projects the proposed quantum schedule onto the repaired comfort-feasible set, and attributes residual violations to policy error, model error, or physical limits. Because the final certificate is produced by classical projection, comfort feasibility is invariant to finite-shot and depolarizing read-out noise. On real BOPTEST building emulators across three buildings, two localized misspecifications, and three seeds, Q-DASC reduces average comfort violation from 26.0\% for the raw VQC controller and 55.3\% for a model-trusting scheduler to 0.02\%, matching a clairvoyant oracle, and remains at 0.24\% under NISQ read-out noise. A repair-aware VQC variant reaches 0.00\% violation and reduces projection intervention, while the default Q-DASC keeps lower energy and stronger observational-data behavior. The same wrapper transfers to EnergyPlus heating and cooling benchmarks and to real hospital air-handling-unit data. These results establish a safety-efficiency frontier for deploying quantum policies in physics-constrained control.


[23] 2606.28837

A Comprehensive Design Framework for Vertical Power Delivery in High-Performance Computing

Power delivery -- including high-to-low voltage conversion, complex power distribution across heterogeneously integrated chiplets, and efficient interconnect allocation -- remains a critical bottleneck in high-performance computing (HPC) systems. Existing vertical power delivery (VPD) solutions are estimated to achieve less than 70\% system-wide end-to-end power delivery efficiency, defined from platform input power to delivered on-chip load power, with substantial energy lost as heat before reaching on-chip point-of-loads (POLs). In the absence of systematic design methodologies, evaluating power quality, exploring architectural alternatives, and optimizing performance rely on computationally prohibitive simulations, resulting in suboptimal designs. This paper introduces an end-to-end scalable power delivery framework for HPC systems, including distributed VPD (DVPD) architecture, DVPD design optimization methodology, and analytical models. The framework leverages substrate-embedded GaN power switches together with arrays of unit inductors and capacitors tailored for HPC applications. Multi-stage power conversion schemes (48V-to-1V, 48V-to-24V-to-1V, and 48V-to-12V-to-1V) are explored, with system-wide voltage drops and power losses evaluated under steady-state conditions. Design specifications for passive and active devices are formulated to meet next-generation efficiency targets. For the 48V-to-1V case, the proposed DVPD approach achieves 84\% system-wide efficiency while occupying 54\% of the area beneath the load system, with efficiency increasing to 87.6\% at 75\% area utilization across a 1--50~kW load range. Furthermore, steady-state voltage drops peak at 2.7\% and transient drops at 9\% (without decoupling capacitors), demonstrating the viability of DVPD for future wafer-scale HPC platforms.


[24] 2606.28847

Physics Equivariance for Robust Generalization in Wireless Foundation Model

Wireless foundation models (WFMs) have recently emerged as a promising paradigm for learning multiple channel state information (CSI) acquisition tasks. However, unlike natural language tokens governed by statistical co-occurrence, wireless channels are generated by electromagnetic propagation laws, and current WFM training is constrained by limited data scale, narrow distribution coverage dominated by simulations, and a pronounced sim-to-real gap. As a result, simply scaling model parameters and CSI samples does not necessarily yield robust and generalizable models. In this paper, we advocate enabling physics equivariance as a principled and explainable inductive bias for WFMs. Specifically, we focus on a universal propagation property for electromagnetic waves, termed wave equivariance: when the input CSI is modulated along time-frequency-space dimensions, the output channel response should exhibit the corresponding transformation. Empirical studies show that the vanilla-WFM fails to reliably acquire such equivariance even with a large number of model parameters and training samples. To address this, we design the physics-intrinsic WFM (phys-WFM) with wave equivariance, which explicitly aligns model behaviors with an interpretable wave propagation structure. Results demonstrate that the proposed design effectively captures wave equivariance and substantially improves robustness and generalization to unseen environments under distribution shift, offering a physics-grounded and testable route toward explainable wireless foundation models.


[25] 2606.28884

GigaSpeechBench: A Real-World Multilingual Speech-to-Text Benchmark

While modern ASR systems achieve low error rates on high-resource benchmarks, such performance often overestimates real-world robustness. Existing evaluations address challenges in isolation, lacking a unified benchmark for domain terminology, age variation, dialects, accents, and low-resource languages, particularly across the Middle East and Southeast Asia, representing over one billion under-evaluated speakers. To address this gap, we introduce GigaSpeechBench, a comprehensive multilingual and multidimensional in-the-wild ASR & AST benchmark comprising 680 hours of human-annotated speech. It features five modules: (1) 12 low-resource Middle Eastern and Southeast Asian languages, plus challenging Japanese and Korean; (2) 6 Chinese dialects; (3) 6 English accents; (4) dense terminology across 12 vertical domains for Chinese and English; and (5) older adult and child speech. We further provide human-annotated Chinese and English translations for 11 languages to support AST evaluation. Extensive evaluations of leading foundation models and commercial APIs reveal significant performance degradation in these challenging settings, exposing critical evaluation blind spots.


[26] 2606.28885

Unified Generalization for Frequency-Domain Channel Extrapolation Across Near-Field and Far-Field Scenarios

As antenna arrays grow, near-field effects become non-negligible in large-scale MIMO, making accurate low-overhead channel acquisition crucial in both far-field and near-field regimes. Deep-learning-based frequency-domain channel extrapolation can reduce pilot overhead, but existing extrapolators generalize poorly to unseen distances and environments, especially across near-field and far-field channels. We propose a physically interpretable framework to unify generalization across both regimes. Our key insight is that angular profiles are regime-dependent, while delay profiles share a sparsity structure that can be aligned. Based on this, we develop a physics-guided disentanglement and alignment pipeline with multi-cluster decoupling, angle-delay feature disentanglement, and delay-domain alignment, enabling the model to learn distribution-stable delay features while reusing heterogeneous angular features. We further design a unified near/far-field DL extrapolator (UNiFi-DLE) and detail its dataset preparation, training, and inference. Simulations and sim-to-real experiments show that UNiFi-DLE generalizes robustly to unseen near-field and far-field scenarios and consistently outperforms state-of-the-art methods.


[27] 2606.28896

A Task-Driven and Quality-Assured Agent Framework for SAR Data Generation

Synthetic aperture radar (SAR) data augmentation is important for improving the generalization of data-driven SAR interpretation models, yet practical augmentation workflows are often hindered by heterogeneous dataset formats, task-dependent metadata requirements, diverse generation methods, and weak validation of generated samples. This paper presents the \textbf{S}AR \textbf{A}ugmentation and \textbf{G}eneration \textbf{A}gent (SAGA), a schema-grounded and benefit-aware agent framework for task-oriented SAR data generation and augmentation. Given a natural-language request and heterogeneous SAR inputs, SAGA extracts observable dataset facts, validates executable dataset schemas, selects feasible augmentation strategies through validator-constrained planning, and compiles the selected strategy into an auditable augmentation workflow. Generated data are further assessed by quality, distribution, SAR-artifact, duplicate, leakage, and optional downstream-task evaluators to support evidence-qualified augmentation claims. By separating semantic proposal from deterministic validation and execution, SAGA improves the reliability and reproducibility of SAR augmentation decisions. Experiments on controlled agentic benchmarks and downstream SAR interpretation tasks show that SAGA improves schema grounding, skill planning, invalid-sample rejection, and downstream augmentation utility compared with rule-based, LLM-only, ReAct-style, and fixed-augmentation baselines.


[28] 2606.28921

PASS-Assisted RSMA under Imperfect SIC: Joint Antenna Activation and Resource Allocation

The performance of rate-splitting multiple access (RSMA) can be severely affected by imperfect successive interference cancellation (SIC) in practical wireless systems. This paper investigates a downlink pinching antenna system (PASS)-assisted RSMA network under imperfect SIC, where residual common-stream interference is explicitly incorporated into private-stream decoding. To improve user fairness, a max-min rate optimization problem is formulated through the joint design of antenna activation, common-rate allocation, and power allocation. The resulting mixed-integer non-convex problem is addressed using a two-stage framework that combines greedy channel-aware antenna activation with successive convex approximation (SCA)-based resource allocation. Numerical results demonstrate the effectiveness of the proposed framework in improving fairness under imperfect SIC.


[29] 2606.28922

Cross-Sensor SAR Data Generation Using Diffusion Models and Feature Migration

Different synthetic aperture radar (SAR) sensors vary significantly in resolution, polarization modes, and frequency bands, making it difficult to directly apply existing models to newly launched SAR satellites. These new systems require large amounts of labeled data for model retraining, but collecting sufficient data in a short time is often infeasible. To address this contradiction, this paper proposes a data generation and transfer framework, integrating a stable diffusion model with attention distillation, that leverages historical SAR data to synthesize training data tailored to the unique characteristics of new SAR systems. Specifically, we fine-tune the low-rank adaptation (LoRA) modules within the multimodal diffusion transformer (MM-DiT) architecture to enable class-controllable SAR image generation guided by textual prompts. To ensure that the generated images reflect the statistical properties and imaging characteristics of the target SAR system, we further introduce an attention distillation mechanism that transfers sensor-specific features, such as spatial texture, speckle distribution, and structural patterns, from real target-domain data to the generative model. Extensive experiments on multi-class aircraft target datasets from two real spaceborne SAR systems demonstrate the effectiveness of the proposed approach in alleviating data scarcity and supporting cross-sensor remote sensing applications.


[30] 2606.28924

Communication-Centric RIS-Assisted ISAC: Signal Modeling and BER Analysis

We propose and analyze a communication-centric reconfigurable intelligent surface (RIS)-assisted integrated sensing and communication (ISAC) system, in which a monostatic radar simultaneously senses a moving target and serves a user equipment (UE) over Nakagami-m fading. We design a dual-function phase-modulated continuous-wave (PMCW) waveform that embeds the data stream directly into the radar pulse train: each pulse carries one full maximum-length sequence whose polarity is flipped by a binary phase-shift keying data symbol, so that the same emission preserves the sharp range autocorrelation required for sensing while conveying one bit per pulse to the UE. We further propose a communication-centric RIS phase configuration that co-phases each element onto the direct radar-to-UE path, yielding a coherent superposition at the UE and a received-power gain that scales with the square of the number of elements. We show that from the radar's perspective, however, the same surface behaves as an uncontrolled scatterer, since the resulting reflection paths are mis-phased and do not benefit from array combining. We derive a closed-form approximation for the average UE bit error rate based on a moment-matched Gamma approximation, and we show that the same waveform still forms a usable range-Doppler map for sensing. Monte-Carlo simulations corroborate the analytical results.


[31] 2606.28948

Estimating Available Traction Power in Multi-Train AC Railway Networks from a Distance-Dependent Power Envelope

Decarbonisation is raising the electrical load on mainline alternating-current railway feeders that were not designed for sustained, simultaneous high-power demand. When several trains accelerate together on a shared feeder, the contact-line voltage can fall far enough to trigger rolling-stock current limitation or feeder protection, eroding capacity and reliability. Preventing this in real time requires a quantity conventional operation does not expose: a localised, continuously updated estimate of the traction power available to each train given the live network state. A railway power-flow model, with trains represented under a voltage-dependent automatic current-limitation characteristic, shows that the minimum network voltage is governed by the product of power and distance rather than by power alone, yielding a distance-dependent single-train power envelope. This envelope does not add up when several trains share a feeder, so a conservative pairwise screen is generalised to a solver-free multi-train estimate: a calibrated shared-path voltage model returning the minimum section voltage and the per-train available power for any number of trains. Calibration uses two short offline solver runs, one fixing the self-impedance and one the inter-train coupling through a separation-dependent factor. Its current-limitation behaviour follows EN 50388-1, and on matched multi-train cases the estimate tracks the full power flow to within about nine per cent on average across two-, three-, and four-train cases, improving as more trains share the feeder, while its online cost scales with the number of trains rather than the network size.


[32] 2606.29000

Two-Dimensional Method-of-Moments Analysis of TMz and TEz Scattering from PEC Cylinders

This paper presents a two-dimensional method-of-moments (MoM) solver for electromagnetic scattering from infinitely long perfectly electrically conducting (PEC) cylinders. Both TMz and TEz polarizations are considered. Starting from the scalar Helmholtz equation, the electric field integral equation (EFIE) is derived for TMz scattering and the magnetic field integral equation (MFIE) is derived for TEz scattering. The induced surface current on the PEC boundary is expanded using pulse basis functions, and the boundary integral equations are discretized using point matching at the segment centers. Circular cylinders with radii $R = {\lambda}$ and $R = 2{\lambda}$ are used as validation cases because analytical series solutions are available. The MoM-computed surface currents, total near fields, scattered near fields, and field-error distributions are compared against the analytical solutions. After validation, the same solver is applied to a square PEC cylinder, for which no simple closed-form analytical solution is used. The results show strong agreement between the MoM and analytical circular-cylinder solutions and demonstrate the geometry-dependent scattering behavior of the square cylinder.


[33] 2606.29008

Macro--Micro Decision-Making in 6G Networks: An Agent-Based Framework for the Resource-Fungibility Landscape Resource-Fungibility Landscape

A defining feature of 6G networks is that performance depends not only on the quantity of available resources (e.g., spectrum, antennas, cache memory, compute, and fronthaul bandwidth) but also on their \emph{fungibility}, i.e., the ability of one resource to substitute for another under changing conditions. We argue that the fungibility landscape of a distributed 6G system is governed by two coupled decision scales: \emph{micro} decisions made locally by agents and \emph{macro} outcomes that emerge at the network level. Existing distributed-optimization approaches largely conflate these scales. To address this gap, we develop an agent-based-modeling (ABM) framework that separates macro and micro decisions through three operator-controllable macro choices, three micro hyperparameters, and three structural metrics. We establish six key results: (i) a two-timescale decomposition theorem, (ii) a structural-metric basis theorem, (iii) a macro--micro design rule with closed-form factorization of the emergent breakdown threshold, (iv) a fungibility--resilience monotonicity proposition, (v) a connectivity--substitutability duality theorem, and (vi) a multi-application generalization proposition. Numerical results visualize the macro fungibility landscape and the micro decision-sensitivity region for a representative 6G deployment.


[34] 2606.29011

PACR: Parameter-Optimized AC Power Flow Restoration for AC Feasible DCOPF Dispatch

The DC optimal power flow is widely used in power system operations because of its computational efficiency and scalability. However, DC dispatches are not guaranteed to satisfy the nonlinear AC power-flow equations or associated operational limits. This paper develops a parameterized, differentiable AC power-flow restoration method for mapping DC dispatches to AC-consistent operating points. The method incorporates distributed slack for active-power balancing and PV/PQ switching for reactive-power regulation, both implemented using smooth differentiable surrogates with tunable parameters, including slack participation factors, voltage setpoints, and regulation steepness. These parameters are trained offline by differentiating through the AC restoration equations using the implicit function theorem. Once trained, the optimized parameters are fixed and used directly during AC power-flow recovery from DC dispatches. The approach is evaluated on IEEE, ACTIVSg, and PEGASE test systems using setpoints computed by standard DC optimal power flow. Results show that the optimized restoration method improves AC feasibility recovery across various systems relative to conventional single-slack AC power-flow recovery. On the 9,241-bus case, the optimized method improves cost difference by 80% relative to the conventional recovery baseline and improves solving time relative to ACOPF by 75%.


[35] 2606.29072

A Sliding Mode Lateral Velocity Observer

A lateral velocity estimation scheme whose stability can be analytically derived (rather than empirically demonstrated through cut-and-try) is attempted. The designed adaptive sliding mode observer shows robust performance under a wide variety of maneuvers/ environments, including the more challenging slow J-turn on low mu.


[36] 2606.29074

Physics-Informed Uncertainty-Aware Beamforming for HAPS Massive MIMO under Imperfect CSI

High-altitude platform station (HAPS) massive multiple-input multiple-output (MIMO) systems are expected to support wide-area, low-latency, and energy-efficient connectivity in future non-terrestrial networks. However, Doppler-induced channel aging, finite-rate feedback quantization, packet loss, and estimation noise impair transmitter-side channel state information (CSI), making robust downlink beamforming challenging. In HAPS channels, these impairments are strongly structured by elevation-dependent Rician propagation and line-of-sight (LoS)-dominant geometry, whereas conventional robust beamforming methods often rely on generic uncertainty models and computationally intensive optimization. This paper develops a physics-informed uncertainty-aware beamforming framework for HAPS massive MIMO systems under imperfect CSI. First, a geometry-aware channel and feedback-impairment model is developed, where CSI errors due to aging, quantization, packet loss, and noise are represented through tangent-space ellipsoidal uncertainty sets. Second, a physics-informed variational autoencoder (VAE) exploits the LoS-dominant steering manifold to enhance channel direction information and propagate learned uncertainty through unit-sphere projection. Third, the learned uncertainty representation is embedded into a robust energy-efficiency maximization formulation with probabilistic QoS awareness. To enable scalable online operation, the resulting beamforming policy is approximated using a multi-agent deterministic policy gradient framework with centralized training, decentralized execution, and differentiable power projection. Simulation results show that the proposed framework improves energy efficiency, SINR robustness, outage reliability, convergence behavior, and online runtime compared with imperfect-CSI, SDR-based, and no-VAE baselines.


[37] 2606.29080

Detection-Control Games under Hidden Modes: Resilience-Induced Blindness Phenomenon

This paper studies resilient control for cyber-physical systems operating under hidden degraded or compromised modes. We formulate hidden-mode detection and belief-dependent control as a game between two decision makers with different objectives: the detector seeks informative belief updates, while the controller seeks regulation performance. This objective mismatch shows why the usual separation intuition between detector design and controller design may fail, leading to a performance-reversal phenomenon induced by the resilience of the controller. For a two-mode linear Gaussian system, we theoretically characterize this phenomenon by linking the resilience margin to the log-likelihood evidence. The analysis shows that a well-performing controller with a large resilience margin can suppress mode-dependent information and slow belief adaptation, which in turn degrades the control performance. The resilience-induced blindness phenomenon and its mitigation are illustrated in numerical simulations.


[38] 2606.29081

Divergence-based Safety Measure for Large Language Models via Rational Inattention

This paper proposes a divergence-based safety measure for large language models (LLMs) under embedding-input attacks. The proposed measure quantifies the worst-case Kullback--Leibler divergence between the clean and attacked LLMs' output distributions, subject to a stealthiness constraint. This constraint is constructed by leveraging the equivalence between transformer attention used in LLMs and rational inattention modeling human decision-making. We analyze the proposed divergence-based safety measure by investigating perfectly undetectable attacks and deriving its upper bound through a Bregman-divergence argument. The proposed safety measure is applied to two pretrained causal language models, GPT-2 and GPT-Neo-125M, to show nontrivial output-distribution shifts, illustrating that the measure can distinguish model-level safety profiles.


[39] 2606.29085

Complete virtual unwrapping and reading of a rolled Herculaneum papyrus

The carbonized papyri from Herculaneum preserve the only large-scale library to survive from classical antiquity, but many unopened rolls remain unread because physical opening risks irreversible damage. X-ray computed microtomography ($\mu$CT) and virtual unwrapping offer a non-invasive route to their texts, yet previous work on sealed Herculaneum scrolls has recovered only localized readings or limited surface regions. Here, using high-resolution phase-contrast $\mu$CT acquired on the BM18 beamline at the European Synchrotron Radiation Facility (ESRF), together with improved computational unrolling and machine learning, we achieve the complete virtual unwrapping and reading of PHerc. 1667 under explicit coverage and papyrological-review criteria. This makes PHerc. 1667 the first Herculaneum papyrus to be fully digitally unrolled and read for extended scholarly study without physical opening. In PHerc. Paris 4, the optimized scan protocol makes ink directly visible in the tomographic volume, allowing three-dimensional ink segmentation and independent validation of surface-conditioned ink recovery. In PHerc. 139, we recover title and author-attribution evidence identifying the scroll as Philodemus, On Gods, Book 8. These results move virtual unwrapping of the Herculaneum scrolls beyond isolated demonstrations towards a scalable framework for systematic recovery of the still-unopened library.


[40] 2606.29117

An Integrated Two-Stage Deep-Learning Tool for Rapid Post-Hurricane Damage Identification and Repair Scheduling

Post-hurricane damage assessment and repair scheduling can require computationally intensive simulation and optimization. This paper presents an integrated two-stage deep-learning tool for rapid damaged-line identification and repair-schedule computation. An available offline synthetic dataset for the IEEE 9500-node test feeder contains 1,700 hurricane scenarios with exposure features, grid metadata, fragility parameters, OpenDSS outputs, damaged-line labels, and Adaptive Large Neighborhood Search reference schedules. Stage 1 benchmarks MLP, ResMLP, and GraphSAGE, while Stage 2 compares MLP, DeepSets, and Set Transformer. The selected ResMLP-Set Transformer pipeline propagates Stage 1 errors into Stage 2 and achieves a damaged-job F1-score of 0.920, pairwise order agreement of 0.854, and start- and end-time mean absolute errors of 4.349 min and 4.486 min, respectively. The tool provides rapid initial repair-log decision support for new hurricane cases.


[41] 2606.29166

A Self-Supervised Learning Framework for Video Encoding Complexity Clustering

Adaptive video streaming is a widely used technique for delivering video content over the internet. One of the key challenges is determining the optimal encoding settings for each video, which can vary significantly based on its content and characteristics. In this paper, we propose Compression Echo Contrastive Learning (CECL), a novel self-supervised learning framework for clustering videos based on their encoding complexity. Our method leverages the response of a video to compression - the Compression Echo - as a supervisory signal, allowing the model to capture underlying encoding characteristics during pretraining. We conduct extensive experiments to demonstrate the effectiveness of our learned representations for the downstream task of clustering videos by their encoding complexity. Our results show that CECL improves upon existing state-of-the-art visual encoders and delivers strong bitrate and quality savings against the fixed bitrate ladder.


[42] 2606.29179

Performance Analysis of Hardware-Accelerated 10-Bit 4:2:2 Encoding with Split-Frame Encoding for High-Fidelity V-PCC Streaming

Video-based Point Cloud Compression (V-PCC) encodes volumetric data by projecting 3D geometry and texture onto 2D video frames. To prevent spatial distortion and color bleeding during 3D reconstruction, this process requires 10-bit color depth and 4:2:2 chroma subsampling, rather than the standard 8-bit 4:2:0 format. Additionally, capturing high-density dynamic point clouds requires demanding encoding parameters, such as 8K resolution at framerates up to 120 fps. Historically, the lack of 4:2:2 chroma support in older GPU hardware encoders restricted real-time V-PCC to custom Application-Specific Integrated Circuits (ASICs). However, the recent introduction of NVIDIA's Blackwell GPU architecture, featuring on-chip hardware encoders with 10-bit 4:2:2 support, presents an opportunity to shift this workload to general-purpose hardware. This paper investigates the feasibility of such an approach. Using a commercially available Blackwell GPU equipped with four parallel on-die hardware encoders as a testbed, we evaluate the throughput, rate-distortion (RD) performance, and power consumption of 8K 10-bit 4:2:2 HEVC across various Split-Frame Encoding (SFE) configurations. Our results demonstrate that 4-way SFE achieves an encoding throughput of 122 fps, successfully meeting the strict real-time constraints of high-density V-PCC. Although the inability to exploit spatial redundancies across slice boundaries results in a BD-Rate penalty of up to 5%, the measured throughput and power efficiency establish standard, commercial off-the-shelf GPUs as a highly viable baseline for real-time volumetric video streaming.


[43] 2606.29244

A Holistic Link Budget Analysis for mmWave and THz Communications in Non-Terrestrial Networks

The non-terrestrial network (NTN) architecture has gained significant interest from the academia owing to its versatility and the ability to provide worldwide service. To achieve extremely high data rates in NTNs, as intended in the sixth-generation (6G) communication systems, millimeter wave (mmWave) and terahertz (THz) frequencies can be considered, enabling substantial bandwidth and data transmission capacity, which makes them highly suitable for NTN applications. However, these high-frequency signals suffer from significant propagation challenges, including atmospheric attenuation, pointing errors, and various environmental effects. Therefore, a comprehensive link budget analysis is essential to accurately assess the feasibility of mmWave/THz-based NTN systems. Existing studies in the literature often fail to fully capture certain frequency-, altitude-, and direction-dependent effects observed in mmWave/THz transmission or possible communication scenarios within the NTN architecture. In particular, while most prior works primarily focus on free-space loss or atmospheric attenuation, this study adopts a much more comprehensive approach. In this work, a detailed link budget analysis is conducted for mmWave/THz NTNs, considering free-space loss, atmospheric absorption, weather-induced effects, ionospheric disturbances, polarization mismatches, feeder losses, antenna and circuitry constraints, fading, pointing errors, and non-white noise characteristics. The results have revealed that the multi-layer structure of the NTN architecture can help reducing the excessive loss levels to a certain level that can be tolerated by high-gain directional antennas/arrays, providing multi-gigabit links and making mmWave/THz NTNs feasible for 6G communication systems.


[44] 2606.29271

Robust Extended Kalman Filter for Land Navigation Using Massive Array of MEMS IMUs

We propose a robust Extended Kalman Filter (EKF) architecture for land navigation using an array of hundreds of low-cost micro-electromechanical systems (MEMS) inertial sensors. The main challenges in this setting are bursty sensor-specific bias errors, bias drift, and the need to aggregate many inertial measurements without increasing the computational burden of the navigation filter. To address these challenges, we introduce Robust Inertial Sensor Array Fusion (RISAF), a pre-filtering framework that combines dynamic percentile gating with real-time bias tracking before the EKF prediction step. The proposed aggregation suppresses anomalous sensor readings and compensates for individual sensor drift while preserving the vehicle-level kinematic signal. Because the resulting fused inertial measurements are passed to a standard EKF, the navigation filter retains a minimal state vector and supports real-time execution. We evaluate RISAF through extensive simulations and real-world field tests in GNSS-denied environments, with the data provided as supplementary material. Compared with a baseline that averages the sensor readings, RISAF achieves substantially improved azimuth accuracy and reduced drift accumulation. These results demonstrate that robust fusion of large MEMS inertial arrays can bridge a substantial part of the gap between cost-effective hardware and tactical-grade inertial navigation performance.


[45] 2606.29345

Neural Augmentation of MIMO-OFDM Receivers for Universal LLR Reconstruction

The growing demands for higher throughput and cost-efficient wireless communications drive the need for receivers that are both simple to deploy and robust to hardware impairments and nonlinear environments. While classical model-based receivers and recently proposed deep neural network ( DNN) architectures provide complementary benefits, they either rely on simplified linear Gaussian assumptions, require considerable computational resources, or are tailored for a given setting and modulation. In this work, we propose a compact and modular DNN augmentation that universally refines the soft outputs of existing receivers (model-based or data-driven), addressing two distinct operating regimes: structurally incomplete soft information arising from reduced-complexity detectors, and degraded soft outputs caused by hardware impairments and synchronization errors. A key property of the proposed framework is its task-agnostic nature: operating without any knowledge of the specific source of unreliability, it produces well-calibrated log-likelihood ratios (LLRs) suitable for channel decoding. Our design leverages an element-wise scaled convolutional neural network tailored to perform learned interference cancellation across users and neighboring subcarriers, combined with a training algorithm that encourages accurate LLR s for soft channel decoding. Numerical results demonstrate that the proposed augmentation consistently improves diverse receiver algorithms in challenging channel conditions while incurring minimal overhead.


[46] 2606.29351

Fair Allocation of Operating Envelopes for Distribution Networks Considering Voltage Unbalance

Operating envelopes (OEs) are increasingly used to allocate limits to distributed energy resources (DERs) while maintaining secure distribution network operation. In unbalanced low-voltage feeders, OE calculation based only on voltage magnitude and thermal constraints can yield overly optimistic limits because power quality constraints such as voltage unbalance are neglected. This paper proposes a three-phase unbalanced AC optimal power flow framework for computing coupled P--Q OEs with explicit voltage unbalance factor (VUF) constraints. In addition, two fairness mechanisms for allocating the available P--Q flexibility across multiple PV units are embedded and compared: (i) network-weighted proportional fairness and (ii) lexicographic max--min fairness. Case studies on unbalanced test feeders illustrate how VUF constraints reshape the P--Q feasible region and the impact of power quality-constrained operation. The comparison highlights the trade-off between the efficiency, equity, and practicality of fairness allocation methods.


[47] 2606.29412

Privacy-Aware State Estimation: From Coarse to Precise Privacy Protection

This paper addresses the problem of achieving both coarse and precise privacy in state estimation. Coarse privacy forces the eavesdropper's total mean-square error (MSE) to infinity, but errors along certain confidential directions may remain bounded. This motivates precise privacy, which additionally drives the MSE along any prescribed direction to infinity. For coarse privacy, an analytical transformation is established, preserving the user's optimality and driving the eavesdropper's total MSE to infinity at a polynomial-exponential rate. A stochastic intermittent encryption scheme is further developed, and an explicit lower bound on the encryption probability is derived to guarantee divergence. For precise privacy, by analyzing the behavior of the Riccati equation on the unobservable subspace, we prove that the eavesdropper's directional MSE becomes unbounded if and only if the direction's unstable component lies outside the observable subspace. Finally, a systematic method is proposed to exclude target vectors from the observable subspace, forcing the directional MSE to infinity.


[48] 2606.29450

VeRe-Flow: Guiding Flow Matching toward Clean Speech via Velocity Contrastive Regularization and Representation Alignment for Noise-Robust Bandwidth Expansion

Noise-robust bandwidth expansion aims to reconstruct high-fidelity wideband speech from noisy low-resolution inputs. While flow matching has shown strong performance in speech generation, accurately recovering clean speech from noisy inputs remains challenging due to the ambiguity of velocity estimation under noise. In this work, we propose VeRe-Flow, a clean-guided flow matching framework that introduces multi-level clean supervision to guide the generative process toward clean speech. At the velocity level, we introduce velocity contrastive regularization, which attracts the predicted velocity toward the clean trajectory while repelling it from noisy trajectories. At the representation level, we incorporate representation alignment that aligns intermediate features with clean self-supervised learning representations. The results demonstrate that the proposed method achieves the lowest LSD and highest DNSMOS OVRL among all baselines, and the highest MOS among generative baselines.


[49] 2606.29480

DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection

Variable frame rate (VFR) coding has recently emerged in neural speech codecs, allocating fewer frames to redundant regions and more frames to rapidly changing speech. VFR must transmit side information about retained time steps, but prior gains are either not rigorously addressed or often minor once these overhead bits are included in total bitrate. We present Dynamic Token Masking (DTM)-Codec, a neural speech codec that demonstrates clear gains over fixed-frame-rate baselines under a strict matched-total-bitrate protocol. DTM keeps selected encoder tokens, fills masked positions with a learned <MASK> embedding, and transmits a binary keep-mask for position-aware decoding. We further introduce Path Length Equalization (PLE), a linear-time boundary selector for VFR coding that yields well-spread adaptive segments with negligible overhead. Across operating points, DTM-Codec broadly improves reconstruction quality and intelligibility over fixed-frame-rate baselines.


[50] 2606.29536

High-Probability ISS Tubes for Continuous-Time State Estimation

This paper studies a probabilistic interpretation of input-to-state stability (ISS) bounds for estimation-error dynamics in continuous-time systems. We show that, if the aggregated disturbance satisfies a probabilistic envelope in an essential-supremum sense, then deterministic ISS bounds immediately induce high-probability error tubes. To make this interpretation constructive, we also provide explicit sufficient conditions based on quadratic Lyapunov inequalities and specialize them to positive and cooperative systems. The approach is illustrated on a positive compartment model with aggregated measurements, where ISS tubes are compared with nominal uncertainty bands produced by a Kalman--Bucy filter and by Gaussian and robust moving-horizon estimators. The examples show that ISS tubes provide a conservative but computationally light uncertainty baseline, while robust MHE is less sensitive to outlier contamination than Gaussian-based


[51] 2606.29627

A Two-Stage Reflection and Reprompting Framework for LLM-Based Solution of Petri Net Reachability Problems in Industrial Applications

Manufacturing systems exhibit strong concurrency, synchronization, and contention for shared reusable resources, which makes fast and reliable scheduling and verification challenging. Petri nets provide a rigorous formalism for modeling such discrete-event manufacturing systems, but reachability analysis and solving remain difficult for conventional graph search or optimization-based solvers, particularly under state-space explosion and evolving production requirements. Recently, Large language models (LLMs) have shown promise as flexible planners that can generate candidate action sequences from textual specifications. However, direct use of LLMs for Petri net reachability remains unreliable. This paper proposes an LLM-based solving framework augmented with a two-stage reflection and reprompting mechanism. The combined effects of reflection and re-clarification improve the accuracy of feasible sequence generation. The proposed method is evaluated on an industrial case modeled as a Petri net. Under a fixed Petri net structure, the proposed strategy is assessed on six solvable reachability configurations. The results demonstrate improved reliability and stability in solving Petri net reachability problems. The proposed framework is further evaluated across multiple LLMs, which indicates that the framework is not tied to any specific model.


[52] 2606.29632

VIB-AVSR: Variational Information Bottleneck for Noise-Robust LLM-Based Audio-Visual Speech Recognition

Audio-Visual Speech Recognition takes two input modalities, acoustic and visual streams, where visual information from lip movements aids recognition when audio is noisy. Recently, LLM-based AVSR models have emerged as a promising paradigm by connecting pre-trained audio-visual encoders to an LLM, achieving strong results in clean conditions. However, these models are predominantly optimized for clean acoustic conditions, with limited attention to making the LLM backbone robust to noise. No explicit mechanism is employed to produce stable representations under corrupted audio, leading to performance degradation in noisy environments. To address this, we propose VIB-AVSR, which integrates Variational Information Bottleneck layers at targeted positions within the LLM backbone to regularize representations. VIB-AVSR reduces degradation under noisy conditions across multiple SNR levels and noise types, without requiring architectural modifications or additional training data.


[53] 2606.29640

Fast Wireless Foundation Models with Early-Exits

While wireless foundation models (FMs) are demonstrating strong potential to enable AI-Native 6G networks, their high computational cost remains a critical barrier to deployment. The large computational cost stems from the rigid, full-depth execution of the FM backbone for every task, a process we show is not only inefficient but can also degrade performance on unseen out-of-distribution (OOD) tasks. In this paper, we propose a novel early-exit FM framework that attaches lightweight, per-task heads, at the most appropriate exit-stage of a frozen wireless FM encoder, enabling variable-depth inference tailored to each task's preferred representation depth. Our results demonstrate that these intermediate-layer features not only speed-up inference significantly (up to 93% fewer FLOPs), but also provide more transferable representations that exceed the full encoder accuracy on unseen tasks. We further demonstrate that a simple fixed-exit strategy per task is more effective than traditional early-exiting policies that route different samples to different exits based on their perceived difficulty levels.


[54] 2606.29737

Effective Depth in Joint Source-Channel Coding: An Implicit Equilibrium Analysis

A fundamental design question in deep joint source-channel coding (Deep JSCC) remains insufficiently explored: given a channel signal-to-noise ratio (SNR), what effective computation depth is required for semantic reconstruction? Existing Deep JSCC systems typically employ fixed-depth neural architectures selected through empirical hyperparameter tuning, which may lead to unnecessary computation under favorable channel conditions and insufficient refinement under severe channel noise. This paper proposes \emph{Implicit-JSCC}, an implicit equilibrium framework in which semantic encoding and decoding are formulated as fixed-point equilibrium processes. The effective encoder and decoder depths are determined by residual-based solver convergence rather than manually predefined layer numbers, while parameter sharing across equilibrium iterations enables depth-independent parameter complexity. To analyze the resulting effective-depth behavior, we develop a Gaussian-process-inspired kernel evolution framework that models equilibrium iterations as an effective-depth propagation process. Since channel noise is injected between the encoder and decoder, the analysis tracks channel-induced representation perturbations across receiver-side equilibrium iterations and derives a theory-guided depth--SNR relationship. After offline calibration of the system-specific parameters, the resulting model characterizes the required receiver-side refinement depth under different SNRs. Extensive experiments show that Implicit-JSCC achieves competitive reconstruction performance while enabling residual-based adaptive inference and controllable computation--quality tradeoffs. The depth--SNR model further provides a characterization of the SNR-dependent refinement depth required to reach a prescribed perturbation tolerance.


[55] 2606.29862

Active Learning for Channel Knowledge Map Construction via Bayesian Inference Diffusion Models

Channel knowledge maps (CKMs) are regarded as key enablers of environment-aware communications in future wireless networks, as they provide location-specific channel information by establishing an explicit connection between wireless devices and the physical propagation environment. As a representative CKM, the channel gain map (CGM) characterizes the spatial distributions of large-scale fading to support wireless environment awareness and network optimization. Existing CGM construction methods generally lack a well-defined sampling-point acquisition strategy, which may result in a limited number of sampling points being allocated to spatially redundant or highly predictable regions, thereby degrading CGM reconstruction performance in complex propagation environments. In this paper, we propose an active-learning-based diffusion framework for efficient CGM construction. By combining Bayesian inference with the diffusion model, the proposed method estimates epistemic uncertainty without retraining the model. Two uncertainty quantification algorithms are further developed along the reverse diffusion process to generate element-wise epistemic uncertainty maps. Furthermore, an uncertainty-aware sampling strategy is designed to determine new observation locations by jointly considering epistemic uncertainty and spatial distribution uniformity. Experimental results on both static and dynamic CGM datasets demonstrate that the proposed method achieves better reconstruction performance than baseline methods. These results indicate that the proposed method can effectively improve the utilization efficiency of limited sampling points and enhance the accuracy of CGM construction in complex wireless propagation environments.


[56] 2606.29901

Semi-Supervised Sound Event Detection with Conditional Mixup and Embedding-Level Contrastive Loss

Sound event detection (SED) is a core module for acoustic environmental analysis, yet its performance is often limited by scarce labeled data. Recent systems leverage large pretrained audio foundation models, but effective fine-tuning remains challenging because labeled data are limited while unlabeled data are abundant. A previous work, ATST-SED, addressed this problem with a pseudo-label based semi-supervised fine-tuning framework. In this work, we further improve the framework by adopting an embedding-level self-supervised contrastive loss inspired by ATST-Frame pretraining. This contrastive objective better exploits unlabeled data during fine-tuning. One challenge is that mixup serves different roles in the two objectives: pseudo-label learning uses composition mixup, while contrastive learning treats mixup as a perturbation. To resolve this mismatch, we propose conditional mixup, which combines composition mixup and perturbation mixup in one semi-supervised framework and defines the corresponding embedding-level contrastive losses. The resulting model achieves 0.645 PSDS1 and 0.822 PSDS2 on the DESED validation set, establishing a new state of the art.


[57] 2606.29926

Multi-Sensor Integrated Sensing and Communication for Critical Infrastructure Protection

Integrated Sensing and Communications (ISAC) will become a service in future mobile communication networks. It enables the detection and recognition of passive objects and environments using radar-like sensing. One promising first application is the protection of critical infrastructure (CI), for example by monitoring the lower airspace above sensitive sites or facilities to prevent unauthorized drone overflights. Our proposal is based on the concept of a distributed multi-sensor (MS)-ISAC. We assume deploying three or more additional passive sniffing sensors near the protected site (PS) of a CI. The sniffers are connected via Downlink (DL) / Uplink (UL) to the distant illumination base station (BS). Multistatic range-Doppler estimation, including synchronization, is performed according to the Cooperative Passive Coherent Location (CPCL) principle. The multistatic architecture has several advantages over the often considered quasi-monostatic architecture where one sniffer is located close to the base station. We discuss the advantages and disadvantages of both approaches and compare their performance for the considered use case in terms of coverage and geometric dilution of precision (GDoP)


[58] 2606.29949

Data-Efficient Multimodal Alignment for Histopathology-based Molecular Prediction

H&E-stained whole-slide images offer cohort-scale availability and rich spatial context but lack molecular specificity, whereas bulk RNA-seq provides transcriptome-wide resolution at high cost with limited archival availability. We show that training a lightweight alignment module atop frozen histopathology and RNA-Seq foundation models enables open-vocabulary molecular prompting -- querying H&E slides with gene-set signatures to predict pathway activity without sequencing or end-to-end retraining. Using contrastive learning on a multi-cancer cohort (N=1,720), we achieve a 25-fold improvement in retrieval over baseline methods. Systematic analysis reveals a graduated predictability spectrum: morphologically grounded programs (cell-cycle programs, immune-related) are most reliably predicted (R^2>0.5), while predicting pathways with no morphological footprint remains challenging as expected. We validate clinical utility on the POSEIDON clinical trial: H&E-predicted squamous cell carcinoma scores recapitulate NSCLC subtype identity and predicted IFN-gamma mirror PD-L1 tumor-cell expression groups. Furthermore, genesets describing immune activation and fibrosis predict known tumor microenvironment archetypes from histology alone. We further validate generalization of our approach across unseen cohorts and demonstrate data-efficient domain adaptation, establishing a slide-native framework for molecular analysis on H&E images.


[59] 2606.29977

A multi-architecture study of specificity refinement and false-positive mechanism analysis in prostate MRI

Objectives: To characterize residual false positives in prostate MRI detection, and to evaluate a lightweight post-hoc refinement head for case-level specificity. Materials and Methods: This retrospective study used PI-CAI (5-fold cross-validation) and Prostate158 (n=158; external). A context-aware evidence head and an 89,216-parameter refinement head were trained on a frozen detection backbone; the evidence head was also trained on four further backbones (bare nnU-Net, bare U-Net, bare Mamba, MIGF-Mamba). For each false-positive region, T2-weighted, apparent-diffusion-coefficient, and high-b-value contrast ratios versus peri-lesional rings were compared against ground-truth lesions and contralateral benign regions. Results: False positives were closer to true cancers than to benign tissue in evidence and raw T2-weighted and apparent-diffusion-coefficient contrast, reproducing 35/35 across five architectures (Cohen's d 1.10; FP/benign evidence ratio 2.38x) and 105/105 across modality-perturbation scenarios. On PI-CAI fold-0, refinement raised case-level specificity from 0.469 to 0.549 (+17.2%) at preserved sensitivity (0.943); 5-fold cross-validation showed fold-conditional behavior (9/15 observations positive; range -22% to +28%). On Prostate158, both models saturated (McNemar pooled p=0.69), while the false-positive contrast-matching finding replicated. Conclusion: Residual false positives are contrast-matched to cancer (sharing raw imaging features rather than histologically confirmed mimicry), reproducing across five architectures -- a data-level imaging property, not model-specific artifacts; post-hoc refinement adds practical specificity in-domain but is fold-conditional.


[60] 2606.29994

Quantifying Realizable Flexibility Limits in Fast and Ultra-Fast EV Charging Using Real-World Data

The rapid growth of electric vehicles (EVs) is increasing the need to accurately quantify their flexibility as a resource for power system operation. However, most existing approaches rely on simplified or power-controllable models that overlook the intrinsic constraints of fast and ultra-fast DC charging. In practice, flexibility is fundamentally shaped by battery management system (BMS) behavior, connection time availability, and battery-protection limits. This paper introduces a trajectory-aware data-driven framework to quantify EV charging flexibility as an energy-bounded and time-constrained process. Based on 252 real charging sessions, 141 representative Power-SoC profiles are reconstructed to capture real-world charging dynamics. Unidirectional flexibility is defined through bounds on the maximum shiftable charging energy, while bidirectional flexibility is quantified as the bounds of the maximum extractable discharge energy under feasibility constraints. Results show that flexibility depends on charging state and connection time. Charging beyond 80% SoC increases duration with limited gains, while higher charger power saturates due to BMS limits. Charging time in the 20%-80% range drops by over 60%, and mean power increases by up to 40%. The maximum extractable bidirectional energy can exceed twice its value depending on the point at which flexibility is activated. These results highlight that EV flexibility is not a controllable resource, but a bounded and time-dependent capability. As such, the proposed framework provides actionable limits that can be directly used by system operators and aggregators for scheduling, peak shaving, and short-duration flexibility services.


[61] 2606.30031

Joint Outage Detection and Compensation for Self-Healing 5G RAN via Deep Reinforcement Learning

Self-healing radio access network (RAN) requires autonomous detection and compensation of base station (BS) failures. This letter proposes an end-to-end framework combining three-class cell outage detection (COD), distinguishing normal, failed, and collaterally degraded cells, with a deep Q-Network (DQN) based deep reinforcement learning (DRL) agent that jointly controls power and antenna tilt for cell outage compensation (COC). Evaluation results show that the proposed DQN agent achieves 99.1% coverage and 54% full-recovery rate, an 11$\times$ improvement over the best heuristic, while consuming less compensation energy than heuristic baselines and learning, without explicit geometric input, to prefer tilt-only compensation for centre-cell outage.


[62] 2606.30110

LEO-NA Walker Constellation Design with Bi-objective Optimisation Approaches

Low Earth Orbit (LEO) constellation design for navigation augmentation (NA) has attracted increasing attention in navigation satellite system studies, yet balancing navigation performance and deployment cost remains a fundamental challenge. To address this issue, this paper proposes a bi-objective optimization framework for LEO Walker constellation design. The problem is formulated as a bi-objective optimization model with constellation cost and positioning accuracy as objectives. In the formulation, PDOP tail risk and satellite visibility are incorporated into the objective formulation to better characterize navigation performance. The Pareto-optimal solution set isobtained using the Non-dominated Sorting Genetic Algorithm II (NSGA-II). Simulation results show that, under the same satellite deployment cost, the proposed LEO-NA Walker constellation improves the average number of visible satellites by 42.5% and 24.4%, and reduces the mean PDOP by 18.9% and 10.5% compared with representative Polar and optimized-LFC constellations, respectively, thereby enhancing service continuity and resource utilization efficiency. These results provide useful guidance for the design and deployment of LEO-NA constellations.


[63] 2606.30114

Evaluation of Head-Related Transfer Functions Across Five Levels of Individualisation in Virtual Reality

Head-related transfer functions (HRTFs) underpin spatial hearing in virtual and augmented reality systems. Whilst individual HRTFs capture listener-specific morphology, their practical limitations have led to widespread use of generic HRTFs and growing interest in synthetic approaches. Yet their relative perceptual impact remains rarely compared within a single study. In this study, we analysed data from 19 listeners that completed two virtual reality sound localisation experiments with complementary subsets of interleaved HRTF conditions enabling within-subject comparison of five conditions: individually measured, KEMAR, randomly selected non-individual measured, high-resolution scan-based synthetic and photogrammetry-based synthetic HRTFs. Test-retest stability of the individually measured baseline across sessions supported pooling across experiments and attributing differences to perceptual rather than session effects. Across HRTF conditions, lateral localisation metrics were largely insensitive to HRTF type, whereas polar-domain metrics and confusion rates showed strong HRTF dependence. Random HRTFs outperformed KEMAR on several polar metrics. High-resolution synthetic HRTFs matched individual measured performance, whilst photogrammetry-based synthetic HRTFs, alongside KEMAR, showed the greatest degradation. These findings clarify practical choices for non-individual baselines and highlight the importance of mesh resolution when using numerical synthesis for elevation-dependent localisation tasks.


[64] 2606.30163

End-to-End Abstraction-Based Control with LLM-Enhanced NL-to-LTL Translation

Abstraction-Based Controller Design (ABCD) offers a principled framework for the safe control of complex Cyber-Physical Systems (CPSs), but interfacing real-world requirements with its formal synthesis machinery remains a major bottleneck: such requirements are most naturally expressed in Natural Language (NL), whereas ABCD requires formal specifications such as Linear Temporal Logic (LTL). Large Language Models (LLMs) offer a promising way to bridge this gap by translating NL requirements into formal specifications. This paper makes three contributions. First, we formalize an LLM-enhanced pipeline for ABCD, in which NL requirements are translated into LTL and used within a formal synthesis workflow. Second, we implement this pipeline in the Dionysos toolbox and introduce a benchmark for evaluating NL-to-LTL translation under both logical diversity and linguistic variation. Third, through experiments with state-of-the-art LLMs, we show that translation accuracy degrades systematically as the target specifications become more complex, across several measures including Abstract Syntax Tree (AST) size, temporal depth, and Büchi automaton size, while also accounting for the length of the NL input. These results reveal a scaling law that links LLM success rate to the intrinsic complexity of the underlying LTL formula. Together, these contributions provide both an evaluation framework and a practical integration pathway for making ABCD more accessible while preserving the rigor of formal methods.


[65] 2606.30397

Model Predictive Current Control with Harmonic Correction for Single-Phase AC-DC EV Charging

The increasing integration of Electric Vehicles (EVs) has imposed a growing harmonic challenge on the power grid. For AC/DC Power Factor Correction (PFC) in single-phase On-Board Chargers (OBCs), Model Predictive Current Control (MPCC) improves the current quality by predicting and tracking the inductor current. However, finite control set MPCC selects switching states, resulting in discrete control actions and a limited optimisation space. Moreover, the MPCC cost function based on instantaneous current tracking error has limited capability to compensate for low-order harmonic disturbances induced by dead time, control delay, and model parameter mismatch. This paper proposes a duty cycle predictive MPCC incorporating a real-time harmonic estimation reference. The proposed method dynamically estimates the low-order harmonic components of the input current and corrects the MPCC reference current, enabling continuous duty cycle control and targeted suppression of dominant low-order harmonics. Simulation results on a single-phase OBC demonstrate that the proposed duty cycle predictive MPCC reduces the steady-state current THD_i from 11.47% to 6.10% compared with the switching state predictive MPCC. With the harmonic reference, the THD_i is further reduced to 2.85%.


[66] 2606.30565

GPU-Accelerated Real-Time Software Defined Radio-Based Orthogonal Time Frequency Space Network-Coded Cooperation System: Hardware Implementation

While Orthogonal Time Frequency Space (OTFS) modulation offers robust reliability for 6G vehicular networks, standalone links suffer from blockages, and existing Software-Defined Radio (SDR) testbeds are bottlenecked by complex Delay-Doppler (DD) equalizers. This paper presents a real-time Decode-and-Forward Network-Coded Cooperation OTFS (OTFS-NCC) prototype implemented on consumer hosts and USRP B210 SDRs. Operating over a five-node TDD (Time-Division Duplexing) topology, our framework improves spectral efficiency by 33% over conventional relaying while mitigating error propagation via an enhanced Gaussian Approximate Message Passing Algorithm (GA-MPA). To support a 2 MHz baseband rate, we devise a hardware-algorithm decoupled GPU (Graphics Processing Unit) architecture using 1D memory mapping and transcendental function clipping, compressing the simulated Real-Time Factor (RTF) from 4.37 to 0.89. RF-conducted Hardware-in-the-Loop validation under 3GPP EVA70 (Extended Vehicular A model) fading confirms sustained zero-packet-drop real-time demodulation over 60-second test runs across a large 112-by-64 DD grid.


[67] 2606.30580

MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling

Text-based singing voice editing (SVE) aims to revise sung lyrics while preserving the original melody, total duration, and non-edited regions. In this paper, we propose MeloDISinger, a flow-matching-based SVE model for melody-aware and duration-preserving editing. Its core module, MeloDRP, predicts fixed-budget duration ratios, enabling explicit span-wise duration control. For melody-aware duration allocation, MeloDRP fuses phonetic cues with pseudo-MIDI melodic context through cross-attention, while temporal-overlap supervision encourages soft phoneme--note correspondences. We further use a flow-matching mel decoder for audio infilling to synthesize edited regions while preserving surrounding context. In addition, we introduce a duration-aware edited-lyric generation pipeline using WhisperX and an LLM to construct feasible evaluation scenarios. Experiments demonstrate state-of-the-art performance in both objective and subjective evaluations.


[68] 2606.28398

Semantic-Aware Generative Image Transmission for Resource-Constrained Visual IoT Systems

Resource-constrained visual Internet of Things (IoT) systems, such as edge cameras, unmanned sensing platforms, industrial inspection nodes, and remote monitoring sensors, often need to transmit task-relevant visual evidence over low-rate wireless links to an edge/cloud service. Existing image communication methods usually compress or transmit complete global representations, leaving limited room to exploit receiver-side generative restoration. This paper proposes a semantic-aware generative image transmission framework for edge-assisted visual IoT. The image captured by an IoT visual sensor is encoded into a discrete token grid by a VQ encoder. At the IoT transmitter or nearby gateway, token recoverability, estimated from prediction entropy and local structure complexity, is fused with semantic importance obtained from instance segmentation and category-aware scoring. A spatial dispersal sampler then selects the tokens to be transmitted under a bitrate budget. The transmitter sends only the quantization indices of kept tokens and a binary mask map, while the edge/cloud receiver recovers masked tokens through MaskGIT with Halton sequence scheduling. Experiments on Kodak and VisDrone scenes under AWGN and Rayleigh channels show that the proposed method provides a flexible bitrate-quality tradeoff for narrowband visual IoT links. At 0.074 bpp, it uses 44.6% of the transmitted bits of the 0.167-bpp DeepJSCC/WITT reference while achieving 29.9 dB PSNR. A pseudo-GT downstream detection study on Kodak further shows that semantic-aware masking preserves task-relevant objects better than random masking at both 30% and 50% mask ratios.


[69] 2606.28441

Learning to Distributedly Estimate under Partially Known Dynamics: A Covariance-Agnostic Neural Kalman Consensus Filter

Online latent state estimation constitutes a fundamental challenge within the artificial intelligence field, serving as a foundational tool for diverse applications, including sequential decision making, anomaly and change-point detection. In this paper, a novel online distributed sensing framework, where agents collaborate and exchange information to perform latent state estimation, is presented. The proposed estimator combines available partial domain knowledge with the representation capabilities of deep neural networks. In particular, the designed sensing framework incorporates prior estimates, optimized consensus weights, and Kalman-like recursive updates to perform decentralized inference, without relying on knowledge of noise statistics. Extensive experiments on linear, chaotic (Lorenz), and practical wireless tracking environments reveal that the proposed Covariance-Agnostic Neural Kalman Consensus Filter (CA-NKCF) outperforms traditional distributed Kalman and particle filters as well as purely model-free deep neural networks, exhibiting robustness even when the underlying motion and observation models are misspecified. It is also demonstrated that CA-NKCF's performance advantage remains stable across varying noise levels, random communication topologies, latent state dimensions, and observation clutter densities induced by scattering objects in wireless systems.


[70] 2606.28555

Robotic Arm-Based Spectral Sensing for Strawberry Positioning and Non-Destructive Sweetness Measurement

Accurate assessment of sweetness is essential for quality control in agriculture, yet conventional methods rely on destructive sampling and are difficult to scale. This thesis presents a robotic arm-based spectral sensing system for strawberry detection, localization, approach, and non-destructive sweetness estimation. The system integrates perception, calibration, and robotic control in a closed-loop pipeline. A YOLOv11s detector is adopted for real-time strawberry detection, while RGB-ToF calibration and mask-to-depth alignment are used to obtain geometrically consistent target localization. A custom eye-in-hand hand-eye calibration workflow is developed to estimate the rigid transform between gripper_link and cam_front, enabling reliable transformation of fruit targets into the robot base frame. Based on these estimates, the robot executes a waypoint-based search and an incremental closed-loop approach strategy to position the sensor at optimal working distance for sweetness sensing. Experimental results show strong end-to-end performance (88.10% success over 42 trials), with robust detection (95.24%) and successful approach execution once a target is detected (100% conditional success). Hand-eye calibration comparisons indicate that although Andreff yields the smallest translation norm in single-run results, the Park method provides better cross-sample consistency and therefore more stable downstream robot behavior. The residual failures are concentrated in the sensing stage, especially valid-region extraction for sweetness estimation under difficult depth/reflectance conditions. Overall, this work demonstrates the feasibility of integrating RGB-ToF perception, robotic manipulation, and non-destructive sensing for practical strawberry quality assessment, and provides a scalable baseline for future integration of learning-based policies such as Vision-Language-Action models.


[71] 2606.28600

Neuromorphic Energy-Aware Learning for Adaptive Deep Brain Stimulation

Neuromorphic and edge computing research has focused on reducing the inference cost of neural network controllers, yet in physical closed-loop systems the actuator can rival or exceed an efficient controller in energy. An efficient controller is therefore necessary but not sufficient, because the actuator becomes the cost worth reducing once inference no longer dominates it. Here, we introduce energy-aware learning, an approach that incorporates actuator energy directly into the reinforcement learning reward, and demonstrate it in closed-loop deep brain stimulation (DBS) for Parkinson's disease. A deep spiking Q-network, trained in a biophysical cortico-basal ganglia-thalamic circuit model, learns to suppress pathological alpha-beta oscillations by 45.2% while reducing stimulation charge by 80.0% relative to continuous DBS. Sparsity-constrained knowledge distillation compresses the policy onto the SynSense XyloAudio 3 neuromorphic processor at 0.52 mW inference power, yielding 28.1x lower energy per inference than an equivalent artificial neural network on conventional edge hardware. By co-optimizing stimulation energy and inference efficiency, the framework addresses both major power demands in implantable neuromodulation.


[72] 2606.28778

Brownian Bridge Diffusion-Based Joint Channel Estimation and Data Detection for Jamming-Resilient Receivers

In next-generation wireless networks, the growing density of devices and limited spectrum resources pose severe jamming challenges to fragile legitimate communication links in the wireless electromagnetic environment. Crucially, when jamming overlaps with pilot and data symbols in both time and frequency domains, it inflicts a severe bottleneck on receiver-side joint estimation and detection. Existing schemes often lack an effective framework to combat such jamming contamination, thereby failing to guarantee reliable transmission. To address this issue, we propose a Brownian bridge diffusion-based joint channel estimation and data detection framework (BBD-JCED) for jamming-resilient receivers. Specifically, the proposed framework comprises two core modules: the first extracts jamming features in the short-time Fourier transform (STFT) domain and suppresses jamming samples, thereby improving the signal-to-jamming-plus-noise ratio (SJNR) of the received signal; the second introduces a Brownian bridge diffusion (BBD) process to model the evolution of the suppressed signal and the encoded bits in the presence of channel estimation errors, thereby enabling enhanced joint channel estimation and data detection. To alleviate the computational burden of the BBD process in the second module, we further derive a fast ordinary differential equation (ODE) solver that enables its low-complexity iterative evolution. Finally, we design a multi-module training algorithm to improve the data recovery capability of the proposed framework. Simulation results demonstrate that the proposed framework achieves superior bit recovery performance compared with baseline schemes while maintaining a lower number of model parameters and competitive computational complexity.


[73] 2606.28824

Exit-and-Join Dynamics and Equilibrium in Continuum Cooperative Games

This paper develops a continuum theory of exit-and-join coalition dynamics in nonatomic cooperative games. We extend the Aumann-Shapley value and the Aumann-Drèze value to coalition structures in which each coalition is treated as a restricted nonatomic game, yielding a marginal-contribution-based payoff density that governs incentives for agents to remain in, exit, or join coalitions. We derive deterministic mean-field dynamics from decentralized switching rules and show that payoff-difference switching recovers replicator dynamics as a special case. We characterize exit-and-join equilibrium by the absence of profitable positive-mass deviations and prove its equivalence with stationarity of the induced mass dynamics under incentive-compatible and strictly payoff-responsive switching rates. For mass-based cooperative games, we construct a Lyapunov function and establish global convergence under strict concavity. We further show that the equilibrium is equivalent to a Wardrop equilibrium of an induced nonatomic population game and admits a variational inequality formulation. The framework is extended to incorporate switching costs and endogenous coalition acceptance rules, leading to constrained equilibria characterized by quasi-variational inequalities. The proposed theory unifies cooperative value allocation, noncooperative coalition mobility, mean-field dynamics, evolutionary game theory, and population games within a common framework for analyzing coalition formation and adaptation in large-scale multi-agent systems.


[74] 2606.28842

Channel Capacity under the Subtractive Dithered Quantization Model

We study the capacity of an additive white Gaussian noise (AWGN) channel followed by a subtractive dithered uniform quantizer. Under the Schuchman conditions and with negligible overload probability, the system admits an additive-noise representation in which the effective noise is the sum of Gaussian and uniform components. Capacity bounds are derived for this model when inputs are subject to an average-power constraint as well as a peak-amplitude constraint, where the latter accounts for the limited quantizer dynamic range. Specifically, a computable lower bound is obtained based on the entropy power inequality (EPI), using the maximum-entropy input under the above constraints. Tighter numerical lower bounds are derived using discrete input constellations with finite mass points. Finally, an upper bound is obtained by exploiting the fact that Gaussian distributions maximize entropy under a variance constraint. Numerical results show that, for a K-level quantizer, discrete constellations with K mass points already achieve near-optimal rates among the tested families. Moreover, our upper bound is close to the lower bounds in the moderate-SNR regime; it thus represents a good and simple capacity approximation in this regime.


[75] 2606.28879

Analysis of Adam Algorithms for Stochastic Dynamic Systems

The adaptive moment estimation algorithm, known as Adam, is widely used in modern machine learning, owing to its low per-iteration complexity and strong empirical performance. Despite its prevalent use, the theoretical foundation of Adam remains largely unexplored for time-varying and nonstationary systems. In fact, the existing theoretical analyses of Adam-type algorithms are primarily concerned with time-invariant model parameters and explicitly or implicitly rely on independent and identically distributed (i.i.d.) data assumptions, under which the learning taskcan be formulated as minimizing a fixed expected objective with a static minimizer. However, such assumptions are often violated in time-varying and nonstationary systems, thereby calling for a theoretical investigation beyond the conventional yet idealized i.i.d. setting. The main objective of this paper is to solve this challenging problem by establishing a general theory of Adam for time-varying and nonstationary stochastic systems. We will introduce some new techniques for analyzing the products of nonstationary and dependent random matrices induced by Adam's coupled first- and second-moment recursions, and will construct a new stochastic Lyapunov function that blends these two moment dynamics. Under a stochastic excitation condition that allows nonstationary and dependent data, we will derive both parameter tracking and output prediction error bounds explicitly, quantifying the effects of stepsize, first- and second-momentum parameters, gradient noise and parameter drift. These bounds not only provide guarantees for Adam performance, but also provide guidelines for hyperparameter selection. Experiments on both synthetic and real-world data validate our theory and design guidelines.


[76] 2606.28988

Underwater Source Detection and Classification for Signal-based Surveillance: Audio Dataset Curation and Cross-Domain Evaluation

Machine learning for underwater acoustics is constrained by the scarcity of publicly available labeled datasets. In contrast to air-acoustic domains, where large benchmarks enable rapid model development, underwater datasets are typically small and limited in acoustic diversity, restricting robust model training and cross-domain generalization. To help address this gap, we introduce a curated underwater audio dataset derived from an open-source maritime sound archive. The dataset contains over one thousand labeled audio segments across eight biologically and mechanically relevant acoustic classes, providing an additional resource for training models in data-limited underwater environments. Additionally, we establish a lightweight Convolutional Neural Network (CNN) baseline and propose a margin-enhanced loss with feature alignment to mitigate class confusion arising from data imbalance, acoustic similarity, and cross-domain mismatch. While the baseline achieves 96.35% in-domain accuracy, evaluation on ShipsEar reveals substantial domain shift; the proposed feature alignment improve zero-shot ship detection by 42.60%, demonstrating stronger robustness under distribution mismatch. We further release a transparent curation pipeline and reproducible benchmark to support future research on imbalance mitigation, domain adaptation, and data-efficient underwater acoustic classification.


[77] 2606.28991

Learning from Acquisition: Metadata-driven Multimodal Pre-training for Cardiac MRI

Cardiac magnetic resonance imaging (CMR) routinely records structured acquisition metadata, yet most CMR foundation models rely primarily on image-only pre-training and leave this naturally available source of weak semantic supervision largely underexplored. We propose MetaCLIP-CMR, a metadata-driven framework based on Contrastive Language--Image Pre-training (CLIP), which converts imaging modality, anatomical view, scanner vendor, field strength, and scanner model into textual supervision for CMR representation learning. The pretrained image encoder is evaluated on imaging modality classification, cine view classification, and cardiac segmentation. MetaCLIP-CMR achieves 86.8% modality accuracy and 86.5% cine view accuracy, clearly outperforming ImageNet and masked reconstruction initialisations. For downstream cardiac segmentation, MetaCLIP-CMR consistently obtains the highest Dice score across the evaluated ACDC and M&Ms cine short-axis (SAX) settings under both full-data and 20% fine-tuning regimes. Compared with recent image-focused large-scale CMR pre-training models, MetaCLIP-CMR achieves comparable ACDC segmentation performance, while requiring less than 1% of their pre-training image scale. These results suggest that metadata learning offers a natural and easy-to-use strategy for transforming routinely recorded acquisition information into effective supervision for foundation-level CMR representation learning, highlighting the promise of metadata-driven multimodal pre-training.


[78] 2606.29071

An Optimal Contact-Mechanically Consistent and Flow-Separation Adapted Modeling of Vocal Fold Dynamics

Single mass-spring-damper models of vocal folds have been effective in simulating vocal fold vibrations without added complexity. However, single-degree-of-freedom models cannot sustain oscillation in the presence of structural damping unless source-tract interaction is considered. Moreover, existing lumped models struggle to accurately simulate vocal fold closure during phonation. This study aims to develop a reliable and simplified single-degree-of-freedom model of phonation that can simulate sustained oscillation in a damped system without incorporating a vocal tract model. Additionally, the proposed model maintains vocal fold closure in a manner consistent with the physics of phonation, addressing a longstanding challenge in existing lumped models. High-speed videoendoscopy (HSV) data from four normophonic subjects producing sustained vowel /i/ were used to extract glottal area waveforms (GAWs) via deep learning-based image segmentation for particle swarm optimization of the model parameters. An additional resistance force was incorporated to compensate for flow separation and generate the force imbalance required for sustained oscillation. An external structural force was also added during closure to sustain the closed phase. The 4th-order Runge-Kutta method was used to solve the governing equations with enhanced numerical stability and accuracy. The model parameters were optimized for individual subjects, resulting in normalized errors below 3% between experimental and simulated GAWs. The proposed model accurately reproduced subject-specific vocal fold vibrations and vocal fold closure in agreement with experimental data. Overall, the proposed model provides a computationally efficient framework for simulating sustained phonation without requiring complex source-tract coupling while capturing the key biomechanical and aerodynamic mechanisms of phonation.


[79] 2606.29098

Connectivity Estimation using Stochastic Graph Heat Modelling

A growing number of techniques leverage the spatial structures that underlie many real-world datasets. Despite these advances, the complementary task of estimating spatial structures and understanding their role within these techniques has often been overlooked. In neurophysiological data analysis specifically, numerous methods exist to estimate brain connectivity, but most are not explicitly model-based, dynamic, multivariate, or directed. To address these limitations, we previously introduced noise-driven heat modelling on graphs for neurophysiological connectivity estimation. In this study, we extend this framework by relaxing earlier noise assumptions and adding regularisation to improve robustness. We also develop a simulation procedure to characterise and evaluate our technique in a controlled setting. Finally, we demonstrate that the technique is able to capture meaningful spatial structure across two experiments, each using two real-world datasets. The explicit model formulation of our connectivity estimator has the potential to improve the interpretability of graph-based techniques across a wide range of applications. The code implementing our method is available at this https URL.


[80] 2606.29162

Spatially Localized Image Degradation Embeddings for Image Quality Assessment

Self-supervised learning (SSL) currently drives state-of-the-art performance in no-reference image quality assessment (NR-IQA). However, standard SSL pipelines uniformly apply synthetic distortions across the entire image field, which can limit their sensitivity to spatially localized and co-occurring degradations encountered in real-world content. In this work, we empirically expose this representational blind spot across existing state-of-the-art encoders, demonstrating their reduced sensitivity to spatially bounded image degradations. To bridge this gap, we introduce Spatial Localized Image Degradation Embeddings for Image Quality Assessment (SLIDE-IQA). SLIDE-IQA employs a dual-branch Vision Transformer framework that injects spatially bounded degradations into a contrastive pretraining objective. To handle the spatial complexity of these degradations, we introduce a Threshold-Bounded Exclusion Mechanism, a representational design choice that resolves structural conflicts arising from spatially localized distortions to ensure the latent space respects both degradation type and spatial scale. Finally, we show that SLIDE-IQA's synthetic-only pretraining significantly improves sensitivity to localized distortions, while achieving competitive performance on NR-IQA benchmarks against existing SSL NR-IQA models.


[81] 2606.29269

Proportional-Fair Joint User Grouping and Power Allocation for Uplink NOMA-ISAC

This letter addresses long-term fairness in uplink non-orthogonal multiple access integrated sensing and communication (NOMA-ISAC) systems. Existing resource allocation schemes that maximize instantaneous sum rate often favor strong users, leaving historically underserved users with poor long-term throughput. We propose PF-JUGPA, a proportional-fair scheduling based joint user grouping and power allocation method. PF-JUGPA first pre-selects users via a PF metric combining instantaneous rate proxies and historical averages, then performs fairness-aware grouping and power allocation by maximizing a weighted sum rate whose weights are inversely proportional to historical service rates. Simulation results show that PF-JUGPA significantly improves the Jain fairness index and weak-user average rates with only a modest sum-rate loss compared to sum-rate-oriented and round-robin baselines. The findings confirm that embedding long-term service history into both scheduling and resource allocation yields an effective throughput--fairness--sensing tradeoff in uplink NOMA-ISAC.


[82] 2606.29339

Two kinds of robustness are not the same: disentangling fault tolerance and low-SNR robustness in multi-domain event detection on real data

Reliable event detection underpins induced-seismicity monitoring for Carbon dioxide Capture and Storage (CCS) and geothermal operations, distributed acoustic sensing (DAS), and industrial condition monitoring. In each setting a detector must stay reliable both when sensors fail and when the signal is buried in noise. These two failure modes are routinely conflated, and architectural complexity is often credited with robustness it may not deserve. We assemble a unified binary event-detection benchmark from three physically distinct real sources -- Hi-net seismic waveforms, Utah FORGE 2024 borehole DAS, and MAFAULDA industrial vibration -- each mapped to a common 8-channel, 256-sample representation, and evaluate a fault-tolerant detector (CEPHALON) trained with per-sample sensor-dropout against standard detectors (a 1D convolutional network, a temporal convolutional network, and a compact Transformer) trained with an identical recipe. On clean data every model is near-perfect (AUC ~ 0.99). Under progressive sensor loss, simple models with sensor-dropout are already robust and CEPHALON holds no advantage. Under additive noise, however, CEPHALON degrades far more gracefully: at -2.5 dB its overall AUC is 0.939 versus 0.532-0.572 for the convolutional baselines. Same-architecture ablations isolate the cause: disabling internal redundancy at inference reduces the low-SNR advantage only modestly, whereas removing sensor-dropout training collapses it (0.899 to 0.603 at -5 dB). The training recipe is therefore the dominant cause and parallel redundancy only secondary. We release a complete, numbered, reproducible pipeline so that every figure can be regenerated.


[83] 2606.29433

Dynamical System Characterization of Heterogeneous Walker Satellite Networks: An Orbit-Aware Stochastic Geometry Perspective

Heterogeneous and in particular multi-altitude low Earth orbit (LEO) satellite constellations exhibit complex spatial and temporal structures, which require new modeling tools for their performance analysis. In this paper, we develop an orbit-aware stochastic geometry framework modeling today's LEO satellites on various orbits and various altitudes. In particular, we characterize such a system as the superposition of multiple Walker point processes and formulate it as a dynamical system determined by an initial condition and the rotation speeds of satellites and Earth. We show that when the speeds are rationally commensurable, the proposed satellite system is periodic. Then, we show that the system is ergodic when the speeds are rationally independent, establishing a theoretical link between time averages of the system and the expectation of it under the invariant measure. We derive the nearest-satellite distance distribution of a typical receiver at a given latitude and analyze the signal to interference-plus-noise ratio (SINR) coverage probability of the typical receiver. We then derive the ergodic throughput of the downlink communication to the typical receiver. Overall, the proposed framework offers a rigorous and tractable tool for analyzing downlink performance in Walker-type heterogeneous LEO satellite networks.


[84] 2606.29534

Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs

Popular ASR test sets adopt inconsistent conventions for numbers, disfluencies, entities, and casing, while standard normalizers erase the format distinctions users care about. Current benchmarks therefore cannot measure whether a model follows user preferences for output style. We introduce PreferenceASR, a test set evaluating ASR systems on their ability to follow natural-language preference instructions across four categories: normalization, entities, disfluencies, and case. Built from seven open-source corpora via a two-stage LLM-assisted pipeline with human verification, it is evaluated with a preference-aware normalizer that selectively skips steps matching the active instruction. Benchmarking four models shows rankings shift across preference types, exposing quality differences traditional evaluation obscures. We publicly release the dataset.


[85] 2606.29578

SoftBinary Coding: A New Information-Theoretic Neural Compression Paradigm

Neural compression is currently dominated by Nonlinear Transform Coding (NTC), which maps data to real-valued latents via continuous transforms. Despite its success, NTC suffers from train-test mismatch due to non-differentiable quantization, a ``smoothness bias" inherent in continuous transforms that precludes optimality for certain sources, and a loss of ``shaping gain" due to the complexity of including high-dimensional vector quantization. We propose SoftBinary Coding (SBC), an end-to-end learning paradigm that bypasses these limitations by using a stochastic binary latent space. In the spirit of vector quantization, SBC employs discrete representations and compresses them through a novel fast binary channel simulation scheme, for which we provide a proof of rate optimality. Experimental gains on information-theoretic sources provide both theoretical and practical closure to NTC's limitations, establishing discrete binary structures as a viable path toward reaching optimal rate--distortion bounds. Surprisingly, SBC also achieves state-of-the-art performance on vector quantization of i.i.d. sources, exceeding Trellis Coded Quantization of the Gaussian source.


[86] 2606.29609

Cooperative RSU Sleep Scheduling for Green V2I Corridors

As vehicle-to-infrastructure (V2I) deployments scale, roadside units (RSUs) that consume 10-25W continuously yet serve negligible traffic during off-peak hours represent a growing source of energy waste. Sleep scheduling can exploit the pronounced diurnal variation in urban traffic, but the WAVE service restoration overhead of up to 100ms nearly exhausts the 3GPPTS~22.185 latency budget, making independent sleep decisions risky. This paper proposes a cooperative framework in which upstream RSUs share traffic detection signals with downstream neighbors via infrastructure-to-infrastructure links, enabling predictive wake-up that exploits spatial correlation between adjacent intersections. The framework is formulated as a constrained Markov decision process and decomposed into per-RSU subproblems solvable by value iteration. Four algorithms of increasing sophistication are evaluated on real hourly traffic data from four consecutive signalized intersections in Kuwait City, comprising a total of 762,050 vehicles over five days. The cooperative algorithm reduces corridor energy consumption by 59.5% relative to always-on operation while maintaining 99% latency compliance, and provides 7.7 percentage points of additional savings over independent per-RSU optimization at downstream RSUs with spatial correlation \r{ho} >= 0.97. Extrapolated to a 200-RSU urban deployment, the cooperative approach yields an estimated 5.25 tonnes of CO2 reduction per year.


[87] 2606.29673

Privacy-Preserving Decentralized Cooperative Localization with Range-Only Measurements: A Convex Optimization Based Approach

Cooperative localization using range-based measurements is critical for multi-robot systems operating in GPS-denied and unstructured environments. However, traditional cooperative approaches require sharing explicit spatial coordinates across the network, presenting a severe security vulnerability in privacy-sensitive missions. While recent literature has explored privacy-preserving alternatives, these methods typically rely on accuracy-degrading noise injection or computationally prohibitive cryptographic protocols. To overcome these limitations, we propose a novel, natively privacy-preserving Decentralized Cooperative Localization (DCL) framework based on convex optimization. Discarding probabilistic noise models, we assume strictly bounded measurement noise and formulate the localization problem via Semi-Definite Programming (SDP) to compute a Maximum-Volume Inscribed Ellipsoid (MVE). Our approach introduces novel intersection-plane constraints derived from landmark measurements to significantly tighten individual spatial bounds. To incorporate inter-robot range measurements securely, we uniquely decompose coupling constraints into localized Linear Matrix Inequalities (LMIs). Agents achieve fleet-wide spatial consensus by iteratively exchanging only abstract dual variables, completely avoiding the transmission of explicit primal position estimates. Extensive 3D Monte Carlo simulations demonstrate that our DCL framework outperforms existing SDP-based localization method in accuracy, while guaranteeing operational privacy and maintaining highly scalable, parallelizable computation.


[88] 2606.29677

Lateral String Stability for Vehicle Platoons

Connected and automated vehicle (CAV) platooning promises gains in energy efficiency and traffic throughput and, most critically, in safety. These safety benefits hinge on string stability, which determines how disturbances propagate along a platoon. While longitudinal string stability is well studied, lateral string stability, which governs the propagation of path-tracking errors that can lead to unsafe deviations from the intended path, remains underexplored. Its importance is increasing as autonomous vehicles rely more heavily on onboard sensing and map-free navigation, where sensor occlusion and dense formations amplify safety risks. This paper presents a new framework for lateral string stability that directly addresses safety-critical path-relative tracking errors and enables consistent comparison across vehicles following the same road geometry. Central to this framework is an arc-length (Eulerian) viewpoint, a departure from traditional analyses, that clarifies how tracking errors at a given point on the path propagate from one vehicle to the next. A formal definition of lateral string stability is introduced along with two control strategies: an onboard-sensing-only controller and a novel learn-from-predecessor approach utilizing vehicle-to-vehicle (V2V) communication. We show that onboard sensing alone cannot guarantee attenuation of path-tracking errors, imposing a fundamental safety limitation, whereas V2V communication enables true error attenuation.


[89] 2606.29681

Sample-Efficient Learning of Probabilistic Causes for Reachability in Markov Decision Processes with Probabilistic Guarantees

Probabilistic model checking for Markov decision processes (MDPs) provides quantitative guarantees, but often offers limited insight into why undesired outcomes occur. Probability-raising (PR) causality addresses this by identifying states whose visitation increases the probability of reaching designated states. Existing PR-cause identification methods, however, use MDP modifications not well-suited for learning: the gap between conditional and unconditional reachability probabilities can be hard to detect from transition samples, and construction requires reachability probabilities of the MDP, which are unavailable when transition probabilities are unknown. We study unknown MDPs and propose a learning approach with probabilistic guarantees for PR-cause identification. Our key ingredient is a restart-based MDP modification that reduces PR-cause checking to two conditional reachability queries without using reachability values of the original MDP. We prove correctness, establish sample-complexity bounds, and develop an anytime learning-and-checking algorithm based on two-sided value iteration that progressively classifies states as causal, non-causal, or undecided. Experiments on two benchmarks demonstrate reliable and fast identification of PR causes.


[90] 2606.29897

Child-Centric Voice Anonymization in Single and Multi-Speaker Speech via Domain-Adapted SSL Models

Voice anonymization aims to protect speaker identity while preserving linguistic content and speech usability. However, most anonymization systems are developed on adult speech, leading to degraded performance when applied to child speech. This paper investigates child-centric anonymization by adapting a self-supervised learning (SSL) based anonymization pipeline to the child speech domain. The system is adapted using child speech from the MyST corpus and evaluated under both single-speaker and two-speaker mixture conditions. Experimental results show that child-domain adaptation improves intelligibility and perceptual quality while maintaining strong privacy protection. Extending the approach to multi-speaker further demonstrates that combining target speaker extraction with child-adapted anonymization provides privacy protection while preserving conversational structure. These findings highlight the importance of child-specific adaptation for practical speech anonymization systems.


[91] 2606.29995

Design and Realization of Broadband Magnonic Spectrometers With Local Electrical Outputs

Microscopic radio-frequency (RF) devices based on propagating spin waves (SWs) are promising for compact, energy-efficient RF signal processing, but their implementation is impeded by fabrication complexity and the lack of efficient electrical readout. In this work, we demonstrate a SW-based Rowland circle spectrometer with electrical input and local electrical output transducers. The device is realized using a scalable fabrication process based on sputter deposition and wet-chemical etching of Yttrium-Iron-Garnet (YIG), forming concave grating structures with micrometer-scale features. The device functionality is confirmed by combined electrical and magneto-optical measurements, which show that the deflection of SW wavefronts at different input frequencies closely follows the analytically predicted behavior. The linear excitation of SWs via two input tones further confirms the spectrometer operation for simultaneously propagating waves. Beyond the single-device demonstration, we propose a concept for scalable architectures comprising multiple Rowland circles with tunable operating points. When combined with broadband parallel electrical readout, this approach enables control over bandwidth and spectral resolution, which are relevant to spectral occupancy detection in wireless communication systems.


[92] 2606.30097

CylindTrack: Depth-Aware Cylindrical Motion Modeling for Panoramic Multi-Object Tracking

Multi-Object Tracking (MOT) is a core capability for embodied perception, and panoramic cameras are attractive for embodied systems because their 360° field of view reduces blind spots and keeps surrounding targets observable for longer durations. However, panoramic MOT is not a straightforward extension of perspective MOT. In equirectangular panoramic videos, the horizontal image domain is periodic rather than Euclidean, which breaks planar motion assumptions and makes IoU-based association unreliable near the 0°/360° seam. Meanwhile, large-FoV scenes often contain more objects, stronger scale variation, and more frequent interactions, making online association particularly sensitive to unstable frame-wise depth cues. To address these issues, we propose CylindTrack, a depth-aware cylindrical tracking-by-detection framework for panoramic MOT. CylindTrack first introduces Depth-Temporal Trajectory Modeling (DTM), which promotes instance depth from an isolated frame-wise cue to a temporally filtered trajectory-level state. To improve the reliability of depth observations, we further develop Spherical Spatio-Temporal Consistency Learning (SSTC), which combines a Temporal Mixer and Spherical Geometry-aware Attention to enhance temporal coherence and panoramic geometric alignment in depth-aware representations. Finally, we design a Topology-Aware Cylindrical Motion Model (TCMM) that lifts horizontal motion into a continuous angular state space and performs seam-consistent motion prediction and association in the periodic panoramic domain. By jointly modeling trajectory-level depth consistency and panoramic topology, CylindTrack improves identity preservation and trajectory continuity in challenging panoramic scenes. The source code will be released at this https URL.


[93] 2606.30100

Binary Signal Recovery in Undersampling: Iterative SDP with Majority Voting and Successive Interference Cancellation

Binary compressive sensing (BCS) seeks to recover a $k$-sparse binary vector of length $n$ from $m$ linear measurements. Classical CS guarantees break down for $m < k$ and convex/greedy BCS algorithms with random Gaussian sensing matrices perform poorly. We introduce ISDP-MVSIC, which combines randomized semidefinite programming (SDP) sampling, majority voting (MV) and successive interference cancellation (SIC) across $L \ll n$ stages, wrapped in a residual-cost driven retry loop. The method exposes a tunable complexity--performance trade-off: for $n=100, 144$, raising the worst-case complexity $\mathcal{C}_{max}$ from $7.9 \times 10^9$ to $2.0 \times 10^{10}$ enables empirical exact recovery over $m/k \in [0.4,5.0]$ as the sparsity ratio $s=k/n$ decreases from $0.5$ to $0.1$, by practically targeting the undersampled regime.


[94] 2606.30179

HiRes: A Hierarchical Cascaded Method for Resistor Value Identification

Accurate identification of resistor values from unconstrained images remains a challenging computer vision task due to variations in lighting, orientation, scale, and background complexity. This paper presents HiRes, a hierarchical cascaded pipeline for end-to-end resistor value identification directly from full-frame images. The approach combines object detection (YOLOv8n), semantic segmentation (UNet++ with EfficientNet-B2), and structured geometric decoding via projection along the resistor axis. To improve robustness, we incorporate geometric filtering, gap-preserving band separation, and validation against the E24 resistor series. Experiments across diverse real-world images show that HiRes achieves a detection mAP50 of 0.9906, a segmentation mIoU of 0.8444, and an end-to-end identification accuracy of 85.8% (95% CI: 78.0-91.9%), outperforming the publicly available classical baseline, CVResist, which fails to generalize beyond controlled conditions. In addition, our architecture outperforms state-of-the-art MLLMs on our challenging test set, offering a lower cost, high efficiency, and an interpretable alternative method. These results demonstrate the effectiveness of integrating learned visual representations with structured reasoning for robust resistor interpretation. Code and dataset are available at this https URL.


[95] 2606.30196

Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector

This paper offers an in-depth analysis of non-sequential multimodal sentence-level embeddings, with a particular focus on the SONAR model. We demonstrate that certain embedding dimensions are sensitive to perturbations and can serve as indicators of decoding anomalies. By leveraging the consistency between successive encoding and decoding, we successfully build an accurate detector. Additionally, we explore modifying specific dimensions of interest to attempt to correct them. This work underscores the importance of understanding and analyzing the embeddings themselves to enhance the reliability of multimodal representations.


[96] 2606.30322

Hybrid Active-Online Learning Framework for Label-Efficient Concept Drift Adaptation in Optical Network Failure Detection

We propose a hybrid active-online learning framework for label-efficient concept drift adaptation in optical network failure detection. Using margin-based selective labeling, our method achieves nearceiling accuracy and AUC scores while querying only 3.4% of streaming samples, with negligible latency overhead compared to static inference.


[97] 2606.30356

OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL

We propose Online Latent prediction with Invariant Views and rEconstruction (OLIVE), a self-supervised speech representation learning framework that jointly optimizes analysis and synthesis objectives. OLIVE combines view-augmented masked latent prediction with waveform reconstruction under a unified objective. Reconstruction constrains early encoder features to retain signal-level information, while masked latent prediction shapes later contextual representations toward invariance for robust downstream performance. We show that these objectives enable representations that support a broad range of tasks. In particular, OLIVE improves results on generation and speaker tasks, maintains competitive performance on recognition and semantic tasks, and improves waveform reconstruction.


[98] 2606.30476

PS-MOT: Cultivating Instance Awareness from Point Seeds for Multi-Object Tracking

We introduce Point-supervised Multi-Object Tracking (PS-MOT) as a cost-effective alternative to traditional bounding box supervision, shifting the focus from spatial fitting to topological center-driven representation. However, PS-MOT faces challenges, e.g., spatial ambiguity and identity drift due to the lack of explicit geometric structure and scale constraints. To address these, we propose PS-Track, a hierarchical pipeline transitioning from points to instances across data, model, and loss levels. At the data level, we introduce Temporal-Feedback Prompting (TFP) to evolve points into temporally consistent pseudo-labels using negative spatial cues and motion priors. At the model level, we design the Point-Excited Wavelet Attention (PEWA) module, which leverages semantic correlations to activate high-frequency components, ``hallucinating'' object boundaries. At the loss level, Uncertainty-Guided Gaussian Learning (UGL) models pseudo-labels as probabilistic distributions, dynamically calibrating supervision intensity. Experiments on DanceTrack, EmboTrack, SportsMOT, and JRDB demonstrate that PS-Track provides a feasible and effective point-supervised alternative across diverse tracking scenarios, establishing a new state-of-the-art for point-supervised tracking. The source code is available at this https URL.


[99] 2606.30581

Realtime Wind Estimation using Low Cost Quadrotor Uncrewed Aerial Vehicles

In environmental monitoring as well as emergency response applications such as wildfires, wind velocity measurement is essential. Quadrotor UAVs have become popular platforms for wind velocity estimation due to their maneuverability, compact size, and cost-effectiveness. Numerous studies use the Extended Kalman Filter (EKF) to estimate the wind velocity based on the quadrotor dynamic model. However, most of them use hovering quadrotors only for wind estimation, others use a near-linear trajectory to estimate near-constant velocities. Furthermore, EKF performance is constrained by its reliance on linearized approximations of the nonlinear quadrotor dynamics around current states, limiting accuracy in highly nonlinear scenarios, including windy conditions. This study proposes the use of an Unscented Kalman Filter (UKF), a nonlinear estimator to provide accurate wind estimations while maintaining the trajectory of the quadrotor UAV. The quadrotor is modeled on the Special Euclidean group SE(3) and the approach is evaluated through numerical simulations using a geometric controller to maintain quadrotor flight paths. The results indicate that as the nonlinearity of the simulation increases, the UKF consistently outperforms the EKF. This demonstrates the potential of the UKF as a reliable estimator for highly nonlinear scenarios, capable of maintaining the trajectory with minimal deviation while providing accurate wind velocity estimations.


[100] 2606.30595

Wireless Backdoor Attack and Defense for Semantic Communications over Multiple Access Channel

Semantic communication (SemCom) aims to preserve semantic meaning and task-oriented information beyond conventional message recovery over wireless channels. The adoption of SemCom in shared-access wireless networks introduces new vulnerabilities for multi-user semantic inference. This paper considers a SemCom system for two transmitters communicating with a common receiver over a multiple access channel. Each transmitter maps source information into latent semantic representations, while the receiver jointly reconstructs and classifies the semantic information for both transmitters. A selective over-the-air backdoor (Trojan) attack is presented in which an adversary transmits a low-power trigger waveform over the air and injects it into the shared received signal during training. By transmitting the trigger again during testing, this stealthy, low-power attack selectively manipulates the semantic inference for one transmitter while minimally affecting the inference of the other transmitter. To mitigate this vulnerability, a trigger-aware defense mechanism is developed to preserve correct semantic labels under trigger-contaminated wireless observations. The results demonstrate both the vulnerability of shared-access SemCom systems to selective over-the-air backdoor attacks and the effectiveness of trigger-aware robust training for semantic protection.


[101] 2606.30623

When and Which Sensor to Observe? Timely Tracking of a Joint Markov Source

We investigate the problem of remote estimation (at a monitor) of a discrete-time joint Markov process with individual components which can be observed with dedicated sensors. At a given time slot, the monitor has the option of staying idle or sending a pull request to one of the sensors to obtain a partial state value, while the sensors are assumed to have heterogeneous sampling costs. Our goal is to develop a monitor pull policy, i.e., determining when and towards which sensor to send a pull request, in order to minimize a weighted sum of average age of incorrect information (AoII), or in short age, and sampling costs. As the communication model, we assume an erasure channel with a fixed one-slot delay from each sensor to the monitor. In this setting, the monitor does not perfectly know either the state of the process or the age, at any given time. We first obtain a sufficient statistic, namely belief, representing the joint distribution of the age and the current state of the observed process, by using the history of all pull requests and observations. Then, we formulate the optimization problem as a continuous state-space Markov decision process (MDP), namely belief-MDP, for the solution of which we propose two model predictive control (MPC) methods, namely MPC without terminal costs (MPC-WTC), and reinforcement learning MPC (RL-MPC). The effectiveness of the proposed methods is validated by numerical examples.


[102] 2606.30645

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

Perception-based humanoid loco-manipulation requires connecting egocentric observations and task instructions to whole-body motion. Learning this mapping requires synchronized egocentric images, language commands, and robot-compatible kinematic trajectories, yet no existing data source provides this complete tuple at scale. We address this bottleneck by generating vision-language-kinematics (VLK) supervision synthetically in reconstructed scenes. Our pipeline leverages 3D Gaussian Splatting to reconstruct metric-scale indoor environments, synthesizes navigation and object-interaction trajectories using privileged scene information, and renders paired egocentric observations after the fact. We produce 48,000 paired trajectories with no human intervention and train a VLK policy that predicts short-horizon whole-body kinematic trajectories. A whole-body tracker converts these predictions into actions on the physical humanoid. We evaluate on the physical Unitree G1 performing navigation and single-object transport, demonstrating that synthesized interactions in reconstructed scenes provide effective supervision for sim-to-real perception-based humanoid loco-manipulation. Project Website: this https URL


[103] 2405.16356

A Prudent Framework for Understanding Risk-Awareness in Demand Response

We show that risk-aware behaviors in demand response originate from superquadratic state-dependent cost functions and price uncertainty with skewed distributions. We obtain such results through developing a novel theoretical demand response framework that combines non-anticipatory multi-stage decision-making with superquadratic cost functions. We introduce the concept of prudent demand, defined by a positive third-order derivative of the cost function, which is the first principle for risk-averse behavior despite a risk-neutral objective. Our analysis establishes that future price uncertainty affects immediate consumption decisions, and the extent of this response scales proportionally with the skewness of the price distribution. We visualize our theoretical findings through numerical simulations and illustrate their practical implications using a real-world case study.


[104] 2501.08210

Inner-Loop-Free Total-Variation-Constrained Full-Waveform Inversion

This paper proposes a computationally efficient algorithm to address the Full-Waveform Inversion (FWI) problem with a Total Variation (TV) constraint, designed to accurately reconstruct subsurface properties from seismic signal. FWI, as an ill-posed inverse problem, requires effective regularizations or constraints to ensure accurate and stable solutions. Among these, the TV constraint is widely known as a powerful prior for modeling the piecewise smooth structure of subsurface properties. However, solving the optimization problem is challenging because of the nonlinear observation process combined with the non-smoothness of the TV constraint. Conventional methods rely on inner loops and/or approximations, which lead to high computational cost and/or inappropriate solutions. To address these limitations, we develop a novel algorithm based on a primal-dual splitting method, achieving computational efficiency by eliminating inner loops and ensuring high accuracy by avoiding approximations. We also demonstrate the effectiveness of the proposed method through experiments using the SEG/EAGE Salt Models. The source code will be available at this https URL.


[105] 2504.05184

MSA-UNet3+: Multi-Scale Attention UNet3+ with New Supervised Prototypical Contrastive Loss for Coronary DSA Image Segmentation

Accurate segmentation of coronary Digital Subtraction Angiography (DSA) images is essential for diagnosing and treating coronary artery disease (CAD). Despite advances in deep learning, challenges such as high intra-class variance and class imbalance limit precise vessel delineation. Existing approaches for coronary DSA segmentation cannot effectively address these issues. Furthermore, existing segmentation network encoders do not directly generate semantic embeddings, which could enable the decoder to reconstruct segmentation masks more effectively. We propose a Supervised Prototypical Contrastive Loss (SPCL) that combines supervised and prototypical contrastive learning to enhance coronary DSA image segmentation. The supervised contrastive loss enforces semantic embeddings in the encoder, improving feature differentiation. The prototypical contrastive loss enables the model to focus on the foreground class while alleviating high intra-class variance and class imbalance by concentrating only on hard-to-classify background samples. We implement the proposed SPCL within MSA-UNet3+, a Multi-Scale Attention-Enhanced UNet3+ architecture. The architecture integrates a Multi-Scale Attention Encoder (M-encoder), a Multi-Scale Dilated Bottleneck (MSD-Bottleneck) for multi-scale feature extraction, and a Contextual Attention Fusion Module (CAFM) to preserve fine-grained details while improving contextual understanding. Experiments on a private coronary DSA dataset demonstrate that MSA-UNet3+ outperforms state-of-the-art methods, achieving the highest Dice coefficient and F1-score while significantly reducing ASD and ACD. The framework provides precise vessel segmentation for accurate identification of coronary stenosis and supports informed diagnostic and therapeutic decisions. The code will be released at this https URL.


[106] 2506.03925

SVD-Based Graph Fractional Fourier Transform on Directed Graphs and Its Application

Real-world signals frequently reside on directed Cartesian product graphs, including digital images, sensor networks, and meteorological temperature records. Designing a transform method suitable for processing such multi-dimensional graph signals within the fractional Fourier transform domain remains a critical challenge in graph signal processing (GSP). This paper proposes two novel graph fractional Fourier transforms (GFRFTs) for multi-dimensional signals defined on such directed product graphs and comprehensively investigates their denoising capabilities. Our contributions are fourfold: (1) We propose two distinct two-dimensional GFRFTs based on singular value decompositions of some fractional Laplacian matrices; (2) We generalize these transforms to multi-dimensional graph fractional Fourier transforms (MGFRFTs), establishing a powerful fractional domain analysis framework for multi-dimensional GSP; (3) We investigate the signal reconstruction capability of our proposed GFRFTs, as well as their computational complexity; and (4) We validate the practical utility of our approach through denoising experiments on real-world meteorological temperature datasets.


[107] 2506.08319

Differentiable Physics-Informed Adaptive Koopman Control for Stable Flight under Unknown Disturbances

Uncertainties and disturbances in robotic systems, such as aerodynamic forces, are fundamentally outcomes of physical interactions with the environment, manifesting as learnable spatiotemporal sequences rather than random noise. However, achieving high-precision control for robotic systems operating in unstructured environments is often hindered by complex unmodeled dynamics and external disturbances. While learning-based methods offer powerful approximation capabilities, they typically suffer from heavy reliance on offline training and lack theoretical guarantees. Conversely, traditional robust control strategies are predominantly reactive, limited to instantaneous estimation without the foresight to anticipate future disturbance trends. To bridge this gap, this paper proposes a differentiable data-enabled Koopman control framework termed DEKC. Unlike black-box approaches, DEKC adopts a hybrid modeling strategy that retains the nominal physics model while employing a deep neural network to parameterize the lifting function of Koopman operator for unknown residual dynamics. Crucially, the framework formulates disturbances as a dynamical system, learning their temporal evolution in a global linear space. This enables the prediction of future disturbance trajectories, which are explicitly integrated into controller for preemptive compensation. Furthermore, an online backward gradient update mechanism is introduced to ensure real-time adaptation to time-varying uncertainties. Numerical simulations on a tethered space robot demonstrate the efficacy of the proposed DEKC in mitigating highly coupled uncertainties. Complementing these results, real-world experiments on a quadrotor substantiate its superiority in tracking agile trajectories under uncertainties induced by aerodynamics and suspended payload.


[108] 2506.12997

MORIC: CSI Delay-Doppler Decomposition for Robust Wi-Fi-based Human Activity Recognition

The newly established IEEE 802.11bf Task Group aims to amend the WLAN standard to support advanced sensing applications such as human activity recognition (HAR). Although studies have demonstrated the potential of sub-7 GHz Wi-Fi Channel State Information (CSI) for HAR, existing methods often degrade substantially under realistic variations across users, environments, and sensing configurations. This work addresses the poor generalization of Wi-Fi-based HAR by extracting motion-centered representations that reduce dependence on static, environment-specific, and non-activity-related CSI magnitude and phase patterns. CSI signals are transformed into the delay-profile space and decomposed into multiple Doppler velocity projections, which are modeled as observations of a moving point's velocity from different unknown directions, analogous to virtual cameras observing the same motion with varying degrees of clarity. This yields a richer activity representation than either a single aggregated Doppler estimate or the spurious, environment-dependent CSI patterns used in prior works. Since these projections are unordered and may recur due to random multipath propagation, we introduce MORIC, a novel order- and repetition-invariant time-series classification model for robust Wi-Fi-based HAR. Experimental results on the collected dataset show that the proposed method outperforms state-of-the-art approaches in cross-user hand motion recognition, especially for challenging gestures. Incorporating only a few calibration samples further improves accuracy, demonstrating MORIC's adaptability and highlighting the potential of the proposed methodology for practical Wi-Fi sensing in real-world scenarios.


[109] 2507.06764

Fast Equivariant Imaging: Accelerating Unsupervised Learning and Model Adaptation via Inexact Splitting

In this work, we propose Fast Equivariant Imaging (FEI), a novel unsupervised learning framework to rapidly and efficiently train deep imaging networks without ground-truth data. FEI reformulates the EI objective through an inexact variable-splitting scheme, decoupling network training from an auxiliary restoration step implemented with a plug-and-play denoiser, this novel unsupervised scheme shows superior efficiency and performance compared to the standard Equivariant Imaging paradigm. In particular, our FEI schemes achieve an order-of-magnitude (10x) acceleration over standard EI on training U-Net for X-ray CT reconstruction and image inpainting, with improved generalization performance. Beyond offline training, the proposed scheme also enables efficient test-time adaptation of a pretrained model to individual samples, to secure further performance improvements. Extensive experiments show that the proposed approach provides a noticeable efficiency and performance gain over existing unsupervised methods and model adaptation techniques.


[110] 2509.07201

Design of Input-Output Observers for a Population of Systems with Bounded Frequency-Domain Variation using $DK$-iteration

This paper proposes a linear input-output observer design methodology for a population of systems in which each observer uses knowledge of the linear time-invariant dynamics of the particular device. Observers are typically composed of a known model of the system and a correction mechanism to produce an estimate of the state. The proposed design procedure characterizes the variation within the population in the frequency domain and synthesizes a single robust correction filter. The correction filter is compatible with all system models that satisfy the variation characterization such that a given level of estimation performance is guaranteed. This is accomplished by posing a robust performance problem using the observer error dynamics and solving it using $DK$-iteration. The design procedure is experimentally demonstrated on a flexible joint robotic manipulator with varied joint stiffnesses. It is shown that the proposed method that uses a single correction filter achieves comparable estimation performance to a method that uses a correction gain tailored toward each joint stiffness configuration.


[111] 2509.15001

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

Child-centered daylong recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic differences. We introduce BabyHuBERT, a self-supervised speech model trained on 13,000 hours of multilingual child-centered recordings from 40+ languages. Evaluated on voice type classification, the task of identifying who produces speech and when in child-centered recordings (key child, other children, male, and female adults), BabyHuBERT-VTC achieves F1-scores from 55.0% to 76.1% across six corpora, consistently outperforming W2V2-LL4300 and HuBERT (pretrained on English daylongs and clean adult speech, respectively). Notable gains include 14.0 and 18.3 absolute F1 points over HuBERT on Vanuatu and Solomon Islands, demonstrating effectiveness on underrepresented languages. We share code and models to support researchers working with child-centered recordings across diverse linguistic contexts.


[112] 2510.22790

Robust Safety Filter Synthesis for Quaternion Attitude Dynamics via LMI-Based Ellipsoidal Invariant Sets

We present a safety filter to guarantee constraint satisfaction on the rotation angle in the presence of disturbances. An LMI-based framework simultaneously synthesizes a maximal ellipsoidal robust controlled invariant (RCI) set and its associated state-feedback backup control law by solving a single convex semidefinite program, subject to state and input constraints. To extend this framework to nonlinear quaternion attitude dynamics, we derive exact closed-form sector bounds on the quaternion kinematic nonlinearity and analytically embed them into the LMI via the S-procedure. A smooth mixing law intervenes only as the state approaches the RCI boundary, preserving nominal performance during safe operation. This work is motivated by hierarchical aerial control architectures, where outer-loop commands can generate attitude references that drive the inner-loop attitude state unstable, a cascade failure mode that endangers the entire system. Quadrotor simulations with hierarchical controller structures under bounded disturbances confirm constraint satisfaction across three scenarios specifically designed to stress-test the cascade failure mode: set-point tracking with small initial errors, set-point tracking with large initial position errors that saturate the outer loop, and high-frequency circular trajectory following that persistently excites the inner-loop attitude dynamics.


[113] 2511.08370

Power Hardware-in-the-loop Interfacing via $\mathcal{H}_\infty$ Model Matching

This paper presents an $\mathcal{H}_\infty$ model matching control-based approach to the problem of power hardware-in-the-loop (PHIL) interfacing. The objective is to interconnect a grid simulation and a physical device via an interface in a way that is stable and accurate. Conventional approaches include the ideal transformer method (ITM) and its impedance-based variants, which trade accuracy for stability, as well as some $\mathcal{H}_\infty$ control-based approaches, which do not make use of all the available information in their optimization for accuracy. Designing for transparency, as opposed to accuracy as existing approaches do, would achieve both accuracy and stability, while making use of all the dynamical information present in the idealized interconnection of the grid and device. The approach proposed in this paper employs model matching to formulate the PHIL problem as an $\mathcal{H}_\infty$ control problem using transparency as the explicit frequency-domain control objective. The approach is experimentally validated in a real-time resistive-load PHIL setup, and is found to achieve accuracy levels that are comparable or superior to those of an ITM-based interface.


[114] 2511.12308

ISAC with Affine Frequency Division Multiplexing: An FMCW-Based Signal Processing Perspective

This paper investigates the sensing potential of affine frequency division multiplexing (AFDM) in high-mobility integrated sensing and communication (ISAC) from the perspective of radar waveforms. We introduce an innovative parameter selection criterion that establishes a precise mathematical equivalence between AFDM subcarriers and Nyquist-sampled frequency-modulated continuous-wave (FMCW). This connection not only provides a clear physical insight into AFDM's sensing mechanism but also enables a direct mapping from the DAFT index to delay-Doppler (DD) parameters of wireless channels. Building on this, we develop a novel input-output model in a DD-parameterized DAFT (DD-DAFT) domain for AFDM, which explicitly reveals the inherent DD coupling effect arising from the chirp-channel interaction. Subsequently, we design two matched-filtering sensing algorithms. The first is performed in the time-frequency domain with low complexity, while the second is operated in the DD-DAFT domain to precisely resolve the DD coupling. Simulations show that our algorithms achieve effective pilot-free sensing and demonstrate a fundamental trade-off between sensing performance, communication overhead, and computational complexity. The proposed AFDM outperforms classical AFDM and other variants in most scenarios.


[115] 2511.18267

Laboratory and field testing of a residential heat pump retrofit for a DC solar nanogrid

Residential buildings are increasingly integrating large devices that run natively on direct current (DC), such as solar photovoltaics, electric vehicles, stationary batteries, and DC motors that drive heat pumps and other major appliances. Today, these natively-DC devices typically connect within buildings through alternating current (AC) distribution systems, entailing significant energy losses due to conversions between AC and DC. This paper investigates the alternative of connecting DC devices through DC distribution. Specifically, this paper shows through laboratory and field experiments that an off-the-shelf residential heat pump designed for conventional AC systems can be powered directly on DC with few hardware modifications and little change in performance. Supporting simulations of a DC nanogrid including {historical heat pump and rest-of-house load measurements,} a solar photovoltaic array, and a stationary battery suggest that connecting these devices through DC distribution could decrease annual electricity bills by 12.5% with an after-market AC-to-DC heat pump retrofit and by 16.7% with a heat pump designed to run on DC. The associated savings in gross nanogrid energy are 8% and 9.2%, respectively.


[116] 2511.21274

Multiport Analytical Pixel Electromagnetic Simulator (MAPES) for AI-assisted RFIC and Microwave Circuit Design

This paper proposes a novel analytical framework, denoted the Multiport Analytical Pixel Electromagnetic Simulator (MAPES). MAPES enables efficient and accurate prediction of the electromagnetic (EM) performance of arbitrary pixel-based microwave (MW) and RFIC structures. Unlike the Internal Multiport Method (IMPM), which optimizes only connecting elements within a fixed, gap-separated pixel skeleton, MAPES operates directly on the all-pixel presence/absence formulation used in recent MW/RFIC design. This is enabled by diagonal virtual pixels, an occupancy-to-load mapping, and a multi-layer/via port-level formulation that have no counterpart in IMPM. By introducing virtual pixels and diagonal virtual pixels and inserting virtual ports at critical positions, MAPES captures all horizontal, vertical, and diagonal electromagnetic couplings within a single multiport impedance matrix. Only a small set of full-wave simulations (typically about 1% of the datasets required by AI-assisted EM emulators) is needed to construct this matrix. Subsequently, any arbitrary pixel configuration can be evaluated analytically using a closed-form multiport relation without additional full-wave calculations. The proposed approach eliminates data-driven overfitting and ensures accurate results across all design variations. Using MAPES, comprehensive examples for single- and double-layer PCBs and CMOS processes (180 nm and 65 nm) confirm that high prediction accuracy with 600-2000$\times$ speed improvement is achieved compared to CST simulations. Owing to its efficiency, scalability, and reliability, MAPES provides a practical and versatile tool for AI-assisted MW circuit and RFIC design across diverse fabrication technologies.


[117] 2512.08444

Learned iterative networks: An operator learning perspective

Learned image reconstruction has become a pillar in computational imaging and inverse problems. Among the most successful approaches are learned iterative networks, which are formulated by unrolling classical iterative optimisation algorithms for solving variational problems. While the underlying algorithm is usually formulated in the functional analytic setting, learned approaches are often viewed as purely discrete. In this survey we present a unified operator view for learned iterative networks. Specifically, we formulate a learned reconstruction operator, defining how to compute, and separately the learning problem, which defines what to compute. In this setting we present common approaches and show that many approaches are closely related in their core. We review linear as well as non-linear inverse problems in this framework and present a short numerical study to conclude.


[118] 2512.14175

KalMRACO: Unifying Kalman Filtering and Model Reference Adaptive Control for Robust Control and Estimation

A common assumption when applying the Kalman filter is a priori knowledge of the system parameters. These parameters are not necessarily known, and this may limit the real-world applicability of the Kalman filter. The well-established Model Reference Adaptive Controller (MRAC) utilizes a known reference model and ensures that the input-output behavior of a potentially unknown system converges to that of the reference model. We present KalMRACO, a unification of Kalman filtering and MRAC leveraging the reference model of MRAC as the Kalman filter system model, thus eliminating, to a large degree, the need for knowledge of the underlying system parameters in the application of the Kalman filter. We also introduce the concept of blending estimated states and measurements in the feedback law to ensure stability during the initial transient. KalMRACO is validated through simulations and lab trials on an underwater vehicle. Results show superior tracking of the reference model state, observer state convergence, and noise mitigation properties.


[119] 2601.03527

Linear computation of XPM and BER in Long-Haul Optical Systems

Cross-Phase Modulation (XPM), a critical nonlinear effect in long-haul optical communication systems utilizing Wavelength Division Multiplexing (WDM), is significantly influenced by intensity fluctuations (IFs) originating from the transmitted signal and altered by chromatic dispersion. A linear model is employed to characterize the growth of intensity fluctuations along the transmission path, demonstrating that these fluctuations are sufficient to predict the spectral characteristics of XPM on an adjacent channel. A direct correlation between frequency-domain IF growth and XPM-induced phase distortions is established and analyzed. Furthermore, the impact of XPM on the bit error ratio (BER) is shown to be analytically predictable. These analytical predictions align closely with results obtained from full nonlinear simulations. Results reveal that the evolution of IFs, especially at lower frequencies, has a pronounced effect on the XPM phase fluctuation spectra and overall phase variance. Validation through simulation confirms the model's accuracy in predicting XPM-induced phase fluctuation spectra and variance under various system configurations. These findings highlight the necessity of accounting for frequency-domain IF evolution during signal propagation in order to accurately model XPM-induced impairments, offering valuable guidance for the optimization and design of advanced optical communication systems.


[120] 2602.21567

Diagnosis-Driven Co-planning of Network Reinforcement and BESS for Distribution Grid with High Penetration of Electric Vehicles

While the rapid proliferation of electric vehicles (EVs) accelerates net-zero goals, uncoordinated charging activities impose severe operational challenges on distribution grids, including exacerbated peak loads, thermal overloading, and voltage violations. To overcome the computational intractability of jointly optimizing grid infrastructure reinforcements and Battery Energy Storage System (BESS) installations, this paper proposes a novel three-stage Diagnosis-Driven Co-Planning (DDCP) framework. The methodology integrates a Violation Detection and Quantification (VDQ) model to systematically identify system breaches, and a Violation Mitigation-Based Planning (VMBP) model for optimal BESS allocation. Specifically, Stage I of the DDCP framework diagnoses critical bottleneck lines that render standalone BESS solutions infeasible; Stage II executes targeted physical upgrades exclusively on these bottlenecks; and Stage III finalizes the optimal BESS deployment on the updated network topology. Furthermore, this study quantifies the EV hosting capacity thresholds before and after BESS integration across varying EV adoption rates and base voltages. Finally, a comprehensive comparative analysis evaluates four mitigation approaches: the VDQ-driven cable upgrade (VCU) model, the VMBP model, system-wide voltage uprating, and the proposed DDCP framework. The results demonstrate that the DDCP framework not only resolves the complex joint-optimization hurdle but also achieves superior techno-economic performance in addressing high-EV-penetration challenges.


[121] 2602.23171

Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization

Consistency regularization (CR) improves the robustness and accuracy of Connectionist Temporal Classification (CTC) by ensuring predictions remain stable across input perturbations. In this work, we propose Align-Consistency, an extension of CR designed for Align-Refine -- a non-autoregressive (non-AR) model that performs iterative refinement of frame-level hypotheses. This method leverages the speed of parallel inference while significantly boosting recognition performance. The effectiveness of Align-Consistency is demonstrated in two settings. First, in the fully supervised setting, our results indicate that applying CR to both the base CTC model and the subsequent refinement steps is critical, and the accuracy improvements from non-AR decoding and CR are mutually additive. Second, for semi-supervised ASR, we employ fast non-AR decoding to generate online pseudo-labels on unlabeled data, which are used to further refine the supervised model and lead to substantial gains.


[122] 2603.05693

Longitudinal Lesion Inpainting in Brain MRI via 3D Region Aware Diffusion

Accurate longitudinal analysis of brain MRI is often hindered by evolving lesions, which bias automated neuroimaging pipelines. While deep generative models have shown promise in inpainting these lesions, most existing methods operate cross-sectionally or lack 3D anatomical continuity. We present a novel pseudo-3D longitudinal inpainting framework based on Denoising Diffusion Probabilistic Models (DDPM). Our approach utilizes multi-channel conditioning to incorporate longitudinal context from distinct visits (t_1, t_2) and extends Region-Aware Diffusion (RAD) to the medical domain, focusing the generative process on pathological regions without altering surrounding healthy tissue. We evaluated our model against state-of-the-art baselines on longitudinal brain MRI from 93 patients. Our model significantly outperforms the leading baseline (FastSurfer-LIT) in terms of perceptual fidelity, reducing the Learned Perceptual Image Patch Similarity (LPIPS) distance from 0.07 to 0.03 while effectively eliminating inter-slice discontinuities. Furthermore, our model demonstrates high longitudinal stability with a Temporal Fidelity Index of 1.024, closely approaching the ideal value of 1.0 and substantially narrowing the gap compared to LIT's TFI of 1.22. Notably, the RAD mechanism provides a substantial gain in efficiency; our framework achieves an average processing time of 2.53 min per volume, representing approximately 10x speedup over the 24.30 min required by LIT. By leveraging longitudinal priors and region-specific denoising, our framework provides a highly reliable and efficient preprocessing step for the study of progressive neurodegenerative diseases. A derivative dataset consisting of 93 pre-processed scans used for testing will be available upon request after acceptance. Code will be released upon acceptance.


[123] 2603.25645

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation masks, and 133,000 words of clinical descriptions. We utilize Colon-Bench to rigorously evaluate state-of-the-art MLLMs across lesion classification, Open-Vocabulary Video Object Segmentation (OV-VOS), and video Visual Question Answering (VQA). The MLLM results demonstrate surprisingly high localization performance in medical domains compared to SAM-3. Finally, we analyze common VQA errors from MLLMs to introduce a novel "colon-skill" prompting strategy, improving zero-shot MLLM performance by up to 9.7% across most MLLMs. The dataset and the code are available at this https URL .


[124] 2603.29792

Where to Put Safety? Control Barrier Function Placement in Networked Control Systems

Control barrier functions (CBFs) are widely used to enforce safety in autonomous systems, yet their placement within networked control architectures remains largely unexplored. In this work, we investigate where to enforce safety in a networked control system in which a remote model predictive controller (MPC) communicates with the plant over a delayed network. We compare two safety strategies: i) a local myopic CBF filter applied at the plant and ii) predictive CBF constraints embedded in the remote MPC. For both architectures, we derive state-dependent disturbance tolerance bounds and show that safety placement induces a fundamental trade-off: local CBFs provide higher disturbance tolerance due to access to fresh state measurements, whereas MPC-CBF enables improved performance through anticipatory behavior, but yields stricter admissible disturbance levels. Motivated by this insight, we propose a combined architecture that integrates predictive and local safety mechanisms. The theoretical findings are illustrated in simulations on a planar three-degree-of-freedom robot performing a collision-avoidance task.


[125] 2604.12156

Impact of Position Uncertainty on the Secrecy Performance of Pinching Antenna Systems

This paper investigates the secrecy performance of pinching-antenna systems (PAS) under practical pinching-position activation uncertainty. By dynamically selecting the radiation point along a dielectric waveguide, PAS enables low-cost spatial reconfigurability and enhanced secure transmission. Unlike existing studies that assume ideal activation control, we account for spatial inaccuracies caused by hardware limitations and environmental perturbations, which induce statistical dependence between the legitimate and eavesdropping channels. To characterize this uncertainty-induced dependence, PAS-specific marginal signal-to-noise ratio (SNR) distributions are derived, and a Gaussian copula is employed as a tractable representation of the resulting joint SNR distribution, enabling the derivation of approximate expressions for the secrecy outage probability. Simulation results validate the theoretical findings and demonstrate that PAS retains robust secrecy performance compared with conventional fixed-antenna systems, even in the presence of activation uncertainty.


[126] 2604.23354

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to study an XAI topic: analysing, visualising and understanding the unknown organisation of network representations, particularly those a speaker recognition network learns from utterances, for recognising speaker identity. Past studies have employed algorithms (e.g. K-means) to analyse the different ways in which network representations can be naturally grouped into clusters, i.e. to analyse different flat clustering phenomena within the space defined by those representations. In contrast, this work applies two algorithms -- Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) -- to analyse the different ways in which representations from the speaker recognition network can form clusters with hierarchical relationships, i.e., to analyse different hierarchical clustering phenomena within the representation space of the speaker recognition network. Furthermore, an algorithm called Hierarchical Cluster-Class Matching (HCCM) is designed to semantically interpret one of the above hierarchical clustering phenomena analysed using SLINK. Given the clusters representing this phenomenon, HCCM identifies which ones best match individual semantic classes related to gender and nationality (e.g.\ male, female, Ireland, UK) and and-logic conjunctions of these classes (e.g.\ female and Ireland). The Liebig score metric is also proposed within HCCM to quantify the matching quality of each cluster-class pair and diagnose the factor that limits each match.


[127] 2605.05522

Tumor-aware augmentation with task-guided attention analysis improves rectal cancer segmentation from magnetic resonance images

Although self-supervised pretraining is expected to learn broadly transferable representations, its effectiveness across imaging modalities substantially different from the pretraining domain, and on complex tumor-segmentation tasks, remains understudied. Evaluating CT-pretrained transformers on MRI rectal cancer segmentation, we identified two interacting failure modes in CT-to-MRI transfer: (a) inefficient token usage caused by zero-padding to match pretrained input dimensions, and (b) ineffective feature adaptation. We investigated these vulnerabilities using two primary CT-pretrained hierarchical shifted-window transformer backbones, SMIT and Swin UNETR, together with VoCo as a large-scale-pretrained supporting benchmark; these models differ in pretraining objectives and datasets. Mechanistic analysis leveraged an attention dilution index (ADI), an entropy-based metric quantifying attention diverted toward uninformative padding tokens, and centered kernel alignment (CKA) to measure feature reuse during MRI adaptation. ADI increased with zero-padding, while high feature reuse did not necessarily translate to improved downstream accuracy. To mitigate these issues, we introduced two interventions: a tumor-aware augmentation strategy to expand tumor appearance heterogeneity coverage, and an anisotropic cropping strategy to restore token efficiency. Fine-tuning with these strategies on identical rectal MRI datasets yielded detection rates of 91.1% (225/247) and 88.7% (219/247) for the primary SMIT and Swin UNETR backbones, with the supporting VoCo benchmark reaching 90.3% (223/247), demonstrating significantly improved robustness under CT-to-MRI transfer. This study is among the first to examine when pretrained transformers fail to transfer across imaging modalities and demonstrates how targeted mitigation strategies can systematically overcome cross-modality transfer limitations.


[128] 2605.08157

Clinical Feasibility of Smartphone-based EEG in Kenya

Purpose: Access to electroencephalography (EEG) remains limited across low- and middle-income countries (LMICs) due to cost, infrastructure requirements, and a shortage of trained staff. This study evaluated the feasibility and clinical utility of a smartphone-based EEG system in a real-world setting. Methods: We conducted a multicenter observational study (November 2023 to April 2026) across 29 clinical sites in Kenya. A smartphone-based 27-lead EEG system enabled trained healthcare workers to acquire standardized recordings with remote expert interpretation. Results: 3,036 EEG sessions were performed. Male patients constituted 57.8% of the cohort, with representation across pediatric and adult populations. The most common referral indication was seizures or convulsions (68.5%). Overall, 2,915 (96%) recordings were interpretable, while 121 (4%) were uninterpretable, primarily due to high electrode impedance and insufficient recording duration. Uninterpretable recordings were significantly shorter than interpretable recordings (mean 18.5 vs. 33.8 minutes; median 15.1 vs. 31.6 minutes; p < 0.0001). Mean turnaround time for interpretation was 107 minutes. Among interpretable recordings, 917 (30.2%) were abnormal, including 701 (76.4%) with epileptiform abnormalities, 215 (23.4%) with non-epileptiform findings, and 1 (0.1%) indeterminate finding. Epileptiform abnormalities were highest in children aged 4-9 years (33.1%) and less frequent in adults (14-21%). Non-epileptiform abnormalities were more common in patients aged 60+ years (19.2%) compared to younger age groups (3-9%). Conclusion: Large-scale, point-of-care EEG acquisition by non-specialist operators in a resource-limited setting is feasible. Expansion of smartphone-based EEG systems may improve equitable access to neurological diagnosis and care in LMICs.


[129] 2605.11589

Unification of Signal Transform Theory

We unify the discrete Fourier transform (DFT), discrete cosine transform (DCT), Walsh-Hadamard, Haar wavelet, Karhunen-Loève transform (KLT), and several others along with their continuous counterparts (Fourier transform, Fourier series, spherical harmonics, fractional Fourier transform) under one representation-theoretic principle: each is the eigenbasis of every covariance invariant under a specific finite or compact group, with columns constructed from the irreducible matrix elements of the group via the Peter-Weyl theorem. The unification rests on the Algebraic Diversity (AD) framework, which identifies the matched group of a covariance as the foundational object of second-order signal processing. The data-dependent KLT emerges as the trivial-matched-group limit; classical transforms emerge as the cyclic, dihedral, elementary Abelian, iterated wreath, and hybrid wreath cases, with composition rules for direct, wreath, and semidirect products. We also mark the boundary of the construction: the structured points that correspond to no group are the eigenstructures of non-Schurian association schemes, lying just outside the matched-group catalog. A polynomial-time algorithm, the DAD-CAD relaxation cast as a double-commutator generalized eigenvalue problem, discovers the matched group of any empirical covariance without expert judgment, with noise-aware variants via the commutativity residual $\delta$ and algebraic coloring index $\alpha$. The fractional Fourier transform is treated as the metaplectic $SO(2)$ case, and a structural principle relates matched group size inversely to transform resolution. Modern applications (massive-MIMO, graph neural networks, transformer attention, 3D vision, brain connectivity, single-cell genomics, quantum informatics) are sketched with their matched groups.


[130] 2605.18566

HJ-Gauss: A Monte-Carlo HJ Reachability Scheme

Backward reachable tubes (BRTs), computed via grid-based levelset methods for viscous Hamilton-Jacobi (HJ) PDEs, provide principled safety certificates for learned controllers and planning algorithms in control and learning-enabled systems. However, classical grid-based HJ solvers require $O(M^n)$ memory footprint for $M$ grid points per $n$ state dimension. This renders them impractical for high-dimensional systems. We address this bottleneck with a local PDE linearization that enables a frozen-coefficient sampling scheme for the viscous HJ PDE: a generalized Cole-Hopf-type transformation reduces the nonlinear HJ equation to a sequence of linear heat equations, which admits Gaussian heat-kernel representations via the Feynman-Kac formula. The value function and its spatial gradient are then recovered via roll-outs of Monte Carlo expectations on Gaussian densities, yielding a storage-free and grid-free algorithm that scales as $N\cdot n$ for $N$ samples. This decoupling of memory from dimensionality enables reachability analysis on large-scale problems: safety analysis on European starlings' (\textit{sturnus vulgaris}) emergent behavior validated on $\mathbf{100{,}000}$ simulated starlings motion -- modeled as 4D aerial Dubins vehicles. We prove a finite-sample concentration bound $O(N^{-1/2})$ error, conditional linear convergence rates, and establish robustness properties for our introduced scheme. Numerical validation on pursuit-evasion games against the grid-based levelset method demonstrates relative $L^2_{\text{rel}}$ errors of $0.03 - 0.20$, with $14-26$ second wall-clock times per 2D slice on a CPU; and with validation on $n=45$-dimensional multi-agent 2D rocket games. Our numerical results demonstrate real scalability of HJ reachability safety verification on large scale multi-agent systems.


[131] 2605.26627

Breaking the Epistemic Trap: Active Perception Under Compound Uncertainty

Deploying reinforcement learning in safety critical domains, from autonomous vehicles to medical decision support, is constrained by failures arising when systems encounter unfamiliar conditions. We argue that the fundamental bottleneck is not individual challenges like changing dynamics or incomplete observations, but their synergistic interaction, which we term the Epistemic Trap: agents cannot estimate their state without knowing system dynamics, nor learn dynamics without accurate state information. Proof-of-concept experiments in simulated locomotion reveal that combining these uncertainties causes failures far worse than either challenge alone, a 77% observed degradation against the 46% additive prediction, demonstrating that compounding failure modes can emerge and, when they do, far exceed what additive reasoning would predict. Conventional approaches typically adopt a passive epistemic stance that cannot resolve this coupled uncertainty. We propose reframing safety as an information problem. We introduce an Adaptive Safety Architecture built around three contributions. First, the Compound Uncertainty Coefficient ($\kappa$), a mutual-information based metric that quantifies how tightly state and dynamics uncertainties are coupled. Second, information-seeking policies governed by a MaxInfoRL objective that actively probe system dynamics rather than waiting for the environment to reveal itself passively. Third, regime adaptive safety constraints that tighten automatically as epistemic coupling rises. Together, these constitute a paradigm shift from passive robustness to active perception, offering a principled path toward decision making systems that operate under uncertainty, recognize their own ignorance, and act strategically to resolve it.


[132] 2606.01714

Scalable GNN-Based Power Allocation for Rate-Splitting Cell-Free Massive MIMO Systems

Cell-free massive multiple-input multiple-output (CF-mMIMO) systems provide enhanced coverage and capacity for next-generation wireless networks. However, CF-mMIMO systems face significant challenges in downlink power allocation (PA) due to imperfect channel state information (CSI), severe multi-user interference (MUI), and high computational complexity. To address these issues, rate-splitting multiple access (RSMA) is adopted as a robust interference management strategy. Accordingly, this paper proposes an unsupervised and scalable graph neural network (GNN) framework for PA in rate-splitting CF-mMIMO (RS-CF-mMIMO) systems, relying exclusively on large-scale fading (LSF) coefficients without instantaneous CSI. To resolve the dimensionality mismatch in dynamic networks, we introduce a slice-based adaptive layer that projects variable-dimension features into a fixed latent space. This mechanism enables a unified model to generalize across diverse topologies without retraining. Within this architecture, the sum spectral efficiency (SE) is maximized under per-AP power constraints, assuming maximum-ratio precoding for common streams and regularized zero-forcing precoding for private streams. We also derive a weighted minimum mean-square error-alternating direction method of multipliers (WMMSE-ADMM) algorithm as a performance upper bound. Extensive simulations verify that the proposed GNN framework achieves near-optimal SE and outperforms unsupervised deep neural networks (DNNs) across diverse system sizes and pilot assignment schemes. Furthermore, the scalable variant maintains robust performance while reducing the trainable parameter count by over 57% relative to DNNs and decreasing inference latency by up to three orders of magnitude compared with WMMSE-ADMM.


[133] 2606.03283

SpeakerCard-1M: An Evidence-Grounded Corpus for In-the-Wild Speaker Verification

Modern speaker verification (SV) systems rely on speaker embeddings that are effective but difficult to interpret or query in natural language. Most existing speech-text corpora target controllable synthesis or utterance-level captioning, offering limited speaker-level supervision for in-the-wild speaker recognition. This paper introduces SpeakerCard-1M, a bilingual speaker resource for evidence-grounded SV, derived from VoxCeleb1/2 and CN-Celeb1/2, where the ``-1M'' suffix refers to the 1.78M utterance-level captions contained in the release. We adopt a tool-first, LLM-last approach in which ten acoustic probes produce field-level evidence, the evidence is aggregated into speaker profiles under a schema that separates relatively stable traits from utterance-level states, and bilingual Speaker Cards are rendered by a constrained LLM that sees only the structured fields. The release includes 56.7k Speaker Card records over 10.2k speakers, 1.78M utterance-level captions, and speaker-ID-disjoint hard-negative triplets. We further define two SV-oriented cross-modal protocols, bidirectional Speaker-Text Retrieval (T2S-R / S2T-R) and Attribute-Conditioned Verification (AC-Verify), and compare a dual-encoder baseline against recent audio language models under a zero-shot forced-choice setting. Joint audio-text training costs only 0.31% absolute EER on VoxCeleb1-O relative to the audio-only baseline. Under a style-symmetric LLM-generated counterfactual protocol, eight recent audio language models (7B-30B+ parameters, both open- and closed-source) score 49-77% on pitch-level AC-Verify in a 2-way forced-choice setting, compared with 88.66% for our dual encoder.


[134] 2606.04210

Representation Matters in Randomized Smoothing for Audio Classification

Randomized smoothing (RS) certifies robustness in the vector space where Gaussian noise is added. In audio classification, this space is often not uniquely defined as standard pipelines normalize, range-control, and transform waveforms into log-mel or other spectral features. We show that direct RS is therefore under-specified unless the certified object and preprocessing policy are explicit. On two audio benchmarks, keyword spotting and environmental-sound classification, we study waveform, feature-space, and post-processed smoothing. Our diagnostics show why representation-aware reporting is necessary: at the same smoothing level $\sigma=0.0025$, the two datasets share the same median raw radius $.007996$, but different waveform energies yield different SNR-equivalent scales ($83.98$ vs. $90.97$ dB); log-mel smoothing gives higher positive-radius certified accuracy on environmental sounds ($68.42\%$ vs. $65.53\%$), certifying more examples with nonzero radius but over features rather than waveforms; and clipping or peak normalization changes the effective perturbation norm by roughly $230$--$351\times$. We therefore recommend that audio RS studies choose and report the task-specific certified object and perturbation model, including the perturbation location, gain policy, raw radius, and any post-noise geometry changes.


[135] 2606.04361

When Mean Age Is Not Enough: Distribution-Aware Scheduling for Networked LQR Control

Age of Information (AoI) has become a central metric for the design of wireless update systems, especially in applications where fresh measurements support tracking, estimation, and control. Despite its popularity, the use of mean AoI or peak AoI as a surrogate for closed-loop performance is often motivated by intuition rather than by a control-theoretic derivation. This paper examines whether minimizing the mean AoI is in fact optimal for networked control systems. For scalar linear time-invariant systems with delayed intermittent updates, we show that, under state-independent scheduling policies, the infinite-horizon LQR tracking problem reduces to an optimization over the distribution of inter-scheduling intervals. The resulting objective depends on higher-order statistical moments, and in unstable or correlated regimes on exponential moments, of the inter-scheduling process rather than only on its mean. Consequently, policies with identical mean AoI can induce substantially different tracking costs. We further extend the analysis to disturbances with exponentially decaying autocorrelation and derive equivalent cost formulations that expose the role of the full interval distribution. Finally, we evaluate the theory using real vehicle trajectories from the NGSIM US-101 dataset. The empirical results match the predicted performance trends, demonstrating that mean AoI alone is insufficient for control-oriented network design.


[136] 2606.08255

Exactness Certificates for Closed-Form CBF Safety-Filter Projections

For control-affine systems, standard and high-order control barrier function conditions are affine in the control input and are commonly enforced through quadratic-program-based safety filters. Although convex, these optimization problems may be undesirable in embedded, high-rate, or resource-limited implementations. This letter characterizes when the corresponding Euclidean projection can be recovered from the affine inequalities violated by a nominal control input. Given a nominal input, we form the violated set and compute the minimum-norm correction that enforces the violated inequalities with equality. This violated-set correction is closed form, but it need not equal the exact Euclidean projection onto the full feasible set. The main result gives a necessary and sufficient exactness certificate based on primal and dual feasibility, followed by structural sufficient conditions involving interactions among affine-inequality normals. An online certification algorithm is then presented to determine when the closed-form update is exact. When the certificate fails, a finite active-set search can be used to recover the exact projection. Numerical simulations illustrate that the violated-set correction can remain feasible while failing to be the exact projection due to dual infeasibility, and demonstrate computational speedup relative to a standard CBF-QP solver.


[137] 2606.17570

Fine-UNETR for PSMA PET/CT Lesion Segmentation: Automated Tumor Quantification and Overall Survival Stratification in Prostate Cancer

Introduction: To develop and evaluate Fine-UNETR, a Vision Transformer-based architecture for automated segmentation of PSMA-avid lesions on whole-body PET/CT, and to assess clinical utility of AI-derived tumor burden biomarkers for overall survival stratification in radioligand therapy. Methods: In this retrospective study, 373 PSMA PET/CT scans (mean age, 71+-8 years) from patients with prostate cancer were analyzed. Fine-UNETR, a modified UNETR with 8x8x8 voxel patch embedding and axial sliding window training, was trained on 299 scans and validated on 74 scans. Overall survival stratification was assessed in an independent cohort of 67 pre-radioligand therapy patients using Kaplan-Meier analysis and log-rank testing. External validation was performed on 192 cases from the AutoPET IV PSMA PET/CT dataset. Results: Fine-UNETR achieved a Dice similarity coefficient (DSC) of 66.63%, sensitivity of 70.27%, precision of 67.77%, and a lesion detection rate of 79.53% (96.05% for lesions with SUVmax >= 5). On the external validation dataset, the model achieved a DSC of 44.11% and a lesion detection rate of 87.18%, indicating that lesion detection performance was preserved despite reduced voxel-level overlap. AI-derived biomarkers showed excellent agreement with ground truth (total tumor volume: r=0.984; total lesion uptake: r=0.989; lesion count: r=0.960). In the clinical cohort, total tumor volume (p=0.0019), SUVmax (p=0.014), and SUVmean (p=0.016) significantly stratified overall survival. Conclusion: Fine-UNETR enables accurate automated whole-body PSMA lesion segmentation and tumor burden quantification. Performance on an external dataset demonstrates robustness despite evidence of domain shift. AI-derived biomarkers significantly stratified overall survival in a pre-radioligand therapy cohort, supporting the clinical utility of automated PSMA PET/CT quantification for prognostication.


[138] 2606.23452

Industrial electrification in the era of data centers: A Bayesian Optimization approach for grid-aware large load allocation

Large loads from industrial electrification and data centers are reshaping the planning and operation of the power grid. Identifying optimal large load siting decisions while accounting for transmission congestion is key to reducing expansion cost and operational risks. In this paper, we propose a leader-follower bilevel optimization framework to identify optimal large load allocation strategies. The leader determines the allocation of large loads, while the followers determine grid expansion cost and transmission utilization. This modeling approach explicitly integrates strategic planning with detailed short-term operational decisions. Moreover, we develop a Bayesian Optimization approach to efficiently solve the bilevel optimization problem by treating the followers as a black box. We use the framework to study large-scale load allocation from electrified oil refineries and data centers on a synthetic power grid that resembles key characteristics of the Texas (ERCOT) system. The results show that these large loads compete for electricity, and under high-load scenarios, data center demand is distributed across the entire grid, avoiding regions with high demand from industrial electrification.


[139] 2606.24147

Progressive Alignment Objectives for Aligner-Encoder based ASR

Aligner-Encoders are recently proposed seq2seq end-to-end ASR models that replace decoder attention by predicting the uth token directly from the u-th encoder position, so the encoder must learn the alignment internally without cross-attention or a transducer lattice. In practice, this alignment often forms abruptly in the upper layers, making training sensitive and brittle on long utterances. We propose InterAligner, which adds an intermediate Aligner objective so alignment can form progressively across depth, together with an intermediate CTC loss (InterCTC) to stabilize optimization. On LibriSpeech with a 17-layer Conformer, a final-only Aligner reaches 5.0/7.8 WER (test-clean/other). InterCTC improves to 3.4/6.0, and InterAligner further reduces WER to 3.1/5.6 with the largest gains on long utterances.


[140] 2606.25672

Joint Residual Reweighting for Classifier Free Guidance in Flow-Matching Zero-Shot TTS

Classifier-free guidance (CFG) is widely used in flow-matching-based zero-shot text-to-speech (TTS), where generation is typically controlled by two conditions: the target text and a prompt speech signal. Standard CFG strengthens these conditions jointly, while recent branch-selective guidance methods attempt to enhance text or speaker conditioning separately, often leading to a trade-off between text correctness and speaker similarity. In this paper, we revisit the CFG under independently masked text and speech-prompt conditions, and decompose the guidance field into text, speaker, and joint residuals. We show that conventional speaker-selective guidance entangles the speaker residual with the joint residual, which may disturb text-related generation. Based on this observation, we propose joint residual reweighting, which independently controls the speaker and joint residuals within the standard CFG framework. Experiments on F5-TTS and CosyVoice2 show that the proposed method improves speaker similarity while maintaining competitive text correctness, demonstrating the usefulness of the joint residual for balancing speaker fidelity and text accuracy in zero-shot TTS.


[141] 2411.15490

Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

Acute ischemic stroke (AIS) requires time-critical decision-making, where inaccurate interpretation of neuroimaging findings can lead to irreversible disability. Diffusion-weighted imaging (DWI) and apparent diffusion coefficient (ADC) maps from magnetic resonance imaging (MRI) are central to detecting acute infarction, yet generating factually reliable radiology reports directly from 3D MRI remains challenging due to the difficulty of learning robust cross-modal alignments between volumetric images and clinical text. We propose paired image-domain retrieval and text-domain augmentation (PIRTA), a retrieval-augmented generation framework that improves report factuality by avoiding explicit image-text alignment. PIRTA retrieves clinically similar 3D DWI/ADC volumes using a pretrained 3D vision encoder and leverages their paired clinician-authored reports to ground large language model (LLM)-based report generation. Experiments on multi-institutional in-house data, a held-out external privacy-preserving cohort, and the public ISLES benchmark demonstrate that PIRTA achieves strong image-domain retrieval performance and consistently improves ischemic-territory accuracy, a clinically grounded surrogate for report factuality, compared to direct image-to-text baselines. These results indicate that retrieval-grounded generation provides a scalable and reliable paradigm for producing factually consistent radiology reports from complex 3D brain MRI. Source code is available at this https URL.


[142] 2504.20383

Neural Stereo Video Compression with Hybrid Disparity Compensation

Disparity compensation represents the primary strategy in stereo video compression (SVC) for exploiting cross-view redundancy. These mechanisms can be broadly categorized into two types: one that employs explicit horizontal shifting, and another that utilizes an implicit cross-attention mechanism to reduce cross-view disparity redundancy. In this work, we propose a hybrid disparity compensation (HDC) strategy that leverages explicit pixel displacement as a robust prior feature to simplify optimization and perform implicit cross-attention mechanisms for subsequent warping operations, thereby capturing a broader range of disparity information. Specifically, HDC first computes a similarity map by fusing the horizontally shifted cross-view features to capture pixel displacement information. This similarity map is then normalized into an "explicit pixel-wise attention score" to perform the cross-attention mechanism, implicitly aligning features from one view to another. Building upon HDC, we introduce a novel end-to-end optimized neural stereo video compression framework, which integrates HDC-based modules into key coding operations, including cross-view feature extraction and reconstruction (HDC-FER) and cross-view entropy modeling (HDC-EM). Extensive experiments on SVC benchmarks, including KITTI 2012, KITTI 2015, and Nagoya, which cover both autonomous driving and general scenes, demonstrate that our framework outperforms both neural and traditional SVC methodologies.


[143] 2505.11688

On the Sharp Input-Output Analysis of Nonlinear Systems under Adversarial Attacks

This paper is concerned with learning the input-output mapping of general nonlinear dynamical systems. While the existing literature focuses on Gaussian inputs and benign disturbances, we significantly broaden the scope of admissible control inputs and allow correlated, nonzero-mean, adversarial disturbances. With our reformulation as a linear combination of basis functions, we prove that the $\ell_2$-norm estimator overcomes the challenges posed by an adversary with access to the full information history, provided that the attack times are sparse, i.e., the probability that the system is under adversarial attack at a given time is smaller than a certain threshold. We provide an estimation error bound that decays with the input memory length and prove its optimality by constructing a problem instance that suffers from the same bound under probabilistic adversarial attacks. Our work provides a sharp input-output analysis for a generic nonlinear and partially observed system under significantly generalized assumptions compared to existing works.


[144] 2506.05121

The NTNU System at the S&I Challenge 2025 SLA Open Track

A recent line of research on spoken language assessment (SLA) employs neural models such as BERT and wav2vec 2.0 (W2V) to evaluate speaking proficiency across linguistic and acoustic modalities. Although both models effectively capture features relevant to oral competence, each exhibits modality-specific limitations. BERT-based methods rely on ASR transcripts, which often fail to capture prosodic and phonetic cues for SLA. In contrast, W2V-based methods excel at modeling acoustic features but lack semantic interpretability. To overcome these limitations, we propose a system that integrates W2V with Phi-4 multimodal large language model (MLLM) through a score fusion strategy. The proposed system achieves a root mean square error (RMSE) of 0.375 on the official test set of the Speak & Improve Challenge 2025, securing second place in the competition. For comparison, the RMSEs of the top-ranked, third-ranked, and official baseline systems are 0.364, 0.384, and 0.444, respectively.


[145] 2510.14511

Stability Boundaries and Motor Performance in Delayed Robot-Mediated Dyadic Interactions

This paper establishes analytical stability boundaries for robot-mediated human-human (dyadic) interaction systems, subject to haptic communication under network-induced time delays. Bypassing conservative approximations, we employ a frequency-domain zero-crossing methodology to extract explicit stability limits based on the robotic hardware dynamics and coupling stiffness. To demonstrate the scalability of this mathematical framework, we extend the analysis from an elastic coupling to a highly complex, asymmetric virtual proxy topology. The theoretical analysis reveals how interaction stiffness non-linearly constrains the system's stability margin, heightening its vulnerability to delay. Furthermore, we validate these theoretical boundaries through experimental trials, highlighting the correlation between analytical stability margins and empirical motor performance. The proposed framework provides rigorous design guidelines for stable remote dyadic systems and suggests the prerequisites for effective delay-compensation strategies.


[146] 2511.00870

A Distributed Plug-and-Play MCMC Algorithm for High-Dimensional Inverse Problems

Markov Chain Monte Carlo (MCMC) algorithms are standard approaches to solve imaging inverse problems and quantify estimation uncertainties, a key requirement in absence of ground-truth data. To improve estimation quality, Plug-and-Play MCMC algorithms, such as PnP-ULA, have been recently developed to accommodate priors encoded by a denoising neural network. Designing scalable samplers for high-dimensional imaging inverse problems remains a challenge: drawing and storing high-dimensional samples can be prohibitive, especially for high-resolution images. To address this issue, this work proposes a distributed sampler based on approximate data augmentation and PnP-ULA to solve very large problems. The proposed sampler uses lightweight denoising convolutional neural network, to efficiently exploit multiple GPUs on a Single Program Multiple Data architecture. Reconstruction performance and scalability are evaluated on several imaging problems. Communication and computation overheads due to the denoiser are carefully discussed. The proposed distributed approach noticeably combines three very precious qualities: it is scalable, enables uncertainty quantification, for a reconstruction performance comparable to other PnP methods.


[147] 2511.05715

Policy Stability for Measuring Operational Performance in Task Assignment with Time-Windows Under Internal Adversarial Influence

We study autonomous pickup-and-delivery routing problems in which internal adversarial agents spoof their locations to attract request assignments and then intentionally leave those requests unserviced. Such attacks disrupt the centralized scheduler, causing delays, cancellations, and routing instability. A routing policy is stable if its cost remains uniformly bounded over time. Existing policy-cost formulations typically characterize cost through the work required to service outstanding requests. Such a formulation requires analyzing agent-specific route execution and is therefore not well suited to adversarial settings, where non-cooperative agents may arbitrarily deviate from assigned routes or fail to service requests altogether. We introduce a new policy-cost formulation based only on observable system signals, namely the numbers of outstanding and canceled requests. Under bounded arrivals and finite request time windows, we show that stability under this formulation is equivalent to keeping the expected cumulative number of canceled requests uniformly bounded over time, an important operational metric in both cooperative and adversarial settings. We also extend cooperative fleet-sizing guarantees to finite time-window settings and highlight that request time windows are not merely a modeling detail, but are essential for ruling out \emph{degenerate stability}, a regime in which policies are certified as stable despite undesirable large request backlogs.


[148] 2511.17038

DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing

From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its practical behavior: the prior offers limited guidance, while reconstruction is largely driven by the measurement-consistency term, leading to an inference process that is effectively decoupled from the diffusion dynamics. We show that the diffusion prior in these solvers functions primarily as a warm initializer that places estimates near the data manifold, while reconstruction is driven almost entirely by measurement consistency. Based on this observation, we introduce \textbf{DAPS++}, which fully decouples diffusion-based initialization from likelihood-driven refinement, allowing the likelihood term to guide inference more directly while maintaining numerical stability and providing insight into why unified diffusion trajectories remain effective in practice. By requiring fewer function evaluations (NFEs) and measurement-optimization steps, \textbf{DAPS++} achieves high computational efficiency and robust reconstruction performance across diverse image restoration tasks.


[149] 2511.21041

Data-driven control of continuous-time systems: A synthesis-operator approach

This paper addresses data-driven control of continuous-time systems. We develop a framework based on synthesis operators associated with state and input trajectories. A key advantage of the proposed method is that it does not require the state derivative and uses continuous-time data directly without sampling or filtering. First, systems consistent with the data are represented in terms of synthesis operators, into which the data trajectories are embedded. Next, we characterize data informativity properties for system identification and for stabilization in the noise-free case. Finally, we establish a necessary and sufficient condition for noisy data to be informative for quadratic stabilization. All these informativity characterizations are formulated in terms of finite-dimensional matrices, by leveraging the finite-rank structure of the synthesis operators.


[150] 2512.14022

Symbol Distributions in Semantic Communications: A Source-Channel Equilibrium Perspective

Semantic communication systems often use end-to-end neural networks to map input data into continuous symbols. These symbols, which are essentially neural network features, have fixed dimensions and often exhibit heavy-tailed distributions. However, the mechanism behind this distributional shape remains underexplored due to the end-to-end nature of encoder training, hindering systematic analysis and design. In this paper, we propose a parametric model for semantic symbol distributions. We model end-to-end training as inducing two coupled pressures on the symbol distribution: a source pressure that favors power allocation minimizing the average description cost, and a channel pressure that favors distributions with higher channel utilization. Under surrogate objectives that capture these effects, we obtain a Student's t-distribution as a model for the semantic symbols. Experiments on image-based semantic systems show that the model closely predicts how the shape parameter varies with (i) explicit symbol rate control and (ii) dataset entropy variability. Furthermore, enforcing a target symbol distribution via regularization (e.g., a Gaussian prior) improves training convergence, which is consistent with our hypothesis.


[151] 2512.19612

MauBERT: Universal Phonetic Inductive Biases for Few-Shot Acoustic Units Discovery

This paper introduces MauBERT, a multilingual extension of HuBERT that leverages articulatory features for robust cross-lingual phonetic representation learning. We continue HuBERT pre-training with supervision based on a phonetic-to-articulatory feature mapping in 55 languages. Our models learn from multilingual data to predict articulatory features or phones, resulting in language-independent representations that capture multilingual phonetic properties. Through comprehensive ABX discriminability testing, we show MauBERT models produce more context-invariant representations than state-of-the-art multilingual self-supervised learning models. Additionally, the models effectively adapt to unseen languages and casual speech with minimal self-supervised fine-tuning (10 hours of speech). This establishes an effective approach for instilling linguistic inductive biases in self-supervised speech models.


[152] 2512.20117

Delayed Bidirectional Alignment via Disentangled Audio Semantics for Audio-Visual Segmentation

Audio-Visual Segmentation (AVS) aims to localize sound-producing objects at the pixel level by integrating auditory and visual cues. However, existing methods often struggle with multi-source entanglement and audio-visual misalignment, leading to a dominance bias toward acoustically or visually salient objects (i.e., louder or larger ones) at the expense of subtler or co-occurring sources. To address these challenges, we propose DDAVS: Delayed Bidirectional Alignment via Disentangled Audio Semantics for Audio-Visual Segmentation. To mitigate multi-source entanglement, DDAVS employs learnable queries to extract audio semantics and anchor them within a structured semantic space derived from an audio prototype memory bank. This process is further optimized through contrastive learning to enhance discriminability and robustness. To alleviate audio-visual misalignment, DDAVS introduces dual cross attention with delayed modality interaction, improving the robustness of multimodal alignment. Extensive experiments on the AVS-Objects and VPO benchmarks demonstrate that DDAVS achieves state-of-the-art performance across single-source, multi-source, and multi-class multi-instance scenarios. These results validate the effectiveness and generalization ability of our framework under challenging real-world audio-visual segmentation conditions. Project page: this https URL


[153] 2512.20211

Aliasing-Free Neural Audio Synthesis

In neural audio synthesis, neural vocoders and codecs are models that reconstruct waveforms from acoustic and latent representations, which are essential to the resulting audio quality. While current models are capable of generating perceptually natural speech, they still struggle with high-fidelity music and singing voice synthesis, as severe aliasing artifacts are introduced by non-linear activation functions and upsampling layers in existing architectures. Although various anti-aliasing techniques have been proposed in digital signal processing, their integration into neural vocoders and codecs remains under-explored. This paper incorporates differentiable anti-aliasing techniques into the activation and upsampling modules to bridge this gap, and thus presents Pupu-Vocoder and Pupu-Codec. We build a test signal benchmark to evaluate the anti-aliased modules, and validate our proposed models on speech, singing voice, music, and audio. Experimental results show that Pupu-Vocoder and Pupu-Codec outperform existing systems on singing voice, music, and audio, while achieving comparable performance on speech. Demos, codes, and checkpoints are available at: this http URL.


[154] 2601.05329

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models

Automatic speech editing aims to modify spoken content based on textual instructions, yet traditional cascade systems rely on explicit temporal alignment and complex preprocessing. To address these limitations, we propose CosyEdit, an end-to-end speech editing model adapted from CosyVoice through task-specific post-training and a complementary training paradigm, which internalizes text--speech alignment while ensuring high consistency between the speech before and after editing. Trained on only 250 hours of supervised data from our curated GigaEdit dataset, our 400M-parameter model achieves reliable speech editing performance. Extensive evaluations show that CosyEdit not only outperforms several billion-parameter language model baselines but also approaches state-of-the-art cascade systems. These results show that robust and efficient speech editing can be unlocked from a zero-shot TTS model through post-training, offering a cost-effective end-to-end solution for high-quality speech editing. Code and audio samples are available at this https URL.


[155] 2602.02992

Data-driven stabilization of continuous-time systems with noisy input-output data

We study data-driven stabilization of continuous-time systems in autoregressive form when only noisy input-output data are available. First, we provide an operator-based characterization of the set of systems consistent with the data. Next, combining this characterization with behavioral theory, we establish a necessary and sufficient condition for the noisy data to be informative for quadratic stabilization. This condition is formulated in terms of linear matrix inequalities, whose solutions yield a stabilizing controller. Finally, we characterize data informativity for system identification in the noise-free setting.


[156] 2602.15335

The Corrected Inverse-Gaussian: A Tractable First-Hitting-Time Channel Model for Nonstationary Molecular Communication

This paper develops a tractable analytical channel model for first-hitting-time molecular communication (MC) systems under time-varying drift. While existing studies of nonstationary transport rely primarily on numerical solutions of advection-diffusion equations or parametric impulse-response fitting, they do not provide an explicit analytical description of trajectory-level arrival dynamics at absorbing boundaries. By adopting a change-of-measure formulation, we reveal a structural decomposition of the first-hitting-time density into a cumulative-drift displacement term and a stochastic boundary-flux modulation factor. This leads to a closed-form analytical approximation, termed the calibrated Corrected-Inverse-Gaussian (C-IG) density, that advances the stationary-drift IG channel law to deterministic nonstationary drift while preserving O(1) evaluation complexity. Monte Carlo simulations under both smooth pulsatile and abrupt switching drift profiles confirm that the proposed C-IG model accurately captures complex transport phenomena, including phase modulation, multi-pulse dispersion, and transient backflow--effects that traditionally complicate symbol synchronization and induce severe inter-symbol interference. The resulting framework provides a physics-informed, computationally efficient MC channel law suitable for system-level analysis and advanced receiver design, such as real-time maximum likelihood detection, in dynamic biological and MC environments.


[157] 2602.15727

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Visual analogy learning enables image editing via demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet $\{\mathbf{a}$, $\mathbf{a}'$, $\mathbf{b}\}$, the goal is to generate $\mathbf{b}'$ such that $\mathbf{a} : \mathbf{a}' :: \mathbf{b} : \mathbf{b}'$. Recent methods adapt text-to-image models with a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed module constrains generalization. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, which specializes the model for each analogy task in a single inference pass. LoRWeB dynamically composes learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRAs to span the space of different visual transformations, and (2) a lightweight encoder that dynamically weighs these basis LoRAs given the input analogy pair. Comprehensive evaluations demonstrate state-of-the-art performance and significantly improved generalization to unseen transformations. Our findings suggest LoRA basis decompositions are a promising direction for flexible visual manipulation tasks. See this https URL for code.


[158] 2602.17393

Contact-Anchored Proprioceptive Odometry for Legged and Wheel-Legged Robots

Reliable odometry for legged robots without cameras or LiDAR remains challenging due to IMU drift and noisy joint velocity sensing. This paper presents a purely proprioceptive state estimator that uses only IMU and motor measurements to estimate body pose and velocity, with a unified formulation applicable to quadruped and wheel-legged robots and extensible to other legged morphologies. The key idea is to treat each reliable contact as a kinematic anchor: joint-torque--based foot wrench estimation selects stance contacts, and the corresponding footfall records provide intermittent world-frame constraints that suppress long-term drift. To prevent elevation drift during extended traversal, we introduce a lightweight height clustering and time-decay correction that snaps newly recorded footfall heights to previously observed support planes. For wheel-legged platforms, the recorded contact is further propagated by effective wheel rolling displacement with shank-motion compensation and a slope-aware rolling direction. To improve foot velocity observations under encoder quantization, we retain an inverse-kinematics cubature Kalman filter as an optional velocity-enhancement module that filters foot-end velocities from joint angles and velocities. The implementation further mitigates yaw drift through multi-contact geometric consistency, which is injected as a soft heading prior rather than as a hard reset of the attitude state. The method is evaluated on four quadruped platforms.


[159] 2602.18452

RA-QA: A Benchmarking System for Respiratory Audio Question Answering Under Real-World Heterogeneity

As conversational multimodal AI tools are increasingly adopted to process patient data for health assessment, robust benchmarks are needed to measure progress and expose failure modes under realistic conditions. Despite the importance of respiratory audio for mobile health screening, respiratory audio question answering remains underexplored, with existing studies evaluated narrowly and lacking real-world heterogeneity across modalities, devices, and question types. We hence introduce the \textbf{Respiratory-Audio Question-Answering (RA-QA) benchmark}, including a standardized data generation pipeline, a comprehensive multimodal QA collection, and a unified evaluation protocol. RA-QA harmonizes public RA datasets into a collection of 9 million format-diverse QA pairs covering diagnostic and contextual attributes. We benchmark general audio-language models as well as domain-specific architectures, establishing reproducible reference points and showing how current approaches fail under heterogeneity.


[160] 2603.02149

3D Field of Junctions: A Noise-Robust, Training-Free Structural Prior for Volumetric Inverse Problems

Volume denoising is a foundational problem in computational imaging, as many 3D imaging inverse problems face high levels of measurement noise. Inspired by the strong 2D image denoising properties of Field of Junctions (ICCV 2021), we propose a novel, fully volumetric 3D Field of Junctions (3D FoJ) representation that optimizes a junction of 3D wedges that best explain each 3D patch of a full volume, while encouraging consistency between overlapping patches. In addition to direct volume denoising, we leverage our 3D FoJ representation as a structural prior that: (i) requires no training data, and thus precludes the risk of hallucination, (ii) preserves and enhances sharp edge and corner structures in 3D, even under low signal to noise ratio (SNR), and (iii) can be used as a drop-in denoising representation via projected or proximal gradient descent for any volumetric inverse problem with low SNR. We demonstrate successful volume reconstruction and denoising with 3D FoJ across three diverse 3D imaging tasks with low-SNR measurements: low-dose X-ray computed tomography (CT), cryogenic electron tomography (cryo-ET), and denoising point clouds such as those from lidar in adverse weather. Across these challenging low-SNR volumetric imaging problems, 3D FoJ outperforms the evaluated classical denoisers, untrained neural denoisers, and denoisers trained only on noisy examples. Code is available at this https URL.


[161] 2603.02364

When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

We introduce LRLspoof, a large-scale multilingual synthetic-speech corpus for cross-lingual spoof detection, comprising 2,732 hours of audio generated with 24 open-source TTS systems across 66 languages, including 45 low-resource languages under our operational definition. To evaluate robustness without requiring target-domain bonafide speech, we benchmark 11 publicly available countermeasures using threshold transfer: for each model we calibrate an EER operating point on pooled external benchmarks and apply the resulting threshold, reporting spoof rejection rate (SRR). Results show model-dependent cross-lingual disparity, with spoof rejection varying markedly across languages even under controlled conditions, highlighting language as an independent source of domain shift in spoof detection. The dataset is publicly available at \href{this https URL}{\textbf{\underline{\textit{HuggingFace}}}} and \href{this https URL}{\textbf{\underline{\textit{ModelScope}}}}


[162] 2603.13082

InterEdit: Navigating Text-Guided 3D Dyadic Human Motion Editing

Text-guided 3D motion editing has seen success in single-person scenarios, but its extension to multi-person settings is less explored due to limited paired data and the complexity of inter-person interactions. We introduce the task of multi-person 3D motion editing, where a target motion is generated from a source and a text instruction. To support this, we propose InterEdit3D, a new dataset with manual two-person motion change annotations, and a Text-guided Multi-human Motion Editing (TMME) benchmark. We present InterEdit, a synchronized classifier-free conditional diffusion model for TMME. It introduces Semantic-Aware Plan Token Alignment with learnable tokens to capture high-level interaction cues and an Interaction-Aware Frequency Token Alignment strategy using DCT and energy pooling to model periodic motion dynamics. Experiments show that InterEdit improves text-to-motion consistency and edit fidelity, achieving state-of-the-art TMME performance. The dataset and code will be released at this https URL.


[163] 2603.16016

FlatLands: Generative Floormap Completion From a Single Egocentric View

A single egocentric image typically captures only a small portion of the floor, yet a complete metric traversability map of the surroundings would better serve applications such as indoor navigation. We introduce FlatLands, a dataset and benchmark for single-view bird's-eye view (BEV) floor completion. The dataset contains 270,575 observations from 17,656 real metric indoor scenes drawn from six existing datasets, with aligned observation, visibility, validity, and ground-truth BEV maps, and the benchmark includes both in- and out-of-distribution evaluation protocols. We compare training-free approaches, deterministic models, ensembles, and stochastic generative models. Finally, we instantiate the task as an end-to-end monocular RGB-to-floormaps pipeline. FlatLands provides a rigorous testbed for uncertainty-aware indoor mapping and generative completion for embodied navigation.


[164] 2603.27833

Separation is Optimal for LQR under Intermittent Feedback

We study finite-horizon linear-quadratic regulation of a scalar linear system with intermittent state feedback under an average communication-rate constraint. In this setting, the scheduling policy and controller are generally coupled through the dual effect: transmission decisions shape future estimation errors, while control actions influence the information available for scheduling. Existing treatments often recover tractability by restricting attention to symmetric scheduling policies, but the optimality of this restriction has remained unclear. We show that, for i.i.d. zero-mean disturbances, symmetric policies are optimal. Consequently, the communication-constrained LQR problem admits a separation structure. The optimal controller is a linear feedback law independent of the scheduling policy, while the optimal scheduler is obtained from a dynamic program. We further show that the optimal scheduling rule is a symmetric threshold policy in the accumulated disturbance since the most recent update.


[165] 2604.04280

Resilient Decentralized Ergodic Coverage for Scalable Multi-Robot Systems in Unknown Time-Varying Environments

Maintaining situational awareness in disaster response, environmental monitoring, and search and rescue requires balancing exploration of unobserved regions with sustained monitoring of changing Regions of Interest (ROIs), often under unknown and time-varying distributions, partial observability, and limited communication. We propose a decentralized multi-agent coverage framework that serves as a high-level planning strategy, in which each agent computes an adaptive ergodic policy, implemented via a Markov-chain transition model, that tracks a continuously updated belief over the underlying importance map. Beliefs are maintained online via Gaussian Process (GP) regression from local noisy observations exchanged with neighbors. The resulting policy drives agents to spend time in ROIs in proportion to their estimated importance, while preserving sufficient exploration to detect and adapt to time-varying environmental changes. Unlike existing approaches that assume known importance maps, centralized coordination, or a static environment, our framework addresses the combined challenges of unknown, time-varying distributions under a decentralized, partially observable setting. We further show that our framework is robust to communication and memory degradation, robot loss, and can scale up to hundreds of robots.


[166] 2604.14603

A Synonymous Variational Perspective on the Rate-Distortion-Perception Tradeoff

The fundamental limit of natural signal compression has traditionally been characterized by classical rate-distortion (RD) theory through the tradeoff between coding rate and reconstruction distortion, while the rate-distortion-perception (RDP) framework introduces a divergence-based measure of perceptual quality as a modeling principle, leaving its theoretical origin unclear. In this paper, motivated by a synonymity-based semantic information perspective, we reformulate perceptual reconstruction as recovering any admissible sample within an ideal synonymous set (synset) associated with the source, rather than the source sample itself, and establish a synonymous source coding architecture. On this basis, we develop a synonymous variational inference (SVI) analysis framework with a synonymous variational lower bound (SVLBO) for tractable analysis of synset-oriented compression. Within this framework, we establish a synonymity-perception consistency principle, showing that optimal identification of semantic information is theoretically consistent with perceptual optimization. Based on this result, we further derive a tight-bound synonymous source coding rate characterization and show that its Jensen-limit relaxation leads to a synonymous rate-distortion-perception form for practical optimization. These analytical results show that the distributional divergence term arises naturally from the synset-based reconstruction objective, clarify its compatibility with existing RDP formulations and classical RD theory, and suggest the potential advantages of synonymous source coding.


[167] 2604.27290

Boundedness of solutions in feedback systems with antithetic controllers

Antithetic feedback controllers have become a key experimental and theoretical tool in synthetic biology. Introduced by Khammash and collaborators about 10 years ago, they are employed in order to achieve the practical regulation of protein expression, including tracking and robust disturbance rejection. In closed-loop, there are unique equilibria which, depending on parameter values, can be unstable. It had been shown, however, that this instability is not arbitrary: any bounded trajectory that stays away from the equilibrium must converge to a periodic orbit. This motivated a long-standing open question: is every trajectory bounded? In other words, even if the equilibrium is unstable, can nonlinear effects prevent unbounded excursions in the state space? This paper provides an affirmative answer, establishing the boundedness of all solutions. Previous attempts to prove this fact using Lyapunov functions had no success. Instead, this paper takes a completely different approach, specific to antithetic configurations, in which the key idea is to think of the controller as providing a ``persistently negative feedback'' which acts far away from the equilibrium in such a way so as to keep trajectories from diverging. This new approach, although tailored to the antithetic controller, might be useful in other applications as well.


[168] 2606.09620

Motion planning for hundreds of floating robots

Planning collision-free motion for large robot fleets is difficult because collision avoidance induces strong inter-agent coupling that grows rapidly with team size. We consider omnidirectional floating robots on water, where choreographies are specified by sparse keyframes and an interactive tool must generate trajectories within seconds, even when transitions span minutes and thousands of time steps. We propose a scalable pipeline that builds a collision graph from an initialization, decomposes the coupled problem into interaction clusters, and solves clusters independently (and in parallel) with robustness mechanisms for common decomposition pathologies. We validate the approach in simulations up to 500 robots. The synthesized trajectories have also been deployed in two real-world demonstrations, on Lake Zürich with a fleet of 24 Way of Water crafts and at the Time Space Existence 2025 Venice Biennale.


[169] 2606.15083

REGRID-QAOA: A Resource-Efficient Hybrid QAOA Framework for Physics-Constrained Power System Islanding

Quantum computing has rapidly emerged as a powerful paradigm for tackling computationally demanding problems. In particular, quantum optimization shows strong promise for hard combinatorial problems in power systems, where increasing distributed energy penetration heightens the need for intentional islanding to maintain grid reliability and resilience. However, power system islanding is an NP-hard combinatorial optimization problem that becomes computationally prohibitive for classical solvers as network size grows, motivating the use of quantum computing as a promising alternative pipeline. This study develops a resource-efficient hybrid QAOA islanding framework that brings physics-constrained power-system partitioning into the quantum optimization workflow. The framework combines coherency-informed graph reduction, physics-aware constraint modeling, and structured post-processing to efficiently convert shallow-circuit QAOA samples into high-quality feasible islanding decisions without deep circuits or large shot budgets. The proposed framework is validated on the standard IEEE benchmark systems (9-, 14-, 24-, 30-, 39-, and 57-bus), demonstrating that the hybrid workflow achieves Gurobi-optimal solution quality with a clear quantum resource advantage over vanilla QAOA, while the resulting islanding solutions satisfy all physical feasibility requirements after network separation. This study establishes QAOA-based islanding as a viable quantum approach for critical infrastructure, with structured post-processing as the key enabler of quantum resource efficiency.