New articles on Electrical Engineering and Systems Science


[1] 2606.26187

Tracking the Turn: Mamba-Powered Human Orientation Detection using UWB

User orientation is crucial for many context-aware applications, including interactive museum experiences, smart door access, and intuitive human-environment interaction. However, most existing indoor localization systems focus on estimating position, while body orientation is typically assigned to secondary devices such as inertial measurement units. In this paper, we propose a purely UWB-based approach that predicts yaw orientation directly from UWB Channel Impulse Response (CIR) measurements recorded at fixed anchors as they receive transmissions from a single wearable tag. We use a bidirectional Mamba architecture that captures dependencies across the anchor observations through forward and backward recurrent scans. The model uses per-anchor CIR and a body-part conditioning module to adapt the representation to different tag placements on the body. Two different Kalman filters are used as post-processing stages to exploit temporal continuity: an orientation-based filter that smooths the neural network predictions, and a location-based filter that additionally incorporates position-derived heading corrections. We evaluated the model's performance in different scenarios to ensure generalizability. The proposed Mamba model achieves a mean absolute error of 38.6 degrees in its raw form, outperforming a rule-based baseline of 49.5 degrees. With the location-based Kalman filter, the error is further reduced to 18.9 degrees, corresponding to a 51% reduction.


[2] 2606.26236

Rendering Novel Views of MRI Using 3D Gaussian Splatting

The objective of this paper is to improve radiological gradings measured on MRIs of spines, by resampling scans so that the new view planes are better aligned with the target anatomy than the original sparse images. To this end, we adapt 3D Gaussian Splatting to form a volumetric reconstruction starting from sparse anisotropic MRIs, and imaging planes aligned with the anatomy relevant for clinical evaluation are then sampled and rendered. The novel view plane is optimal for diagnostic radiological grading of the target anatomy, whereas the original MRI is not. The resampled scans are then used to predict ordinal severity grades of localised stenosis conditions in spinal MRIs. We compare our method against Voxel Interpolation resampling, which takes the average of inverse-distance weighted nearest neighbour intensities for each target coordinate. Experiments show that across all stenosis conditions, resampled scans using Gaussian Splatting produce more accurate stenosis gradings compared to the raw scans which do not include the complete anatomy in-plane, as well as images resampled using Voxel Interpolation.


[3] 2606.26317

Parametric Generalized Adaptive Moment Features (PG-AMF) for Bearing Fault Diagnosis and Machine Health Monitoring

Accurate fault diagnosis of rolling element bearings in rotating machinery is considered essential for ensuring industrial safety and enabling predictive maintenance. Conventional statistical feature-based methods rely on predefined descriptors, whose diagnostic sensitivity is constrained by fixed configurations and limited adaptability across varying fault conditions. Although deep learning approaches offer strong representational capacity, their effectiveness is often restricted by high data requirements and reduced interpretability. In this work, a parametric adaptive feature extraction framework is proposed, in which feature characteristics are learned directly from data rather than being manually specified. Multiple complementary representations are extracted from vibration signals, including absolute features capturing signal energy distribution, signed moment features reflecting waveform asymmetry, and AC-coupled moment features emphasizing dynamic fluctuations, while interactions between multiple sensor channels are modeled through a structured fusion mechanism to enhance fault representation. The proposed approach is evaluated on a benchmark gearbox bearing dataset comprising five health conditions, including normal operation and multiple fault types. Improved classification performance is observed compared to conventional methods, with consistent results under cross-validation, indicating strong generalization capability. Additionally, enhanced feature separability is demonstrated through clearer clustering patterns in low-dimensional projections. The learned representations effectively capture a wide range of signal characteristics, supporting both improved diagnostic performance and practical applicability in industrial monitoring systems.


[4] 2606.26328

A Bilevel Framework for Data Center-Grid Coordination with DLMPs in Unbalanced Three-Phase Distribution Systems

This paper proposes a grid-aware coordination framework between data centers and distribution grids using a DLMP-based bilevel optimization model. The data center aggregator (DCA) determines active power demand in response to distribution locational marginal prices (DLMPs), while the distribution system operator (DSO) solves a network-constrained optimal power flow problem to determine DLMPs in an unbalanced three-phase system. The model incorporates both active and reactive power consumption of data centers to evaluate their impacts on voltage regulation and phase imbalance. To mitigate adverse network effects, two operating cases are analyzed: without reactive power compensation and with static var generator (SVG)-based compensation. The proposed approach is validated on the IEEE 37-bus unbalanced distribution test system. Simulation results show that DLMP-based coordination captures economically efficient data center operation, and phase- and location-dependent network conditions, while SVG-based compensation improves voltage profiles and reduces phase unbalance.


[5] 2606.26342

A Large-Scale Database and Predictive Model of Listener-Rated Ease of Speech Understanding in Commercial Hearing Aids

HearAdvisor aims to provide hearing-aid consumers with audio-performance metrics and recordings that reflect real listening experience. For speech-related metrics, HearAdvisor has historically used HASPIv2, a metric designed to predict objective intelligibility and validated primarily under simulated distortions. Its relationship to consumer-rated ease of understanding for commercial hearing aids is uncertain. Here we introduce a large-scale perceptual dataset and learned metric for listener-rated perceived benefit for speech understanding. Website visitors with self-reported hearing loss completed a blind, MUSHRA-inspired listening test in which they rated recordings of commercial hearing aids on a five-point "Ease of Understanding" scale. The dataset contains 151,608 ratings, 104,298 after quality screening, spanning 10,394 binaural acoustic-manikin recordings from 83 commercial products across 72 realistic acoustic scenes. To predict these ratings, we pass aided audio and a matched clean-speech reference through a frozen Whisper encoder, subtract their internal representations, and train a small MLP head on the resulting difference embedding. On devices held out of training, the learned metric substantially outperforms HASPIv2 at the scene level (overall r = 0.92 vs. 0.83; loud = 0.89 vs. 0.75; quiet = 0.79 vs. 0.58). In loud scenes, performance reaches the split-half reliability of the listener ratings; in quiet scenes, it approaches that ceiling. The model also responds sensibly to controlled gain and SNR manipulations. Together, the dataset and model provide a new way to predict listener-rated ease of speech understanding for real commercial hearing-aid recordings.


[6] 2606.26345

Feasibility-Aware Security-Constrained Unit Commitment via Hybrid Soft Actor-Critic with Quantum-Sampled Features

Security-constrained unit commitment (SCUC) couples binary commitment, economic dispatch, reserves, and network security over a multiperiod horizon, which makes an exact solution expensive at realistic system sizes. This paper proposes a three-layer hybrid framework in which a Bernoulli hybrid soft actor-critic (HSAC) policy proposes hourly commitments, a quantum-sampled auxiliary channel augments the state, and a native SCUC mixed-integer linear program recovers dispatch and security variables after only a limited subset of commitment binaries is enforced. The method is therefore solver-compatible rather than an end-to-end replacement for exact optimization. We formalize the SCUC-to-reinforcement-learning interface, derive the temporal coverage induced by the fixed cap, and conduct representative experiments on the 14-, 57-, and 118-bus cases. The results show stable, low-cost recovery in the 14-bus case; a very low screen-rejection rate in the 57-bus case, consistent with learned feasibility generalization under fixed intertemporal SCUC constraints; and a clear coverage bottleneck in the 118-bus case once the enforcement cap no longer spans a complete commitment period. The 118-bus case runtime traces nevertheless remain tightly clustered for accepted episodes, indicating that the policy still captures a repeatable recovery pattern across most episodes. The study, therefore, identifies the dominant limitation of the current implementation as the amount of useful commitment information that reaches the recovery model under an exploratory Bernoulli actor and a small enforcement cap, and shows how that limitation governs scalability.


[7] 2606.26347

Health feature extraction from battery energy storage system field fault data

Health monitoring methods are critical for lithium-ion battery modules connected to the grid to prevent faults that can lead to catastrophic events. However, assessing the health of cells in modules from their operational data presents challenges including variable operating conditions, which directly confound health features, and sparse sensing in the modules, particularly within cells in parallel, which prevents observing critical states of individual cells. Here, we present a framework for extracting and calibrating health features for battery modules from their operational data to identify discriminative features for separating faulty parallel-connected cell groups within the modules. We applied this framework to operational data from 25 commercial grid-connected lithium-ion Battery Energy Storage System (BESS) modules. Each module consisted of 14 series-connected parallel groups, one of which was confirmed as faulty via post-mortem investigation; in total, the dataset included 25 faulty and 325 non-faulty cell groups. A statistical evaluation of these calibrated features demonstrated that group-level capacity, capacity degradation rate, and dV/dQ peak heights separate faulty parallel-connected cell groups within the modules with statistical significance (p<0.05). Conversely, group internal resistance did not (p>0.05), indicating that increased resistance was not a primary characteristic of the faults in this dataset. These findings challenge the exclusive reliance on resistance features for fault detection. The observed feature signatures suggest potential failure mechanisms, furthering the understanding of fault behavior in lithium-ion battery modules during field operation. More importantly, this work demonstrates a framework for robustly monitoring the health of cells in lithium-ion battery modules under real-world operations.


[8] 2606.26352

Scalable Reachability Analysis of Linear Continuous Systems with Property-Driven Time-Step Adaptation

We study safety verification for linear time-invariant systems with bounded inputs in continuous time. The standard approach reduces to a reachability analysis in two steps: first discretize time and then apply a forward analysis in the discretized system. Existing algorithms use either a fixed time step or an adaptive time step that changes based on the approximation error compared to the underlying continuous system. In this paper, we present an efficient reachability algorithm that adapts the time step based on a given safety property. Essentially, our algorithm makes the largest possible time step such that it can still prove safety. For this approach to be scalable in practice, we discuss several optimizations such as avoiding the repeated expensive calculation of the matrix exponential during discretization and a careful balance how we tame the approximation error stemming from the states and the inputs. This allows our algorithm to yield a moderate approximation error even when using a large time step, thus requiring much fewer steps than prior algorithms. We demonstrate the effectiveness and scalability on the large-scale SLICOT benchmark suite, where our algorithm consistently outperforms other state-of-the-art approaches.


[9] 2606.26363

Bayesian Changepoint Detection for Smart Sensing of Battery Degradation: Cycle-Level Health Indicators and PyMC Implementation

Reliable detection of the onset of accelerated degradation is central to safe and cost-efficient operation of lithium-ion batteries. This paper presents a Bayesian single-changepoint model applied to a simple but physically meaningful cycle-level health indicator (HI), defined as the ratio of charge time to discharge time. The indicator is computed directly from voltage-current telemetry typically available in battery management systems (BMS), without access to raw waveforms. The changepoint model is implemented in PyMC using Hamiltonian Monte Carlo and produces posterior distributions for onset time and pre/post-degradation slopes, together with posterior predictive checks. Experiments on an open 18650-cell remaining useful life (RUL) dataset show consistent midlife changepoints with narrow highest-density intervals. The formulation is lightweight, interpretable, and amenable to smart-sensing deployment on embedded BMS platforms.


[10] 2606.26368

An Evaluation of ABR Switching for Time-Shifted Clients in MoQ

Media over QUIC enables ultra low latency video streaming over QUIC, but its default quality-switching semantics risk introducing playback gaps during periods of network congestion. The in-progress SWITCH specification for MOQ Transport aims to streamline rate adaptation for MoQ. In this work, we characterize the performance of SWITCH-style Adaptive Bitrate (ABR) for both live and time-shifted clients in a Mininet simulated topology. We validate that standard ABR algorithms can be directly applied to time-shifted playback without modification, yielding substantially higher throughput. We demonstrate that a subscriber can experience increased overall throughput after a rebuffering scenario, and we identify focal points for further optimizations of MoQ ABR switching.


[11] 2606.26376

pysib: An Open-Source Python Toolbox for Linear System Identification

Discrete-time polynomial input--output models (ARX, ARMAX, OE, and Box--Jenkins) are usually estimated by prediction-error methods, but for OE, ARMAX, and BJ the finite-sample criterion is nonconvex: the estimate a user actually obtains is set by the initialization and the optimization procedure, not only by the asymptotic theory. This article documents the dedicated optimization strategy behind pysib, an open-source Python toolbox for SISO polynomial system identification. The strategy consists of an ARX-based initialization, a smoothed-gradient phase, an incremental Gauss--Newton refinement, and filtered continuation interpreted as cost-function shaping. This strategy produced the filtered-continuation results reported by the author in earlier work but had not previously been described or released; it is given here in full and as open Python software, with a common five-polynomial representation, shared prediction and simulation routines, and the scripts and archived release needed to reproduce the experiments. On a moderate-noise OE benchmark the strategy returns estimates far more concentrated around the true parameters than a general-purpose nonlinear-programming solver, and on a harder nonconvex benchmark filtered continuation raises the success rate from 60% to 100%.


[12] 2606.26413

PRISM: Efficient and Locally Optimal Probabilistic Planning with Reachability Guarantees

Belief-space planning under motion uncertainty and state and control constraints remains a fundamental challenge, largely due to the difficulty of establishing reachability guarantees in constrained belief spaces. Existing constrained belief-space planners rely on sampling to construct multi-query belief roadmaps and explicitly find feasible trajectories between sampled nodes to establish reachability. These methods often struggle to cover the belief space or use robust control techniques that improve coverage at the cost of indirect, high-cost trajectories; they also lack finite-time or finite-memory completeness guarantees. We propose PRISM, a multi-query motion planning algorithm for belief spaces with state and control constraints that targets both high coverage and low cost. We present a new result on controllability of the state covariance under constraints, which is used by PRISM to decompose belief-space planning into deterministic mean planning and covariance shrinking. PRISM further includes an online local optimization method that reduces the cost of feasible belief-space trajectories. Under mild assumptions on the start and goal distributions, we prove that PRISM guarantees full coverage (i.e. completeness) despite actuator and obstacle constraints. In challenging simulated scenarios, PRISM achieves substantially higher roadmap coverage than state-of-the-art belief-space planning methods while producing trajectories with lower mean cost and cost variance. For example, PRISM achieves 100% coverage in easy and medium-difficulty scenarios, and, in the hardest scenario, which violates PRISM's coverage assumptions, it still achieves 97-100% coverage, while all other methods achieve less than 45%.


[13] 2606.26420

MIMO Zak-OTFS: Channel Estimation, Detection, and Throughput Analysis

Zak-Orthogonal Time Frequency Space (Zak-OTFS) modulation has demonstrated substantial performance gains over cyclic-prefix orthogonal frequency-division multiplexing (CP-OFDM) in highly time- and frequency-selective channels. In this paper, we extend Zak-OTFS to a multiple-input multiple-output (MIMO) framework. We first derive a complete system model for MIMO Zak-OTFS based directly on the physical multipath channel; ours is the first work to do so. We then propose an efficient channel estimation method using structured pilot placement in the delay-Doppler (DD) domain. The proposed approach is evaluated under the standardized CDL-C channel model, demonstrating that the advantages of Zak-OTFS observed in SISO scenarios extend to MIMO systems, particularly its robustness to Doppler and inter-carrier interference (ICI). We identify a fundamental crossover behavior: CP-OFDM performs slightly better at low SNR and low Doppler, while Zak-OTFS excels at higher SNR or under severe Doppler dispersion. Furthermore, we show that the crossover points for SNR and Doppler shift inversely to each other. We also observe that Zak-OTFS, particularly with MIMO, exhibits increased sensitivity to high values of pilot-to-data power ratio (PDR), but has a similar optimal PDR as CP-OFDM.


[14] 2606.26430

Design Guidelines for In-line X-ray Inspection in Advanced Packaging Technology: A CoWoS Case Study

The shift towards advanced packaging technologies, including 2.5D and 3D integration, addresses the limitations of traditional methods while meeting increasing demands for performance, miniaturization, and efficiency. These methods enhance functionality and support heterogeneous integration but also introduce metrology challenges due to complex, three-dimensional structures. X-ray imaging, crucial for nondestructive inspection, faces compatibility issues such as material density similarities and noise scattering. To address these challenges, we propose a framework based on AI-integrated Design of Experiment (DoE) to develop design guidelines to optimize X-ray compatibility during the design stage. This framework, demonstrated through a case study on Chip-on-Wafer-on-Substrate (CoWoS) packaging, systematically analyzes design parameters and material properties to develop guidelines for improved inspection accuracy. Our method integrates AI to predict outcomes and optimize processes, ensuring high-quality X-ray images and enhancing defect detection. Implementing these guidelines can significantly improve inspection accuracy and reliability, reducing production costs and supporting the efficiency and scalability of advanced semiconductor technologies.


[15] 2606.26431

Revealing Mammographic Phenotypes in Deep Learning Breast Cancer Risk Models

Mammogram-based deep learning models have improved breast cancer risk prediction, but the learned imaging patterns remain underexplored. Existing interpretability methods rely on single-image saliency maps, failing to identify recurring mammographic phenotypes across large patient cohorts. By clustering patch embeddings from a pre-trained model, Mirai, we isolate recurring phenotypes linked to 5-year cancer risk. Analyses show risk-increasing phenotypes capture complex structures (e.g., dense tissue, microcalcifications) and shortcut artifacts (e.g., clips). These phenotypes correlate strongly with older age and higher BI-RADS density. Our framework connects tissue patterns to AI risk scores, revealing clinical signatures and potential latent model confounders.


[16] 2606.26567

Multi-Modal Environment-Aware Beam Management for Massive MIMO: A Geometry-Driven Virtual Base Station Framework

High-frequency massive multiple-input multiple-output (MIMO) systems promise ultra-high data rates. However, efficient beam management remains challenging due to the prohibitive beam training overhead and intricate coordination required in multi-user MIMO (MU-MIMO) scenarios. To address these bottlenecks, environment-aware communications have emerged as a promising paradigm, leveraging site-specific knowledge to circumvent exhaustive pilot-based beam training and streamline multi-user communications. In this paper, we propose an interpretable and geometry-driven framework that utilizes multi-modal environmental data, specifically regional 3D light detection and ranging (LiDAR) point clouds and location information, to construct an offline virtual base station (VBS) database. By modeling dominant reflection paths via mirror symmetry across building facades reconstructed from the point clouds, the VBS database provides a compact and sparse description of the wireless propagation environment. To bridge the semantic gap between geometric information and wireless channels, we develop a coarse channel reconstruction mechanism that estimates channel parameters directly from VBS-derived geometric relationships. Based on the resulting coarse beamspace representation, we design a VBS-assisted orthogonal-pilot (VOP)-based partial beam training scheme to refine the coarse estimates with minimal online training overhead. Finally, to tackle the combinatorial beam selection problem and manage inter-user interference, we propose a hierarchical deep reinforcement learning framework, namely a dual-agent dueling double deep Q-network, for coordinated beam selection (DD3QN-CBS). Simulation results demonstrate consistent gains in both beam training efficiency and beam selection performance over heuristic and learning-based baselines.


[17] 2606.26569

ISAC for Sea-Air Networks: Predictive Beam Tracking under Sea Induced Disturbances

In sea-air communication networks composed of an uncrewed aerial vehicle (UAV) and an uncrewed surface vehicle (USV), the extended target characteristics and three degree of freedom motion of the USV under sea induced disturbances cause beam misalignment in the UAV's tracking of the USV. To address these issues, this paper proposes a predictive beam tracking scheme based on integrated sensing and communication (ISAC) for sea-air networks. We develop a wide and narrow beam switching scheme based on sub-array selection, where a time allocation factor is optimized to balance robust state sensing in the wide beam mode and high-rate communication in the narrow beam mode. Specifically, a wide beam mode provides full USV coverage and state sensing, while a narrow beam mode exploits the estimated state for high-gain communication with the communication receiver (CR) mounted on the USV. To characterize the CR motion, a sea-air state evolution model is derived by jointly considering the surge, sway, yaw, and sea induced disturbances of the USV. For the extended target USV, the measurement equation is constructed from multiple scatterer observations, with the measurement noise caused by sea clutter modeled, and an extended Kalman filter (EKF) based CR state prediction and estimation method is developed. In addition, the effect of sea clutter on sensing accuracy is incorporated into the time allocation optimization problem to adjust the time of the wide beam mode. Simulation results demonstrate that the proposed scheme achieves higher tracking accuracy than the state-of-the-art benchmark schemes.


[18] 2606.26712

MLFFM-SegDiff: A Multi-Level Feature Fusion Diffusion Model for Skin Lesion Segmentation

Skin lesion segmentation is a key task in computer-aided dermatological diagnosis, where accuracy directly impacts downstream analysis and disease classification. However, dermoscopic images are challenging due to blurred boundaries, low contrast, large shape variations, and artifacts such as hair and shadows. Recently, diffusion models have shown strong performance in medical image segmentation thanks to their progressive denoising and distribution modeling capabilities. Nevertheless, existing diffusion-based methods still suffer from limited cross-level feature interaction and insufficient boundary detail recovery. To address these issues, we propose MLFFM-SegDiff, a multi-level feature fusion diffusion model for skin lesion segmentation. Built on a diffusion framework, the method introduces a dual-path U-Net encoder, a Multi-Level Feature Fusion Module (MLFFM), and a boundary-sensitive loss function. The dual-path encoder enhances interaction between noisy mask features and dermoscopic image features. MLFFM improves skip connections via attention, scale alignment, and adaptive cross-level fusion. These designs enable the decoder to jointly leverage shallow boundary cues and deep semantic representations, improving mask reconstruction quality. Experiments on ISIC2018, PH2, and HAM10000 demonstrate that MLFFM-SegDiff outperforms representative methods including DermoSegDiff, U-Net, and SwinUNETR across Accuracy, F1-score, Jaccard index, Recall, and Dice. In particular, it achieves an average Jaccard index of 0.8546 and Dice coefficient of 0.9207. These results validate the effectiveness of the proposed multi-level feature fusion strategy for improving lesion segmentation performance. The code will be released at this https URL after publication.


[19] 2606.26716

Dual-Prior Guided Null-Space Learning with Mixture-of-Splines for Arbitrary Medical Slice Super-Resolution

Arbitrary slice super-resolution reconstructs isotropic volumes from anisotropic clinical acquisitions by synthesizing intermediate slices at arbitrary scales. However, treating this ill-posed inverse problem as unconstrained residual-based regression risks hallucinating anatomically implausible structures or altering the originally observed data. To address both concerns, this paper presents the Dual-Prior Null-space Learning (DP-NSL) framework, which reformulates the task as a constrained recovery process guided by two complementary priors. A Measurement-Consistent Projection (MCP) enforces a Deterministic Observation Prior: the reconstruction undergoes an exact orthogonal projection that reproduces every acquired slice with zero error, confining all learned details to the unobservable null space. Within this null space, a Mixture-of-Splines (MoS) module imposes a Geometric Continuity Prior by dynamically mixing B-spline experts of different analytic orders, allowing each anatomical region to be modeled with a content-aware level of continuity. To promote spatial coherence, a Local Spatial Consistency Decoder (LSCD) further injects local inductive bias. Experiments on three CT and one MRI benchmark show that DP-NSL outperforms existing approaches while strictly preserving measurement consistency. Code is available at this https URL.


[20] 2606.26723

State-Specific Respiratory Signatures for Affective and Stress Recognition: Interpretable Respiratory Markers, Autocorrelation Lags, and Compact CNN Models

Respiratory activity is a direct and interpretable physiological channel for wearable stress and affective-state recognition, yet many studies emphasize classification accuracy without identifying which respiratory properties separate different states. This work reframes RESP-based recognition as a joint predictive and explanatory problem. Using the chest respiratory channel of the WESAD dataset, we analyze 60 s windows under leave-one-subject-out validation and combine two complementary branches: compact raw-signal one-dimensional convolutional neural networks (1D-CNNs) and physically grouped handcrafted respiratory signatures. The primary application task is binary stress versus non-stress detection, while baseline, stress, amusement, and meditation are additionally analyzed in a one-vs-rest setting to reveal state-specific respiratory markers. The feature space is organized into respiratory timing, breath-to-breath variability, waveform statistics, spectral/time-frequency descriptors, and autocorrelation/nonlinear predictability descriptors, with the raw 60 s signal treated as a sixth representation for the CNN branch. We introduce autocorrelation transition lags (Zpm/Zmp) as interpretable markers of respiratory correlation scale and separately evaluate exploratory FEG-Pro/Lyapunov-like descriptors. In the final CNN refit setting, the raw-signal model achieved the strongest stress-vs-rest performance, with accuracy 96.72 percent, macro-F1 95.30 percent, and MCC 90.61 percent. In contrast, compact feature models were stronger for baseline, with MCC 65.34 percent, amusement, with MCC 35.69 percent, and especially meditation, with MCC 88.65 percent. These results show that CNNs are most useful for the practical stress detector, whereas interpretable respiratory signatures provide stronger and more physiologically transparent state-specific markers for several non-stress conditions.


[21] 2606.26724

Distribution Network Congestion Management via Strategic Aggregator Intervention in Local Energy Markets

High penetration of distributed energy resources increasingly creates congestion in low-voltage distribution networks, while local energy markets (LEMs) optimise community welfare without explicitly internalising network constraints. This paper investigates whether a profit-seeking aggregator embedded within a welfare-oriented LEM can partially internalise distribution-level congestion through market participation. We develop a post-clearing, price-protected intervention in which the aggregator injects additional supply and triggers re-clearing, with network feasibility validated using nonlinear AC power flow subject to a non-deterioration constraint on maximum line loading. The mechanism is benchmarked against Distribution System Operator (DSO)-only corrective control and a hybrid regime with residual DSO action following aggregator intervention. Results on a UK LV feeder show that aggregator participation reduces thermal loading and preserves community welfare relative to DSO-only control, though it does not fully restore compliance under severe stress. The hybrid regime achieves the strongest technical performance while maintaining lower welfare loss. Overall, aggregator intervention remains privately profitable, indicating partial incentive alignment.


[22] 2606.26842

voxmap-studio: An open-source speaker diarization annotation tool with built-in cost instrumentation

Labeling speaker diarization data is costly, yet annotation tools rarely measure that cost. We present voxmap-studio, an open-source, React-based diarization annotation tool integrated with the pyannote-based diarization ecosystem. Its canvas is initialized by a fast stride-accelerated diarization engine so that the annotator corrects a hypothesis rather than drawing every speaker turn by hand, and the tool records annotation cost - typed edit-operation counts and time - as a first-class output, enabling quantitative comparison of how much different forms of assistance actually help. Export is gated on per-segment human confirmation and guarded by injected "phantom" attention checks, which prevent unverified automatic output from being released as ground truth. In a preliminary study on nine AMI audio files, unassisted manual annotation was the costliest and least accurate, and automatic initialization shifted the work from creating turns to correcting them; highlighting uncertain segments gave the lowest cost in our small sample. The tool and its instrumentation are open source.


[23] 2606.26896

Collision-resistant multi-channel M-ASPM configurations with shared single detection channel

M-ary Aggregate Spread Pulse Modulation (M-ASPM) is a physical layer (PHY) modulation technique that offers several advantages for low-power wide-area networks (LPWANs). For instance, in conventional LPWAN modulations increasing receiver sensitivity by extending symbol duration - thereby proportionally increasing the time-on-air (ToA) - exacerbates collision exposure. In contrast, M-ASPM payload processing gain can vary over a wide range without impacting the effective packet collision rate. In particular, in this work we demonstrate how short front portions of M-ASPM packets can serve as a separate collision-resistant detection channel that, in addition to performing asynchronous packet detection and synchronization, obtains the carrier frequency offset (CFO) for each packet within a desired range and with the required precision. Then, while raising processing gain, the subsequent payload information can be extracted without expanding the sample window per symbol. Consequently, the receiver sensitivity can be significantly increased without exacerbating packet collisions and thus without reducing network throughput under collision-limited operation. We further establish a multi-channel configuration in which numerous quasi-orthogonal payload channels share a single detection channel that additionally performs payload channel identification and selection. Such sharing is especially useful for scaling and economizing LPWAN deployments under diverse technical requirements and constraints. The presented analysis is validated via extensive simulations under high packet collision rates in wide ranges of payload sizes and processing gains, and for varying noise and interference power levels. The results signify that M-ASPM provides a structurally distinct scaling behavior compared to conventional LPWAN modulations, decoupling range extension from collision-induced throughput degradation.


[24] 2606.26903

DNSMOS-C: Improving End-to-end Speech Quality Models via Contrastive Learning

We introduce DNSMOS-C, a compact end-to-end speech quality assessment model that extends the DNSMOS Pro framework by integrating a MOS-guided triplet-based contrastive loss. Applied directly to the intermediate embeddings, this contrastive supervision encourages the latent space to be better organized with respect to perceptual quality while preserving the simplicity and efficiency of DNSMOS Pro. Unlike prior methods that depend on large pre-trained self-supervised learning (SSL) encoders and multi-stage training, DNSMOS-C jointly learns speech representations and MOS regression within a single, unified framework. Experiments on multiple datasets show that DNSMOS-C consistently improves correlation metrics over DNSMOS Pro and achieves better generalization on challenging out-of-domain test sets. Furthermore, latent space analyses indicate that our approach learns representations that exhibit an emergent low-dimensional quality ordering, which enhances interpretability and improves training stability. These findings demonstrate that MOS-guided contrastive learning enables more robust and accurate quality predictions without incurring additional computational overhead.


[25] 2606.26920

When the Timetable Breaks: Physics-Anchored Scientific Machine Learning for Cold-Wave-Robust Battery-Electric Bus Operations

Cold-climate transit agencies are electrifying fixed-timetable fleets, but winter exposes a block-level failure mode hidden by seasonal energy margins: cabin heating can deplete batteries faster than layovers recharge them, causing later trips to start undercharged and making one cold day cascade into timetable infeasibility. We present WeatherRobustBus, an open-data framework that converts this risk into block-level failure probability by injecting real hourly weather into real transit duties and propagating cold-weather energy uncertainty. The framework couples a transparent traction and cabin-thermal backbone with a bounded monotone residual ensemble, and validates cabin heating against an independent EnergyPlus bus-cabin simulation driven by the same Toronto weather record. Against this first-principles reference, it achieves the lowest all-year error (0.213 kWh RMSE over 8760 hours) and remains reliable in the out-of-support cold tail ($T \le -12^\circ$C), where pure machine-learning baselines degrade by 1.5--4x and the best competitor reaches only 1.055 kWh. Embedded in a Monte Carlo block-feasibility simulator over 60 real Toronto TTC vehicle blocks, the model reveals a sharp weather-induced failure envelope. A forecast-triggered robust policy combining opportunity charging, a fuel-fired cabin-heating bridge, and modest buffering reduces mean cold-wave failure probability from 0.759 to 0.112 across eight cold-wave days; a deconfounded ablation shows opportunity charging is the dominant lever and the heater is a low-cost complement. WeatherRobustBus provides a reproducible pathway from weather data to winter-resilience decisions for electric-bus fleets.


[26] 2606.26932

Battery thermal-safety reserve erosion by mandatory cabin ventilation in shared-cooling electric vehicles

Hot-weather electric-vehicle thermal management is no longer a separate cabin and battery problem. A single climate system must cool the traction battery, maintain passenger comfort, and admit outdoor air for cabin air quality, while high ambient temperature and solar load derate the compressor serving all three demands. We identify fresh-air ventilation as a hidden battery-safety load: on a derated shared cooling loop, the compliant fresh-air floor consumes finite cabin-side cooling capacity and removes residual cooling reserve from the battery. In a 40 $^\circ$C, 800 W m$^{-2}$, 150 kW event, raising the fresh-air floor from 0.30 to 0.43 lowers peak cabin CO$_2$ from 1219 to 978 ppm, but raises peak battery temperature from 39.96 to 40.02 $^\circ$C and reduces the battery cooling bus from 575 to 529 W. We develop a reserve-aware predictive controller combining a physics-guided scientific-machine-learning surrogate, grid-connected departure thermal reserve, air-quality-priced ventilation allocation, and dual control-barrier-function projections for battery temperature and operative comfort. The controller holds the pack at 39.73 $^\circ$C, caps peak CO$_2$ at 895 ppm, keeps operative-temperature RMSE at 0.82 $^\circ$C, and uses 20.0\% less drive cooling energy than fixed maximum-compressor operation; ablations show that removing either barrier, under-ventilating, or removing departure reserve breaks joint feasibility. Evidence comes from NASA POWER records, KU Leuven BEV BMS data merged with NASA POWER weather, 45 $^\circ$C GOTION aging data, 40 $^\circ$C high-power NMC thermal identification, EnergyPlus cabin cross-checks, and OpenModelica/FMI replay. Treating fresh air as a battery thermal-reserve variable creates an actionable path toward EV thermal management that protects battery life, occupant health, comfort, and efficiency in one shared loop.


[27] 2606.26939

Distributed Massive MIMO with 1-Bit Radio-over-Fiber Fronthaul: Uplink Spectral Efficiency and Power Control

We analyze the uplink spectral efficiency achievable in a distributed multiple-input multiple-output (D-MIMO) architecture employing a 1-bit radio-over-fiber fronthaul. This architecture eliminates the need for local oscillators at the access points, hence enabling coherent-phase transmission without costly over-the-air synchronization. With this fronthaul architecture, the uplink signal at the central processing unit is a dithered, oversampled, and 1-bit quantized version of the passband signal received at the access points. This makes some of the conventional spectral-efficiency expressions used in the D-MIMO literature not directly applicable for two key reasons: the nonlinearity of the input-output relation and the practical unavailability of minimum mean square error (MMSE) channel estimates. To address this issue, we propose novel achievable-rate expressions that do not require MMSE channel estimates and rely on the Bussgang decomposition to linearize the input-output relation. We use these expressions to determine the optimal signal-to-dither ratio (SDR) that maximizes the achievable rates in both single- and multiuser scenarios and to assess the impact of oversampling. We then use one of the proposed achievable-rate expressions to investigate the max-min fairness problem when the access points cannot maintain the optimal SDR because of limitations in their dynamic range.


[28] 2606.26957

Angle Estimation via WFRFT Spatial-Domain Basis Decomposition: Breaking the Rayleigh Resolution Limit with Structured Waveform Diversity

We propose a MIMO radar angle estimation framework that uses the four-component weighted-type fractional Fourier transform (4-WFRFT) as a spatial-domain waveform diversity mechanism. Unlike conventional fractional Fourier (FrFT) MIMO radar where FrFT serves as a receiver-side time-frequency processing tool, our approach decomposes a data sequence into four WFRFT basis functions,original signal, its Fourier transform, time-reversal, and inverse Fourier transform, and transmits them simultaneously from a four-element uniform linear array. The spatial superposition of these basis functions at each far-field angle creates a unique angle-dependent waveform structure, enabling angle estimation through time-domain matched filtering with known waveforms. We demonstrate that this spatial-domain mixing achieves angular resolution surpassing the Rayleigh diffraction limit by a factor of 1.4$\times$ to 12.8$\times$, with the advantage most pronounced at low SNR where conventional beamforming fails completely. The Cramér-Rao bound is derived with a full 3-parameter Fisher information matrix, and the Fisher information is decomposed into geometry and waveform contributions, revealing that the WFRFT waveform structure contributes approximately 3$\times$ more information than array geometry alone. Extension to $M$-element arrays with $M$-component WFRFT demonstrates resolution gain scaling with array size. Simulations with linear chirp base sequences achieve 0\,dB PAPR and validate sub-Rayleigh resolution with a four-element array.


[29] 2606.26991

Enabling self-supervised learned primal dual with Noise2Inverse

X-ray computed tomography reconstruction is an ill-posed inverse problem, particularly in low-dose and sparse-angle settings where measurements are noisy and incomplete. While learned reconstruction methods such as the Learned Primal-Dual algorithm achieve strong performance, they typically rely on supervised training with access to ground-truth data, which is often unavailable in practice. In this work, we propose a self-supervised reconstruction method by extending the Noise2Inverse framework to the Learned Primal-Dual algorithm. The resulting approach, called Noise2Inverse Learned Primal-Dual (N2I-LPD), enables training of a learned iterative reconstruction operator without ground-truth images by exploiting the statistical independence of noise in distinct measurements with respect to angular rotation of the CT-scan. We compare the proposed method with classical reconstruction methods, as well as neural network-based approaches such as a U-Net trained within the same N2I framework. The results demonstrate that N2I-LPD achieves improved reconstruction quality, highlighting the potential of combining learned reconstruction operators with self-supervised training strategies for practical CT imaging scenarios where ground-truth data is unavailable.


[30] 2606.27002

Inverse Design of Compact and Wideband Inverted Doherty Power Amplifiers Using Deep Learning

This paper presents a deep learning-assisted methodology for the inverse synthesis of a compact, wideband inverted Doherty power amplifier (PA). Convolutional neural networks (CNNs) and genetic algorithms (GAs) are jointly employed to generate pixelated Doherty combiner networks that integrate load modulation, impedance matching, power combining, and phase compensation into a single structure. As a proof of concept, we design and fabricate a GaN HEMT Doherty PA with a pixelated output combiner. The prototype achieves a measured peak drain efficiency of 51%-63% and a 6-dB back-off efficiency of 48%-54% over 1.9-2.5 GHz. Within the same frequency range, the measured output power is 44+/-0.3 dBm. Furthermore, with digital predistortion (DPD) applied, the prototype circuit demonstrates an adjacent channel leakage ratio (ACLR) better than -53.2 dBc.


[31] 2606.27039

Scalability of Morality: A Particle-Based Numerical Study on the Decoupling of Law and Ethics in Large-Scale Populations

This study introduces a particle-based computational framework to investigate the scalability of morality and the systemic decoupling of formal law from decentralized social ethics in expanding populations. While micro-societies reinforce ethical conduct through local reciprocity, macroscale systems introduce anonymity that strains cognitive memory limitations. We model individual agents as discrete particles with finite memory capacities ($L$) and dynamically evolving, stochastic choice profiles ($\mu$) regulated by non-linear social pressure switches. Monte Carlo ensemble simulations demonstrate a distinct, non-linear phase transition as the population scales ($N \to \infty$). When the population metric outpaces memory capacity ($N \gg L$), the local re-encounter probability drops as $\mathcal{O}(L/N)$. This structural dilution neutralizes decentralized peer-to-peer accountability, causing global behavioral norms to decouple from moral baselines and drift toward a minimalist legal floor. Furthermore, cyclic scale experiments expose a prominent, path-dependent hysteresis loop, mathematically formalizing the non-Markovian inertia and irreversible nature of moral decay in self-organizing social systems.


[32] 2606.27042

Low Complexity Kolmogorov-Arnold Network-based DPD for Analog RoF Fronthaul

This paper proposes and demonstrates experimentally for the first time a Kolmogorov-Arnold Network (KAN)-based digital predistortion (DPD) model, named envelope time-delay KAN (ETDKAN), for mitigating nonlinear distortions in analog radio-over-fiber (A-RoF) systems. The ETDKAN model incorporates physical constraints of radio-frequency (RF) nonlinear devices and, through KAN symbolization, achieves a significant reduction in computational complexity while improving interpretability. The proposed model is numerically implemented and optimized alongside multilayer perceptron (MLP) and memory-polynomial-based DPDs. Results show that the resulting symbolic ETDKAN (symbETDKAN) attains ACLR and EVM performance comparable to neural network-based models, while maintaining a computational complexity close to that of memory polynomials. Experimental validation using an A-RoF system confirms the practical feasibility of the proposed approach, which resulted in a 4-5 dB reduction in ACLR in the analyzed scenario.


[33] 2606.27157

Single-Base-Station Indoor Localization via Super-Resolved Relative Power Delay Profiles

Indoor multipath is shaped by surrounding reflectors, scatterers, and blockages, so a relative power-delay profile (PDP) can serve as a location fingerprint without an identifiable LoS path, angle information, or absolute time-of-arrival ranging. However, a communication receiver observes finitely many noisy pilot-frequency samples rather than an ideal PDP. This paper models the resulting Dirichlet blur, delay folding, and off-grid mismatch, and reconstructs a posterior-power profile using expectation-maximization sparse Bayesian learning. In spatially consistent QuaDRiGa simulations, twofold SBL raises 20-dB Top-1 accuracy from 75.79\% (native PDP) and 87.24\% (threefold zero-padding) to 93.27\%, with 0.392~m mean error.


[34] 2606.27257

Resilient Output Containment under Undisclosed Leader Dynamics and Actuator Attacks

This work studies resilient output containment for heterogeneous linear multi-agent systems with actuator cyber-attacks over directed network topologies. The leaders generate bounded locally absolutely continuous trajectories; however, their dynamics, velocity bounds, and motion envelopes are undisclosed to the followers. The cyber-attack model includes state- and input-correlated, as well as bounded exogenous actuator false-data terms. A continuous two-layer adaptive control architecture is proposed. The first layer is a virtual-actuator reconfiguration layer that uses partial state measurements to compensate for actuator attacks in the local tracking-error dynamics. The second layer is a network interface that generates task-space commands via an adaptive interaction protocol. This protocol uses only neighbor-exchanged network-interface states whose dimensions match those of the plant output, and it does not require global graph knowledge for parameter tuning. For directed graphs, under a leader-rooted united spanning-tree condition, a nonsmooth Lyapunov analysis yields asymptotic containment at the command level. The physical outputs then converge to the leader convex hull up to a residual determined by the command-tracking local controllers. Simulation results using a network of quadrotors with damped suspended loads illustrate the performance of attack recovery and containment tracking.


[35] 2606.27338

Jet impingement cooling with multi-stage ducted electroaerodynamic actuators

Modern high-performance mobile electronics impose extreme constraints on thermal management, and traditional cooling methods often fail to meet requirements for power density, form factor, and durability. Jet impingement cooling offers a compelling solution but is typically hindered by the need for bulky ancillary hardware. Here, we demonstrate that compact arrays of reduced-scale electroaerodynamic (EAD) plasma actuators, which are silent, solid-state devices with no moving parts, can be used for direct jet impingement cooling of electronics. The main contribution is the first rigorous experimental demonstration and system-level validation of multi-stage, ducted electroaerodynamic jet arrays as a compact, fan-replacement impingement cooling solution for mobile electronics. We characterize the performance of both single- and multi-stage ducted actuators, including thermographic analysis of heat transfer coefficients and spatial cooling profiles. We also quantify the relationship between actuator stage count and cooling efficiency, showing that increasing the number of ion acceleration stages enhances jet velocity and heat transfer performance at a reduced efficiency. The actuators are then assembled into an array and directly compared to a conventional fan with similar coverage area, showing competitive performance at a fraction of the volume, weight, and power. Finally, we integrate the array onto a commercial edge AI system and show that thermal regulation during extended inference workloads matches that of a stock fan, without any moving mechanical components or noise. These results confirm that multi-stage EAD jet arrays are not only viable but advantageous for thermal management in mobile and high-performance systems, paving the way toward silent and miniaturized solid-state cooling solutions.


[36] 2605.06582

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

Modern learning systems represent perceptual signals with continuous vectors, but comparison, retrieval, memory, alignment, and reasoning are often naturally symbolic. In language, this interface is given by tokens; for speech and audio, it must be learned. Existing audio tokenizers use local quantization, clustering, or reconstruction, leaving sequence consistency, compactness, length control, termination, and edit geometry indirectly optimized. We introduce PairAlign, a framework for compact audio tokenization through sequence-level self-alignment. PairAlign treats tokenization as conditional sequence generation: an encoder maps speech to a condition, and an autoregressive decoder emits tokens from BOS to EOS, learning identity, order, length, and termination. Given two content-preserving views, each token string is trained to be likely under the other's representation, while unrelated examples provide competing sequences. This yields a surrogate for edit-distance preservation while discouraging collapse. Starting from a VQ tokenizer, PairAlign extends a frame-synchronous prior into an autoregressive tokenizer using VQ-derived and EMA-teacher targets, cross-paired teacher forcing, anti-bypass regularization, likelihood contrast, length control, and timing recovery. On 3 s speech, PairAlign learns compact token strings with strong cross-view consistency. In retrieval, it operates at 12.71 tokens/s and reduces archive tokens by 55% versus VQ while preserving edit-distance search. The results expose a compactness--locality trade-off: PairAlign does not aim to dominate dense geometric or SSL tokenizers on every local metric, but provides a lower-rate symbolic interface for comparison, retrieval, and analysis. More broadly, PairAlign is a sequence-symbolic analogue of JEPA-style predictive learning, predicting a learned variable-length symbolic sequence rather than a continuous latent.


[37] 2606.25369

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique linguistic challenges, such as widespread context-dependent kanji polyphony, have yet to be adequately tackled. Here we introduce Sarashina2.2-TTS (this https URL), a Japanese-centric LLM-TTS system that tackles these challenges through a dual approach: data strategy and evaluation methodology. First, we scale training to approximately 361k hours of speech, incorporating a balanced mix of Japanese and English data. Furthermore, we design a targeted data augmentation pipeline covering all 2,136 Joyo (regular-use) kanji designated by Japan's Agency for Cultural Affairs to efficiently address kanji polyphony disambiguation. Second, we introduce the Joyo Kanji Yomi Benchmark (this https URL), covering all 2,136 Joyo kanji and their 4,378 readings. Alongside this benchmark, we propose Kana-CER, a metric that compares synthesized speech against reference readings in the kana space, eliminating orthographic variations to directly measure pronunciation correctness. Experiments demonstrate that our targeted data augmentation significantly improves reading accuracy. Overall, Sarashina2.2-TTS achieves state-of-the-art kanji-level reading accuracy and matches top baselines on general sentence-level pronunciation, while delivering the highest speaker similarity in zero-shot Japanese speech synthesis. Furthermore, cross-lingual evaluation reveals that Sarashina2.2-TTS is the only system that maintains stable Japanese pronunciation regardless of the prompt language, confirming that our balanced training approach improves cross-lingual robustness.


[38] 2606.25629

Event-Adaptive Motion Planning with Distilled Vision-Language Model in Safety-Critical Situations

Robot navigation in safety-critical scenarios faces significant challenges from unforeseen semantic events, where collisions arise primarily from the unpredictable behaviors of dynamic agents rather than unseen objects. While large vision-language models (VLMs) offer remarkable capabilities in commonsense reasoning, frequently invoking them within the continuous control loop introduces severe computational latency, fundamentally destabilizing physical execution. To address these challenges, we propose event-adaptive motion planning (EAMP), an efficient framework for VLM-based robot navigation. Specifically, a prompt-configurable semantic event trigger (PC-SET) selectively activates semantic intervention by continuously monitoring short temporal clips for behavioral anomalies. Upon triggering, an event-triggered distilled SemNav-VLM, fine-tuned via physically verified semantic distillation, maps detected anomalies into discrete strategy-level decisions. Subsequently, a semantic model predictive control (SMPC) module translates these strategies into dynamic reconfigurations of optimization objectives and geometric references. Extensive experiments in safety-critical logistics scenarios demonstrate that EAMP effectively aligns high-level reasoning with low-level control, significantly improving dynamic safety margins over existing baselines while preserving real-time efficiency.


[39] 2606.26194

Self-Supervised Tree-level Biomass Estimation in Urban Environments From Airborne LiDAR and Optical Observations

Urban tree biomass remains less spatially explicitly quantified than biomass in managed forests because many estimates rely on inventories or coarse products that cannot resolve individual crowns or fine-scale heterogeneity. We present a crown-level above-ground biomass (AGB) framework for an 810~km$^2$ landscape in Ontario, Canada, using leaf-off airborne LiDAR (8--10~pulses~m$^{-2}$) and near-infrared RGB orthophotography (0.16--0.20~m) from 2018 and 2023. A dual-stream cross-attention network trained on rule-based pseudo-labels produced semantic marks for buildings, needleleaf trees, and deciduous trees, supporting crown delineation and functional-type assignment. On independently annotated withheld tiles, global/mean precision, recall, and Dice scores were 0.86, 0.83, and 0.84. Crowns were delineated with multiscale watershed segmentation in mapped tree areas, and AGB was estimated from a crown area--height power-law proxy calibrated to species-specific allometry (Lambert et al., 2005) for 21,921 inventory trees. For 18,713 inventory--segment matched pairs from a 90,726-tree held-out test set, AGB prediction achieved $R^2=0.609$ using inventory crown geometry and $R^2=0.570$ under operational segmentation, identifying crown delineation as the remaining uncertainty source. Aggregated to 30~m, estimates yielded total AGB stocks of 1.73~Tg in 2018 and 1.81~Tg in 2023 (811--850~Gg~C), local densities up to ${\sim}140$~Mg~ha$^{-1}$ along the Niagara Escarpment, and a net carbon gain of 39~Gg~C over five years. Deep-ensemble uncertainty maps highlighted high-epistemic-uncertainty areas linked to underrepresented land covers and guided assignment of uncertain crowns to a pooled allometric equation. The framework uses standard provincial data, requires no manual annotation, and produces a public bitemporal crown-level AGB database for trees outside forests at management-relevant resolution.


[40] 2606.26306

Fiber Bragg grating-based acoustic sensing system enabled by ML-trained, sub-picometer-tunable hybrid III-V/SiN lasers

Distributed acoustic emission (AE) sensing is critical for early detection of structural degradation, yet conventional electrical sensors are difficult to scale and fiber-based approaches are limited by interrogation complexity and resolution. Here, we report an intelligent fiber Bragg grating (FBG) sensing system enabled by machine learning (ML)-trained hybrid III-V/SiN tunable lasers that achieve uniform, mode-hop-free, sub-picometer wavelength control. A supervised gradient-descent algorithm is used to learn the nonlinear electro-thermal tuning space of Vernier-based external-cavity lasers, enabling continuous tuning with <0.1 pm resolution and <0.5 dB power variation. This capability allows precise alignment to FBG reflection slopes for high-sensitivity acoustic detection. We demonstrate a four-laser interrogation system monitoring 16 FBG sensors distributed across multiple metallic structures, operating over a 35 nm wavelength span. The system autonomously identifies sensor resonances, dynamically tracks spectral shifts, and reconfigures interrogation wavelengths in response to localized acoustic events. Using pencil-lead break tests as calibrated AE sources, we show simultaneous multi-channel detection and adaptive spatial localization. The combination of narrow linewidth (<10 kHz), wide tunability, and ML-driven calibration enables robust, scalable, and high-resolution sensing. This approach establishes a pathway toward fully autonomous, distributed photonic sensing networks for real-time structural health monitoring.


[41] 2606.26313

Racing a Wheeled Quadruped: Active Load Transfer Mitigation via Model Predictive Control

This paper presents a hierarchical control framework using model predictive control (MPC) and reinforcement learning (RL) for active roll control to manage lateral load transfer during autonomous racing of a wheeled quadruped. The framework integrates offline time-optimal raceline generation, an online MPC planner that actively minimizes the lateral Load Transfer Ratio (LTR), and a low-level, whole-body RL policy deployed directly onto the robot's 16 actuators. The MPC is based on a vehicle dynamics bicycle model of the Unitree Go2-W platform. The robot's leg actuators act as active suspension where knee joints generate anti-roll torque to bank into turns. Physical track experiments demonstrate that active roll control reduces mean LTR by up to 44%, improves the fastest lap time by 8.7%, and boosts peak lateral acceleration capability by 21.3% to 1.98 $m/s^2$, maintaining robust high-speed stability beyond the range of a non-tilting baseline controller. Supplementary code and video can be found at this https URL


[42] 2606.26400

When Agents Meet Electric Bus Fleet Operations: Pricing Behavior, Trade-offs, and Policy Implications in an Aggregator Framework

Agentic systems are changing how complex operational tasks are coordinated, introducing a new paradigm for connecting heterogeneous data sources and automating processes. Electric bus fleets provide a relevant test case. Their operation requires continuous coordination between service reliability, battery state-of-charge, charger availability, electricity prices, route-energy uncertainty, and vehicle-to-grid (V2G) opportunities. This paper proposes an agentic aggregator framework that streamlines this decision environment by coupling an optimization-based electric bus scheduling model with supervisory agents for disturbance detection, tariff adaptation, and schedule evaluation. The optimization core enforces physical feasibility across routes, chargers, batteries, and V2G exchanges, while the agentic layer interprets changing operating conditions, triggers real-time re-optimization when needed, and defines how flexibility value is allocated between the aggregator and the public transport operator (PTO). A realistic depot case study evaluates day-ahead and real-time operations under profit-based and operation-based coordination modes, considering service delays, route-energy deviations, electricity price shocks, and combined disturbances. The results show that agentic aggregation can support adaptive fleet-grid coordination by maintaining feasible schedules, activating re-optimization selectively, and improving the use of charging and V2G flexibility. However, they also reveal a critical trade-off: the same agentic capability that reduces operational complexity can extract value from the PTO when configured around profit-oriented pricing. These findings suggest that agentic aggregators can become useful for managing electric bus V2G operations, but their deployment in public-fleet contexts requires transparent coordination modes, auditable tariff-setting, and explicit value-sharing rules.


[43] 2606.26426

Nanoelectromechanical Systems (NEMS) for Hardware Security in Advanced Packaging

As hardware security threats escalate across semiconductor manufacturing and advanced packaging, there is a growing need for novel physical mechanisms to counter sophisticated attacks such as tampering, counterfeiting, and supply chain infiltration. This paper presents Nanoelectromechanical Systems (NEMS) as an emerging class of hardware security primitives that enable physical assurance, tamper detection, and authentication at the device level. Leveraging mechanisms such as NEMS-based Physically Unclonable Functions (PUFs), shape memory materials, resonance-based fingerprints, and physical unlocking architectures, these systems offer enhanced resilience to reverse engineering, side-channel attacks, and environmental degradation. By harnessing mechanical unpredictability and fabrication-induced nanoscale variability, NEMS technologies introduce a physically robust and low-power alternative to conventional digital security methods. Their seamless integration into standard semiconductor workflows paves the way for scalable, verifiable, and secure solutions across defense, aerospace, critical infrastructure, and consumer electronics.


[44] 2606.26446

Input Convex Neural Network as a Surrogate in Stability-Constrained Optimization for IBR-dominated Power Systems

Input convex neural networks (ICNNs) are increasingly used as surrogates for stability indices and embedded as constraints in power-system optimization. This letter clarifies two recurring formulation limitations that can negate ICNN convexity benefits: (i) applying generic Big-$M$ mixed-integer reformulations introduces auxiliary binaries that are unnecessary for enforcing ICNN sublevel constraints; and (ii) reversing the stability inequality transforms a convex sublevel set into a generally nonconvex superlevel set, invalidating global-convergence guarantees of cut-based methods. After clarifying the limitations, we provide (i) an exact LP-based epigraph reformulation for ReLU-ICNNs, (ii) an outer-approximation scheme with global guarantees under the sublevel convention, and (iii) a feasibility-preserving inner-approximation scheme for the superlevel convention, with simulations on IEEE 14- and 118-bus unit commitment instances.


[45] 2606.26464

A Low-PAPR, Synchronization-Robust Non-Coherent Grassmannian Modulation for Optical Communications

Non-coherent Grassmannian (unitary space-time) signaling detects on the received subspace, which is invariant to a branch-side (polarization or mode-coupling) rotation and to a phase that is constant over the coherence block. It therefore needs no carrier-phase or polarization recovery within the block and is robust to phase noise when the per-block phase drift is small, while a multi-branch (polarization or spatial) front end harvests diversity without channel estimation or pilots. However, the Grassmannian-constellation literature usually assumes a distortion-free, linear channel and transmitter and already-acquired symbol timing. This paper closes both gaps while reusing off-the-shelf Grassmannian packings. First, we impose a constant-modulus (low peak-to-average-power-ratio, PAPR) constraint on the constellation and quantify the PAPR/chordal-distance trade-off: a constant-modulus design lowers the 0.1% PAPR from 6.1 dB (unconstrained) to 3.6 dB -- 1.6 dB below 16-QAM (5.2 dB) -- easing the optical modulator linear range and the fiber Kerr-nonlinearity penalty, at a ~1.8 dB cost in high-SNR coding gain. Second, we derive a phase-blind subspace timing-error detector (TED) that exploits the invariance of the GLRT projection energy to the unknown carrier phase, plus a feedforward acquisition metric, supplying clock recovery without prior carrier or polarization recovery. The TED yields a clean S-curve with a stable lock point for roll-offs down to beta=0.1. Under block fading the proposed estimator attains genie-timing SER within a fraction of a dB and recovers full diversity, whereas an uncorrected 0.35-symbol timing offset floors the error rate near 0.4. Results use a symbol-rate block-fading abstraction; full fiber, modulator, and phase-noise modeling is future work. The scheme combines low PAPR with the diversity and phase-recovery-free operation of non-coherent reception.


[46] 2606.26473

When Does Quality-Aware Multimodal Fusion Matter? A Leakage-Safe Diagnostic for Decision-Level Dependence

Many multimodal systems estimate the reliability of each modality and weight their contributions to the final prediction. However, it remains unclear whether these scores influence model decisions or merely correlate with performance. We propose a simple diagnostic to test whether reliability information is used during inference. After training, the model and inputs are fixed while reliability scores are permuted across test examples. If predictions depend on these scores, performance should degrade. Experiments on StressID for stress recognition and CMU-MOSEI for sentiment analysis show that permuting reliability scores leaves performance unchanged despite substantial potential gains from selecting the best modality per example. In positive controls where reliability signals identify the correct modality, the same frozen fusion rules yield significant improvements, indicating that reliability signals influence fused decisions only when they reliably predict unimodal correctness.


[47] 2606.26556

WQ-Fusion: Dynamic Gated Attention for Cross-Domain Audio Representation

While pre-trained models excel in specialized tasks, learning universal representations across diverse acoustic domains remains challenging. To address this, we propose WQ-Fusion, a robust dual-encoder framework for cross-domain audio representation learning. Overcoming the limitations of static concatenation, WQ-Fusion integrates whisper and qwen via an Adaptive Feature Modulation module and a novel element-wise gated attention mechanism. This design enables dynamic feature selection, allowing the model to selectively emphasize relevant acoustic and semantic dimensions. Extensive experiments on the Interspeech 2026 Audio Encoder Capability Challenge (Track A) benchmark demonstrate that by effectively routing heterogeneous information, WQ-Fusion achieves a superior overall score of 0.836, significantly outperforming the strongest single-encoder baseline.


[48] 2606.26580

Graph-Based ECG Synthesis with Activation-Consistency Certification and Diagnostics-Aware Morphology Curation

Synthetic electrocardiogram (ECG) generation can support algorithm development and robustness evaluation, but simulated signals must preserve interpretable activation, recovery, and morphology properties. We present a graph-based ECG synthesis framework that combines activation-consistency certification with diagnostics-aware morphology curation. A unified heart graph supports an eikonal-template backend (ET) and a pseudo-diffusion reaction--eikonal backend (RE). We formulate graph Eikonal activation as a Bellman fixed-point problem and use the Bellman residual as a computable certificate for activation-time consistency. Each simulated ECG is evaluated by a two-stage diagnostics pipeline that separates metric computation from experiment-specific acceptance policies. On the cardiac graph, RE-derived activation times showed near-millisecond agreement with the Eikonal backbone and achieved $R^2=0.99876$ after causal predecessor filtering. Recovery experiments showed that endo-epicardial APD gradients determined the main T-wave morphology window, whereas the diffusion strength $\kappa$ provided secondary repolarization smoothing. In final balanced multi-lead curation, RE accepted 658/2000 samples versus 578/2000 for ET and increased per-model morphology coverage from 0.09248 to 0.09888. The framework provides a conservative basis for controllable and curated synthetic ECG generation.


[49] 2606.26615

TaskTok: Delving into Task Tokens for Task-driven Image Restoration

While traditional image restoration focuses on perceptual quality, Task-Driven Image Restoration (TDIR) aims to maximize the performance of downstream high-level vision tasks. Recent approaches leveraging generative priors have shown promise for TDIR; however, they typically suffer from computational inefficiency and potential semantic alteration by indiscriminately updating all latent tokens. In this paper, we posit that not all visual information is equally important for machine perception. Through an analysis of the latent token space, we observe that task-relevant cues are unevenly distributed across the token sequence, exhibiting index-wise specialization. This suggests that selectively refining a subset of tokens can be sufficient for task-driven objectives. Leveraging this insight, we propose TaskTok, a novel framework that selectively restores only task-relevant tokens via a learnable token switch and a lightweight token refinement module. Extensive experiments across image classification, semantic segmentation, and object detection demonstrate that TaskTok significantly enhances task performance with high computational efficiency. The source code is available at this https URL


[50] 2606.26636

FracEvent: Event-Camera Simulation via Fractional-Relaxation Pixel Dynamics

Event cameras asynchronously report brightness changes with microsecond-level temporal resolution, but real event data remain difficult to collect at scale because specialized sensors, careful synchronization, and task-specific annotations are required. Event-camera simulation is therefore important to event-based vision tasks. Most practical simulators build on contrast-threshold event generation, some with additional filtering, stochastic noise, or hand-tuned sensor parameters. While effective, such formulations often simplify the temporal structure produced by the lifecycle of each pixel, which can distort event timing and weaken downstream transfer. We introduce FracEvent, an event simulator that models this pixel-level lifecycle with fractional-relaxation voltage dynamics. Given a log-intensity trajectory, FracEvent drives a compact stack of relaxation modes, combines their responses into a voltage state, emits ON/OFF events by localizing threshold crossings on the continuous voltage trajectory, and updates the reference while retaining the underlying memory modes. This retained state links residual voltage response to later event timing. We evaluate FracEvent through event-stream comparison and downstream transfer on image reconstruction and optical flow estimation. Across multiple datasets, FracEvent improves the temporal structure of generated events and achieves stronger downstream-transfer results than competing simulator baselines, showing its practical value for event-camera simulation.


[51] 2606.26780

Event-based Gaze Control System for Accurate Real-time Spin Estimation in Professional Ball Games

Spin plays a crucial role in many ball sports due to its effect on the trajectory of the ball. Vision-based estimation of the ball's spin during a game with conventional cameras is challenging due to the ball's small size, high speed, and fast rotation. To address these challenges, we propose an event-based active vision system that can track unmodified balls and measure their spin in real-time. The system consists of an event camera for its high temporal resolution and minimal motion blur, high-speed pan/tilt galvanometer mirrors to keep the ball in the field of view, and a low-latency focus-tunable telephoto lens to increase the spatial resolution on the ball and keep it in focus. To track the ball, we use a hybrid approach that combines 2D event-based detection for centering and 3D positions from a ball localization system for re-initialization. For high-accuracy spin estimation, we propose an offline method that performs contrast maximization on the sphere (s-CMax). This method achieves state-of-the-art accuracy on static balls across multiple sports (table tennis, baseball, tennis, and golf), with mean magnitude and axis errors of 2.1% and 4.0 degrees, respectively. We then develop a low-latency online method for table tennis as a case study in real-time applications. This method uses an uncertainty-aware convolutional neural network trained on pseudo-ground-truth spin labels from the offline approach, combined with a GPU-accelerated batch implementation of contrast maximization for refinement. We demonstrate reliable tracking and spin estimation with a three-view setup during professional table tennis matches, with high accuracy (8.8% magnitude and 6.4 degrees axis mismatch), 3 ms latency, and 750 Hz throughput.


[52] 2606.26824

wav2tok 2.0: Scalable Audio Tokenization Maintaining Explicit Pairwise Token Alignment for Efficient Audio Retrieval

Learning discrete speech representations that preserve similarity across variable-length utterances is central to query-by-example spoken term detection (QbE-STD). While wav2tok introduced CTC-based sequence alignment to enforce token consistency, its tightly coupled clustering and alignment training recipe limits scalability. We propose wav2tok 2.0, a scalable alignment-aware speech tokenizer built on the BEST-STD backbone. wav2tok 2.0 employs staged training, first learning discriminative, speaker-invariant representations via contrastive learning and vector quantization, and then enforcing pairwise token consistency using a CTC alignment loss and a novel DTW-aligned framewise prediction objective with adaptive weighting. Experiments show that wav2tok 2.0 consistently outperforms BEST-STD and general-purpose tokenizers on QbE-STD while remaining efficient and scalable.


[53] 2606.26846

Construction of Lyapunov density for nonautonomous dynamical systems on hypertorus

We present a semidefinite programming framework for constructing time-varying Lyapunov densities for nonautonomous dynamical systems on a hypertorus. The formulation leverages Gram matrix representations of hybrid (real-trigonometric) polynomials. In addition, we introduce a novel block decomposition of these Gram representations to confine the blow-up of the resulting density to a prescribed set. The results are then applied to establish the almost global synchronization of a time-varying Kuramoto model and the robust almost-global stability of a parameter-varying nonautonomous system. These examples demonstrate the applicability of the proposed method and validate the theoretical results. All computational results are obtained using an open-source MATLAB implementation, as referenced in the text, thereby facilitating reproducibility of the reported examples.


[54] 2606.26975

XMSE-Aware Adaptive Empirical Bayes Estimation

Empirical Bayes (EB) estimators can match the first-order asymptotic risk of maximum likelihood (ML) while behaving very differently at second order: recent excess mean squared error (XMSE) analysis shows that kernel-based EB estimation may be worse than ML when the kernel is poorly aligned with the true parameter. This paper turns that diagnostic into a design principle. We propose an XMSE-aware mixed estimator that interpolates between ML and EB shrinkage. Its fixed-weight XMSE is a scalar quadratic, yielding a closed-form oracle mixing weight that is no worse than both ML and the base EB estimator at the XMSE scale. A plug-in implementation based on finite-sample XMSE approximations is proved consistent, with a second-order oracle regret rate for an interior oracle weight. We further establish a transfer of the regret bound to the fixed-weight risk curve evaluated at the selected weight, a thresholded boundary rule, and extensions to compact kernel families and to finite and growing kernel dictionaries with high-probability oracle bounds. Finite impulse response simulations with SURE-tuned, hard-selection, and trace-corrected baselines, together with the public Silverbox and Cascaded Tanks benchmarks, show that the proposed estimator retains most of the benefit of regularization when it is helpful and retreats toward ML under kernel misspecification, with an identified finite-de analyzed on the benchmarks.


[55] 2606.27044

Physical Layer Authentication With Channel Knowledge Maps in Indoor Environments

Physical layer authentication (PLA) allows to authenticate the user by comparing measurements over time, assuming their time consistency or by modeling their evolution. However, these assumptions become problematic when devices are in motion and in indoor environments due to multipath propagation and obstructions. In this paper, we propose a PLA mechanism for moving devices in indoor environments, where multiple access points (APs) estimate the dominant channel tap path loss (PL) and angle of arrival (AoA) from the received signals and compare them with previously collected channel knowledge maps (CKMs). Specifically, the measurements are compared to those in the neighborhood of the previously known position obtained from CKMs. A comprehensive security analysis is conducted under both random and optimal attacks. Numerical results in a representative indoor scenario, with CKM obtained via ray tracing, validate the effectiveness of the proposed PLA approach.


[56] 2606.27084

Pseudo-Text-Conditioned 3D Grounding DINO for Organ Localization in Abdominal CT

Reliable organ localization in abdominal CT can provide spatial priors for downstream trauma analysis. We propose CT-3GDINO, a lightweight 3D detector that adapts a Grounding-DINO-style query-based architecture to fixed organ localization using frozen pseudo-text class tokens instead of a real text encoder. The model combines a Swin3D visual backbone, bidirectional feature enhancement, pseudo-text-guided query selection, and a cross-modality decoder to predict normalized 3D boxes for liver, spleen, left kidney, right kidney, and bowel. We train and evaluate on 193 matched RSNA/RATIC CT volumes with segmentation-derived boxes. The best multi-scale model, trained from scratch, achieves 0.5830 overall top-1 class-wise mAP over 3D IoU thresholds from 0.1 to 0.7, outperforming fixed- and trainable-backbone classification-pretrained variants with 0.5570 and 0.4657 mAP. Performance is strong for coarse localization, with 0.9649 AP at IoU 0.1, but remains limited for strict box alignment, with 0.1552 AP at IoU 0.7. These results establish CT-3GDINO as an open-source baseline for pseudo-text-conditioned 3D organ localization and motivate future work on localization-aware pretraining, richer multimodal conditioning, and injury-focused detection.


[57] 2306.14634

Generalized Graph Signal Sampling by Difference-of-Convex Optimization

We propose a comprehensive framework for the generalized sampling and recovery of generalized graph signals by leveraging difference-of-convex (DC) optimization. A fundamental challenge in graph signal processing is sampling, especially for graph signals that are not bandlimited. To accurately capture complex real-world phenomena, it is essential to handle beyond bandlimited graph signals, moving past traditional bandlimited assumptions. Consequently, extending the generalized sampling theory to graph signals has been studied, enabling the best possible recovery for a wide range of signals by assuming signal priors. However, achieving the best possible recovery requires handling inherently non-convex and computationally intractable constraints such as full rank constraint. As a result, existing methods have relied on either aggressive convex relaxations that sacrifice accuracy or greedy algorithms that risk falling into poor suboptimal solutions, facing a fundamental dilemma between modeling accuracy and optimization tractability. To overcome this dilemma, we propose a DC optimization-based method for designing an aggregation sampling operator for beyond bandlimited graph signals that comprehensively handles arbitrary signal priors assumed in the generalized sampling theory. Specifically, the intractable full rank constraint is tightly relaxed using the nuclear norm, reformulating the design problem into a DC optimization problem. We developed a solver based on the general double-proximal gradient DC algorithm, which theoretically guarantees convergence to a critical point. Experimental results on synthetic and real-world data demonstrate the superiority of our method in sampling and recovering beyond bandlimited graph signals compared to existing approaches.


[58] 2503.07414

Cost-Effective Design of Grid-tied Community Microgrid

This study aims to develop a cost-effective microgrid (MG) design that optimally balances the economic feasibility, reliability, efficiency, and environmental impact in a grid-tied community MG. A multi-objective optimization framework is first employed to generate feasible MG configurations considering economic, reliability, efficiency, and environmental objectives. Subsequently, a preference-based deep reinforcement learning (DRL) framework is utilized to evaluate and select preferred configurations using a scalarized reward function. This combined approach enables systematic exploration of trade-offs among conflicting objectives and supports informed decision-making for community MG planning. Sensitivity analyses are conducted to evaluate the system performance under varying load demand and renewable energy fluctuations. Besides, an economic sensitivity assessment examines the impact of electricity prices and capital costs on the levelized cost of energy (LCOE). The proposed MG configuration achieves high reliability, satisfying 100% of the load, even under adverse weather conditions. The proposed framework attains an efficiency of 91.99\% while maintaining a carbon footprint of 302,747 kg/year, which is approximately 95\% lower than the annual emissions associated with a conventional grid-supplied energy system. The economic analysis indicates a net present cost of \$4.83M with a competitive LCOE of \$0.208/kWh. In addition, the operation cost is \$201,473 per year with a capital investment of \$1.42M, rendering it a financially viable alternative to conventional grid-dependent systems. This work can be valuable in identifying effective solutions for supplying reliable and cost-effective power to regional and remote areas.


[59] 2508.17774

Linear Power System Modeling and Analysis Across Wide Operating Ranges: A Hierarchical Neural State-Space Equation Approach

As modern power systems exhibit increasingly high-dimensional, nonlinear, and uncertain characteristics, the applicability of classical linear state-space methods is severely challenged. Existing paradigms struggle to reconcile the analytical transparency of physics-based models with the continuous nonlinear generalization of AI. To address this, the Hierarchical Neural State-Space Equation (HNSSE) framework is proposed. At the component level, the formulated Neural State-Space Equation (NSSE) extends neural ordinary differential equations (NODEs) to learn continuous dynamic manifolds across varying conditions while strictly preserving local analytical transparency. At the system level, a hierarchical architecture analytically fuses components via network constraints, constructing an interaction-consistent global NODE while circumventing the curse of dimensionality. To ensure robust convergence under noisy measurements, a training strategy synergizing spatiotemporal slicing, physics-informed curriculum learning, and Expectation-Maximization-based refinement is established. Validation on the large-scale Guangdong Power Grid demonstrates the framework's remarkable performance in interpretable state-space reconstruction, high-fidelity trajectory prediction, continuous stability perception, and noise robustness. Comprehensive comparisons substantiate HNSSE's superiority as a unified, interpretable paradigm for complex power system modeling.


[60] 2510.15347

Symmetric Entropy-Constrained Video Coding for Machines

As video transmission increasingly serves machine vision systems (MVS) instead of human vision systems (HVS), video coding for machines (VCM) has become a critical research topic. Existing VCM methods often bind codecs to specific downstream models, requiring retraining or supervised data, thus limiting generalization in multi-task scenarios. Recently, unified VCM frameworks have employed visual backbones (VB) and visual foundation models (VFM) to support multiple video understanding tasks with a single codec. They mainly utilize VB/VFM to maintain semantic consistency or suppress non-semantic information, but seldom explore how to directly link video coding with understanding under VB/VFM guidance. Hence, we propose a Symmetric Entropy-Constrained Video Coding framework for Machines (SEC-VCM). It establishes a symmetric alignment between the video codec and VB, allowing the codec to leverage VB's representation capabilities to preserve semantics and discard MVS-irrelevant information. Specifically, a bi-directional entropy-constraint (BiEC) mechanism ensures symmetry between the process of video decoding and VB encoding by suppressing conditional entropy. This helps the codec to explicitly handle semantic information beneficial to MVS while squeezing useless information. Furthermore, a semantic-pixel dual-path fusion (SPDF) module injects pixel-level priors into the final reconstruction. Through semantic-pixel fusion, it suppresses artifacts harmful to MVS and improves machine-oriented reconstruction quality. Experimental results on classical video understanding tasks and MLLM-based tasks show SOTA rate-task performance. It achieves significant bitrate savings over H.266/VVC reference software VTM on video instance segmentation (37.4%), video object segmentation (29.8%), object detection (46.2%), multiple object tracking (44.9%), and MLLM-based video grounding (97.6%).


[61] 2601.13962

Optimal Calibration of the Endpoint-corrected Hilbert Transform

Accurate, low-latency estimates of the instantaneous phase of oscillations are essential for closed-loop sensing and actuation, including (but not limited to) phase-locked neurostimulation and other real-time applications. The endpoint-corrected Hilbert transform (ecHT) reduces boundary artefacts of the Hilbert transform by applying a causal narrow-band filter to the analytic spectrum. This improves the phase estimate at the most recent sample. Despite its widespread empirical use, the systematic endpoint distortions of ecHT have lacked a principled, closed-form analysis. In this study, we derive the ecHT endpoint operator analytically and demonstrate that its output can be decomposed into a desired positive-frequency term (a deterministic complex gain that induces a calibratable amplitude/phase bias) and a residual leakage term that sets an irreducible variance floor. This yields (i) an explicit characterisation and bounds for endpoint phase/amplitude error, (ii) a mean-squared-error-optimal scalar calibration, and (iii) practical design rules relating window length, filter bandwidth and order, and centre-frequency mismatch to residual bias via an endpoint group delay. The resulting calibrated ecHT achieves near-zero mean phase error and remains computationally compatible with real-time pipelines. Code and analyses are provided at this https URL.


[62] 2602.03762

Conditional Flow Matching for Visually-Guided Acoustic Highlighting

Visually-guided acoustic highlighting seeks to rebalance audio in alignment with the accompanying video, creating a coherent audio-visual experience. While visual saliency and enhancement have been widely studied, acoustic highlighting remains underexplored, often leading to misalignment between visual and auditory focus. Existing approaches use discriminative models, which struggle with the inherent ambiguity in audio remixing, where no natural one-to-one mapping exists between poorly-balanced and well-balanced audio mixes. To address this limitation, we reframe this task as a generative problem and introduce a Conditional Flow Matching (CFM) framework. A key challenge in iterative flow-based generation is that early prediction errors -- in selecting the correct source to enhance -- compound over steps and push trajectories off-manifold. To address this, we introduce a rollout loss that penalizes drift at the final step, encouraging self-correcting trajectories and stabilizing long-range flow integration. We further propose a conditioning module that fuses audio and visual cues before vector field regression, enabling explicit cross-modal source selection. Extensive quantitative and qualitative evaluations show that our method consistently surpasses the previous state-of-the-art discriminative approach, establishing that visually-guided audio remixing is best addressed through generative modeling.


[63] 2602.24150

Channel Estimation for Beyond Diagonal RIS Exploiting Core Tensor Sparsity

Beyond diagonal reconfigurable intelligent surfaces (BD-RISs) enhance wave manipulation through inter-element couplings but pose significant channel estimation challenges due to cascaded channels and block-Kronecker structures. This paper proposes a compressive sensing framework exploiting the sparse Tucker decomposition of the measurement tensor and the Kronecker rank-one structure of channel components. Two algorithms are developed: Sparse Tensor Orthogonal Recovery Method (STORM), which uses orthogonal matching pursuit (OMP) for greedy support recovery, and Sparse Tensor subspace- Aided Recovery (STAR), which leverages subspace-based projection to enhance robustness to noise. Both perform joint sparse support identification, followed by a Kronecker rank-one factorization via singular value decomposition (SVD) to recover the channel parameters. Simulations show that STAR achieves oracle-assisted least squares (LS) performance at moderate-to-high signal-to-noise ratio (SNR) with significantly fewer measurements than baseline methods, enabling practical BD-RIS deployment in next-generation millimeter wave (mmWave)/sub-terahertz (sub-THz) networks.


[64] 2603.22107

Sample-based detectability and moving horizon state estimation of continuous-time systems

In this paper we propose a detectability condition for nonlinear continuous-time systems with irregular/infrequent output measurements, namely a sample-based version of incremental integral input/output-to-state stability (i-iIOSS). We provide a sufficient condition for an i-iIOSS system to be sample-based i-iIOSS. This condition is also exploited to analyze the relationship between sample-based i-iIOSS and sample-based observability for linear systems, such that previously established sampling strategies for linear systems can be used to guarantee sample-based i-iIOSS. Furthermore, we present a sample-based moving horizon estimation scheme, for which robust stability can be shown. Finally, we illustrate the applicability of the proposed estimation scheme through a biomedical simulation example.


[65] 2604.08330

Group-invariant moments under tomographic projections

Let $f:\mathbb{R}^n\to\mathbb{R}$ be an unknown object, and suppose the observations are tomographic projections of randomly rotated copies of $f$ of the form $Y = P(R\cdot f)$, where $R$ is Haar-uniform in $\mathrm{SO}(n)$ and $P$ is the projection onto an $m$-dimensional subspace, so that $Y:\mathbb{R}^m\to\mathbb{R}$. We prove that, whenever $d\le m$, the $d$-th order moment of the projected data determines the full $d$-th order Haar-orbit moment of $f$, independently of the ambient dimension $n$. We further provide an explicit algorithmic procedure for recovering the latter from the former. As a consequence, any identifiability result for the unprojected model based on the $d$-th order group-invariant moment extends directly to the tomographic setting at the same moment order. In particular, for $n=3$, $m=2$, and $d=2$, our result recovers a classical result in the cryo-EM literature: the covariance of the 2D projection images determines the second order rotationally invariant moment of the underlying 3D object.


[66] 2604.26136

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this paper, we address this challenge through our system submission to the International Conference on Spoken Language Translation (IWSLT 2026), the Cross-Lingual Voice Cloning shared task. First, we evaluate several state-of-the-art voice cloning models for cross-lingual speech generation of scientific texts in Arabic, Chinese, and French. Then, we build voice cloning systems based on the OmniVoice foundation model. We employ data augmentation via multi-model ensemble distillation from the ACL 60/60 corpus. We investigate the effect of using this synthetic data for fine-tuning, demonstrating improvements in intelligibility (WER & CER) and speaker similarity (SIM), with gains varying across languages.


[67] 2605.10152

Online Learning-Based Control with Guaranteed Error Bounds for a Class of Nonlinear Systems

In this paper, we present a learning-based control for a class of nonlinear systems that guarantees exponential stability as well as bounded output errors. The control is based on the Gaussian Process Submodel Online Learning (GPSOL) algorithm and the Disturbance Error Rate Limiting (DERL) algorithm, both of which were developed in previous work. The GPSOL algorithm provides a method to learn Gaussian Process (GP) models for subsystems online, whereas the DERL algorithm allows to limit the rate of the prediction error of these GP models. The focus of this paper is the utilization of the GP model within an adaptive controller and the derivation of corresponding stability conditions and system peak-to-peak gains by means of linear matrix inequalities (LMIs). These peak-to-peak gains are then used to prescribe a desired prediction error rate for the DERL algorithm to achieve user-defined output error bounds. The gains and the related bounds were successfully verified using a simulation model. Furthermore, results form a successful experimental validation of the bounds and the overall control structure on a pneumatic test rig are presented. While the control scheme and error bounds proposed in this paper are limited to first-order single-input-single-output systems, an extension to certain classes of higher-order and multiple-input-multiple-output systems is expected to be forthcoming.


[68] 2605.23354

Physics-informed sparse identification-based tube model predictive control for aerial vehicles

Autonomous aerial vehicles necessitate control strategies that balance computational efficiency with robust performance in dynamic operational environments. This paper proposes a model predictive control (MPC) framework for aerial platforms that leverages physics-informed machine learning (PIML) to achieve an optimal balance between computational tractability and robust performance. At the core of the proposed approach lies a sparse, control-affine model identified via the PIML method, which provides a parsimonious yet interpretable representation of the system dynamics by embedding first-principles knowledge and learning residual uncertainties from operational data. This model is incorporated within a robust MPC scheme that adopts a high-order Runge-Kutta discretization to ensure prediction accuracy and an adaptive tube-based mechanism to guarantee constraint satisfaction under uncertainty. The online adaptation of the tube, directly informed by the residual error of the PIML model, ensures robust stability without introducing excessive conservatism. Rigorous theoretical proofs are provided to establish recursive feasibility and stability. Numerical simulations and experiments on a quadrotor demonstrate that our method significantly reduces computational load compared to nonlinear MPC and robust MPC using a first-principles high-fidelity model, while outperforming PID, nonlinear MPC, neural-network-based MPC, and fixed-tube robust MPC in tracking performance and robustness, showcasing the practical efficiency of the proposed PIML-based control synthesis for resource-constrained aerial systems.


[69] 2606.14223

Toward Deeper Environmental Understanding: Event-Level Sensing for Intelligent 6G ISAC

The intelligent evolution of mission-critical networks, such as the Internet of vehicles (IoV) and the low-altitude economy (LAE), requires sixth-generation (6G) networks to move beyond discrete physical parameter estimation toward deeper environmental understanding. However, existing integrated sensing and communications (ISAC) studies mainly focus on target-level sensing, which provides fragmented snapshots of the physical world and lacks the behavioral semantic capability to interpret intent. This limitation hinders the intelligent evolution of such networks and prevents 6G from acquiring the essential sensing foundation to evolve into an "intelligent service engine". To bridge this gap, ISAC must advance toward event-level sensing, which models continuous-time states to enable persistent recognition and prediction of target intent and behavioral semantics. This article presents a comprehensive overview of event-level sensing in 6G ISAC networks. We first introduce its fundamental concepts, sensing types, and representative scenarios. We then review key enabling techniques across waveform design, target state estimation and tracking, and event recognition. Furthermore, focusing on IoV and LAE scenarios, we discuss representative applications of ISAC event-level sensing and the intelligent enhancement of downstream operational functions enabled by event-level information. Finally, we highlight future research trends and potential directions to further advance ISAC event-level sensing toward intelligent and proactive 6G networks.


[70] 2606.17594

Low-Thrust Orbital Differential Games with Speed Constraint Enforcement Using Cost Weighting

This paper considers the problem of a low-thrust spacecraft pursuit-evasion differential game with an arbitrary terminal relative speed constraint. It addresses the terminal phase of the engagement for two relatively close spacecraft near a circular orbit. The problem is formulated as a linear-quadratic zero-sum differential game, with soft constraints on the terminal relative position and velocity, and running costs on the players' control efforts. An analytical, closed-loop, minimum-fuel-consumption optimal guidance law is derived for each player, forming a saddle-point solution. It is proven that any terminal speed can be achieved by properly choosing the weighting parameters of the cost function. To verify the optimality of the solution, a conjugate point analysis is performed when the cost function velocity weighting matrix is either positive or negative definite. The negative-definite case arises at high terminal speeds and is seldom seen in the literature. The performance of the derived guidance law is evaluated in simulations for different target maneuvers and compared to a state-of-the-art optimal-control-based guidance law. The simulations show that the derived guidance law satisfies the constraints and offers a substantial advantage over the optimal-control-based guidance law when the target is optimally evading.


[71] 2606.25452

Control Barrier Function only Formation Tracking in Multi-Agent Systems

This paper presents a real-time control framework for formation tracking of heterogeneous multi-agent systems with non-linear dynamics. The proposed method formulates a single Control Barrier Function-like constraint within a quadratic optimization setting that addresses formation tracking. Relying on the relative information of neighboring agents, the controller is designed to operate without the need for manual parameter tuning or a separate nominal formation controller. The leader-follower framework is validated through simulations of moving formations.


[72] 2507.00853

Ranking Quantilized Mean-Field Games with an Application to Early-Stage Venture Investments

Quantilized mean-field game models involve quantiles of the population's distribution. We study a class of such games with a capacity for ranking games, where the performance of each agent is evaluated based on its terminal state relative to the population's $\alpha$-quantile value, $\alpha \in (0,1)$. This evaluation criterion is designed to select the top $(1-\alpha)\%$ performing agents. We provide two formulations for this competition: a target-based formulation and a threshold-based formulation. In the former and latter formulations, to satisfy the selection condition, each agent aims for its terminal state to be \textit{exactly} equal and \textit{at least} equal to the population's $\alpha$-quantile value, respectively. For the target-based formulation, we obtain an analytic solution and demonstrate the $\epsilon$-Nash property for the asymptotic best-response strategies in the $N$-player game. Specifically, the quantilized mean-field consistency condition is expressed as a set of forward-backward ordinary differential equations, characterizing the $\alpha$-quantile value at equilibrium. For the threshold-based formulation, we obtain a semi-explicit solution and numerically solve the resulting quantilized mean-field consistency condition. Subsequently, we propose a new application in the context of early-stage venture investments, where a venture capital firm financially supports a group of start-up companies engaged in a competition over a finite time horizon, with the goal of selecting a percentage of top-ranking ones to receive the next round of funding at the end of the time horizon. We present the results and interpretations of a set of numerical experiments for both formulations discussed in this context, which illustrate that the target-based formulation closely approximates the threshold-based formulation in the scenarios considered.


[73] 2509.15622

Deep Regularized RNNs for Virtual Analog Modeling

Virtual analog (VA) modeling methods seek to emulate analog audio hardware using digital signal processing (DSP). Modeling approaches fall into three broad categories: white-box methods, which use detailed device knowledge for accurate simulation; gray-box methods that use generic DSP blocks to model the system; and black-box methods, which rely solely on opaque models learned from input-output data. A category of architectures used widely in black-box modeling are recurrent neural networks (RNNs). To model device controls, the control values can be provided as conditioning input to the network. However, when the conditioning is time-varied, the models are susceptible to producing noise artifacts. Regularization of the RNN dynamics significantly reduces these artifacts, though at a loss in modeling accuracy. This paper closes the dynamics regularization quality gap by introducing deep control-conditioned LSTMs and a gammatone filterband (GFB) loss. Experiments indicate that the proposed method achieves comparable modeling performance as unregularized baselines while avoiding the noise artifacts caused by time-varying control inputs.


[74] 2510.22517

Data-driven Sensor Placement for Predictive Applications: A Correlation-Assisted Attribution Framework (CAAF)

Optimal sensor placement (OSP) is critical for efficient, accurate monitoring, control, and inference in complex physical systems. We propose a machine-learning-based feature attribution (FA) framework to identify OSP for target predictions. FA quantifies input contributions to a model output; however, it struggles with highly correlated input data often encountered in practical applications for OSP. To address this, we propose a Correlation-Assisted Attribution Framework (CAAF), which introduces a clustering step on the candidate sensor locations before performing FA to reduce redundancy and enhance generalizability. We first illustrate the core principles of the proposed framework through a series of validation cases, then demonstrate its effectiveness in realistic dynamical systems such as structural health monitoring, airfoil lift prediction, and wall-normal velocity estimation for turbulent channel flow. The results show that the CAAF outperforms alternative approaches that typically struggle due to the presence of nonlinear dynamics, chaotic behavior, and multi-scale interactions, and enables the effective application of FA for identifying OSP in real-world environments.


[75] 2601.01084

A UAV-Based Multispectral and RGB Dataset for Multi-Stage Paddy Crop Monitoring in Indian Agricultural Fields

We present a large-scale unmanned aerial vehicle (UAV)-based RGB and multispectral image dataset collected over paddy fields in the Vijayawada region, Andhra Pradesh, India, covering nursery to harvesting stages. We used a 20-megapixel RGB camera and a 5-megapixel four-band multispectral camera capturing red, green, red-edge, and near-infrared bands. Standardised operating procedure (SOP) and checklists were developed to ensure repeatable data acquisition. Our dataset comprises of 42,430 raw images (415 GB) captured over 5 acres with 1 cm/pixel ground sampling distance (GSD) with associated metadata such as GPS coordinates, flight altitude, and environmental conditions. Captured images were validated using Pix4D Fields to generate orthomosaic maps and vegetation index maps, such as normalised difference vegetation index (NDVI) and normalised difference red-edge (NDRE) index. Our dataset is one of the few datasets that provide high-resolution images with rich metadata that cover all growth stages of Indian paddy crops. The dataset is available on IEEE DataPort with DOI, . It can support studies on targeted spraying, disease analysis, and yield estimation.


[76] 2601.08987

ABE-VVS: Attribute-Based Encrypted Volumetric Video Streaming

This work introduces ABE-VVS, a framework that performs attribute based selective coordinate encryption for point cloud based volumetric video streaming, enabling lightweight yet effective digital rights management (DRM). Rather than encrypting entire point cloud frames, our approach encrypts only selected subsets of coordinates ($X, Y, Z$, or combinations), lowering computational overhead and latency while still producing strong visual distortion that prevents meaningful unauthorized viewing. Our experiments show that encrypting only the $X$ coordinates achieves effective obfuscation while reducing encryption and decryption times by up to 50% and 80%, respectively, compared to full-frame encryption. To our knowledge, this is the first work to provide a novel end-to-end evaluation of a DRM-enabled secure point cloud streaming system. We deployed a point cloud video streaming setup on the CloudLab testbed and evaluated three HTTP-based Attribute-Based Encryption (ABE) granularities - ABE-XYZ (encrypting all $X,Y,Z$ coordinates), ABE-XY, and ABE-X against conventional HTTPS/TLS secure streaming as well as an HTTP-only baseline without any security. Our streaming evaluation demonstrates that ABE-based schemes reduce server-side CPU load by up to 80% and cache CPU load by up to 63%, comparable to HTTP-only, while maintaining similar cache hit rates. Moreover, ABE-XYZ and ABE-XY exhibit lower client-side rebuffering than HTTPS, and ABE-X achieves zero rebuffering comparable to HTTP-only. Although ABE-VVS increases client-side CPU usage, the overhead is not large enough to affect streaming quality and is offset by its broader benefits, including simplified key revocation, elimination of per-client encryption, and reduced server and cache load.


[77] 2603.17376

A Cycle-Based Solvability Condition for Real Power Flow Equations

Certifying power flow solvability is important for reliable power system operations under volatile operating conditions, but solving power flow equations repeatedly can be costly and may encounter convergence issues. In this paper, we develop an explicit cycle-based solvability condition for the lossless real power flow equations on meshed networks. We decompose every feasible nodal balance solution into a particular flow plus a cycle flow correction vector. The power flow problem is then reduced to enforcing edge-wise feasibility and cycle consistency. We show that the cycle consistency function is strongly monotone and is the gradient of a strongly convex energy function. By exploiting these properties, we derive an explicit condition for the existence and uniqueness of a power flow solution with bounded angle difference. The resulting condition is invariant under the choice of cycle basis and can be verified through simple algebraic computations. Numerical results on standard test systems show that the proposed condition is significantly less conservative than existing sufficient conditions and closely approximates true loading limits.