New articles on Electrical Engineering and Systems Science


[1] 2604.25937

SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment

Recent advancements in Text-to-Song generation have enabled realistic musical content production, yet existing evaluation benchmarks lack the professional granularity to capture multi-dimensional aesthetic nuances. In this paper, we propose SongBench, a specialized framework for fine-grained song assessment across seven key dimensions: Vocal, Instrument, Melody, Structure, Arrangement, Mixing, and Musicality. Utilizing this framework, we construct an expert-annotated database comprising 11,717 samples from state-of-the-art models, labeled by music professionals. Extensive experimental results demonstrate that SongBench achieves high correlation with expert ratings. By revealing fine-grained performance gaps in current state-of-the-art models, SongBench serves as a diagnostic benchmark to steer the development toward more professional and musically coherent song generation.


[2] 2604.25945

Planar Gaussian Splatting with Bilinear Spatial Transformer for Wireless Radiance Field Reconstruction

Wireless radiance field (WRF) reconstruction aims to learn a continuous, queryable representation of radio frequency characteristics over 3D space and direction, from which specific quantities, such as the spatial power spectrum (SPS) at a receiver given a transmitter position, can be predicted. While Gaussian splatting (GS)-based method has surpassed Neural Radiance Fields (NeRF)-based method for this task, existing adaptations largely transplant vision pipelines, limiting physical interpretability and accuracy. We introduce BiSplat-WRF, a planar GS framework that retains the expressiveness of 3D GS while removing unnecessary projections and incorporating global EM coupling and mutual scattering among primitives. Each primitive is a 2D planar Gaussian with 3D coordinates, rendered directly on the angular domain of the SPS. A bilinear spatial transformer (BST) aggregates inter-primitive relations on an angular grid and, via attention, captures long-range electromagnetic dependencies, thereby enforcing globally aware EM interactions that reflect the complex physics of the wireless environment. On spatial spectrum synthesis task, BiSplat-WRF surpasses NeRF-based and prior GS-based baselines with respect to the Structural Similarity Index (SSIM); comprehensive ablation studies validate the contribution of BST. We also provide a larger BiSplat-WRF+ variant that further increases SSIM at a higher computation cost, serving as a strong reference for future studies.


[3] 2604.26017

Optimal-Control Suggestion for Congestion on Freeways using Data Assimilation of Distributed Fiber-Optic Sensing

This paper presents the optimal-control suggestion for congestion on freeways using data assimilation (DA) of distributed fiber-optic sensing (DFOS). To simultaneously maximize throughput and avoid/mitigate congestion, it is necessary to execute optimal control for the current traffic state as active transportation and demand management (ATDM) according to multi-objective optimization with real-time monitoring data. However, optimal control cannot be estimated due to intermittent observed data obtained from conventional sensors. To solve the issue, this paper proposes the ATDM optimal control estimation with DA of DFOS, which can monitor traffic flow in real time without dead zones. Our real-time DA method enables us to estimate the effectiveness of control scenarios by simulation. This paper also provides a method to uniquely determine the optimal-control solution among the Pareto solutions for multi-objective optimization. Throughput and mean speed across the entire road are considered as the objective functions. Variable speed limit (VSL) and inflow control are taken as ATDM examples. Validation results on a Japanese freeway show that (i) the optimal control scenario varies depending on the traffic state, especially congestion level; (ii) optimal control considering VSL alone improves throughput by 5-14% while the improvement rate for mean speed is 0-8%; (iii) throughput and mean speed are improved by 10-15% and 20-30%, respectively when VSL and inflow control are considered. This paper also implies the importance of balance management for the lane occupancy and proactive optimal control before congestion occurs.


[4] 2604.26050

Risk Assessments for Evasive Emergency Maneuvers in Autonomous Vehicles

This paper presents a systematic verification and validation (V\&V) framework for the Evasive Minimum Risk Maneuver (EMRM) feature in autonomous vehicles, addressing a critical gap in existing safety assessment methods. We introduce the first formally integrated pipeline that unifies Hazard Analysis and Risk Assessment (HARA), System-Theoretic Process Analysis (STPA), and Finite State Machine (FSM) modeling into a single traceable workflow specifically designed for EMRM V\&V. HARA and STPA are combined through a structured hazard-loss mapping to identify hazards and unsafe control actions; an FSM layer captures hazard-to-loss state transitions that neither method models individually; and the unified framework drives automated scenario generation with measurable parameter-space coverage. Applied to a T-junction EMRM case study, the framework guides 1{,}880 RRT-based simulations spanning ego speed, time-to-collision (TTC), and road friction, uncovering a key physical result: the T-junction geometry gives nearly equal difficulty to stopping and to navigating, so the intermediate mitigation mode occupies only 1.9\% of the feasible parameter space. EMRM steering strategies achieve 81\% collision-avoidance rate and reduce mean residual impact speed from 18.9~km/h to 9.0~km/h compared with emergency braking alone, while the framework attains 100\% hazard, UCA, and parameter-space coverage versus $\leq$1\% for traditional methods. These results demonstrate that the integrated HARA-STPA-FSM framework enables high-resolution, traceable EMRM V\&V that is not achievable with any single method in isolation.


[5] 2604.26057

Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

Supervised contrastive learning (SupCon) is widely used to shape representations, but has seen limited targeted study for audio deepfake detection. Existing work typically combines contrastive terms with broader pipelines; however, the focus on SupCon itself is missing. In this work, we run a controlled study on wav2vec2 XLS-R (300M) that varies (i) similarity in SupCon (cosine vs angular similarity derived from the hyperspherical angle) and (ii) negative scaling using a warm-started global cross-batch queue. Stage 1 fine-tunes the encoder and projection head with SupCon; Stage 2 freezes them and trains a linear classifier with BCE. Trained on ASVspoof 2019 LA and evaluated on ASV19 eval plus ITW and ASVspoof 2021 DF/LA, Cosine SupCon with a delayed queue achieves the best ITW EER (8.29%) and pooled EER (4.44), while angular similarity performs strongly without queued negatives (ITW 8.70), indicating reduced reliance on large negative sets.


[6] 2604.26126

Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems

This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.


[7] 2604.26132

Sparse Graph Learning from Sparse Data via Fiedler Number Maximization

We aim to learn a sparse and connected graph from sparse data, where the number of observations K can be substantially smaller than the signal dimension N for signals x in R^N, and the underlying distribution is unknown. In this severely ill-posed setting, we incorporate Fiedler number (the second eigenvalue of the graph Laplacian matrix that quantifies connectedness) as a robust regularization term in the sparse graph learning objective. We first develop a greedy algorithm that iteratively selects one edge globally for weakening/removal to reduce the objective, leveraging eigenvalue perturbation theorems that bound the adverse effect of an edge change to the Fiedler number. Next, we design a parallel variant, based on the Cheeger's inequality, that recursively partitions an input graph into two sub-graphs using an approximate Cheeger cut to distributedly find an optimal edge. Simulation experiments show that Fiedler number maximization robustifies sparse graph estimates, outperforming previous sparse graph learning algorithms.


[8] 2604.26136

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this paper, we address this challenge through our system submission to the International Conference on Spoken Language Translation (IWSLT 2026), the Cross-Lingual Voice Cloning shared task. First, we evaluate several state-of-the-art voice cloning models for cross-lingual speech generation of scientific texts in Arabic, Chinese, and French. Then, we build voice cloning systems based on the OmniVoice foundation model. We employ data augmentation via multi-model ensemble distillation from the ACL 60/60 corpus. We investigate the effect of using this synthetic data for fine-tuning, demonstrating consistent improvements in intelligibility (WER and CER) across languages while preserving speaker similarity.


[9] 2604.26172

Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control

We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach {co-learns} a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected under the current control policy, and the controller is re-optimized on the updated model. Both components are parameterized by neural networks that embed the pH {dynamics} and EB-PBC structure, ensuring interpretability in terms of energy {interactions}. The learned controller renders the closed-loop system inherently passive and provably stable, and exploits passive plant dynamics without canceling the natural potential. A dissipation regularization enforces strict energy decay during training, thereby enhancing robustness to sim-to-real gaps. The proposed framework is validated on state-regulation and swing-up tasks for planar and torsional pendulum systems.


[10] 2604.26200

Blind OFDM-ISAC Relying on Asymmetric Modem Constellations

Integrated sensing and communication (ISAC) is increasingly expected to operate under aggressive spectrum reuse, where co-channel orthogonal frequency division multiplexing (OFDM) interference can be catastrophic for data recovery on the time-frequency (TF) grid. We show that supporting blind ISAC is feasible by exploiting a fundamental asymmetry in the impact of co-channel OFDM interference: while communication is fragile on the TF grid, sensing depends on structured physical parameters whose signatures remain identifiable by relying on higher-order statistics. Based on this observation, we construct a fourth-order measurement tensor from the received OFDM signal whose coherent component preserves the delay-, Doppler-, and angle-dependent phase evolution of each source. We then develop a three-dimensional higher-order-statistics (HOS) based periodogram for iterative peak search and refinement to jointly estimate both range, velocity, and angle in the presence of unknown co-channel interferers. We further exploit constellation asymmetry to resolve the remaining phase ambiguities of blind recovery, enabling blind coherent demodulation via minimum constellation fitting. We also benchmark the performance through matched data-aided and stochastic Cramer-Rao lower bounds. We then quantify the cost of signal blindness. Simulations and experimental validations demonstrate reliable radar parameter estimation together with effective communication demodulation even when the TF-domain link is severely interfered with.


[11] 2604.26281

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard prosody for privacy or lack a principled mechanism to control the utility-privacy trade-off, operating at fixed design points. We propose DiffAnon, a diffusion-based anonymization method with classifier-free guidance (CFG) that provides explicit, continuous inference-time control over prosody preservation. DiffAnon refines acoustic detail over semantic embeddings of an RVQ codec, enabling smooth interpolation between anonymization strength and prosodic fidelity within a single model. To the best of our knowledge, it is the first voice anonymization framework to provide structured, interpolatable inference-time prosody control. Experiments demonstrate structured trade-off behavior, achieving strong utility while maintaining competitive privacy across controllable operating points.


[12] 2604.26296

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

Conventional neural speech codecs suffer from severe intelligibility degradation at ultra-low bitrates, where the bottleneck transitions from acoustic distortion to semantic loss. To address this issue, this paper conducts a systematic investigation into the role and fundamental limits of integrating frozen semantic priors -- specifically HuBERT and Whisper -- into neural speech coding. We introduce and quantitatively validate a novel Semantic Retirement phenomenon: while semantic constraints reduce the Word Error Rate (WER) by up to ~10% relatively at 1.5 kbps, their benefits rapidly diminish beyond 6 kbps, indicating a practical capacity boundary. We further uncover a clear trade-off between different prior types: acoustic-rich priors (HuBERT) better preserve prosodic and timbral details, whereas high-level linguistic priors (Whisper) effectively suppress phonetic hallucinations in noisy environments (reducing hallucination rates by 26 percent) and substantially narrow the generalization gap for unseen speakers. Building on these findings, we propose a bitrate-aware regulation strategy that dynamically adjusts prior strength to optimize the trade-off between semantic consistency and perceptual naturalness. Extensive experimental evaluations confirm that our approach achieves competitive intelligibility and noise robustness compared to existing baselines, offering a principled pathway toward ultra-low-bitrate generative speech coding.


[13] 2604.26327

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

Cross-lingual speaker verification suffers from severe language-speaker entanglement. This causes systematic degradation in the hardest scenario: correctly accepting utterances from the same speaker across different languages while rejecting those from different speakers sharing the same language. Standard adversarial disentanglement degrades speaker discriminability; blind discriminators inadvertently penalize speaker-discriminative traits that merely correlate with language. To address this, we propose Dual-LoRA, injecting trainable task-factorized LoRA adapters into a frozen pre-trained backbone. Our core innovation is a Language-Anchored Adversary: by grounding the discriminator with an explicit language branch, adversarial gradients target true linguistic cues rather than arbitrary correlations, preserving essential speaker characteristics. Evaluated on the TidyVoice benchmark, our system achieves a 0.91% validation EER and achieves 3rd place in the official challenge.


[14] 2604.26330

Optimizing Tracking Accuracy in Energy-Constrained Multimodal ISAC via Lyapunov-Driven Heterogeneous Mixture-of-Experts

The integration of multimodal sensing and millimeter-wave (mmWave) communications is a key enabler for highly mobile vehicle-to-infrastructure (V2I) networks. However, continuous high-resolution visual sensing incurs prohibitive computational energy, while delayed sensing information causes severe beam misalignment. This paper establishes a physics-aware multimodal integrated sensing and communication (M-ISAC) framework that mathematically bridges network-layer queuing delays with physical-layer spatial uncertainty via the semantic age of information (AoI). Guided by this relationship, we aim to strike an optimal trade-off between the tracking posterior Cramer-Rao bound (PCRB) and system energy budgets, we formulate a stochastic mixed-integer non-linear programming (MINLP) problem. Addressing the coupled challenges of temporal computing congestion and non-convex constant modulus constraints, we propose a reinforcement learning (RL) framework empowered by a Lyapunov-driven heterogeneous mixture-of-experts (LD-H-MoE) architecture. By strictly decoupling temporal scheduling and spatial phase mapping into specialized subnetworks, the LD-H-MoE circumvents gradient conflicts prevalent in monolithic multi-task learning. Simulations demonstrate that the proposed LD-H-MoE achieves a highly-effective event-triggered sensing policy, yielding superior tracking accuracy and radio-frequency (RF) resilience while guaranteeing edge computing queue stability and long-term energy budgets.


[15] 2604.26335

Real-Time Minimum-Energy Operating-Point Tracking for Battery-Powered Micro DC Motors Under Dynamically Variable Loading

Micro DC brushed motors are widely deployed in battery-powered biomedical systems, where limited energy budgets and variable physiological loading impose stringent efficiency and safety constraints. However, conventional actuation strategies rely on conservative voltage margins to avoid stalling, leading to systematic energy inefficiency. Furthermore, existing methods primarily optimize steady-state performance, neglecting the energy required to complete individual actuation cycles under dynamic conditions. This paper reveals that the energy consumption per mechanical cycle of a DC motor exhibits a non-monotonic dependence on driving voltage, with a load-dependent minimum that shifts with external loading. Based on this insight, we propose a real-time operating-point tracking method that enables the motor to autonomously converge to its minimum-energy condition. A lightweight load metric derived from current waveform features is introduced to detect load variation, and a two-phase adaptive voltage strategy is developed to track the optimal operating point online. Experimental results demonstrate that the proposed method can track the new minimum-energy operating region under both low-to-high and high-to-low loading transitions. With 3-cycle averaging, the mean response time is 11.55s for the low-to-high case and 11.16s for the high-to-low case, while the mean convergence voltage is 2.73V and 2.0V, respectively.


[16] 2604.26347

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Objective metrics for emotional expressiveness are vital for speech generation, particularly in expressive synthesis and voice conversion requiring emotional prosody transfer. To quantify this, the field widely relies on emotion similarity between reference and generated samples. This approach computes cosine similarity of embeddings from encoders like emotion2vec, assuming they capture affective cues despite linguistic and speaker variations. We challenge this assumption through controlled adversarial tasks and human alignment tests. Despite high classification accuracy, these latent spaces are unsuitable for zero-shot similarity evaluation. Representational limitations cause linguistic and speaker interference to overshadow emotional features, degrading discriminative ability. Consequently, the metric misaligns with human perception. This acoustic vulnerability reveals it rewards acoustic mimicry over genuine emotional synthesis.


[17] 2604.26405

A 21-24 GHz Low-Phase-Noise mmWave VCO with Third-Harmonic Expansion using a Triple-Coupled Transformer based Tank

This work presents the design and analysis of a sixth-order triple-coupled transformer-based tank, enabling third-harmonic expansion for mmWave VCOs. Unlike conventional fourth-order tanks, the proposed tank inherently supports three resonance modes, enabling wideband third-harmonic expansion without additional low-Q switched-capacitor tuning elements. In contrast to conventional class-F23 designs, the proposed VCO removes the head resonator and adopts a noise circulating core to maintain low phase noise with reduced area. Implemented in TSMC 65-nm CMOS, post-layout simulation results demonstrate a 21.03-23.99 GHz (13.5%) tuning range, minimum phase noise of -116.25 dBc/Hz at 1 MHz offset, and peak FoM/FoMT/FoMA of 195.86/198.24/212.31 dBc/Hz while consuming 5.4 mW and occupying 0.02268 mm2.


[18] 2604.26408

A Comparative study on THz Communication Systems: Photonics versus Electronics Approaches

Terahertz (THz) communication has emerged as a key enabler for sixth-generation (6G) networks, offering ultrawide bandwidths to support data-intensive applications such as holographic telepresence and immersive extended reality. Recent advances have enabled both electronics-based and photonics-based THz front-ends, each with distinct advantages and hardware limitations. While electronics-based solutions leverage mature semiconductor platforms, they suffer from amplified oscillator phase noise, frequency offsets, and nonlinearities introduced by multiplier and amplifier chains. Photonics-based systems, in turn, enable highly tunable and spectrally pure carriers but are subject to laser intensity noise, amplified spontaneous emission, shot noise in photomixers, and thermal noise in RF mixers. This article provides a comprehensive review of experimental demonstrations in electronics-, photonics-, and hybrid-based THz links, highlighting their hardware architectures, performance metrics, and implementation trade-offs. We then survey theoretical modeling efforts, emphasizing how hardware impairments affect system reliability and identifying limitations in existing studies. Building on this, we develop comprehensive signal models for both approaches, derive analytical expressions for signal-to-noise ratio (SNR), and evaluate bit error rate (BER) performance under realistic system parameters. Comparative results demonstrate how distinct impairment mechanisms shape the overall link performance of electronics- versus photonics-based THz systems. The insights offered aim to guide the design of robust transceiver architectures and accelerate the integration of THz technologies into future 6G deployments.


[19] 2604.26414

A Novel Reinforcement Learning Based Framework for Scalable MIMO Interference Alignment

Interference alignment (IA) is a widely recognized approach for mitigating inter-cell interference in multi-user multiple-input multiple-output (MIMO) networks. Despite its effectiveness, practical deployment remains constrained by two major challenges, i.e., the need for global channel state information (CSI) at each transmitter and the complexity of deriving closed-form solutions for intricate MIMO systems. This work aims to maximize network throughput by effectively mitigating interference using an IA-inspired learning algorithm that addresses its aforementioned challenges. First, we propose a predictive, transformer-based IA framework that estimates CSI to reduce signaling overhead in small-scale MIMO systems. Next, we formulate the IA problem as a multi-objective optimization problem based on subspace coordination and develop two reinforcement learning-based algorithms to enhance the scalability of IA in large-scale MIMO systems. Simulation results demonstrate that the proposed methods significantly outperform conventional baselines with up to 30% average user throughput gains over the best performing baseline.


[20] 2604.26424

Risk-Aware Multi-Market Scheduling of Virtual Power Plants with Dynamic Network Tariffs

As the penetration of distributed energy resources (DERs) increases, harnessing their flexibility becomes critical for power system operations. Virtual power plants (VPPs) offer a promising solution. However, most existing scheduling tools rely on simplified DER or grid models and largely overlook local flexibility procurement mechanisms such as dynamic network tariffs. This paper proposes a two-stage stochastic optimization framework for VPP multi-market scheduling that integrates detailed device-level constraints, network limitations, and operational and market uncertainties. Conditional value-at-risk is incorporated to represent risk preferences, and Benders decomposition ensures tractability with extensive scenario sets. The model jointly optimizes bidding across energy and reserve markets while explicitly accounting for local flexibility procurement through dynamic network tariffs. The results from a realistic case study show that both risk-neutral and risk-averse strategies exploit arbitrage opportunities. However, risk aversion reduces profit volatility through closer alignment with physical dispatch. Dynamic tariffs unlock local flexibility by shifting demand across the day, though strong tariff signals reduce expected profitability by up to 65% with limited additional flexibility gains.


[21] 2604.26438

CONCERTO : Optimization of readout electronics

The CONCERTO millimeter-wave spectral-imaging instrument was deployed on the Atacama Pathfinder EXperiment (APEX), where it acquired science data between April 2021 and May 2023. The instrument features two focal-plane arrays, each composed of 2400 Microwave Kinetic Inductance Detectors (MKIDs). Each array is divided into six feedlines containing 400 MKIDs each, with each feedline read out by a dedicated FPGA-based board, KID_READOUT. The next-generation instrument aims to double the detector count per feedline, increasing it from 400 to 800 MKIDs. Achieving this requires a substantial scaling of the readout architecture and poses two key challenges for KID_READOUT: maintaining readout signal integrity and constraining firmware resource usage, as a direct upscaling of the existing design would exceed the available FPGA capacity. To overcome these limitations, we developed a Python-based, cycle-and bit-accurate digital twin of the full FPGA digital signal processing chain. This model enabled a detailed investigation of internal signal behavior and provided quantitative guidance for firmware optimization. Leveraging these insights, we identified the source of two spurs present in CONCERTO data and significantly reduced their amplitudes. At the same time, we achieved substantial reductions in firmware resource usage-39.0%pt in LUTs, 20.3%pt in flip-flops, and 28.98%pt in DSP slices-without degrading readout performance. The resulting architecture supports more than 800 MKIDs per feedline on the same hardware platform while preserving readout signal quality, offering a scalable and resource-efficient solution for future high-resolution millimeter-wave astronomical instruments.


[22] 2604.26476

Fuelling fusion plasmas with pellets: Can neuromorphic control outperform Sigma-Delta modulation?

Nuclear fusion is a promising clean energy source in which deuterium and tritium fuse inside a magnetically confined plasma in a tokamak, releasing energy. A key challenge on the route to practical nuclear fusion is the control of the plasma density which has to be done through adding fuel in the form of deuterium and tritium to the plasma. Pellet injection, firing frozen fuel into the plasma, is used to accomplish this. Since the injection of a pellet causes an almost instantaneous increase in particle density compared to the time scales of the plasma dynamics, the problem is of a hybrid nature in which continuous plasma dynamics are interrupted by discrete bursts of particles. In this paper, we propose a formal hybrid model for this fuelling process and we propose a new, neuron-inspired control method that treats pellets much like spikes as in a brain-like system. The neuromorphic controller offers a lightweight solution that naturally fits the hybrid character of pellet fuelling. For comparison, we also develop a hybrid model of sigma-delta modulation, which is used in current tokamaks. For both the neuromorphic controller and the sigma-delta modulation we present formal analysis results for this control problem in nuclear fusion. We derive explicit actuator and controller parameter constraints, key for controller tuning, that lead to practical stability guarantees. Numerical simulations compare the different controller variants and validate the theoretical results.


[23] 2604.26492

Adaptive Transform Coding for Semantic Compression

Visual data compression is shifting from human-centered reconstruction to machine-oriented representation coding. In this setting, an image is often mapped to a compact semantic embedding, which is then compressed and transmitted for downstream inference. We propose an adaptive transform-coding method for semantic-feature compression motivated by the conditional rate-distortion function of a Gaussian mixture model. The scheme uses mode-dependent transforms and quantizers selected according to the inferred source component, enabling more efficient coding of heterogeneous feature distributions. Evaluations on features from widely used vision backbones and foundation models show that the proposed method outperforms or is competitive with state-of-the-art neural compression methods while preserving flexibility and interpretability.


[24] 2604.26532

Hybrid Digital and Microwave Linear Analog Computer (MiLAC)-aided Beamforming for Multiuser MIMO-OFDM Systems

Microwave linear analog computing (MiLAC) has recently emerged as a promising architecture for analog-domain beamforming. In particular, a hybrid digital-MiLAC architecture was proposed and was shown to achieve fully-digital beamforming flexibility in narrowband systems when the number of RF chains equals the number of data streams. However, its performance in wideband systems remains unexplored. This paper presents the first study of hybrid digital-MiLAC beamforming for wideband multi-user multiple-input single-output (MU-MISO) systems. We first characterize the minimum number of radio-frequency (RF) chains required for hybrid digital-MiLAC beamforming to realize an arbitrary set of fully-digital beamforming matrices across all subcarriers. It turns out that, unlike in the narrowband case, a larger number of RF chains is generally required in frequency-selective channels to achieve fully-digital beamforming flexibility, which may be unfavorable in practice. To study the performance of hybrid digital-MiLAC beamforming with a limited number of RF chains, we then formulate the average sum-rate maximization problem and develop an efficient weighted minimum mean-square error (WMMSE)-based algorithm for beamforming design. Simulation results show that hybrid digital-MiLAC beamforming consistently outperforms conventional hybrid digital-analog beamforming, and achieves $89.93\%$ of the fully-digital sum-rate while using only $12.5\%$ of the RF chains in highly frequency-selective channels.


[25] 2604.26552

Cooperative OFDM-ISAC Networks: Performance Analysis and Resource Allocation

Cooperative integrated sensing and communication (ISAC) based on orthogonal frequency-division multiplexing (OFDM) enables network-wide sensing by exploiting the spatial diversity of multi-base-station (BS). This paper studies performance analysis and time-frequency resource allocation for a multi-BS cooperative OFDM-ISAC network with fine-grained resource-element (RE)-level orthogonal coordination. Two fusion architectures are considered: signal-level fusion (SLF), which forwards raw echoes to a fusion center, and parameter-level fusion (PLF), which reports only local delay/Doppler estimates and their uncertainty information. For SLF, we derive the Cramér--Rao bound (CRB) for joint target position and velocity estimation. For PLF, we develop a two-stage CRB-like metric by combining local delay/Doppler uncertainty characterization with first-order geometric error propagation, and show that only an oracle ML-based PLF benchmark can asymptotically attain the SLF CRB under restrictive conditions. Based on these results, we formulate a joint RE-selection and power-allocation problem under network-wide RE exclusivity, per-BS power budgets, a communication sum-rate constraint, and a sidelobe-amplitude constraint on the delay-Doppler ambiguity function. An efficient solution is developed via Schur-complement reformulations and penalty-based alternating optimization. Numerical results validate the analysis, demonstrate effective ambiguity-sidelobe suppression and consistent localization/velocity gains over representative baselines, while revealing geometry-dependent SLF-PLF performance gaps.


[26] 2604.26566

Learning to Route Electric Trucks Under Operational Uncertainty

Electric truck operations require routing decisions that remain feasible under limited battery range, long charging times, travel and energy consumption, and competition for shared charging infrastructure. These features make electric truck routing a coupled logistics and energy problem, limiting the practicality of heuristics-based methods and rendering them computationally infeasible at scale. This paper proposes a learning-based framework for the stochastic electric truck routing under charging constraints and operational uncertainty. The problem, solved by Reinforcement Learning, is formulated as an event-driven semi-Markov decision process with shared charging resources, stochastic travel and energy requirements, and realistic nonlinear fast-charging behavior. To support learning in this setting, a graph-based representation of system state and feasible decisions is introduced, together with a rule-based action mask that restricts policies to operationally admissible actions; thus, improving training efficiency. Building on this formulation, an event-driven simulation environment is developed that supports both Reinforcement Learning and benchmarking against heuristic and mathematical programming baselines. Computational experiments across a range of fleet sizes show that the proposed learning-based algorithm consistently outperforms baselines and attains performance close to optimization benchmarks in many settings, while preserving high success rates under charging congestion and uncertainty.


[27] 2604.26595

Exploring Converter Control Duality in Microgrids: AC Grid-Forming vs DC Droop Control

Power electronic converters are fundamental building blocks of both AC and DC microgrids, enabling the integration of renewable energy sources, energy storage systems, electronic loads, and electric vehicles. In contrast, converter control in DC microgrids has developed along the path of droop control, which is widely adopted for decentralized DC-bus voltage regulation and power sharing. Although these control strategies share certain characteristics, their similarities remain largely unexplored due to the distinct physical domains in which they operate. To bridge this gap, we introduce a novel perspective based on the concept of duality to reveal the underlying isomorphism between the two control approaches. We show that AC grid-forming and DC I--V droop control are duals of each other in several aspects, including: (i) the small-signal model of the converter; (ii) the inner current control structure; (iii) power-sharing mechanisms based on the AC swing equation and DC capacitor power balance; and (iv) disturbance signals and dynamic response. Theoretical analysis, validated through simulations on simple converter setups, illustrates these dualities and provides new insights towards a unified control design.


[28] 2604.26612

CRLB and Parameter Estimation for OFDM-ISAC with Non-Uniform Sparse Resource Allocation

Integrated sensing and communication (ISAC) holds great promise in expanding the applications of wireless communication networks. However, in current communication-centric systems, the time-frequency resources available for sensing may be limited, and also usually non-uniformly and sparsely distributed across the time-frequency domain. Such a non-uniformity destroys the "thumbtack-shaped" ambiguity function of the orthogonal frequency division multiplexing (OFDM) waveform, leading to degraded sensing performance. To this end, this paper explores the parameter estimation algorithm for OFDM-ISAC systems with non-uniform sparse resource allocation. Specifically, for the single target case, we derive the closed-form Cramer-Rao lower bound (CRLB) for parameter estimation as a function of resource indices. Furthermore, we show that simply filling unused resource locations with zeros and applying the classic periodogram estimation is equivalent to maximum likelihood (ML) estimation, which is asymptotically optimal. For the multi-target case, we generate a virtual resource using the autocorrelation function of the original signal, which exhibits a significantly larger virtual bandwidth compared to the original signal, at the cost of higher peak-to-sidelobe ratio (PSLR). Simulation results demonstrate that the proposed approach outperforms the conventional periodogram method for non-uniform sparse resource allocation.


[29] 2604.26618

SEP Analysis of Quantized SIMO Systems with M-PSK over Correlated Fading Channels

The average symbol error probability (SEP) of a phase-quantized single-input multiple-output system with M-ary phase-shift keying modulation and maximum ratio combining (MRC) is analyzed under correlated Rayleigh fading and additive white Gaussian noise. Building on our prior framework for the independent and identically distributed case, we extend the analysis to spatially correlated channels by introducing an asymptotically equivalent MRC combiner that enables tractable SEP characterization. Using this approach, we derive closed-form expressions at high signal-to-noise ratio (SNR) that explicitly characterize the diversity and coding gains as functions of the receive correlation structure, phase-quantization resolution, and modulation order, up to a scaling factor bounded between 1 and 2. The results show that channel correlation primarily degrades the coding gain, leading to an SNR penalty, while the diversity gain is preserved when the channel covariance matrix is full-rank. The analytical findings are validated through Monte Carlo simulations, demonstrating a tight match across a wide SNR range.


[30] 2604.26635

Pinching Antenna-Aided Spatial Multiplexing: Transceiver Design and Performance Analysis

In this paper, a novel pinching antenna-aided spatial multiplexing (PASM) architecture is conceived, which intrinsically amalgamates the benefits of flexible radiating element placement with radio-frequency (RF) chain transmission. Specifically, we leverage the deterministic phase variation along dielectric waveguides as a zero-power phase-control mechanism, where each waveguide fed by a single RF chain drives multiple pinching antennas (PAs) acquiring position-dependent phase shifts. Then, the PASM propagation environment is characterized by a realistic channel model encompassing Rician small-scale fading, correlated shadowing, and large-scale path loss. Based on this, a low-complexity vector approximate message passing (VAMP) detector is conceived, which exploits a waveguide-structured prior for jointly processing the signals associated with all PAs. Moreover, we derive an analytical upper bound on the bit error rate (BER) for the maximum likelihood (ML) detector to quantify the achievable performance limits. Finally, our simulation results demonstrate that the proposed PASM architecture achieves substantial signal-to-noise ratio (SNR) gain over the conventional phase-shifter-aided spatial multiplexing (PSSM), while the VAMP detector strikes an attractive trade-off between the system performance and computational complexity.


[31] 2604.26664

Circular Phase Representation and Geometry-Aware Optimization for Ptychographic Image Reconstruction

Traditional iterative reconstruction methods are accurate but computationally expensive, limiting their use in high-throughput and real-time ptychography. Recent deep learning approaches improve speed, but often predict phase as a Euclidean scalar despite its $2\pi$ periodicity, which can introduce wrapping artifacts, discontinuities at $\pm\pi$, and a mismatch between the loss and the underlying signal geometry. We present a deep learning framework for ptychographic reconstruction that models phase on the unit circle using cosine and sine components. Phase error is optimized with a differentiable geodesic loss, which avoids branch-cut discontinuities and provides bounded gradients. The network further incorporates saturation-aware dual-gain input scaling, parallel encoder branches, and three decoders for amplitude, cosine, and sine prediction, together with a composite loss that promotes circular consistency and structural fidelity. Experiments on synthetic and experimental datasets show consistent improvements in both amplitude and phase reconstruction over existing deep learning methods. Frequency-domain analysis further shows better preservation of mid- and high-frequency phase content. The proposed method also provides substantial speedup over iterative solvers while maintaining physically consistent reconstructions.


[32] 2604.26682

Model-Free Dynamic Mode Adaptive Control for Data-Driven Control Synthesis

This paper presents a model-free, data-driven control synthesis method called dynamic mode adaptive control (DMAC) for systems whose mathematical models are unavailable or unsuitable for classical control design. The proposed approach combines data-driven dynamics approximation with adaptive control synthesis to enable online controller design using measured system data. DMAC comprises two main components: a dynamics-approximation module and a controller-synthesis module. The dynamics approximation module estimates a local linear representation of the system dynamics directly from measurements using a matrix recursive least-squares algorithm with a forgetting factor. The estimated dynamics are then used to compute an online stabilizing controller with full-state feedback and integral action. Theoretical analysis establishes convergence properties of the recursive dynamics approximation and boundedness of the closed-loop system under the DMAC controller. The performance of the proposed method is demonstrated through numerical examples involving representative dynamical systems, including an unstable linear system, the Van der Pol oscillator, and the Burgers' equation. Sensitivity studies further demonstrate the robustness of DMAC with respect to both algorithm hyperparameters and variations in system parameters.


[33] 2604.26759

A New Location Estimator for Mixed LOS & NLOS scenarios

Time-of-arrival (TOA)-based localization in mixed line-of-sight (LOS) and non-line-of-sight (NLOS) environments is challenging because conventional Euclidean range models do not capture diffraction-dominated propagation. We show that the diffraction path-length model smoothly transitions between LOS and diffraction-dominated NLOS conditions, eliminating the need for explicit path classification. Although this model provides a unified geometric description of mixed LOS/NLOS propagation, the resulting 3D maximum-likelihood problem is nonconvex, and a direct Gauss--Newton estimator based on this model can converge to suboptimal local minima. This motivates the development of a class of structure-exploiting estimators. For known target height, the model induces a virtual-anchor representation of the reduced 2D problem, enabling estimators that exhibit a clear complexity--performance tradeoff: surrogate formulations provide structure and computational efficiency, while a semidefinite-relaxation formulation more faithfully preserves the original likelihood at higher cost. Building on this same structure, we develop 3D sample--polish--select estimators that reduce the global search to one dimension, solve the associated fixed-height 2D subproblems, and then apply local nonlinear refinement in 3D. The proposed estimators achieve near-Cramér--Rao lower bound (CRLB) performance with substantially lower complexity than multistart Gauss--Newton, while also being far more robust to initialization than a direct single-start Gauss--Newton estimator.


[34] 2604.26797

Multi-Modal Fiber Sensing for OffshoreEnvironmental and Infrastructure Monitoring

Monitoring a 118 km subsea cable using Distributed acoustic, state-of-polarization, and Brillouin sensing captured storm-induced strain up to $\approx 0.003 \mu\epsilon$ (dynamic) and $\approx 180 \mu\epsilon$ (static), demonstrating consistent yet distinct modal responses to environmental loading.


[35] 2604.26802

A Control Framework for Induced Seismicity Mitigation in Groningen Gas Reservoir

Induced seismicity associated with gas production poses major operational and societal challenges, as illustrated by the Groningen field in the Netherlands. While many studies have focused on forecasting seismicity under prescribed production scenarios, fewer works address the inverse problem: designing operational strategies that minimize seismicity while maintaining production objectives. In this paper, we propose a control-oriented methodology for operating Groningen under induced-seismicity mitigation constraints. We employ a cascade model coupling pore-pressure diffusion with seismicity rate (SR) dynamics, and complement it with a stochastic event-generation procedure to convert the continuous SR field into a synthetic earthquake catalog with event times, locations, and magnitudes. From this catalog, we estimate regional SR measurements and design a robust feedback controller that computes well-rate commands to regulate the SR toward a desired reference while satisfying operational requirements, including prescribed production constraints. The proposed control architecture explicitly accounts for injection and extraction flux limits (actuator saturation). The well fluxes generated by the controller are updated at discrete-time intervals (digital control). We validate the modeling components against Groningen data and illustrate the approach through numerical experiments under different scenarios, including various control update periods and gain selections, as well as combined production with compensating injection (e.g., reinjection of nitrogen). The results illustrate how the proposed framework can reduce seismicity levels in a controlled manner while maximizing production targets.


[36] 2604.26803

PM-EKF: A Physiological Model-Based Extended Kalman Filter for Daily-Life Physical Activity Energy Expenditure Estimation

Monitoring physical activity energy expenditure (PAEE) in daily life is essential for characterizing individual health and metabolic status. Although indirect calorimetry provides gold-standard PAEE measurements, it is impractical for continuous daily-life monitoring. Consequently, wearable sensing approaches using inertial measurement units (IMUs) and heart rate (HR) sensors have attracted substantial interest. However, most existing IMU- and HR-based methods are purely data-driven and offer limited physiological interpretability. In this work, we propose a simplified physiological model that explicitly links body movement during activities of daily living to the underlying metabolic gas-exchange processes governing PAEE. The model is formulated as a nonlinear state-space system and embedded within an Extended Kalman Filter (EKF), enabling principled handling of measurement noise, model uncertainty, and system nonlinearities. The proposed framework provides personalized, interpretable PAEE estimates without employing black-box models. Our model was validated using a dataset, including 9 subjects with around 50 minutes of measurements per subject, collected in our lab simulating a free-living condition. Using the respiratory data measured by COSMED K5 as reference and explained variance (R^2) as evaluation metric, our model's predicted PAEE yielded median (min-max) R^2 = 0.72 (0.60--0.87), using three IMUs (pelvis and two thighs) for capturing the body-center-of-mass motion and measured HR for the time-varying cardiac output. Our model outperformed a linear regression (LR) model (R^2 = 0.52 (0.23--0.92)) and CNN-LSTM model (R^2 = 0.65 (0.46--0.78)) on the same dataset. Notably, excluding the sensory HR measurement did not significantly degrade PAEE estimation of all three models, indicating that IMU-captured mechanical workload dominated PAEE estimation performance in our protocol.


[37] 2604.26863

Spectral Boundary Observer for Counter-Flow Heat Exchangers

We consider a system of two coupled first-order linear hyperbolic partial differential equations modeling heat transport in a counter-flow heat exchanger: one equation describes the transport of a hot fluid, and the other the transport of a cold fluid in the opposite direction. For this system, we design a boundary observer that uses only the temperature of the cold fluid measured at one boundary. Our approach is spectral: by assigning the spectrum of the operator governing the observation error dynamics to a prescribed region within the open left-half complex plane, we can freely tune the convergence rate of the observation error to zero in the $L^2$ norm. The main technical contribution is the proof that spectral stability, that is, the location of the spectrum in the open left-half plane, is equivalent to $L^2$ exponential stability of the origin for the observation error dynamics. This equivalence is established by showing that the operator governing the observation error dynamics satisfies the so-called spectral mapping property.


[38] 2604.26899

Safe Navigation using Neural Radiance Fields via Reachable Sets

Safe navigation in cluttered environments is an important challenge for autonomous systems. Robots navigating through obstacle ridden scenarios need to be able to navigate safely in the presence of obstacles, goals, and ego objects of varying geometries. In this work, reachable set representations of the robot's real-time capabilities in the state space can be utilized to capture safe navigation requirements. While neural radiance fields (NeRFs) are utilized to compute, store, and manipulate the volumetric representations of the obstacles, or ego vehicle, as needed. Constrained optimal control is employed to represent the resulting path planning problem, involving linear matrix inequality constraints. We present simulation results for path planning in the presence of numerous obstacles in two different scenarios. Safe navigation is demonstrated through using reachable sets in the corresponding constrained optimal control problems.


[39] 2604.26903

Recent Advances in mm-Wave and Sub-THz/THz Oscillators for FutureG Technologies

This paper provides a concise yet comprehensive review of recent advancements in millimeter-wave (mm-wave) oscillators below 100 GHz and sub-terahertz (sub-THz/THz) oscillators above 100 GHz for next-generation computing and communication systems, including 5G, 6G, and beyond. Various design approaches, including CMOS, SiGe, and III-V semiconductor technologies, are explored in terms of performance metrics such as phase noise, output power, efficiency, frequency tunability, and stability. The review highlights key challenges in achieving high-performance and reliable oscillator designs while discussing emerging techniques for performance enhancement. By evaluating recent design trends, this work aims to offer valuable insights and design guidelines that facilitate the development of robust mm-wave and sub-THz/THz oscillators for future communication, computing, and sensing applications.


[40] 2604.26924

High Coupling Tunable Acoustic Resonators in Monolithic Barium Titanate

The growing number of wireless communication bands has driven demand for compact, low-loss, and frequency adjustable RF filtering. Tunable acoustic resonators are well suited to address these needs, offering a path toward reconfigurable front ends with reduced component count. In this work, we extend upon previous conference results to investigate epitaxial barium titanate (BTO) grown on silicon as a platform for tunable acoustic resonators. We demonstrate lateral excitation of symmetric Lamb (S0) modes in 120 nm X-cut BTO membranes using a multi-cell electrode architecture that simultaneously achieves high electromechanical coupling and practical impedance levels. Devices are fabricated with laterally patterned electrodes on released BTO membranes. Under applied DC bias, ferroelectric domains align, allowing electrical excitation, frequency tuning, and quality-factor enhancement of acoustic modes. The primary resonance near 700 MHz exhibits a Bode quality factor of 175, electromechanical coupling up to 25.1%, and series and parallel resonance tunability of 2.3% and 5.6%, respectively. Voltage-dependent material parameters, including permittivity, stiffness, and piezoelectric coefficients, are extracted through a combination of modified Butterworth-Van Dyke modeling and finite-element simulation to explain the observed trends. These results highlight monolithic BTO on silicon as a promising material system for laterally excited, tunable acoustic resonators for reconfigurable RF applications.


[41] 2604.26948

Optimizing Dynamic Metasurface Antenna Configurations for Direction-of-Arrival and Polarization Estimation Using an Experimentally Calibrated Multiport-Network Model

Sensing the direction of arrival and polarization of impinging signals is a key prerequisite for beamforming and interference mitigation in modern wireless communication systems. Dynamic metasurface antennas (DMAs) can multiplex direction- and polarization-dependent field information onto a single detector by sequentially switching between programmable configurations. This makes DMAs attractive for joint direction-of-arrival and polarization (DoA-P) estimation with a single radio-frequency chain. Experimental demonstrations have so far relied on random pre-measured configuration sequences because optimizing the configurations requires an accurate forward model of the fabricated DMA. Here, we use an experimentally calibrated model based on multiport-network theory (MNT) to optimize DMA configuration sequences for DoA-P estimation. Our experimentally calibrated MNT model predicts the dual-polarized far-field response of our 96-element DMA for arbitrary admissible configurations, enabling model-based optimization without additional radiation-pattern measurements. We optimize sequences using effective-rank-based surrogate objectives and compare them with random sequences as a function of the sequence length and the noise level. The optimized sequences yield the largest gains in the intermediate-SNR and intermediate-sequence-length regime, where the inverse problem is neither noise-limited nor already solved by random diversity. We also tackle a dual-source scenario involving a jammer and a desired transmitter. Our results illustrate some of the potential in the context of jamming-resilient communications that is unlocked by experimentally calibrated MNT models for fabricated DMAs.


[42] 2604.25936

SAND: Spatially Adaptive Network Depth for Fast Sampling of Neural Implicit Surfaces

Implicit neural representations are powerful for geometric modeling, but their practical use is often limited by the high computational cost of network evaluations. We observe that implicit representations require progressively lower accuracy as query points move farther from the target surface, and that even within the same iso-surface, representation difficulty varies spatially with local geometric complexity. However, conventional neural implicit models evaluate all query points with the same network depth and computational cost, ignoring this spatial variation and thereby incurring substantial computational waste. Motivated by this observation, we propose an efficient neural implicit geometry representation framework with spatially adaptive network depth (SAND). SAND leverages a volumetric network-depth map together with a tailed multi-layer perceptron (T-MLP) to model implicit representation. The volumetric depth map records, for each spatial region, the network depth required to achieve sufficient accuracy, while the T-MLP is a modified MLP designed to learn implicit functions such as signed distance functions, where an output branch, referred to as a tail, is attached to each hidden layer. This design allows network evaluation to terminate adaptively without traversing the full network and directs computational resources to geometrically important and complex regions, improving efficiency while preserving high-fidelity representations. Extensive experimental results demonstrate that our approach can significantly improve the inference-time query speed of implicit neural representations.


[43] 2604.25938

Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning Model

Speech Emotion Recognition (SER) is the use of machines to detect the emotional state of humans based on the speech, which is gaining importance in natural human-computer interaction. Speech is a very valuable source of information, as emotions modify the patterns of speech; pitch, energy and even timing. Nonetheless, SER is not an easy task because speakers are not constant, and situations vary when recording and the sound similarity between specific feelings. In this work, the author introduces a speech emotion recognition system relying on the Mel-Frequency Cepstral Coefficient and Long Short-Term Memory (LSTM) neural network, as a feature extraction method. The Toronto Emotional Speech Set (TESS) speech signal was pre-processed, and transformed into MFCC features to understand the important aspects in terms of time. The resultant features were then introduced to LSTM model, which is able to learn long term features of sequential audio data. The trained model was measured over several emotion classes occurring in the dataset. As seen in the results of experiments, the proposed MFCC-LSTM approach succeeds in capturing the patterns of emotions in speech and provides highly realistic classifications in all the chosen emotion classifications. This study presents a speech emotion recognition system using Mel-Frequency Cepstral Coefficients (MFCCs) as features and a deep learning LSTM classifier. A Support Vector Machine (SVM) with an RBF kernel served as a classical baseline, achieving 98% accuracy, against which the proposed LSTM model, achieving 99% accuracy, was validated. Overall, it is possible to confirm that LSTM-based architectures can be used to address the task of speech emotion recognition. Actual applications of the proposed system may be virtual assistants and mental health surveillance.


[44] 2604.26073

Privacy-Preserving Federated Learning Framework for Distributed Chemical Process Optimization

Industrial chemical plants often operate under strict data confidentiality constraints, making centralized data-driven process modeling difficult. Federated learning (FL) provides a promising solution by enabling collaborative model training across distributed facilities without sharing raw operational data. This paper proposes a privacy-preserving federated learning framework for distributed chemical process optimization using data collected from multiple geographically separated plants. Each plant locally trains a neural-network-based process model using its own time-series sensor data, while only model parameters are transmitted to a central aggregation server through secure aggregation mechanisms. This design allows cross-plant knowledge sharing while maintaining strict data locality and industrial confidentiality. Experimental evaluation was conducted using process datasets from three independent chemical plants operating under heterogeneous conditions. The results demonstrate rapid convergence of the federated model, with the global mean squared error decreasing from approximately 2369 to below 50 within the first five communication rounds and stabilizing around 35 after 40 rounds. In comparison with local-only training, the proposed federated framework significantly improves prediction accuracy across all plants, while achieving performance comparable to centralized training. The findings indicate that federated learning provides an effective and scalable solution for collaborative industrial analytics, enabling privacy-preserving predictive modeling and process optimization across distributed chemical production facilities.


[45] 2604.26223

StreamGuard: Exploring a 5G Architecture for Efficient, Quality of Experience-Aware Video Conferencing

Video conferencing over 5G is increasingly prevalent, yet its Quality of Experience (QoE) often degrades under limited radio resources. This has two causes: 5G networks must serve many users, while interactive traffic requires careful handling. Motivated by the insight that different subflows within an interactive session have a disproportionate effect on QoE, we present the design and implementation of StreamGuard, a practical 5G architecture for subflow-level, QoE-aware prioritization. StreamGuard forms a closed control loop with three components: (1) a monitor in the Radio Access Network (RAN) that uses deep packet inspection to infer QoE and RAN state, (2) a controller that selects prioritization actions to balance QoE and fairness, and (3) a marking module that applies these decisions by marking packets to steer subflows into appropriate priority queues. StreamGuard further shapes application behaviors via mechanisms including selective subflow dropping and probe-based rate control, to align application behavior with radio constraints. Implemented in a real 5G testbed, StreamGuard achieves a superior QoE-fairness tradeoff compared to vanilla 5G and prior state-of-the-art approaches, improving QoE by up to 70% at comparable background throughput or preserving up to 2x higher background throughput at similar QoE.


[46] 2604.26242

Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech

Digital biomarkers for depression have largely relied on static acoustic descriptors, pooled summary statistics, or conventional machine learning representations. Such approaches may miss nonlinear temporal organization embedded in conversational vocal dynamics. We hypothesized that depression is associated with altered recurrence structure in vocal state trajectories, reflecting changes in how the vocal system revisits acoustic states over time. Using the depression subset of the DAIC-WOZ corpus with 142 labeled participants, we modeled frame-level COVAREP trajectories as nonlinear dynamical systems and derived recurrence-based biomarkers from 74 vocal channels. Logistic regression with feature selection and stratified cross-validation evaluated classification performance. Recurrence-based biomarkers achieved a mean cross-validated AUC of 0.689, exceeding static acoustic baselines, entropy-dynamics features, Hurst exponent features, determinism features, and Lyapunov-like instability proxies. Permutation testing indicated statistical significance with $p=0.004$. Pooled cross-validated predictions yielded AUC 0.665 with a 95\% bootstrap confidence interval of [0.568, 0.758]. These findings suggest that depression may be characterized by altered recurrence structure in conversational vocal dynamics and support nonlinear state-space analysis as a promising direction for digital psychiatric biomarkers.


[47] 2604.26282

Rethinking Mutual Coupling in Movable Antenna MIMO Systems: Modeling and Optimization

Movable antennas (MAs) have attracted growing interest for their ability to improve channel conditions via adaptive antenna movement. Nevertheless, such movement inevitably introduces mutual coupling (MC), whose impact has been largely overlooked in existing MA literature. In this paper, we show that MC is not merely an unavoidable electromagnetic effect, but also a new source of capacity gains in MA-enabled multiple-input multiple-output (MIMO) systems. To leverage MC effects, we develop an optimization framework for both narrowband and wideband systems based on a rigorous circuit-theoretic model. For narrowband systems, capacity maximization is formulated as a non-convex optimization problem, which is solved via a block coordinate ascent (BCA) framework. Because optimizing MA positions is challenging due to analytically intractable MC matrices, we develop a trust region method (TRM)-based algorithm that utilizes Sylvester equations to compute the derivatives of the inverse square roots of the MC matrices. We further consider wideband systems and formulate a sum-rate maximization problem. To find a unified set of MA positions that balances varying subcarrier conditions, the BCA framework and the TRM-based MA position optimization algorithm are extended to wideband systems. Simulation results demonstrate that exploiting MC effects in MA-MIMO systems yields significant performance gains in both narrowband and wideband systems under various channel conditions. These gains highlight the benefits of MC-induced superdirectivity and designable MC matrices.


[48] 2604.26339

Can Cross-Layer Design Bridge Security and Efficiency? A Robust Authentication Framework for Healthcare Information Exchange Systems

As healthcare systems become increasingly interconnected, ensuring secure and continuous device authentication in health information exchange (HIE) networks is critical to safeguarding patient data and clinical operations. In this context, this paper proposes a novel cross-layer authentication scheme for HIE networks that integrates cryptographic mechanisms with physical (PHY) layer-based authentication to ensure reliable communication while minimizing computational and communication overheads. The initial authentication phase leverages a traditional public key infrastructure (PKI)-based approach, employing elliptic curve cryptography (ECC) and digital certificates to verify the legitimacy of communicating devices. Simultaneously, it extracts unique hardware-level features such as carrier frequency offset (CFO) and quadrature skewness from the devices. These features are then used to train a machine learning (ML) model during an offline phase managed by a regional centralized authority (RCA). For re-authentication, the system re-extracts these PHY-layer features from incoming orthogonal frequency division multiplexing (OFDM) symbols and verifies the device identity in real-time using the trained ML classifier. This cross-layer strategy enables continuous, lightweight identity verification without the need to exchange and validate cryptographic signatures for each message, thereby reducing system overhead. The proposed scheme further enhances privacy through the use of encrypted, frequently refreshed pseudo-identities, ensuring unlinkability and resistance to identity tracking. A formal security analysis using Burrows-Abadi-Needham (BAN) logic demonstrates the scheme's robustness against various threats, including impersonation, man-in-the-middle (MitM), replay, and Sybil attacks.


[49] 2604.26384

Asset Administration Shell-Based OCL Validation Framework for Model-Based System Engineering

Increasing complexity of modern enterprise systems and the demand for automation and interoperability require consistent and semantically validated models in Model-Based Systems Engineering (MBSE). The Object Constraint Language (OCL) supports formal definition of such constraint validations. However, MBSE models and OCL constraints are typically managed in separate tools, causing manual effort during model constraint application and result interpretation. To address this gap, this paper proposes an approach to managing OCL constraints and their validation results through Asset Administration Shells (a well-established technology for interoperability in enterprise systems). The methodology is demonstrated through a fictional industrial scenario, and to support reproducibility, all artifacts are publicly available in a GitHub repository.


[50] 2604.26527

Persona-Based Process Design for Assistive Human-Robot Workplaces for Persons with Disabilities

Human-robot interaction is emerging as an important paradigm for integrating persons with disabilities into the workplace. While these systems can enable individuals to work, their design is mostly personalized, hindering widespread use beyond the individual user. The universal design paradigm is a central pillar of inclusive design, describing usability of systems by all. To incorporate universal design into process design for human-robot workplaces expert knowledge is required that is often not available. To simplify process design of human-robot workplaces, we propose a persona-based design approach. First, typical impairments prevalent in the workforce or particularly relevant for the processes are abstracted into personas with disabilities. The work process is subdivided into sequential actions. For each action and persona, strategies are developed to reach the action goal by a design thinking approach. The resulting actions are ordered by level of robot assistance, i.e. robot involvement, and implemented in a behavior tree. Therefore, the macro-behavior of the workplace may adapt to individual personas online. We demonstrate the method in a collaborative box folding process with a total of seven personas with disabilities. The persona-based process design shows promising results by generating more comprehensive process strategies while enabling adaptive behavior in the sense of universal design.


[51] 2604.26741

Analytically Characterized Optimal Power Control for Signal-Level-Integrated Sensing, Computing and Communication in Federated Learning

In the Internet-of-Things (IoT) era, efficient functionality integration is essential to address the growing demands of communication, computation, and sensing. Signal-level integrated sensing, computing, and communication (Sig-ISCC) is envisioned, where a single waveform simultaneously supports sensing, computing and communication via over-the-air computation (AirComp). Meanwhile, federated learning (FL) is widely regarded as a promising distributed machine learning framework that enables network intelligence in a privacy-preserving and secure manner, and exhibits strong synergy with AirComp, which alleviates the communication bottleneck of FL. In this paper, we study uplink Sig-ISCC design for AirComp-FL with joint target detection. We formulate the joint power and receive-scaling control problem, where edge devices' transmitted signals should serve both sensing and AirComp purposes. The goal is to minimize the AirComp aggregation distortion subject to a joint target-detection requirement. Although the resulting problem is non-convex in the original variables, we show that it admits an equivalent convex reformulation after a suitable variable transformation. By exploiting analytical optimality properties, we develop a robust, optimal, and polynomial-time-complexity algorithm that efficiently achieves the optimal transmit powers and receive scaling factor. Simulation results validate the optimality and numerical robustness of the proposed algorithm and show its superior FL performance compared to baseline methods.


[52] 2604.26778

Input Distribution Design for Ranging-Oriented OFDM-ISAC Systems Under Frequency-Selective Fading

The implementation of the \ac{isac} feature in \ac{6g} networks is most likely to be based on the framework of \ac{ofdm}. Input distribution design, or constellation design, is a crucial technique in \ac{ofdm}-\ac{isac} systems enabling a favorable balance between communication rate and sensing performance. In this treatise, we propose a computationally efficient input distribution design approach for \ac{ofdm}-\ac{isac} under frequency-selective channels, following the theoretical framework of capacity distortion. We highlight that under practical sensing constraints, the optimal strategy is to treat the kurtosis of constellations as a resource, and allocate it appropriately over subcarriers.


[53] 2604.26787

Hankel and Toeplitz Rank-1 Decomposition of Arbitrary Matrices with Applications to Signal Direction-of-Arrival Estimation

We consider the problems of computing the optimal rank-$1$ Hankel and Toeplitz-structured approximation of arbitrary matrices under $L_2$ and $L_1$-norm error. Such problems arise naturally in engineered systems, including the basic few-shot signal Direction-of-Arrival (DoA) estimation problem that is of importance to modern autonomous systems applications. We develop accurate and computationally efficient structured matrix decomposition algorithms for both formulations and then derive analytically grounded small-sample-support DoA estimators for practical sensing system deployments. The resulting estimators under the $L_2$ and $L_1$ norms are formally shown to be maximum-likelihood optimal under white Gaussian and Laplace noise, respectively. The estimators are further validated through extensive simulation studies and real-world data experiments in few-shot DoA inference.


[54] 2604.26793

Super-resolution Multi-signal Direction-of-Arrival Estimation by Hankel-structured Sensing and Decomposition

Motivated by sensing modalities in modern autonomous systems that involve hardware-constrained spatial sampling over large arrays with limited coherence time, we develop a novel framework for rapid super-resolution multi-signal direction-of-arrival (DoA) estimation based on Hankel-structured sensing and data matrix decomposition of arbitrary rank, under both the $L_2$ and $L_1$-norm formulation. The resulting $L_2$-norm estimator is shown to be maximum-likelihood optimal in white Gaussian noise. The $L_1$-norm estimator is shown to be maximum-likelihood optimal in independent, identically distributed (i.i.d.) isotropic Laplace noise, offering broad robustness to impulsive interference and corrupted measurements commonly encountered in practice. Extensive simulations demonstrate that the proposed methods exhibit powerful super-resolution capabilities, requiring significantly lower SNR and achieving substantially higher resolution probability than recent competing approaches.


[55] 2604.26836

Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics

Predictive safety filters (PSFs) leverage model predictive control to enforce constraint satisfaction during deep reinforcement learning (RL) exploration, yet their reliance on first-principles models or Gaussian processes limits scalability and broader applicability. Meanwhile, model-based RL (MBRL) methods routinely employ probabilistic ensemble (PE) neural networks to capture complex, high-dimensional dynamics from data with minimal prior knowledge. However, existing attempts to integrate PEs into PSFs lack rigorous uncertainty quantification. We introduce the Uncertainty-Aware Predictive Safety Filter (UPSi), a PSF that provides rigorous safety predictions using PE dynamics models by formulating future outcomes as reachable sets. UPSi introduces an explicit certainty constraint that prevents model exploitation and integrates seamlessly into common MBRL frameworks. We evaluate UPSi within Dyna-style MBRL on standard safe RL benchmarks and report substantial improvements in exploration safety over prior neural network PSFs while maintaining performance on par with standard MBRL. UPSi bridges the gap between the scalability and generality of modern MBRL and the safety guarantees of predictive safety filters.


[56] 2604.26857

Edge AI for Automotive Vulnerable Road User Safety: Deployable Detection via Knowledge Distillation

Deploying accurate object detection for Vulnerable Road User (VRU) safety on edge hardware requires balancing model capacity against computational constraints. Large models achieve high accuracy but fail under INT8 quantization required for edge deployment, while small models sacrifice detection performance. This paper presents a knowledge distillation (KD) framework that trains a compact YOLOv8-S student (11.2M parameters) to mimic a YOLOv8-L teacher (43.7M parameters), achieving 3.9x compression while preserving quantization robustness. We evaluate on full-scale BDD100K (70K training images) with Post-Training Quantization to INT8. The teacher suffers catastrophic degradation under INT8 (-23% mAP), while the KD student retains accuracy (-5.6% mAP). Analysis reveals that KD transfers precision calibration rather than raw detection capacity: the KD student achieves 0.748 precision versus 0.653 for direct training at INT8, a 14.5% gain at equivalent recall, reducing false alarms by 44% versus the collapsed teacher. At INT8, the KD student exceeds the teacher's FP32 precision (0.748 vs. 0.718) in a model 3.9x smaller. These findings establish knowledge distillation as a requirement for deploying accurate, safety-critical VRU detection on edge hardware.


[57] 2604.26897

Stochastic Entanglement of Deterministic Origami Tentacles For Universal Robotic Gripping

Origami-inspired robotic grippers have shown promising potential for object manipulation tasks due to their compact volume and mechanical flexibility. However, robust capture of objects with random shapes in dynamic working environments often comes at the cost of additional actuation channels and control complexity. Here, we introduce a tendon-driven origami tentacle gripper capable of universal object gripping by exploiting a synergy between local, deterministic deformation programming and global, stochastic entanglements. Each origami tentacle is made by cutting thin Mylar sheets; It features carefully placed holes for routing an actuation tendon, origami creases for controlling the deformation, and a tapered shape. By tailoring these design features, one can prescribe the shrinking, bending, and twisting deformation, eventually creating deterministic coiling with a simple tendon pull. Then, when multiple coiling tentacles are placed in proximity, stochastic entanglement emerges, allowing the tentacles to braid, knot, and grip objects with random shapes. We derived a simulation model by integrating origami mechanics with Cosserat rods to correlate origami design, tendon deformation, and their collective gripping performance. Then, we experimentally tested how these coiling and entangling origami tentacles can grasp objects under gravity and in water. A stow-and-release deployment mechanism was also tested to simulate in-orbit grasping. Overall, the entertaining origami tentacle gripper presents a new strategy for robust object grasping with simple design and actuation.


[58] 2503.02332

COMMA: Coordinate-aware Modulated Mamba Network for 3D Dispersed Vessel Segmentation

Accurate segmentation of 3D vascular structures is essential for various medical imaging applications. The dispersed nature of vascular structures leads to inherent spatial uncertainty and necessitates location awareness, yet most current 3D medical segmentation models rely on the patch-wise training strategy that usually loses this spatial context. In this study, we introduce the Coordinate-aware Modulated Mamba Network (COMMA) and contribute a manually labeled dataset of 570 cases, the largest publicly available 3D vessel dataset to date. COMMA leverages both entire and cropped patch data through global and local branches, ensuring robust and efficient spatial location awareness. Specifically, COMMA employs a channel-compressed Mamba (ccMamba) block to encode entire image data, capturing long-range dependencies while optimizing computational costs. Additionally, we propose a coordinate-aware modulated (CaM) block to enhance interactions between the global and local branches, allowing the local branch to better perceive spatial information. We evaluate COMMA on six datasets, covering two imaging modalities and five types of vascular tissues. The results demonstrate COMMA's superior performance compared to state-of-the-art methods with computational efficiency, especially in segmenting small vessels. Ablation studies further highlight the importance of our proposed modules and spatial information. The code and data will be open source at this https URL.


[59] 2503.23818

L2RU: a Structured State Space Model with prescribed L2-bound

Structured state-space models (SSMs) have recently emerged as a powerful architecture at the intersection of machine learning and control, featuring layers composed of discrete-time linear time-invariant (LTI) systems followed by pointwise nonlinearities. These models combine the expressiveness of deep neural networks with the interpretability and inductive bias of dynamical systems, offering strong performance on long-sequence tasks with favorable computational complexity. However, their adoption in applications such as system identification and optimal control remains limited by the difficulty of enforcing stability and robustness in a principled and tractable manner. We introduce L2RU, a class of SSMs endowed with a prescribed $\mathcal{L}_2$-gain bound, guaranteeing input--output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an $\mathcal{L}_2$ constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given $\mathcal{L}_2$-bound, and a conservative formulation that extends the approach to general (possibly non-square) systems while improving computational efficiency through a structured representation of the system matrices. Both parametrizations admit efficient initialization schemes that facilitate training long-memory models. We demonstrate the effectiveness of the proposed framework on a nonlinear system identification benchmark, where L2RU achieves improved performance and training stability compared to existing SSM architectures, highlighting its potential as a principled and robust building block for learning and control.


[60] 2504.03963

FMCW Radar Interference Mitigation based on the Fractional Fourier Transform

In this paper, we propose a novel method for frequency modulated continuous wave (FMCW) radar mutual interference mitigation (IM) based on the discrete fractional Fourier transform (DFrFT). Interference chirps are detected and mitigated by compression and zeroing in the fractional domain. We provide an efficient implementation that can deal with multiple interferers, where we perform consecutive DFrFTs utilizing its angle-additivity property. For that purpose, we generalize and reduce the computational complexity of the multi-angle centered discrete fractional Fourier transform [1]. Our algorithm is designed to be simple and fast such that it can be implemented in hardware. We evaluate our algorithm on a synthetic I/Q-modulated dataset and outperform reference methods in terms of the mean squared error, signal-to-interference-plus-noise ratio, error vector magnitude, true positive rate, false alarm rate and F1-score.


[61] 2504.08352

Fast Reconfiguration of Liquid Crystal-RISs: Modeling and Algorithm Design

LC technology is a promising hardware solution for realizing extremely large RISs due to its advantages in cost-effectiveness, scalability, energy efficiency, and continuous phase shift tunability. However, the slow response time of the LC cells, especially in comparison to the silicon-based alternatives like radio frequency switches and PIN diodes, limits the performance. This limitation becomes particularly relevant in TDMA applications where RIS must sequentially serve users in different locations, as the phase-shifting response time of LC cells can constrain system performance. This paper addresses the slow phase-shifting limitation of LC by developing a physics-based model for the time response of an LC unit cell and proposing a novel phase-shift design framework to reduce the transition time. Specifically, exploiting the fact that LC-RIS at milimeter wave bands have a large electric aperture, we optimize the LC phase shifts based on user locations, eliminating the need for full channel state information and minimizing reconfiguration overhead. Moreover, instead of focusing on a single point, the RIS phase shifters are designed to optimize coverage over an area. This enhances communication reliability for mobile users and mitigates performance degradation due to user location estimation errors. The proposed design minimizes the transition time between configurations, a critical requirement for TDMA schemes. Our analysis reveals that the impact of RIS reconfiguration time on system throughput becomes particularly significant when TDMA intervals are comparable to the reconfiguration time. In such scenarios, optimizing the phase-shift design helps mitigate performance degradation while ensuring specific QoS requirements. Moreover, the proposed algorithm has been tested through experimental evaluations, which demonstrate that it also performs effectively in practice.


[62] 2507.02262

Localized kernel method for separation of linear chirps

The task of separating a superposition of signals into its individual components is a common challenge encountered in various signal processing applications, especially in domains such as audio and radar signals. A previous paper by Chui and Mhaskar proposes a method called Signal Separation Operator (SSO) to find the instantaneous frequencies and amplitudes of such superpositions where both of these change continuously and slowly over time. In this paper, we amplify and modify this method in order to separate chirp signals in the presence of crossovers, a very low SNR, and discontinuities. We give a theoretical analysis of the behavior of SSO in the presence of noise to examine the relationship between the minimal separation, minimal amplitude, SNR, and sampling frequency. Our method is illustrated with a few examples, and numerical results are reported on a simulated dataset comprising 7 simulated signals.


[63] 2509.21382

Multi-Speaker DOA Estimation in Binaural Hearing Aids using Deep Learning and Speaker Count Fusion

For extracting a target speaker voice, direction-of-arrival (DOA) estimation is crucial for binaural hearing aids operating in noisy, multi-speaker environments. Among the solutions developed for this task, a deep learning convolutional recurrent neural network (CRNN) model leveraging spectral phase differences and magnitude ratios between microphone signals is a popular option. In this paper, we explore adding source-count information for multi-sources DOA estimation. The use of dual-task training with joint multi-sources DOA estimation and source counting is first considered. We then consider using the source count as an auxiliary feature in a standalone DOA estimation system, where the number of active sources (0, 1, or 2+) is integrated into the CRNN architecture through early, mid, and late fusion strategies. Experiments using real binaural recordings are performed. Results show that the dual-task training does not improve DOA estimation performance, although it benefits source-count prediction. However, a ground-truth (oracle) source count used as an auxiliary feature significantly enhances standalone DOA estimation performance, with late fusion yielding up to 14% higher average F1-scores over the baseline CRNN. This highlights the potential of using source-count estimation for robust DOA estimation in binaural hearing aids.


[64] 2509.24959

Coordinated vs. Sequential Transmission Planning

Coordinated planning of generation, storage, and transmission more accurately captures the interactions among these three capacity types necessary to meet electricity demand, at least in theory. However, in practice, U.S. system operators typically follow a sequential planning approach: They first determine future generation and storage additions based on an assumed unconstrained (`copper plate') system. Next, they perform dispatch simulations of this projected generation and storage capacity mix on the existing transmission grid to identify transmission constraint violations. These violations indicate the need for transmission upgrades. We describe a multistage, multi-locational planning model that co-optimizes generation, storage, and transmission investments. The model respects reliability constraints as well as state energy and climate policies. We test the two planning approaches using a current stakeholder-informed 20-zone model of the PJM region, developed for the current FERC Order No. 1920 compliance filing process. In our most conservative model specification, we find that the co-optimized approach estimates 67% lower transmission upgrade needs than the sequential model, leading to total system costs that are .6% lower and similar reliability and climate outcomes. Our sensitivities show larger transmission and cost savings and reliability and climate benefits from co-optimized planning.


[65] 2511.01780

On Systematic Performance of 3-D Holographic MIMO: Clarke, Kronecker, and 3GPP Models

Holographic multiple-input multiple-output (MIMO) has emerged as a key enabler for 6G networks, yet conventional planar implementations suffer from spatial correlation and mutual coupling at sub-wavelength spacing, which fundamentally limit the effective degrees of freedom (EDOF) and channel capacity. Three-dimensional (3-D) holographic MIMO offers a pathway to overcome these constraints by exploiting volumetric array configurations that enlarge the effective aperture and unlock additional spatial modes. This work presents the first systematic evaluation that jointly incorporates electromagnetic (EM) characteristics, such as mutual coupling and radiation efficiency, into the analysis of 3-D arrays under Clarke, Kronecker, and standardized 3rd Generation Partnership Project (3GPP) channel models. Analytical derivations and full-wave simulations demonstrate that 3-D architectures achieve higher EDOF, narrower beamwidths, and notable capacity improvements compared with planar baselines. In 3GPP urban macro channels with horizontal element spacing of 0.3 lambda, 3-D configurations yield approximately 20% capacity improvement over conventional 2-D arrays, confirming the robustness and scalability of volumetric designs under realistic conditions. These findings bridge the gap between theoretical feasibility and practical deployment, offering design guidance for next-generation 6G base station arrays.


[66] 2511.03678

A Constant-Gain Equation-Error Framework for Airliner Aerodynamic Monitoring Using QAR Data

Monitoring the in-service aerodynamic performance of airliners is critical for operational efficiency and safety, but using operational Quick Access Recorder (QAR) data for this purpose presents significant challenges. This paper first establishes that the absence of key parameters, particularly aircraft moments of inertia, makes conventional state-propagation filters fundamentally unsuitable for this application. This limitation necessitates a decoupled, Equation-Error Method (EEM). However, we then demonstrate through a comparative analysis that standard recursive estimators with time-varying gains, such as Recursive Least Squares (RLS), also fail within an EEM framework, exhibiting premature convergence or instability when applied to low-excitation cruise data. To overcome these dual challenges, we propose and validate the Constant-Gain Equation-Error Method (CG-EEM). This framework employs a custom estimator with a constant, Kalman-like gain, which is perfectly suited to the stationary, low-signal-to-noise characteristics of cruise flight. The CG-EEM is extensively validated on a large, multi-fleet dataset of over 200 flights, where it produces highly consistent, physically plausible aerodynamic parameters and correctly identifies known performance differences between aircraft types. The result is a robust, scalable, and computationally efficient tool for fleet-wide performance monitoring and the early detection of performance degradation.


[67] 2602.01537

LMI Optimization Based Multirate Steady-State Kalman Filter Design

This paper presents an LMI-based design framework for multirate steady-state Kalman filters in systems with sensors operating at different sampling rates. The multirate system is formulated as a periodic time-varying system, where the Kalman gains converge to periodic steady-state values that repeat every frame period. Cyclic reformulation transforms this into a time-invariant problem; however, the resulting measurement noise covariance becomes semidefinite rather than positive definite, preventing direct application of standard Riccati equation methods. I address this through a dual LQR formulation with LMI optimization that naturally handles semidefinite covariances. The framework enables multi-objective design, supporting pole placement for guaranteed convergence rates and $l_2$-induced norm constraints for balancing average and worst-case performance. Numerical validation using an automotive navigation system with GPS and wheel speed sensors, including Monte Carlo simulation with 500 independent noise realizations, demonstrates that the proposed filter achieves a position RMSE well below the GPS noise level through effective multirate sensor fusion, and that the LMI solution provides valid upper bounds on the estimation error covariance.


[68] 2603.03664

Principled Learning-to-Communicate with Quasi-Classical Information Structures

Learning-to-communicate (LTC) in partially observable environments has received increasing attention in deep multi-agent reinforcement learning, where the control and communication strategies are jointly learned. Meanwhile, the impact of communication on decision-making has been extensively studied in control theory. In this paper, we seek to formalize and better understand LTC by bridging these two lines of work, through the lens of information structures (ISs). To this end, we formalize LTC in decentralized partially observable Markov decision processes (Dec-POMDPs) under the common-information-based framework from decentralized stochastic control, and classify LTC problems based on the ISs before (additional) information sharing. We first show that non-classical LTCs are computationally intractable in general, and thus focus on quasi-classical (QC) LTCs. We then propose a series of conditions for QC LTCs, under which LTC preserves the QC IS after information sharing, whereas violating them can cause computational hardness in general. Further, we develop provable planning and learning algorithms for QC LTCs, and establish quasi-polynomial time and sample complexities for several QC LTC examples that satisfy the above conditions. Along the way, we also establish new results on a relationship between (strictly) QC IS and the condition of having strategy-independent common-information-based beliefs (SI-CIBs), as well as on solving Dec-POMDPs without computationally intractable oracles but beyond those with SI-CIBs, which may be of independent interest.


[69] 2604.11380

End-to-end differentiable network traffic simulation with dynamic route choice

Optimization using network traffic models requires computing gradients of objective functions with respect to model parameters. However, derivation of such gradients has often been considered difficult or impractical due to their complexity and size. Conventional approaches rely on numerical differentiation or derivative-free methods that do not scale well with the parameter dimension, or on adjoint methods that require manual derivation for each specific model. This study proposes a novel end-to-end differentiable network traffic flow simulator based on automatic differentiation (AD), employing the Link Transmission Model (LTM) and a Dynamic User Optimum (DUO) route choice model. The LTM operates on continuous aggregate state variables through piecewise-linear min/max operations, which admit subgradients almost everywhere and thus require no smooth relaxation for AD. The DUO is also suitable for AD: although the shortest path search is itself discrete, the resulting diverge ratios at each node are continuous functions of per-destination vehicle counts and are thus differentiable. In order to demonstrate the capability of the proposed model, we solved a dynamic congestion toll optimization problem on the Chicago-Sketch dataset with approximately 2500 links, 1 million vehicles, a 3-hour duration, and 15000 decision variables. The proposed model successfully derived a high-quality solution in 3000 iterations, taking about 40 minutes. The simulator, implemented in Python and JAX, is released as open-source software named UNsim (this https URL).


[70] 2604.12804

Grid-Forming Characterization in DC Microgrids

DC microgrids are converter-based electrical networks that are increasingly being used in various applications, including data centers and industrial distribution systems. A central challenge in their operation is maintaining the DC-bus voltage within predefined limits while ensuring overall system stability. Although a wide variety of converter control algorithms has been proposed to achieve these objectives, the literature lacks a clear and physically interpretable framework for evaluating their effectiveness and for classifying and comparing them. Moreover, the grid-forming versus grid-following distinction that exists in AC systems has largely been unexplored in DC microgrids. To address this gap, this paper introduces three novel impedance-based indices that can be used to quantify the voltage-forming and current-forming behavior of a converter. The indices also provide a basis for defining the desired converter behavior that yields superior DC-bus voltage regulation performance. Simulation results illustrate the application of the framework to several representative control strategies and highlight the strengths and limitations of these control algorithms.


[71] 2604.15238

A Nonlinear Separation Principle via Contraction Theory: Applications to Neural Networks, Control, and Learning

This paper establishes a nonlinear separation principle based on contraction theory and derives sharp stability conditions for recurrent neural networks (RNNs). First, we introduce a nonlinear separation principle that guarantees global exponential stability for the interconnection of a contracting state-feedback controller and a contracting observer, alongside parametric extensions for robustness and equilibrium tracking. Second, we derive sharp linear matrix inequality (LMI) conditions that guarantee the contractivity of both firing rate and Hopfield neural network architectures. We establish structural relationships among these certificates-demonstrating that continuous-time models with monotone non-decreasing activations maximize the admissible weight space-and extend these stability guarantees to interconnected systems and Graph RNNs. Third, we combine our separation principle and LMI framework to solve the output reference tracking problem for RNN-modeled plants. We provide LMI synthesis methods for feedback controllers and observers, and rigorously design a low-gain integral controller to eliminate steady-state error. Finally, we derive an exact, unconstrained algebraic parameterization of our contraction LMIs to design highly expressive implicit neural networks, achieving competitive accuracy and parameter efficiency on standard image classification benchmarks.


[72] 2604.19330

Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation

Recent advances in Text-To-Speech (TTS) synthesis have seen the popularity of multi-stage approaches that first predict semantic tokens and then generate acoustic tokens. In this paper, we extend the coarse-to-fine generation paradigm to the temporal domain and introduce Chain-of-Details (CoD), a novel framework that explicitly models temporal coarse-to-fine dynamics in speech generation using a cascaded architecture. Our method progressively refines temporal details across multiple stages, with each stage targeting a specific temporal granularity. All temporal detail predictions are performed using a shared decoder, enabling efficient parameter utilization across different temporal resolutions. Notably, we observe that the lowest detail level naturally performs phonetic planning without the need for an explicit phoneme duration predictor. We evaluate our method on several datasets and compare it against several baselines. Experimental results show that CoD achieves competitive performance with significantly fewer parameters than existing approaches. Our findings demonstrate that explicit modeling of temporal dynamics with the CoD framework leads to more natural speech synthesis.


[73] 2604.20610

Model Predictive Communication for Timely Status Updates in Low-Altitude Networks

Timely information delivery in low-altitude networks is critical for many time-sensitive applications, such as unmanned aerial vehicle (UAV) navigation, inspection, and surveillance. The key challenge lies in balancing three competing factors: stringent data freshness requirements, UAV onboard energy consumption, and interference with terrestrial services. Addressing this challenge requires not only efficient power and channel allocation strategies but also effective communication timing over the entire operation horizon. In this work, we propose a model predictive communication (MPComm) framework, enabled by advanced channel sensing techniques, in which the channel conditions that the UAV will experience are largely predictable. Within this framework, we formulate a constrained bi-objective optimization problem to achieve a desired trade-off between energy consumption and terrestrial channel occupation, subject to a strict timeliness constraint. We solve this problem using Pareto analysis and show that the original non-convex, mixed-integer problem can be decomposed into a two-layer structure: the outer layer determines the optimal communication timing, while the inner layer determines the optimal power and channel allocation for each communication interval. An efficient algorithm for the inner problem is developed using non-convex analysis, with asymptotic optimality guarantees, while the outer problem is solved optimally via a simple graph search, with edges characterized by inner solutions. The proposed approach applies to a broad class of problem variants, including objective transformations and single-objective specializations. Numerical results demonstrate the efficiency of the proposed solution, achieving up to a six-fold reduction in terrestrial channel occupation and a 6dB energy saving compared to benchmark schemes.


[74] 2604.21262

Frequency Security Assessment in Power Systems With High Penetration of Renewables Considering Spatio-Temporal Frequency Distribution

The increasing integration of renewable energy sources exacerbates the spatial and temporal differences in frequency across the power system, posing a serious challenge to the accurate and efficient assessment of system frequency security. To address this issue, a generic effective nodal frequency (ENF) model is first established to concisely characterize nodal frequency dynamics. This model is featured by the effective nodal inertia (ENI), damping, and primary regulation parameters, which retain only the dominant constant component governing nodal frequency dynamic performance. This model enables the tractable analytical formulation of nodal frequency trajectory and the key frequency security indicators. Quantitative analysis under the temporary power disturbance condition reveals that the ENI is the most influential parameter governing frequency security. Consequently, the critical nodal inertia for ensuring nodal frequency security is analytically derived. A system-level frequency security index based on the actual ENI and critical nodal inertia is proposed. On the basis of the proposed index, the system frequency security assessment is carried out with the procedure of ``offline calculation and online evaluation'', which is achieved using a lookup table approach and an interpolation method. Simulations on the modified IEEE 39-bus system verify the effectiveness of the proposed assessment method.


[75] 2604.21484

HyperCEUNet: Parameter-Aware Hypernetwork-Driven UNet for Channel Estimation

Deep learning-based channel estimation has been recognized as a promising technique for sixth-generation wireless systems. However, most existing approaches rely solely on least-squares estimates obtained from demodulation reference signals, which fail to explicitly exploit channel time-frequency correlation parameters. Inspired by the independent channel parameter estimation enabled by semi-static reference signals in modern wireless systems, this letter presents a parameter-aware deep learning-based channel estimation framework termed HyperCEUNet. Specifically, the proposed hypernetwork generates an adaptive front-end convolutional layer based on estimated channel parameters, serving as a pre-filtering stage before the UNet-based estimator. In addition, the Wiener-filtered channel estimates are adopted to provide a correlation-aware initialization for data resources. Simulation results demonstrate that our proposed HyperCEUNet effectively improves channel estimation accuracy compared with its conventional counterparts.


[76] 2604.22904

Triple-Phase Sequential Fusion Network for Hepatobiliary Phase Liver MRI Synthesis

Gadoxetate disodium-enhanced MRI is essential for the detection and characterization of hepatocellular carcinoma. However, acquisition of the hepatobiliary phase (HBP) requires a prolonged post-contrast delay, which reduces workflow efficiency and increases the risk of motion artifacts. In this study, we propose a Triple-Phase Sequential Fusion Network (TriPF-Net) to synthesize HBP images by leveraging the sequential information from pre-HBP sequences: while T1-weighted imaging serves as the indispensable baseline, the model adaptively integrates arterial-phase (AP) and venous-phase (VP) features when available. By modeling the tissue-specific contrast uptake and excretion dynamics across these three phases, TriPF-Net ensures robust HBP synthesis even under the stochastic absence of one or both dynamic contrast-enhanced sequences. The framework comprises an Enhanced Region-Guided Encoder and a Dynamic Feature Unification Module, optimized with a Region-Guided Sequential Fusion Loss to maintain physiological consistency. In addition, clinical variables, including age, sex, total bilirubin, and albumin, are incorporated to enhance physiological consistency. Compared with conventional methods, TriPF-Net achieved superior performance on datasets from two centers. On the internal dataset, the model achieved an MAE of 10.65, a PSNR of 23.27, and an SSIM of 0.76. On the external validation dataset, the corresponding values were 12.41, 23.11, and 0.78, respectively. This flexible solution enhances clinical workflow and lesion depiction, potentially eliminating the need for delayed HBP acquisition in HCC imaging.


[77] 2411.13365

Explainable Representation of Finite-Memory Policies for POMDPs using Decision Trees

Partially Observable Markov Decision Processes (POMDPs) are a fundamental framework for decision-making under uncertainty and partial observability. Since in general optimal policies may require infinite memory, they are hard to implement and often render most problems undecidable. Consequently, finite-memory policies are mostly considered instead. However, the algorithms for computing them are typically very complex, and so are the resulting policies. Facing the need for their explainability, we provide a representation of such policies, both (i) in an interpretable formalism and (ii) typically of smaller size, together yielding higher explainability. To that end, we combine models of Mealy machines and decision trees; the latter describing simple, stationary parts of the policies and the former describing how to switch among them. We design a translation for policies of the finite-state-controller (FSC) form from standard literature and show how our method smoothly generalizes to other variants of finite-memory policies. Further, we identify specific properties of recently used "attractor-based" policies, which allow us to construct yet simpler and smaller representations. Finally, we illustrate the higher explainability in a few case studies.


[78] 2412.10679

U-FaceBP: Uncertainty-aware Bayesian Ensemble Deep Learning for Face Video-based Blood Pressure Estimation

Blood pressure (BP) measurement is crucial for daily health assessment. Remote photoplethysmography (rPPG), which extracts pulse waves from face videos captured by a camera, has the potential to enable convenient BP measurement without specialized medical devices. However, there are various uncertainties in BP estimation using rPPG, leading to limited estimation performance and reliability. In this paper, we propose U-FaceBP, an uncertainty-aware Bayesian ensemble deep learning method for face video-based BP estimation. U-FaceBP models aleatoric and epistemic uncertainties in face video-based BP estimation with a Bayesian neural network (BNN). Additionally, we design U-FaceBP as an ensemble method, estimating BP from rPPG signals, PPG signals derived from face videos, and face images using multiple BNNs. Large-scale experiments on two datasets involving 1197 subjects from diverse racial groups demonstrate that U-FaceBP outperforms state-of-the-art BP estimation methods. Furthermore, we show that the uncertainty estimates provided by U-FaceBP are informative and useful for guiding modality fusion, assessing prediction reliability, and analyzing performance across racial groups.


[79] 2412.11399

Quantifying Climate Change Impacts on Renewable Energy Generation: A Super-Resolution Recurrent Diffusion Model

Driven by global climate change and the ongoing energy transition, the coupling between power supply capabilities and meteorological factors has become increasingly significant. Over the long term, accurately quantifying the power generation of renewable energy under the influence of climate change is essential for the development of sustainable power systems. However, due to interdisciplinary differences in data requirements, climate data often lacks the necessary hourly resolution to capture the short-term variability and uncertainties of renewable energy resources. To address this limitation, a super-resolution recurrent diffusion model (SRDM) has been developed to enhance the temporal resolution of climate data and model the short-term uncertainty. The SRDM incorporates a pre-trained decoder and a denoising network, that generates long-term, high-resolution climate data through a recurrent coupling mechanism. The high-resolution climate data is then converted into power value using the mechanism model, enabling the simulation of wind and photovoltaic (PV) power generation on future long-term scales. Case studies were conducted in the Ejina region of Inner Mongolia, China, using fifth-generation reanalysis (ERA5) and coupled model intercomparison project (CMIP6) data under two climate pathways: SSP126 and SSP585. The results demonstrate that the SRDM outperforms existing generative models in generating super-resolution climate data. Furthermore, the research highlights the estimation biases introduced when low-resolution climate data is used for power conversion.


[80] 2412.13421

Explainable Detection of Machine Generated Music and Early Systematic Evaluation

Machine-generated music (MGM) has become a groundbreaking innovation with wide-ranging applications, such as music therapy, personalised editing, and creative inspiration within the music industry. However, the unregulated proliferation of MGM presents considerable challenges to the entertainment, education, and arts sectors by potentially undermining the value of high-quality human compositions. Consequently, MGM detection (MGMD) is crucial for preserving the integrity of these fields. Despite its significance, MGMD domain lacks comprehensive systematic evaluation results necessary to drive meaningful progress. To address this gap, we conduct experiments on existing large-scale datasets using a range of foundational models for audio processing, establishing systematic evaluation results tailored to the MGMD task. Our selection includes traditional machine learning models, deep neural networks, Transformer-based architectures, and State space models (SSM). Recognising the inherently multimodal nature of music, which integrates both melody and lyrics, we also explore fundamental multimodal models in our experiments. Beyond providing basic binary classification outcomes, we delve deeper into model behaviour using multiple explainable Artificial Intelligence (XAI) tools, offering insights into their decision-making processes. Our analysis reveals that ResNet18 performs the best according to in-domain and out-of-domain tests. By providing a comprehensive comparison of systematic evaluation results and their interpretability, we propose several directions to inspire future research to develop more robust and effective detection methods for MGM. We provide our codes and some samples on Github repository.


[81] 2504.13529

Improving Bayesian Optimization for Portfolio Management with an Adaptive Scheduling

Existing black-box portfolio management systems are prevalent in the financial industry due to commercial and safety constraints, though their performance can fluctuate dramatically with changing market regimes. Evaluating these non-transparent systems is computationally expensive, as fixed budgets limit the number of possible observations. Therefore, achieving stable and sample-efficient optimization for these systems has become a critical challenge. This work presents a novel Bayesian optimization framework (TPE-AS) that improves search stability and efficiency for black-box portfolio models under these limited observation budgets. Standard Bayesian optimization, which solely maximizes expected return, can yield erratic search trajectories and misalign the surrogate model with the true objective, thereby wasting the limited evaluation budget. To mitigate these issues, we propose a weighted Lagrangian estimator that leverages an adaptive schedule and importance sampling. This estimator dynamically balances exploration and exploitation by incorporating both the maximization of model performance and the minimization of the variance of model observations. It guides the search from broad, performance-seeking exploration towards stable and desirable regions as the optimization progresses. Extensive experiments and ablation studies, which establish our proposed method as the primary approach and other configurations as baselines, demonstrate its effectiveness across four backtest settings with three distinct black-box portfolio management models.


[82] 2509.25236

Networks of Causal Abstractions: A Sheaf-theoretic Framework

A core challenge in causal artificial intelligence is the principled coordination of multiple, imperfect, and subjective causal perspectives arising from distributed agents with limited and heterogeneous access to the environment. This problem has received little formal treatment, as the existing framework assumes a single shared global causal model. This work introduces the causal abstraction network (CAN), a general sheaf-theoretic framework for representing, learning, and reasoning across collections of mixture of causal models (MCMs) - a class that unifies several existing models of context-dependent causal mechanisms. Sheaf theory provides a natural foundation for this task, offering a rigorous framework to coherently align distributed causal knowledge without requiring explicit causal graphs, functional mechanisms, interventional data, or jointly sampled observations. At the theoretical level, we provide a categorical formulation of MCMs and characterize key properties of CANs, including consistency and smoothness. Under consistency, we establish necessary and sufficient conditions: (i) for the existence of global sections, linked to spectral properties of an associated connection Laplacian; and (ii) for the convergence of causal knowledge diffusion over the CAN to the space of global sections. At the methodological level, we exploit the compositionality of causal abstractions to decompose the learning of consistent CANs into local problems on network edges, extending our prior work on Gaussian variables to Gaussian mixtures via the proposed MIXTURE-CALSEP algorithm. We validate the framework on synthetic data and through a financial application involving a multi-agent trading system, demonstrating CAN recovery, CAN-based portfolio optimization, and counterfactual reasoning.


[83] 2602.17166

Geometric Inverse Flight Dynamics on SO(3) and Application to Tethered Fixed-Wing Aircraft

We present a robotics-oriented, coordinate-free formulation of inverse flight dynamics for fixed-wing aircraft on SO(3). Translational force balance is written in the world frame and rotational dynamics in the body frame; aerodynamic directions (drag, lift, side) are defined geometrically, avoiding local attitude coordinates. Enforcing coordinated flight (no sideslip), we derive a closed-form trajectory-to-input map yielding the attitude, angular velocity, and thrust-angle-of-attack pair, and we recover the aerodynamic moment coefficients component-wise. Applying such a map to tethered flight on spherical parallels, we obtain analytic expressions for the required bank angle and identify a specific zero-bank locus where the tether tension exactly balances centrifugal effects, highlighting the decoupling between aerodynamic coordination and the apparent gravity vector. Under a simple lift/drag law, the minimal-thrust angle of attack admits a closed form. These pointwise quasi-steady inversion solutions become steady-flight trim when the trajectory and rotational dynamics are time-invariant. The framework bridges inverse simulation in aeronautics with geometric modeling in robotics, providing a rigorous building block for trajectory design and feasibility checks.


[84] 2602.19179

Distributional Stability of Tangent-Linearized Gaussian Inference on Smooth Manifolds

Gaussian inference on smooth manifolds is central to robotics, but exact marginalization and conditioning are generally non-Gaussian and geometry-dependent. We study tangent-linearized Gaussian inference and derive explicit non-asymptotic $W_2$ stability bounds for projection marginalization and surface-measure conditioning. The bounds separate local second-order geometric distortion from nonlocal tail leakage and, for Gaussian inputs, yield closed-form diagnostics from $(\mu,\Sigma)$ and curvature/reach surrogates. Circle and planar-pushing experiments validate the predicted calibration transition near $\sqrt{\|\Sigma\|_{\mathrm{op}}}/R\approx 1/6$ and indicate that normal-direction uncertainty is the dominant failure mode when locality breaks. These diagnostics provide practical triggers for switching from single-chart linearization to multi-chart or sample-based manifold inference. Code and Jupyter notebooks are available at this https URL.


[85] 2603.16961

Impacts of Electric Vehicle Charging Regimes and Infrastructure Deployments on System Performance: An Agent-Based Study

The rapid growth of electric vehicles (EVs) requires more effective charging infrastructure planning. Infrastructure layout not only determines deployment cost, but also reshapes charging behavior and influences overall system performance. In addition, destination charging and en-route charging represent distinct charging regimes associated with different power requirements, which may lead to substantially different infrastructure deployment outcomes. This study applies an agent-based modeling framework to generate trajectory-level latent public charging demand under three charging regimes based on a synthetic representation of the Melbourne (Australia) metropolitan area. Two deployment strategies, an optimization-based approach and a utilization-refined approach, are evaluated across different infrastructure layouts. Results show that utilization-refined deployments reduce total system cost, accounting for both infrastructure deployment cost and user generalized charging cost, with the most significant improvement observed under the combined charging regime. In particular, a more effective allocation of AC slow chargers reshapes destination charging behavior, which in turn reduces unnecessary reliance on en-route charging and lowers detour costs associated with en-route charging. This interaction highlights the behavioral linkage between destination and en-route charging regimes and demonstrates the importance of accounting for user response and multiple charging regimes in charging infrastructure planning.


[86] 2604.07225

A Trajectory-Based Approach to Controlled Invariance and Recursively Feasible MPC

In this paper, we revisit the computation of controlled invariant sets for linear discrete-time systems through a trajectory-based viewpoint. We begin by introducing the notion of convex feasible points, which provides a new characterization of controlled invariance using finitely long state trajectories. We further show that combining this notion with the classical backward fixed-point algorithm allows for the computation of the maximal controlled invariant set. Building on these results, we propose a model predictive control (MPC) scheme that guarantees recursive feasibility without relying on precomputed terminal sets. Finally, we formulate the search for convex feasible points as an optimization problem, yielding a practical computational method for constructing controlled invariant sets. The effectiveness of the approach is illustrated through numerical examples.


[87] 2604.24036

Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues

While Multimodal Large Language Models (MLLMs) have enhanced grounding capabilities in general scenes, their robustness in crowded scenes remains underexplored. Crowded scenes entail visual challenges (i.e., occlusion and small objects), which impair object semantics and degrade grounding performance. In contrast, language expressions are immune to such degradation and preserve object semantics. In light of these observations, we propose a novel method that overcomes such constraints by leveraging Language-Guided Semantic Cues (LGSCs). Specifically, our approach introduces a Semantic Cue Extractor (SCE) to derive semantic cues of objects from the visual pipeline of an MLLM. We then guide these cues using corresponding text embeddings to produce LGSCs as linguistic semantic priors. Subsequently, they are reintegrated into the original visual pipeline to refine object semantics. Extensive experiments and analyses demonstrate that incorporating LGSCs into an MLLM effectively improves grounding accuracy in crowded scenes.