Damage caused by bushfires and volcanic eruptions escalates rapidly when detection is delayed, making fast and reliable early warning capabilities essential. Recent Earth Observation (EO) approaches have shown that thermal anomaly detection can be performed directly on decompressed Level-0 (L0) sensor data, avoiding computationally expensive preprocessing chains. However, direct exploitation of raw data remains challenging due to domain shift, sensor drift, radiometric inconsistencies, and the scarcity of labelled training samples. To address these challenges, this work proposes a Physics-Aware Neuromorphic Network (PANN) framework for onboard thermal anomaly detection. The proposed lightweight architecture, inspired by physical neural network principles and neuromorphic computing paradigms, is evaluated using two Sentinel-2 datasets: decompressed L0 with additional metadata (i.e. raw) and Level-1C (L1C). The PANN achieves a Matthews Correlation Coefficient (MCC) of $0.809$ on raw measurements, compared to $0.875$ when using ground-processed L1C products. The mean processing latency per L0 granule is $2.44 \pm 0.09~\mathrm{s}$, which is below the Sentinel-2 acquisition time of $3.6~\mathrm{s}$, demonstrating the feasibility of real-time, onboard processing. Furthermore, the projected execution time for the corresponding neuromorphic hardware instantiation is substantially lower at $0.1290 \pm 0.0002~\mathrm{s}$. Memory usage, including all necessary programs and packages, remains within realistic onboard constraints, with requirements of $0.673 \pm 0.007~\mathrm{Gb}$ for the software PANN and $0.393 \pm 0.004~\mathrm{Gb}$ for the estimated hardware realisation. Overall, these results indicate that PANN offers a promising pathway toward low-latency and resource-efficient onboard EO processing for thermal event detection.
Visual state-space models (SSMs) are increasingly promoted as efficient alternatives to Vision Transformers, yet their practical advantages remain unclear under fair comparison because existing studies rarely isolate encoder effects from decoder and training choices. We present a strictly controlled benchmark of representative visual SSM families, including VMamba, MambaVision, and Spatial-Mamba, for remote-sensing semantic segmentation, in which only the encoder varies across experiments. Evaluated on LoveDA and ISPRS Potsdam under a unified 4-stage feature interface and a fixed lightweight decoder, the benchmark reveals three main findings, intra-family scaling yields only modest gains, cross-domain generalization is strongly asymmetric, and boundary delineation is the dominant failure mode under distribution shift. Although visual SSMs achieve favorable accuracy-efficiency trade-offs relative to the controlled CNN and Transformer baselines considered here, the results suggest that future improvements are more likely to come from robustness-oriented design and boundary-aware decoding than from encoder scaling alone. By isolating encoder behavior under a unified and reproducible protocol, this study establishes a practical reference benchmark for the design and evaluation of future Mamba-based segmentation backbones
Dynamic state estimation (DSE) is becoming increasingly important for monitoring inverter-dominated power systems. Due to their cascading control structures, inverter-based resources (IBRs) exhibit multi-timescale dynamics, leading to stiff system models that pose significant challenges for conventional DSE methods. In particular, explicit discretization schemes often require impractically small sampling intervals to maintain numerical stability, increasing computational and communication burdens. To address this issue, this paper proposes a stiffness-aware decentralized DSE method for inverter-dominated power systems. The statistical linearization is used to construct a local linear surrogate model for the nonlinear dynamics, which allows matrix-exponential discretization to enable analytical uncertainty propagation in discrete time, rather than relying on explicit integration schemes. This enables stable DSE at lower sampling rates. Numerical results reveal the mechanism by which stiff dynamics destabilize conventional DSE and demonstrate that the proposed method achieves efficient and accurate estimation under coarse sampling conditions.
Hybrid beamforming architectures reduce hardware complexity but restrict access to full array observations, rendering direct implementation of classical covariance based methods such as minimum variance distortionless response (MVDR) and sample matrix inversion (SMI) infeasible. This work introduces a structured covariance completion framework, termed Rock Road to Dublin (RR2D), which estimates the unobservable analytical covariance matrix (ACM) from a partially observed sample covariance matrix (SCM). RR2D exploits signal stationarity across the array and enforces physical measurement consistency using Dykstra's alternating projection algorithm with positive semidefinite, Toeplitz, and block constraints. The reconstructed virtual ACM enables a realizable hybrid SMI (HSMI) formulation that remains fully compatible with existing hybrid MVDR optimization frameworks. Empirical results for a 32 element hybrid array demonstrate both the expected degradation of HSMI implemented directly under prior HMVDR formulations and the performance gains achieved through RR2D. The proposed HSMI consistently outperforms previous hybrid SMI and partial digital baselines, achieving performance close to the HMVDR reference. Overall, RR2D bridges the gap between theoretical HMVDR formulations and practical hybrid hardware by enabling structured covariance reconstruction from incomplete observations.
Three-dimensional (3D) wide-field fluorescence microscopy is a widely used modality for volumetric imaging, but suffers from characteristic out-of-focus blur. Existing reconstruction methods either struggle to operate on high-dimensional volumes or fail to provide credibility characterization of the reconstruction. In this work, we introduce Volumetric Transport (VOLT), a 3D-native probabilistic framework for wide-field fluorescence microscopy reconstruction. VOLT combines a transport-based formulation that maps degraded measurements to clean volumes via stochastic interpolants with a 3D-native anisotropic network that separates lateral and axial processing. This design operates directly in voxel space and achieves improved scalability to large volumes without relying on slice-wise approximations. We develop both stochastic (SDE) and deterministic (ODE) variants within the same framework. We validate VOLT on simulated wide-field microscopy datasets. Our results show that VOLT significantly improves reconstruction quality in both lateral and axial directions while providing voxel-wise credibility estimates.
An algorithm for simulation of switching converters is proposed in the paper. The algorithm is based on simulation of averaged circuit model applying "switching cell" concept, and construction of instantaneous values of the waveforms using quasi steady state and linear ripple approximation. Simulation covers converters operating both in the continuous and the discontinuous conduction mode. Application of the algorithm is demonstrated by simulation results of all three of the basic converters: buck, boost and buckboost, as well as a flyback converter, which required slight generalization of the switching cell concept.
An algorithm for simulation of switching converters is proposed in the paper. The algorithm is based on simulation of averaged circuit model applying "switching cell" concept, and construction of instantaneous values of the waveforms using quasi steady state and linear ripple approximation. Simulation covers converters operating both in the continuous and the discontinuous conduction mode. Application of the algorithm is demonstrated by simulation results of all three of the basic converters: buck, boost and buck-boost, as well as a flyback converter, which required slight generalization of the switching cell concept.
We investigate the performance of beyond-diagonal reconfigurable intelligent surfaces (BD-RIS) for bistatic MIMO multi-target sensing using a two-stage tensor Doppler-delay-angle estimation (TenDAE). The first stage solves a Kronecker sum approximation (KSA) with a rank equal to the number of targets. The second stage employs a nested tensor factorization estimation (NTFE) that exploits the inherent multidimensional structure via two tensor decompositions that are solved in parallel. The first employs a PARAFAC decomposition to extract the targets' angles, and the second uses a nested PARAFAC decomposition to find the targets' delay and Doppler parameters. This two-stage approach decouples acquisition of the angles and delays/Dopplers using either alternating least squares or a higher-order singular value decomposition, followed by a high-resolution subspace technique, such as ESPRIT. We further compare the performance of a BD-RIS with a classical diagonal RIS. For the latter, we solve a Khatri-Rao sum approximation problem rather than the KSA due to the specific structure of the received signal. Notably, our NTFE framework remains blind to the underlying RIS architecture while simultaneously estimating all targets with minimal sensing resources. Additionally, we show that employing a nested-PARAFAC decomposition enables the decoupling of the delay-Doppler and angle domains. We also derive the Cramér-Rao lower bound to further assess the performance of the TenDAE framework. Finally, we numerically evaluate the solutions presented in this paper and demonstrate their efficiency in terms of RMSE compared with state-of-the-art approaches.
This paper proposes a control architecture integrating adaptation with Lyapunov-based Reference Governors (LRGs) to ensure state constraint satisfaction for first-order systems with parametric uncertainties. Adaptation combined with LRGs guarantees stability, ensures good control performance, and remains safe even with parametric uncertainties. Simulations of the fuel cell temperature regulation problem demonstrate that the proposed control architecture successfully meets all control and safety objectives, whereas the standard adaptation fails to achieve the latter.
Computing Fourier transforms of k-sparse signals, where only k of N frequencies are non-zero, is fundamental in compressed sensing, radar, and medical imaging. While the Fast Fourier Transform (FFT) evaluates all N frequencies in $O(N \log N)$ time, sufficiently sparse signals should admit sub-linear complexity in N. Existing sparse FFT algorithms using Chinese Remainder Theorem (CRT) reconstruction rely on moduli selection choices whose worst-case implications have not been fully characterized. This paper makes two contributions. First, we establish an $\Omega(k^2)$ adversarial lower bound on candidate growth for CRT-based sparse FFT when moduli are not pairwise coprime (specifically when $m_3 \mid m_1 m_2$), implying an $O(k^2 N)$ worst-case validation cost that can exceed dense FFT time. This vulnerability is practically relevant, since moduli must often divide N to avoid spectral leakage, in which case non-pairwise-coprime configurations can be unavoidable. Pairwise coprime moduli avoid the proven attack; whether analogous constructions exist for such moduli remains an open question. Second, we present a robustness framework that wraps a 3-view CRT sparse front end with lightweight certificates (bucket occupancy, candidate count) and an adaptive dense FFT fallback. For signals passing the certificates, the sparse path achieves $O(\sqrt{N} \log N + k N)$ complexity; when certificates detect collision risk, the algorithm reverts to $O(N \log N)$ dense FFT, guaranteeing worst-case performance matching the classical bound.
Path integral control in Gaussian belief space requires a structural matching condition between the observation-driven diffusion of the belief mean and the actuation authority, which a fixed observation matrix cannot enforce. We treat the observation matrix as a control variable and show that constraining the sensing control to a measurable selector from the resulting matching set reduces the Hamilton-Jacobi-Bellman equation for the belief mean and covariance to a linear PDE with a Feynman-Kac representation.
Next-generation wireless networks are expected to leverage multi-modal data sources to execute various wireless communication tasks such as beamforming and blockage prediction with situational-awareness. To do so, multi-modal transformers emerged as an effective tool, however, existing transformer-based approaches suffer from high inference latency and large memory footprints when processing multi-modal data. Hence, such existing solutions cannot handle wireless communication tasks that require fast inference to track a dynamically changing environment with moving vehicles and blockages. One major bottleneck is the reliance on attention mechanisms whose complexity grows quadratically with respect to the number of tokens. Hence, in this paper, a novel, fast multi-modal transformer inference framework is designed to practically support wireless communication tasks by processing only important tokens. To this end, an optimization problem is formulated to find the optimal number of tokens under a target FLOPs for a given wireless communication task while maintaining the task accuracy. To solve this problem, modality-specific tokenizers are first designed to project each modality into the same embedding dimension. Then, a token router is introduced to learn the importance of each token and process only important tokens. Subsequently, a trainable keep ratio is introduced to learn how many tokens to process for each layer under the target FLOPs. Simulation results show that, on DeepSense 6G beamforming tasks, we can reduce the inference latency, GPU memory, and FLOPs by 86.2% 35%, and 80%, respectively, with negligible accuracy loss. To validate the feasibility for real-world deployments, a multi-modal handover dataset is developed using a real-world testbed. Emulation results on the developed dataset show that the proposed framework can proactively initiate handover before blockage.
The self-noise of capacitive sensors, primarily caused by thermal noise from the gate-bias resistor in the preamplifier, imposes a fundamental limit on measurement sensitivity. In electret condenser microphones (ECMs), this resistor simultaneously determines the noise low-pass cutoff frequency and the signal high-pass cutoff frequency through a single RC time constant, creating a trade-off between noise reduction and signal bandwidth. This paper proposes PDS-Amp (Photoelectric DC Servo Amplifier), a circuit technique that replaces the gate-bias resistor with a photoelectric element functioning as an ultra-high-impedance current source. A DC servo loop using lag-lead compensation feeds back the preamplifier output through an LED to control the photocurrent, thereby stabilizing the gate bias while decoupling the noise and signal cutoff frequencies. A custom photosensor based on the external photoelectric effect of a zinc photocathode was fabricated to achieve sub-picoampere dark current, overcoming the limitations of commercial semiconductor photodiodes. Combined with a cascode JFET preamplifier that minimizes input capacitance through bootstrap action, PDS-Amp achieved a self-noise of 11 dBA with a 12 pF dummy microphone. Despite using a small-diameter ECM capsule, this performance is comparable to that of large-diaphragm condenser microphones costing several thousand dollars. Recording experiments with an actual ECM capsule qualitatively confirmed a significant reduction in background noise. The proposed technique is applicable not only to microphones but broadly to capacitive sensors including accelerometers, pressure sensors, and pyroelectric sensors.
This paper investigates the transient synchronization stability in power systems hybridized with virtual synchronous generators (VSGs) and synchronous generators (SGs). A relative swing equation model is established to capture the transient synchronization dynamics between the VSG and the SG. Based on this model, both static and dynamic characteristics are systematically analyzed, and a quantitative stability level index is derived to elucidate the underlying stability mechanism. Then, two fundamental inertia matching principles are identified. First, a new instability mechanism induced by improper inertia matching between the VSG and the SG is revealed. It is identified that increasing the VSG's inertia does not monotonically improve transient stability, as commonly presumed. Instead, an optimal inertia matching constant exists that maximizes stability performance. Second, the influence of the VSG share on the synchronization stability is discovered to be strongly influenced by the matching between the VSG's inertia level and its voltage strength (i.e., output impedance). To achieve reliable and robust synchronization stability, proper coordination between the VSG's inertia and virtual impedance is essential. Finally, a coordinated stabilization strategy based on inertia matching and virtual impedance adjustment is proposed to enhance transient synchronization stability performance while suppressing fault current. Simulations conducted on a two-machine system and the IEEE 39-bus system validate the theoretical findings and demonstrate the effectiveness of the proposed strategy.
Mainstream optical satellites often acquire multispectral multi-resolution images, which have limited material identifiability compared to the HSIs. Thus, spectrally super-resolving the MSI into their hyperspectral counterparts greatly facilitates remote material identification and the downstream tasks. However, spectrally super-resolving the MSI into an HSI is often constrained by the multi-resolution nature of the sensor. Specifically, due to the presence of some LR bands in the MSI, the initial spectral super-resolution results often appear to be spatially blurry, resulting in an LR HSI. To overcome this bottleneck, we then leverage some HR band inherent in the acquired MSI to spatially guide the reconstruction procedure, thereby yielding the desired HR HSI. This fusion procedure elegantly coincides with a widely known spatial super-resolution problem in satellite remote sensing. Hence, we have reformulated the tough spectral super-resolution problem into a more widely investigated spatial super-resolution problem, referred to as the spectral-spatial duality theory. Accordingly, we propose ExplainS2A, consisting of a deep unfolding network and an explainable fusion network, that unifies spectral recovery and spatial fusion into a single explainable framework. Unlike conventional black-box models, ExplainS2A offers interpretability and operates as a linear-time algorithm. Remarkably, it can process a million-scale Sentinel-2 image in less than one second, yielding high-fidelity HSI over the same scene, and upgrades the blind source separation results. Although demonstrated on the Sentinel-2 and AVIRIS sensors, ExplainS2A also serves as a general framework applicable to various sensor pairs with different resolution configurations, and has experimentally demonstrated cross-region and cross-season generalization ability. Source codes: this https URL.
High-mobility uncrewed aerial vehicle (UAV) communications in low-altitude wireless networks (LAWN) demand reliable beamforming, while conventional feedback-based schemes suffer from excessive overhead and severe misalignment under rapid trajectory variations. To address this challenge, this paper proposes an SSB-based sensing-assisted predictive robust beamforming framework that replaces explicit channel state information (CSI) feedback with sensing-driven state estimation and uncertainty-aware optimization. Leveraging the periodic 'always-on' synchronization signal block (SSB), a hierarchical sensing algorithm tailored for hybrid digital-analog uniform planar arrays is developed, combining 2D range-velocity profiling and augmented beamspace multiple signal classification (MUSIC). By integrating a locally-focused analog receive beamformer, the proposed sensing design can ensure energy accumulates across different radio-frequency (RF) chains while resolving angular ambiguity. An extended Kalman filter (EKF) is further employed to track UAV states between sparse synchronization-signal (SS) bursts, and a covariance correction is introduced to characterize maneuver-induced prediction uncertainties. Based on the derived statistical distributions of range and angular parameters, the communication channel is modeled through predictive correlation matrices rather than instantaneous CSI, leading to a multi-user robust beamforming formulation that maximizes average network sum-rate under uncertainty. The resulting nonconvex problem is efficiently solved via successive convex approximation and alternating minimization. Simulation results demonstrate that the proposed framework significantly enhances spectral efficiency and link stability compared with feedback-based beamforming and non-robust beamforming design, particularly in high-mobility and large-SSB-interval scenarios.
This paper addresses the quantitative verification of finite-time constrained occupation time for stochastic continuous-time systems governed by stochastic differential equations (SDEs). Unlike classical reachability analysis, which focuses on single-event properties such as entering a target set, many autonomous tasks-including surveillance, wireless charging, and chemical mixing-require a system to accumulate a prescribed duration within a target region while strictly maintaining safety constraints. We propose a barrier-certificate framework to compute rigorous upper and lower bounds on the probability that such cumulative specifications are satisfied over a finite time horizon. By introducing a stopped process that freezes the system once it reaches the boundary of the safe set, we derive three classes of certificates: one for upper bounds and two for lower bounds. The proposed approaches are validated through numerical examples implemented using semidefinite programming.
Reliable surveillance and communication for unmanned aerial vehicles (UAVs) are crucial for enabling and sustaining the accelerated growth of the low-altitude economy. Integrated sensing and communications (ISAC) offers a cost-effective and scalable framework for target sensing by leveraging existing wireless communication systems. This paper investigates a bistatic downlink ISAC architecture tailored to UAV operations, in which a BS communicates with a legitimate UAV and detects a potential unauthorized intruder in the surveillance region. We assume that the BS transmits superimposed ISAC waveforms comprising both Gaussian-information-bearing and deterministic sensing components. First, we develop a Neyman-Pearson (NP)-based optimal detector that jointly exploits both deterministic sensing and stochastic signal components. Subsequently, we optimize the transmit beamforming design at the BS to maximize the minimum detection probability over the entire surveillance region, subject to a minimum signal-to-interference-plus-noise ratio (SINR) requirement at the authorized UAV and a total transmit power budget at the BS. The resulting design problem is highly non-convex, which is efficiently addressed via semi-definite relaxation (SDR) and successive convex approximation (SCA) techniques. Simulation results demonstrate the superiority of the proposed NP-based detector, which fully leverages the synergy between both types of signals, over conventional benchmark schemes that treat information-bearing signals merely as interference. Furthermore, the results reveal a fundamental sensing-communication trade-off, where increasing the communication-rate threshold directs more transmit power to Gaussian-information-bearing signals, thereby reducing the power allocated to deterministic components and consequently weakening detection performance.
In 5G and beyond networks, efficient scheduling is essential to exploit the gains of multi-user MIMO (MU-MIMO) equipped with carrier aggregation and joint transmission (JT). However, cross-cell and cross-carrier scheduling under QoS constraints is challenging due to the strong coupling across users, base stations, and carriers. In this work, we address this problem in multi-cell MU-MIMO networks to maximize system throughput for both JT and non-JT users under rate constraints. The optimization is highly complex as scheduling variables and beamforming (BF) vectors are intertwined. To tackle it, we propose an approximate but tractable surrogate by leveraging the eigen-based zero-forcing BF and massive MIMO asymptotics. The reformulated problem has a separable structure and is amenable to efficient solutions by a penalty-based block coordinate descent method. Simulations demonstrate that the proposed scheduler not only meets the QoS requirements well but also achieves remarkable throughput gains over existing schemes.
Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings remains challenging. We present a Unified ASR framework for Transducer (RNNT) training that supports both offline and streaming decoding within a single model, using chunk-limited attention with right context and dynamic chunked convolutions. To further close the gap between offline and streaming performance, we introduce an efficient Triton implementation of mode-consistency regularization for RNNT (MCR-RNNT), which encourages agreement across training modes. Experiments show that the proposed approach improves streaming accuracy at low latency while preserving offline performance and scaling to larger model sizes and training datasets. The proposed Unified ASR framework and the English model checkpoint are open-sourced.
Synthetic aperture radar tomography (TomoSAR) enables 3-D imaging by exploiting multibaseline acquisitions and has become an important tool for urban mapping. To achieve super-resolution inversion, sparse reconstruction methods based on compressive sensing (CS) are widely adopted. However, most CS-based TomoSAR methods rely on grid-based formulations and therefore suffer from off-grid bias. Gridless formulations provide a principled way to alleviate this limitation, whereas classical Toeplitz-Vandermonde atomic norm minimization (ANM) is not directly applicable to spaceborne TomoSAR under nonuniform baselines. Existing gridless methods for nonuniform-baseline TomoSAR avoid the classical uniform linear array (ULA) assumption, but they are usually tightly coupled to handcrafted iterative solvers and solver-specific parameter settings, while robust inversion under limited observations and low-SNR conditions remains challenging. To address this gap, we propose DUSG-Tomo-Net, a deep unfolded gridless framework for single-look spaceborne TomoSAR under nonuniform baselines. The proposed method reformulates the inversion in a Toeplitz-compatible lag domain via a structured single-look approximation and recovers a Hermitian Toeplitz positive-semidefinite structured covariance representation through layerwise learned regularization and projection-based structural enforcement. The actual acquisition geometry is embedded analytically into the data-consistency step via a fixed, signal-independent operator, enabling operator-based adaptation to varying baseline configurations. Scatterer elevations are then estimated by a continuous-domain spectral estimator without elevation discretization.
The electric vehicle (EV) charging demands (CD) are jointly determined by the EV owners' behavior (i.e., human factor) and the electricity prices (i.e., decisions of distribution system operators (DSO)). However, most existing studies either neglect the decision-dependent nature of EVCD uncertainty or idealistically treat EV owners as perfect decision-makers. This paper formulates the optimal operation of power distribution systems (PDS) as a distributionally robust chance-constrained (DRCC) problem considering EVCDs as endogenous uncertainty (i.e., decision-dependent uncertainty). The Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) is introduced to capture the human factor of EV owners in the proposed ambiguity set. Case studies on IEEE test systems demonstrate that the proposed method achieves superior performance compared to deterministic and conventional DRCC approaches, thereby enhancing resilience and security in PDS operations.
This paper presents an automated software toolchain for synthesizing hardware-implementable analog circuits that solve constrained optimization problems. The proposed toolchain supports nonlinear objective functions with linear and quadratic constraints. It maps optimization variables to capacitor voltages, implementing dynamics that enforce Karush-Kuhn-Tucker conditions using operational amplifiers, resistors, capacitors, diodes, and analog multipliers. From high-level problem descriptions in AMPL or MPS, the toolchain generates a SPICE netlist for the analog circuit, simulates it, and verifies that the solutions converge. The projected settling time of the analog circuit depends on circuit parameters, gain-bandwidth product, and slew-rate limits of operational amplifiers, and leverages the inherent parallelism of analog circuits. The proposed toolchain successfully generates circuits with up to 10,000 variables and demonstrates large scalability improvements, achieving up to a 1,000X increase in solvable problem size over prior analog hardware demonstrations. Simulation studies further show that the automatically synthesized circuits converge to optimal solutions, achieving more than a 200X speedup compared to IPOPT, a state-of-the-art digital interior-point solver.
The process of calibrating instrument transformers (ITs) has been greatly simplified by using phasor measurement unit (PMU) data since this process eliminates the need for (a) additional hardware, and (b) taking ITs offline. However, such simplification comes at the cost of knowing the line parameters, whose estimation using PMU data in turn requires calibrated ITs. To solve this interdependency problem, we propose a novel framework that incorporates power system domain knowledge as constraints to perform simultaneous line parameter estimation and IT calibration. We demonstrate the effectiveness of our approach with simulated and real PMU data as well as for a power system application that uses both PMU data and line parameter information.
Tracking multiple targets in dynamic environments using distributed sensor networks is a fundamental problem in statistical signal processing. In such scenarios, the network of mobile sensors must coordinate their actions to accurately estimate the locations and trajectories of multiple targets, balancing limited computation and communication resources with multi-target tracking accuracy. Multi-sensor control methods can improve the performance of these networks by enabling efficient utilization of resources and enhancing the accuracy of the estimated target states. This paper proposes a novel multi-sensor control method that utilizes multi-agent coordinate descent to address this problem, ensuring distributed consensus of optimal sensor actions throughout the sensor network. To achieve this, a novel adaptive complementary fusion approach that prioritizes information from the most informative sensors is developed. Our method improves computational tractability and enables fully distributed control, ensuring the scalability and flexibility necessary for large-scale real-time sensing systems. Experimental results on several challenging multi-target tracking scenarios demonstrate that our approach significantly improves both multi-target tracking accuracy and computation efficiency over competing methods.
We study the deep image prior (DIP) framework applied to photoacoustic tomography (PAT) as an unsupervised reconstruction approach to mitigate limited-view artifacts and noise commonly encountered in experimental settings. Efficient implementation is achieved by employing recently published fast forward and adjoint algorithms for circular measurement geometries. Initialization via a fast inverse and total variation (TV) regularization are applied to further suppress noise and mitigate overfitting. For comparison, we compute a classical TV reconstruction. Our experiments comprise simulated PAT measurements under limited-view geometries and varying levels of added noise as well as experimental measurements together with using a digital twin for quality assessment. Our findings suggest that DIP framework provides an effective unsupervised strategy for robust PAT reconstruction even in the challenging case of a limited view geometry providing improvement in several quantitative measures over total variation reconstructions.
This paper presents a robust path following control method for vehicles that explicitly considers steering resistance dynamics to improve tracking accuracy. Conventional methods typically treat the steering angle as a direct control input; however, this approach introduces the steering angle as a state variable and incorporates the steering resistance effect into the control model. The steering resistance is modeled as a function of vehicle speed and steering angle, whereas in practice it varies depending on road conditions. To address the resulting model inaccuracies, a Model Error Compensator (MEC) is introduced, mitigating the effects of variations in steering resistance and enhancing the adaptability of the system to different environments. Since the steering resistance coefficient depends on road surface properties and is difficult to determine precisely, the proposed method treats it as an uncertain parameter and compensates for the resulting model error via MEC. Numerical simulations are conducted to evaluate the performance of the proposed method under varying degrees of parameter mismatch, demonstrating that the proposed method substantially reduces the maximum tracking error in representative mismatched cases compared to the conventional method. The results indicate that explicitly modeling steering resistance dynamics and compensating for model errors improve path following performance in numerical simulations compared to conventional approaches.
In this letter, we propose a sparsification method for precoding codebooks that reduces the peak-to-average power ratio (PAPR) while preserving the achievable rate. By exploiting the fact that precoder matrices lie on the Grassmann manifold, we formulate a codebook design problem that enables sparsification without modifying the existing feedback mechanism. We develop two sparsification approaches, namely exact sparsification via unitary transformation and approximate sparsification via sparse principal component analysis, and integrate them into a unified design algorithm. The proposed sparsified codebooks incur negligible performance loss while reducing PAPR by more than 1 dB in uplink scenarios.
Modern UAV architectures increasingly aim to unify high-level autonomy and low-level flight control on a single General-Purpose Operating System (GPOS). However, complex multi-core System-on-Chips (SoCs) introduce significant timing indeterminism due to shared resource contention. This paper performs an architectural analysis of the PREEMPT RT Linux kernel on a Raspberry Pi 5, specifically isolating the impact of kernel activation paths (deferred execution SoftIRQs versus real-time direct activation) on a 250 Hz control loop. Results show that under heavy stress, the standard kernel is unsuitable, exhibiting worst-case latencies exceeding 9 ms. In contrast, PREEMPT RT reduced the worst-case latency by nearly 88 percent to under 225 microseconds, enforcing a direct wake-up path that mitigates OS noise. These findings demonstrate that while PREEMPT RT resolves scheduling variance, the residual jitter on modern SoCs is primarily driven by hardware memory contention.
Recent advances in Text-To-Speech (TTS) synthesis have seen the popularity of multi-stage approaches that first predict semantic tokens and then generate acoustic tokens. In this paper, we extend the coarse-to-fine generation paradigm to the temporal domain and introduce Chain-of-Details (CoD), a novel framework that explicitly models temporal coarse-to-fine dynamics in speech generation using a cascaded architecture. Our method progressively refines temporal details across multiple stages, with each stage targeting a specific temporal granularity. All temporal detail predictions are performed using a shared decoder, enabling efficient parameter utilization across different temporal resolutions. Notably, we observe that the lowest detail level naturally performs phonetic planning without the need for an explicit phoneme duration predictor. We evaluate our method on several datasets and compare it against several baselines. Experimental results show that CoD achieves competitive performance with significantly fewer parameters than existing approaches. Our findings demonstrate that explicit modeling of temporal dynamics with the CoD framework leads to more natural speech synthesis.
This study investigates subarray-level movable antenna (MA) architecture for multi-user MIMO (MU-MIMO) systems. Unlike conventional systems with fixed-position antennas (FPAs), the proposed scheme harnesses the additional positional degrees of freedom (DoFs) of movable subarrays to enhance spatial multiplexing capabilities for both multi-user and multi-stream communications. Our objective is to maximize the overall system spectral efficiency by jointly optimizing the hybrid beamforming design and the positions of all subarrays. To tackle this challenging non-convex optimization problem, we first adopt a block diagonalization (BD) based digital precoder to effectively eliminate multi-user interference. Subsequently, the joint optimization of the analog beamformer and the subarray positions is efficiently solved using a sequential interference cancellation (SIC)-based algorithm. Simulation results demonstrate that the proposed SIC-MA method significantly outperforms the benchmark SIC-FPA scheme where subarrays are fixed.
Reconfigurable intelligent surfaces (RISs) have recently attracted interest for non-terrestrial networks (NTNs), especially for improving satellite communication performance. However, RIS-assisted urban NTN designs that jointly support reliable communication and user positioning under blockage, while maintaining low online complexity, remain limited. This paper proposes a blockage-aware and shadowing-aware RIS-assisted framework for joint communication and positioning in an urban low-Earth-orbit (LEO) satellite downlink. A terrestrial RIS is used both to reinforce the blockage-sensitive satellite--user link and to create an additional reflected path that enhances delay-domain positioning observability. We develop a reduced two-dimensional positioning model based on the direct-path delay and the RIS-assisted excess delay, and combine the resulting position error bound (PEB) with the received signal-to-noise ratio (SNR) into a unified utility. A blockage-aware three-mode policy then adapts RIS operation among communication-oriented, balanced, and positioning-oriented modes according to the direct-link condition. To improve robustness, spatially correlated RIS--user shadowing is tracked across coherence blocks using a state-space model and a scalar Kalman filter, and the filtered estimate is used in a robust codebook-based RIS selection strategy with low online complexity. Numerical results show that the proposed framework provides a controllable SNR--PEB tradeoff, improves positioning accuracy while maintaining competitive SNR, stabilizes codeword selection under shadowing uncertainty, and increases joint success probability with RIS size and phase resolution, with diminishing returns at high hardware complexity.
In recent years, computational power and data availability breakthroughs have revolutionized our ability to analyze complex physical systems through the inverse problem approach. Data-driven techniques like system identification and machine learning play an important role in this field, allowing us to gain insights into previously inaccessible phenomena. However, a major hurdle remains: How can meaningful information from partial measurements be extracted? In the aerospace domain, the challenge of state estimation is particularly pronounced due to the limited availability of observational data and the constraints imposed by sensor capabilities for tracking resident space objects (RSOs). To address these limitations, advanced compensation methodologies are required. Currently, range and bearing measurements obtained from radar and optical systems constitute the primary observational tools in the space situational awareness (SSA) community. In this work, we propose a novel framework that integrates a simplified reference dynamics model with a data-driven surrogate measurement model. This fusion process leverages the strengths of both models to estimate complex dynamical behaviors under conditions of partial observability. Extensive numerical experiments were conducted across multiple datasets to validate the proposed framework. The results demonstrate its efficacy in accurately reconstructing system dynamics from incomplete measurement data. Furthermore, to ensure the robustness of the framework, an initial consistency analysis of the surrogate modeling approach is presented. By addressing the current challenges and refining the integration of data-driven techniques with traditional physics-based modeling, this framework aims to advance state estimation methodologies in the aerospace sector.
We propose a robust nonlinear model predictive control (MPC) scheme for trajectory-tracking control of autonomous vehicles at the limits of handling on non-planar road surfaces. We derive the dynamics from first principles and selectively omit terms with negligible dynamic influence to maintain real-time capability. The resulting MPC with a three-dimensional (3D) dynamic single-track model integrates relevant dynamic effects directly into the prediction model and leverages them to improve prediction accuracy and therefore control performance. Even if the influence of terrain-induced vertical loads on the total acceleration potential is modeled, tire-road interactions are subject to uncertainty and disturbance. The uncertainty-aware constraint tightening scheme introduces a margin to constraint bounds to keep the vehicle controllable and stable in this environment. To validate our proposed approach, we perform high-fidelity dynamic double-track vehicle dynamics simulations on a model of a real circuit. We find that our algorithm can improve trajectory-tracking accuracy while maintaining low computation times.
We present a physics-driven framework for accurate evaluation of discrete spectral bands using a low-cost multispectral setup built from off-the-shelf RGB cameras and narrow multi-band optical filters. The approach starts by explicitly formulating a linear measurement model. The camera responses are expressed as linear mixtures of unknown spectral components, with mixing coefficients determined by the overlap between the camera spectral sensitivities and the filter transmittances. For a multi-camera configuration, the per-camera models are stacked into a single global system whose structure is fully determined by the allocation of target wavelengths across the camera--filter units. We pose wavelength allocation as a deterministic design problem and select the configuration that minimizes the spectral condition number of the resulting system matrix. Guided by a frame-theoretic interpretation, this criterion promotes numerical stability, maximizes worst-case output signal-to-noise ratio, and improves the robustness of spectral reconstruction. The design space is finite, enabling the evaluation of all feasible configurations under practical constraints. We demonstrate the method on a representative example with 12 target wavelengths and four triband filters, and identify the wavelength allocation that yields the most stable and noise-robust recovery. The proposed framework includes redundant configurations, in which individual wavelengths are measured by multiple cameras, thereby providing additional degrees of freedom that further improve noise robustness.
Reliable harmonization of heterogeneous magnetic resonance~(MR) image datasets, especially those acquired in pragmatic clinical trials, is critical to advance multi-center neuroimaging studies and translational machine learning in healthcare. We present an enhanced and rigorously validated version of the HACA3 harmonization algorithm, which we refer to as HACA3$^+$, incorporating key methodological enhancements: (1)~an improved artifact encoder to better isolate and mitigate image artifacts, (2)~background and foreground-sensitive attention mechanisms to increase harmonization specificity, and (3)~extensive training using data spanning 100+ scanners from 64 independent sites, providing a broader diversity of scanners than other harmonization methods. Our study focuses on four commonly acquired MR image contrasts (T1-weighted, T2-weighted, proton density, \& fluid-attenuated inversion recovery), reflecting realistic clinical protocols. We perform inter-site harmonization experiments using traveling subjects to assess the generalization and robustness of the harmonization model. We compare the results of the publicly available version of HACA3 and our implementation, HACA3$^+$. Downstream relevance is further established through whole brain segmentation and image imputation. Finally, we justify each enhancement through an ablation experiment. Pre-trained weights and code for HACA3$^+$ are made publicly available at this https URL.
Clinicians lack a principled framework to quantify diagnostic utility in ultrasound reconstructions. Existing standards like PSNR and VGG-LPIPS are inadequate, failing to account for modality-specific physics or the structural nuances of acoustic imaging. We close this gap with a TinyUSFM-based evaluation framework featuring two distinct metrics: TinyUSFM-uLPIPS, a full-reference perceptual distance based on multi-layer token relations, and TinyUSFM-NRQ, a deployable no-reference quality score utilizing clean-manifold modeling and worst-region aggregation to detect localized harmful artifacts. We demonstrate that the presented metrics have four unique advantages: 1) Task-linked quality, where TinyUSFM-uLPIPS achieves superior calibration with semantic task damage, accurately reflecting Dice-score drops in segmentation where VGG-based metrics fail; 2) Cross-organ comparability, maintaining stable scoring scales and consistent severity rankings across diverse anatomical sites and domain-shifted data; 3) PSNR-consistent sensitivity, with TinyUSFM-NRQ providing a reliable quality score without ground-truth images that remains consistent with traditional fidelity benchmarks (i.e. PSNR); and 4) Clinical utility, improving the prediction of expert preference from 47.2$\%$ to 72.8$\%$ accuracy and producing super-resolution reconstructions preferred by sonographers. By integrating these advantages into a unified assessment and optimization loop, this work establishes a modality-aligned standard that finally bridges the gap between algorithmic performance and diagnostic utility. this https URL
In this paper an algorithm for transient simulation of switching converters using prediction and correction to calculate duty ratio is proposed. It provides large signal simulation on the level of averaged currents and voltages in the circuit. Calculation of duty ratio using inductor current and capacitor voltage prediction and correction do not require their priori knowledge. Number of circuit solving per switching period is fixed and equal to two. Using this algorithm various of constant frequency regulated switching converters can be simulated. Due to predetermined circuit values convergence problems are avoided. This algorithm results in very fast and accurate large signal simulation.
Wireless agentic systems enable agents to autonomously perceive, reason, and act. However, existing works neglect the tight coupling between sensing and control in closed-loop integrated sensing and communication (ISAC) systems. In this paper, we propose an active inference (AIF)-driven wireless agentic system for closed-loop ISAC, which jointly optimizes control and sensing resource allocation via backward--forward message passing on a factor graph. The AIF agent maintains a generative model as a digital twin by integrating a localization model for uncertainty-aware state inference and a localization channel knowledge map (CKM) for approximating observation quality during planning. Simulation results demonstrate that the AIF-enabled agent adaptively allocates sensing resources based on spatially varying channel conditions, achieving superior balance among tracking accuracy, control effort, and sensing resource consumption over baseline strategies.
A notable difference between the ordinary and Hadamard products is that the Hadamard product of two singular positive semidefinite matrices can be nonsingular, and one of the factors can even be indefinite. We present an eigenvalue lower bound for a Hadamard product that depends on the rank, effective condition number, and diagonal entries of one factor, and the smallest eigenvalues of certain principal submatrices of the other factor. We give numerical examples and discuss its applications in array signal processing and matrix time series analysis.
Chemical sensing in real-world environments requires resolving rapidly fluctuating and spatially heterogeneous concentration fields. However, these dynamics are strongly distorted by widely used, low-cost metal-oxide (MOx) gas sensors, whose thermal and surface-kinetic response acts as a low-pass filter on the underlying concentration signal. Quantifying and compensating for these effects remains challenging, largely due to the lack of benchmark datasets that simultaneously capture the spatiotemporal structure of turbulent odour fields and the time-resolved response of point sensors. Here, we present a dataset combining planar laser-induced fluorescence (PLIF) measurements of an acetone tracer plume with synchronised recordings from a custom, kilohertz-rate microelectromechanical (MEMS) MOx electronic nose deployed in a laboratory wind tunnel. The PLIF system provides quantitative, two-dimensional concentration fields at high spatial and temporal resolution, while the co-located e-nose records film resistance, heater currents, and environmental parameters with aligned timestamps. The dataset enables quantitative assessment of sensor dynamics, development and benchmarking of reconstruction and deconvolution algorithms, and data-driven modelling of plume structure. All recordings, metadata, calibration files, and example analysis scripts are released in open, platform-independent formats. Together, these provide a valuable reference for researchers working in odour-guided robotics, environmental monitoring, computational fluid dynamics, and neuromorphic sensing, supporting the design and evaluation of high-speed odour-sensing systems.
Modern world models are becoming too complex to admit explicit dynamical descriptions. We study safety-critical contextual control, where a Planner must optimize a task objective using only feasibility samples from a black-box Simulator, conditioned on a context signal $\xi_t$. We develop a sample-based Penalized Predictive Control (PPC) framework grounded in online Riemannian optimization, in which the Simulator compresses the feasibility manifold into a score-based density $\hat{p}(u \mid \xi_t)$ that endows the action space with a Riemannian geometry guiding the Planner's gradient descent. The barrier curvature $\kappa(\xi_t)$, the minimum curvature of the conditional log-density $-\ln\hat{p}(\cdot\mid\xi_t)$, governs both convergence rate and safety margin, replacing the Lipschitz constant of the unknown dynamics. Our main result is a contextual safety bound showing that the distance from the true feasibility manifold is controlled by the score estimation error and a ratio that depends on $\kappa(\xi_t)$, both of which improve with richer context. Simulations on a dynamic navigation task confirm that contextual PPC substantially outperforms marginal and frozen density models, with the advantage growing after environment shifts.
This paper presents a sensing management frame- work for integrated sensing and communications (ISAC) within cell-free massive multiple-input multiple-output (MIMO) systems to reduce pilot-based channel state information (CSI) acquisition overhead. Conventional communication systems rely on frequent channel estimation procedures that impose significant signaling overhead, consuming valuable time-frequency resources. To ad- dress this inefficiency, we propose a state-based architecture that partitions users into communication and sensing groups based on service requirements. When users are not requesting data, the system utilizes sensing capabilities to track their location. Upon receiving a communication request, the system transitions to communication mode, leveraging the tracked state for predictive beamforming to eliminate the need for uplink pilot training. We develop an extended Kalman filter (EKF) based tracking algorithm coupled with adaptive resource allocation strategies. Furthermore, we analyze the impact of inter-target interference and design a sensing management protocol that performs sensing operations only when necessary to maintain the accuracy of user location estimates. Simulation results demonstrate that the pro- posed EKF-based tracking and sensing management can support predictive beamforming with downlink spectral efficiency close to the perfect-CSI case, while requiring sensing only occasionally after an initial convergence period. The results also indicate that this performance is robust in a cell-free massive MIMO setup and can be achieved with practical sensing waveforms.
Reproducibility, traceability, and transparency in testing cyber-physical energy systems are crucial for scientific advancement and cross-laboratory collaboration. Current experimentation and test documentation practices lack formal semantics, making it difficult to reproduce experiments, share data, and apply, for example, the artificial intelligence-driven analysis. A dataspace that relies on structured ontologies aims to address these gaps by providing machine-actionable descriptions. In this work, we outline an ontology-driven approach for reproducibility of cyber-physical energy systems testing and illustrate its applicability through representative cross-laboratory use cases, demonstrating feasibility while identifying remaining semantic and metadata gaps that limit reproducibility. Based on these observations, we propose an open three-viewpoint ontology framework to guide future ontology extensions.
This paper considers a networked tracking architecture in 6G integrated sensing and communication (ISAC) systems, where multiple base stations (BSs) cooperatively transmit radio signals and process received echo signals to track multiple moving targets. Compared to the single-BS counterpart, networked tracking allows the moving targets to be associated with different BSs over time such that the wireless resources can be dynamically allocated among BSs based on target locations. However, networked tracking imposes new challenges for algorithm design and resource allocation. In this paper, we first design the networked Kalman Filter (NKF) that is suitable for multi-BS based tracking, then characterize the posterior Cramer-Rao bound (PCRB) under this NKF, and last design the beamforming vectors of all the BSs to minimize the tracking PCRB. Numerical results show that our dynamic beamforming design can properly associate the targets to the suitable BSs at various sensing blocks and reduce the tracking mean-squared error (MSE).
Challenging indoor and urban environments with severe multipath propagation and obstructed LoS (OLoS) degrade classical radio frequency (RF) positioning. Multipath-based simultaneous localization and mapping (MP-SLAM) is a promising remedy, building and exploiting a map of the propagation environment to enhance the robustness. Emerging distributed multiple-input multiple-output (D-MIMO)/extremely large-scale MIMO (XL-MIMO) infrastructures, with single XL antenna arrays or distributed subarrays, offer large spatial apertures and enable high-resolution sensing, in particular when phase coherence is maintained across base stations (BSs), subarrays, or distributed arrays. In this work, we propose a scalable Bayesian direct MP-SLAM method for coherent data fusion in D-MIMO/XL-MIMO systems that jointly infers the environment while performing robust, high-accuracy localization directly from raw RF signals. The key idea is a phase-preserving nonzero-mean Type-II likelihood function in which a complex mean is shared across BSs or subarrays and enables coherent fusion, while the variance captures noncoherent signal power. The likelihood function is combined with a surface feature vector (SFV)-based model that enables map feature fusion across the distributed infrastructure and supports near-field propagation and visibility effects. A GPU-parallel implementation enables highly scalable processing across a distributed infrastructure and particles, possibly allowing real-time calculations for large antenna arrays. Simulation results demonstrate performance gains over existing noncoherent methods and approach the corresponding posterior CRLB (PCRLB), highlighting the potential of coherent distributed arrays for high-resolution sensing and localization.
Data informativity provides a theoretical foundation for determining whether collected data are sufficiently informative to achieve specific control objectives in data-driven control frameworks. In this study, we investigate the data informativity subject to noise characterized by quadratic matrix inequalities (QMIs), which describe constraints through matrix-valued quadratic functions. We introduce a generalized noise model, referred to as data perturbation, under which we derive necessary and sufficient conditions formulated as tractable linear matrix inequalities for data informativity with respect to stabilization and performance guarantees via state feedback, as well as stabilization via output feedback. Our proposed framework encompasses and extends existing analyses that consider exogenous disturbances and measurement noise, while also relaxing several restrictive assumptions commonly made in prior work. A central challenge in the data perturbation setting arises from the non-convexity of the set of systems consistent with the data, which renders standard matrix S-procedure techniques inapplicable. To resolve this issue, we develop a novel matrix S-procedure that does not rely on convexity of the system set by exploiting geometric properties of QMI solution sets. Furthermore, we derive sufficient conditions for data informativity in the presence of multiple noise sources by approximating the combined noise effect through the QMI framework. The proposed results are broadly applicable to a wide class of noise models and subsume several existing methodologies as special cases.
Matrix ellipsoids provide a standard framework for representing bounded uncertainties in data-driven control. Since noise models for sequential observations are naturally represented as the Minkowski sum of multiple matrix ellipsoids, applying existing robust control methods, which typically assume a single ellipsoidal set, requires a tight outer approximation. While techniques based on linear matrix inequalities (LMI) are applicable, their computational cost grows quadratically with the data length, limiting their scalability. This paper investigates the optimal outer approximation problem under two criteria: the sum of squared semi-axes and the volume. We propose an LMI-free approach by introducing a parameterized family of bounding matrix ellipsoids. Specifically, we derive an exact analytical solution for the first criterion and develop an efficient majorization-minimization (MM) algorithm for the second. The proposed MM algorithm employs a first-order approximation of the log-determinant function to provide closed-form update rules, ensuring monotonic convergence to the set of stationary points. Numerical experiments demonstrate that our method offers significantly higher computational efficiency and scalability than standard interior-point solvers.
This paper presents a data-driven algorithm for simultaneous system identification and parameter estimation in control-affine nonlinear systems. Parameter estimation is achieved by training a data-driven predictive model using state-action measurements and various known values at the parameters of interest. The predictive model is then used in conjunction with state-action data corresponding to unknown values of the parameters to estimate the said unknown value. Numerical experiments on the controlled Duffing oscillator with unknown damping, stiffness, and nonlinearity coefficients demonstrate accurate recovery of both the system trajectories and the unknown parameter values from data collected under open-loop excitation.
Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. To address these issues, we propose a framework for structured sparse nonnegative low-rank factorization with detection probability estimation. We impose nonconvex $\ell_{1/2}$ regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. The resulting optimization problem is nonconvex and nonsmooth. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines.
Large language models (LLMs) have enabled natural-language-driven automation of electronic design automation (EDA) workflows, but reliable execution of generated scripts remains a fundamental challenge. In LLM-based EDA tasks, failures arise not from syntax errors but from violations of implicit structural dependencies over design objects, including invalid acquisition paths, missing prerequisites, and incompatible API usage. Existing approaches address these failures through tool-in-the-loop debugging, repeatedly executing and repairing programs using runtime feedback. While effective, this paradigm couples correctness to repeated tool invocation, leading to high latency and poor scalability in multi-step settings. We propose to eliminate tool-in-the-loop debugging by enforcing structural correctness prior to execution. Each task is represented as a structural dependency graph that serves as an explicit execution contract, and a verifier-guided synthesis framework enforces this contract through graph-conditioned retrieval, constrained generation, and staged pre-execution verification with diagnosis-driven repair. On single-step tasks, our method improves pass rate from 73.0% (LLM+RAG) and 76.0% (tool-in-loop) to 82.5%, while requiring exactly one tool call per task and reducing total tool calls by more than 2x. On multi-step tasks, pass rate improves from 30.0% to 70.0%, and further to 84.0% with trajectory-level reflection. Uncertainty-aware filtering further reduces verifier false positives from 20.0% to 6.7% and improves precision from 80.0% to 93.3%. These results show that enforcing structural consistency prior to execution decouples correctness from tool interaction, improving both reliability and efficiency in long-horizon EDA code generation.
Reduced-order models are powerful for analyzing and controlling high-dimensional dynamical systems. Yet constructing these models for complex hybrid systems such as legged robots remains challenging. Classical approaches rely on hand-designed template models (e.g., LIP, SLIP), which, though insightful, only approximate the underlying dynamics. In contrast, data-driven methods can extract more accurate low-dimensional representations, but it remains unclear when stability and safety properties observed in the latent space meaningfully transfer back to the full-order system. To bridge this gap, we introduce HALO (Hybrid Auto-encoded Locomotion), a framework for learning latent reduced-order models of periodic hybrid dynamics directly from trajectory data. HALO employs an autoencoder to identify a low-dimensional latent state together with a learned latent Poincaré map that captures step-to-step locomotion dynamics. This enables Lyapunov analysis and the construction of an associated region of attraction in the latent space, both of which can be lifted back to the full-order state space through the decoder. Experiments on a simulated hopping robot and full-body humanoid locomotion demonstrate that HALO yields low-dimensional models that retain meaningful stability structure and predict full-order region-of-attraction boundaries.
Social learning networks (SLNs) are graphical representations that capture student interactions within educational settings (e.g., a classroom), with nodes representing students and edges denoting interactions. Accurately predicting future interactions in these networks (i.e., link prediction) is crucial for enabling effective collaborative learning, supporting timely instructional interventions, and informing the design of effective group-based learning activities. However, traditional link prediction approaches are typically tuned to general online social networks (OSNs), often overlooking the complex, non-Euclidean, and dynamically evolving structure of SLNs, thus limiting their effectiveness in educational settings. In this work, we propose a graph neural network (GNN) framework that jointly considers the temporal evolution within classrooms and spatial aggregation across classrooms to perform link prediction in SLNs. Specifically, we analyze link prediction performance of GNNs over the SLNs of four distinct classrooms across their (i) temporal evolutions (varying time instances), (ii) spatial aggregations (joint SLN analysis), and (iii) varying spatial aggregations at varying temporal evolutions throughout the course. Our results indicate statistically significant performance improvements in the prediction of future links as the courses progress temporally. Aggregating SLNs from multiple classrooms generally enhances model performance as well, especially in sparser datasets. Moreover, we find that jointly leveraging both the temporal evolution and spatial aggregation of SLNs significantly outperforms conventional baseline approaches that analyze classrooms in isolation. Our findings demonstrate the efficacy of educationally meaningful link predictions, with direct implications for early-course decision-making and scalable learning analytics in and across classroom settings.
This study presents a mathematical optimization framework and preliminary analysis for long-term investment planning in Puerto Rico's electric power system. We develop a high-resolution capacity expansion model to identify least-cost generation and storage investments that improve system reliability. The model co-optimizes new investments and thermal generator retirements while representing generator dispatch, unit commitment, fuel selection, and storage operations under constraints of equipment engineering limits, fuel supply limitations, and load satisfaction. Key methodological advances relative to prior long-term planning studies for Puerto Rico include: (i) nodal transmission modeling at 38 kV and above, (ii) hourly chronological operations for representative days, (iii) explicit unit commitment for existing and new thermal units with realistic ramping, minimum up and down times, and startup costs, (iv) system-wide fuel supply constraints, and (v) stochastic operating scenarios reflecting load variation, renewable availability, and the high forced outage rates of legacy units. Using data from LUMA, PREPA, DOE, and public sources, we build present-day (2024) and future (2030) test systems, with the latter including planned generation and storage projects. We evaluate planning scenarios that vary future load, fuel supply assumptions, realization of planned expansion, and allowable new technologies. Results show that, given the recent relaxation of interim renewable goals for the near future in Puerto Rico, an optimal portfolio includes at least 1.5 GW of new H-class combined cycle capacity beyond planned projects. These additions are needed mainly to replace unreliable legacy thermal units rather than to serve new load. The new combined cycle units eliminate modeled bulk-system load shedding and restore a strong reserve margin, even under stressed load and outage conditions.
Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure of transformer blocks, layer-wise dynamics across multiple LLM architectures and scales are well-approximated by locally-linear models. Exploiting this property, we model LLM inference as a linear time-varying dynamical system and adapt the classical linear quadratic regulator to compute feedback controllers using layer-wise Jacobians, steering activations toward desired semantic setpoints in closed-loop with minimal computational overhead and no offline training. We also derive theoretical bounds on setpoint tracking error, enabling formal guarantees on steering performance. Using a novel adaptive semantic feature setpoint signal, our method yields robust, fine-grained behavior control across models, scales, and tasks, including state-of-the-art modulation of toxicity, truthfulness, refusal, and arbitrary concepts, surpassing baseline steering methods. Our code is available at: this https URL
We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.
The simulation of a physical system in a virtual replica, known as a digital twin, is a useful way to interrogate the system non-invasively, providing the ability to perform predictive maintenance and surveillance, and to investigate potential novel configurations without perturbing the system. This article presents the implementation of an auto-generating digital twin architecture for particle accelerators: a virtual control system is generated to mirror the physical accelerator hardware, and used to update a simulation model which then feeds back the results into virtual diagnostics. All of the information about the accelerator lattice is cascaded down from a ground source of truth, removing any ambiguity about the naming of parameters between the simulation model and the virtual hardware. This design is modular and extensible, allowing researchers from different institutions to use their own models (for example, a machine learning model) and accelerator lattices while maintaining the overall structural coherence of the digital twin. This architecture has been tested for three accelerator facilities \textendash~CLARA, the ISIS injector, and the proposed UK XFEL \textendash~and aims to provide the foundation for a collaborative community effort in the development of shared technology towards a generic digital twin solution.
Classical information theory typically assumes reliable receiver-side processing. We study remote inference when communication is noisy and the receiver itself is built from unreliable components under a finite redundancy budget. Under a committed/no-bypass receiver closure, task-relevant information can affect the final estimate only by passing through a budgeted collection of vulnerable primitives unless an explicit protected bypass is modeled. Modeling each vulnerable primitive as a memoryless noisy channel yields a baseline supply--demand converse: the task-relevant information needed to attain a target distortion cannot exceed the smaller of the total information supplied by the communication channel and the total information supplied by the vulnerable compute budget. Our main converse shows that committed intermediate interfaces create additional first-order serial cuts and receiver-internal computation-graph cuts, captured in general by a receiver-internal compute min-cut converse. In particular, the twofold loss in the symmetric two-stage hard-separation special case is not inherent to unreliable receiver computation but induced by hard-separation under the committed/no-bypass closure. This extra first-order tax is therefore closure-dependent rather than universal. On the converse side, if downstream modules retain soft visibility to the raw channel output, the converse reduces to the single-bottleneck supply, up to any explicitly reserved soft-path budget. Under a separate stronger protected-support closure with reliable decoder and control support, we establish achievability results for task-direct and serial hard-separation constructions. For the fully noisy-logic regime, we obtain only a conservative depth-dependent converse, and matched achievability remains open.
Recent work in the machine learning literature has demonstrated that deep learning can train neural networks made of discrete logic gate functions to perform simple image classification tasks at very high speeds on CPU, GPU and FPGA platforms. By virtue of being formed by discrete logic gates, these Differentiable Logic Gate Networks (DLGNs) lend themselves naturally to implementation in custom silicon - in this work we present a method to map DLGNs in a one-to-one fashion to a digital CMOS standard cell library by converting the trained model to a gate-level netlist. We also propose a novel loss function whereby the DLGN can optimize the area, and indirectly power consumption, of the resulting circuit by minimizing the expected area per neuron based on the area of the standard cells in the target standard cell library. Finally, we also show for the first time an implementation of a DLGN as a silicon circuit in simulation, performing layout of a DLGN in the SkyWater 130nm process as a custom hard macro using a Cadence standard cell library and performing post-layout power analysis. We find that our custom macro can perform classification on MNIST with 97% accuracy 41.8 million times a second at a power consumption of 83.88 mW.
Ultra-low-power (ULP) IoT applications demand communication architectures with minimal energy consumption. Noise Modulation (NoiseMod) addresses this by encoding data through the statistical variance of a noise-like signal, eliminating the need for a coherent carrier. To bridge the gap between theoretical potential and practical deployment, this paper benchmarks NoiseMod against standard modulations like BPSK and NC-FSK. We analytically derive the optimal detection threshold and Bit Error Rate (BER) for AWGN and Rayleigh fading channels. Our results show that non-coherent NoiseMod suffers a catastrophic error floor in fading environments, making architectural additions like 2-antenna selection diversity mandatory. Using an ADC-aware energy model, we reveal that NoiseMod's oversampling severely bottlenecks capacity and imposes an 8 dB SNR penalty compared to NC-FSK for a $10^{-3}$ BER in AWGN. Despite its oscillator-free design drastically reducing baseline circuit power, these limitations establish a critical energy crossover distance, which decreases with frequency. Below this distance, NoiseMod offers superior energy efficiency; beyond it, the radiated power needed to overcome its SNR penalty makes coherent schemes like BPSK vastly superior.
Wireless links deployed in orchards often exhibit significant variability in the strength of the received signal that is not adequately captured by classical distance-based propagation models. In row-structured olive groves, signal attenuation differs markedly between along-row and cross-row propagation directions, leading to discrepancies when using omnidirectional propagation assumptions such as those adopted in the Free Space Path Loss (FSPL) model or ITU-R vegetation loss formulations. This paper proposes a topology-based propagation model that explicitly accounts for orchard layout and the relative positions of radio devices within the plantation structure. Experimental validation was conducted using LoRa technology operating at 868 MHz, and the results were compared with established models from the literature and with the proposed two-dimensional model. The proposed approach achieves a closer fit to measured RSSI data than conventional models, providing a more reliable basis for link budgeting and network planning in structured agricultural environments.
Unknown payloads can strongly affect compliant robotic manipulation, especially when the payload center of mass is not aligned with the tool center point. In this case, the payload generates an offset wrench at the robot wrist. During motion, this wrench is not only related to payload weight, but also to payload inertia. If it is not modeled, the compliant controller can interpret it as an external interaction wrench, which causes unintended compliant motion, larger tracking error, and reduced transport accuracy. This paper presents a wrench-aware admittance control framework for unknown-payload pick-and-place using a UR5e robot. The method uses force-torque measurements in two different roles. First, a three-axis translational excitation term is used to reduce payload-induced force effects during transport without making the robot excessively stiff. Second, after grasping, the controller first estimates payload mass for transport compensation and then estimates the payload CoM offset relative to the TCP using wrist force-torque measurements collected during the subsequent translational motion. This helps improve object placement and stacking behavior. Experimental results show improved transport and placement performance compared with uncorrected placement while preserving compliant motion.
Decentralized optimization enables multiple devices to learn a global machine learning model while each individual device only has access to its local dataset. By avoiding the need for training data to leave individual users' devices, it enhances privacy and scalability compared to conventional centralized learning, where all data has to be aggregated to a central server. However, decentralized optimization has traditionally been viewed as a necessary compromise, used only when centralized processing is impractical due to communication constraints or data privacy concerns. In this study, we show that decentralization can paradoxically accelerate convergence, outperforming centralized methods in the number of iterations needed to reach optimal solutions. Through examples in logistic regression and neural network training, we demonstrate that distributing data and computation across multiple agents can lead to faster learning than centralized approaches, even when each iteration is assumed to take the same amount of time, whether performed centrally on the full dataset or decentrally on local subsets. This finding challenges longstanding assumptions and reveals decentralization as a strategic advantage, offering new opportunities for more efficient optimization and machine learning.
Q-learning is one of the most fundamental algorithms in reinforcement learning. We analyze constant-stepsize Q-learning through a direct stochastic switching system representation. The key observation is that the Bellman maximization error can be represented exactly by a stochastic policy. Therefore, the Q-learning error admits a switched linear conditional-mean recursion with martingale-difference noise. The intrinsic drift rate is the joint spectral radius (JSR) of the direct switching family, which can be strictly smaller than the standard row-sum rate. Using this representation, we derive a finite-time final-iterate bound via a JSR-induced Lyapunov function and then give a computable quadratic-certificate version.
Edge-cloud hybrid inference offloads difficult inputs to a powerful remote model, but the uplink channel imposes hard per-request constraints on the number of bits that can be transmitted. We show that selecting transmitted content based solely on attention-based importance, the standard approach in collaborative inference, is inherently limited under hard budgets. Two findings support this claim. First, replacing high-importance units with low-importance but complementary ones improves server accuracy. This shows that what matters is not individual importance but how well the transmitted set covers diverse aspects of the input. Second, spatially uniform selection without any content information achieves competitive accuracy at moderate budgets. This confirms that spatial coverage alone carries independent value. Based on this analysis, we propose SAGE (Semantic Attention-Guided Evidence), a principled, training-free method that combines importance filtering with embedding-diversity sampling. SAGE achieves 93% of the server ceiling in offloaded accuracy while transmitting fewer than half of the available evidence units on ImageNet-1K, substantially outperforming importance-only composition.
Distribution networks with high penetration of Distributed Energy Resources (DERs) increasingly rely on communication networks to coordinate grid-interactive control. While many distributed control schemes have been proposed, they are often evaluated under idealized communication assumptions, making it difficult to assess their performance under realistic network conditions. This work presents an implementation-driven evaluation of a representative virtual power plant (VPP) dispatch algorithm using a co-simulation framework that couples a linearized distribution-system model with packet-level downlink emulation in ns-3. The study considers a modified IEEE~37-node feeder with high photovoltaic penetration and a primal--dual VPP dispatch that simultaneously targets feeder-head active power tracking and voltage regulation. Communication effects are introduced only on the downlink path carrying dual-variable updates, where per-DER packet delays and a hold-last-value strategy are modeled. Results show that, under ideal communication, the dispatch achieves close tracking of the feeder-head power reference while maintaining voltages within the prescribed limits at selected buses. When realistic downlink delay is introduced, the same controller exhibits large oscillations in feeder-head power and more frequent voltage limit violations. These findings highlight that distributed DER control performance can be strongly influenced by communication behavior and motivate evaluation frameworks that explicitly incorporate network dynamics into the assessment of grid-interactive control schemes.
Personalized Federated Learning (PFL) aims to learn multiple task-specific models rather than a single global model across heterogeneous data distributions. Existing PFL approaches typically rely on iterative optimization-such as model update trajectories-to cluster users that need to accomplish the same tasks together. However, these learning-dynamics-based methods are inherently vulnerable to low-quality data and noisy labels, as corrupted updates distort clustering decisions and degrade personalization performance. To tackle this, we propose FB-NLL, a feature-centric framework that decouples user clustering from iterative training dynamics. By exploiting the intrinsic heterogeneity of local feature spaces, FB-NLL characterizes each user through the spectral structure of the covariances of their feature representations and leverages subspace similarity to identify task-consistent user groupings. This geometry-aware clustering is label-agnostic and is performed in a one-shot manner prior to training, significantly reducing communication overhead and computational costs compared to iterative baselines. Complementing this, we introduce a feature-consistency-based detection and correction strategy to address noisy labels within clusters. By leveraging directional alignment in the learned feature space and assigning labels based on class-specific feature subspaces, our method mitigates corrupted supervision without requiring estimation of stochastic noise transition matrices. In addition, FB-NLL is model-independent and integrates seamlessly with existing noise-robust training techniques. Extensive experiments across diverse datasets and noise regimes demonstrate that our framework consistently outperforms state-of-the-art baselines in terms of average accuracy and performance stability.
This paper revisits a classical challenge in the design of stabilizing controllers for nonlinear systems with a norm-bounded input constraint. By extending Lin-Sontag's universal formula and introducing a generic (state-dependent) scaling term, a unifying controller design method is proposed. The incorporation of this generic scaling term gives a unified controller and enables the derivation of alternative universal formulas with various favorable properties, which makes it suitable for tailored control designs to meet specific requirements and provides versatility across different control scenarios. Additionally, we present a constructive approach to determine the optimal scaling term, leading to an explicit solution to an optimization problem, named optimization-based universal formula. The resulting controller ensures asymptotic stability, satisfies a norm-bounded input constraint, and optimizes a predefined cost function. Finally, the essential properties of the unified controllers are analyzed, including smoothness, continuity at the origin, stability margin, and inverse optimality. Simulations validate the approach, showcasing its effectiveness in addressing a challenging stabilizing control problem of a nonlinear system.
Executing flow estimation using Deep Learning (DL)-based soft sensors on resource-limited IoT devices has demonstrated promise in terms of reliability and energy efficiency. However, its application in the field of wastewater flow estimation remains underexplored due to: (1) a lack of available datasets, (2) inconvenient toolchains for on-device AI model development and deployment, and (3) hardware platforms designed for general DL purposes rather than being optimized for energy-efficient soft sensor applications. This study addresses these gaps by proposing an automated, end-to-end solution for wastewater flow estimation using a prototype IoT device.
Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for dynamically reshaping wireless propagation, enhancing coverage and mitigating blockages to enable more pervasive network connectivity. However, implementing RISs at high frequencies remains challenging due to the cost and power demands of semiconductor-based components. To address these critical limitations, liquid crystals (LCs) technology has been identified as a promising low-cost and low-power alternative, giving rise to LC-RIS. The central challenge of this technology, however, lies in its limited responsiveness, as the slow molecular dynamics of LCs lead to long phase-shift reconfiguration times that restrict practicality. This paper presents LiquiRIS, a novel framework that enables substantially faster phase shifting in LC-RIS. By explicitly incorporating the physical dynamics of LC molecules into the phase-shift configuration process, LiquiRIS intelligently selects phase transitions that minimize the overall reconfiguration time. As a result, LiquiRIS achieves up to $ 71.61 \% $ reduction in overall reconfiguration time compared to conventional schemes, significantly improving the feasibility of LC-RIS deployment. The proposed framework is further validated through experiments on a mmWave LC-RIS prototype.
This paper considers constrained linear dynamic games with quadratic objective functions, which can be cast as affine variational inequalities. By leveraging the problem structure, we apply the Douglas-Rachford splitting, which generates a solution algorithm with linear convergence rate. The fast convergence of the method enables receding-horizon control architectures. Furthermore, we demonstrate that {the associated VI admits a closed-form solution within a neighborhood of the attractor, thus allowing for a further reduction in computation time.} Finally, we benchmark the proposed method via numerical experiments in an automated driving application.
Recent advances in text-to-speech (TTS) have been driven by large, multi-domain speech corpora, yet the expressive potential of audiobook data remains underexamined. We argue that human-narrated audiobooks, particularly fictional works, contain rich and diverse prosodic cues arising from the natural alternation between neutral narration and expressive character dialogue. Building from this observation, we introduce LibriQuote, a large-scale 5.3K hours of expressive speech drawn from character quotations. Each quote is supplemented with contextual pseudo-labels for speech verbs and adverbs that characterize the intended delivery of direct speech (e.g., "he whispered softly"). We found that fine-tuning a flow-matching model on LibriQuote yields substantial improvements in expressivity and intelligibility, while training from scratch enhances expressiveness of an autoregressive TTS model. Benchmarking on LibriQuote-test highlights significant variability across systems in generating expressive speech. We publicly release the dataset, code, and evaluation resources to facilitate reproducibility. Audio samples can be found at this https URL.
The substantial communication resources consumed by conventional pilot-based channel sounding impose an unsustainable overhead, presenting a critical scalability challenge for the future 6G networks characterized by massive channel dimensions, ultra-wide bandwidth, and dense user deployments. As a generalization of radio map, channel knowledge map (CKM) offers a paradigm shift, enabling access to location-tagged channel information without exhaustive measurements. To fully utilize the power of CKM, this work highlights the necessity of leveraging three-dimensional (3D) environmental information, beyond conventional two-dimensional (2D) visual representations, to construct high-precision CKMs. Specifically, we present a novel framework that integrates 3D point clouds into CKM construction through a hybrid model- and data-driven approach, with extensive case studies in real-world scenarios. The experimental results demonstrate the potential for constructing precise CKMs based on 3D environments enhanced with semantic understanding, together with their applications in the next-generation wireless communications. We also release a real-world dataset of measured channel paired with high-resolution 3D environmental data to support future research and validation.
In multitemporal InSAR, phase linking (PL) refers to the estimation of a single-reference interferometric phase history for distributed scatterers (DS) from the information contained in the sample coherence matrix. Because the phase information in this matrix is typically inconsistent, DS processing needs practical reliability indicators to decide whether a pixel's PL estimate is sufficiently supported by the data for subsequent deformation analysis. For maximum-likelihood estimation, uncertainty can be quantified via Fisher-information-based covariance estimates, but no analogous, generally applicable uncertainty quantification is available for the broad range of non-ML methods. We propose three heuristic quality coefficients within a unified mathematical framework that covers common PL methods: (1) a method-specific goodness-of-fit coefficient that normalizes the achieved PL objective between a method-consistent upper bound and an empirically modeled noise floor level; (2) a closure phase coefficient computed from the sample coherence matrix in advance; and (3) an ambiguity coefficient that compares the obtained PL estimate with the best alternative in its orthogonal complement in the solution space. All coefficients are normalized to the interval $[0,1]$, where 1 indicates maximum reliability and 0 matches the behavior expected under pure noise. Simulations under exponential and seasonal decorrelation models show that the goodness-of-fit coefficient tracks the normalized absolute phase error most consistently, whereas the closure phase coefficient provides an a priori indicator for pre-screening. Experiments on a TerraSAR-X stack over Visp, Switzerland, reveal plausible spatial patterns across urban and vegetated areas and show that the ambiguity coefficient provides complementary information, especially in regions with temporally varying scattering mechanisms.
Running offers substantial health benefits, but improper gait patterns can lead to injuries, particularly without expert feedback. While prior gait analysis systems based on cameras, insoles, or body-mounted sensors have demonstrated effectiveness, they are often bulky and limited to offline, post-run analysis. Wrist-worn wearables offer a more practical and non-intrusive alternative, yet enabling real-time gait recognition on such devices remains challenging due to noisy Inertial Measurement Unit (IMU) signals, limited computing resources, and dependence on cloud connectivity. This paper introduces StrikeWatch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition using IMU signals. As a case study, we target the detection of heel versus forefoot strikes to enable runners to self-correct harmful gait patterns through visual and auditory feedback during running. We propose four compact DL architectures (1D-CNN, 1D-SepCNN, LSTM, and Transformer) and optimize them for energy-efficient inference on two representative embedded Field-Programmable Gate Arrays (FPGAs): the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. Using our custom-built hardware prototype, we collect a labeled dataset from outdoor running sessions and evaluate all models via a fully automated deployment pipeline. Our results reveal clear trade-offs between model complexity and hardware efficiency. Evaluated across 12 participants, 6-bit quantized 1D-SepCNN achieves the highest average F1 score of 0.847 while consuming just 0.350 microjoule per inference with a latency of 0.140 ms on the iCE40UP5K running at 20 MHz. This configuration supports up to 13.6 days of continuous inference on a 320 mAh battery. All datasets and code are available in the GitHub repository this https URL.
Diffusion-based generative models have greatly impacted the speech processing field in recent years, exhibiting high speech naturalness and spawning a new research direction. Their application in real-time communication is, however, still lagging behind due to their computation-heavy nature involving multiple calls of large DNNs. Here, we present Stream$.$FM, a frame-causal flow-based generative model with an algorithmic latency of 32 milliseconds (ms) and a total latency of 48 ms, paving the way for generative speech processing in real-time communication. We propose a buffered streaming inference scheme and an optimized DNN architecture, show how learned few-step numerical solvers can boost output quality at a fixed compute budget, explore model weight compression to find favorable points along a compute/quality tradeoff, and contribute a model variant with 24 ms total latency for the speech enhancement task. Our work looks beyond theoretical latencies, showing that high-quality streaming generative speech processing can be realized on consumer GPUs available today. Stream$.$FM can solve a variety of speech processing tasks in a streaming fashion: speech enhancement, dereverberation, codec post-filtering, bandwidth extension, STFT phase retrieval, and Mel vocoding. As we verify through comprehensive evaluations and a MUSHRA listening test, Stream$.$FM establishes a state-of-the-art for generative streaming speech restoration, exhibits only a reasonable reduction in quality compared to a non-streaming variant, and outperforms our recent work (Diffusion Buffer) on generative streaming speech enhancement while operating at a lower latency.
Inspired by the success of performing multiple local optimization steps between communication rounds in federated learning, incorporating such local updates into distributed optimization has recently attracted growing interest. However, unlike federated learning, where local updates can accelerate training by reducing gradient estimation error under minibatch settings, it remains unclear whether similar benefits persist when exact gradients are available. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. We also show that these speed gains depend critically on the network structure, with sparser or less connected graphs, characterized by the spectral properties of the mixing matrix, yielding smaller improvements. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.
Hyperspectral single image super-resolution (HS-SISR) aims to enhance the spatial resolution of hyperspectral images to fully exploit their spectral information. While considerable progress has been made in this field, most existing methods are supervised and require ground truth data for training-data that is often unavailable in practice. To overcome this limitation, we propose a novel unsupervised training framework for HS-SISR, based on synthetic abundance data, where no high-resolution ground-truth reference is required for training. The approach begins by unmixing the hyperspectral image into endmembers and abundances. A neural network is then trained to perform abundance super-resolution using synthetic abundances only. These synthetic abundance maps are generated from a dead leaves model whose characteristics are inherited from the low-resolution image to be super-resolved and from the known point spread function (PSF) of the hyperspectral sensor. This trained network is subsequently used to enhance the spatial resolution of the original image's abundances, and the final super-resolution hyperspectral image is reconstructed by combining them with the endmembers. Experimental results demonstrate both the training value of the synthetic data and the effectiveness of the proposed method across 3 datasets, 3 scaling factors, and several evaluation metrics. The code is available at this https URL
Purpose: To investigate whether an AI-based method can detect subtle inter-fraction changes in MR-Linac images acquired during radiotherapy and explore the broader potential of MRLinac imaging. Methods: This retrospective study included longitudinal 0.35T MR-Linac images from 761 patients. To identify temporal changes, we employed a deep learning model using temporal ordering via pairwise comparison, previously shown effective for longitudinal imaging studies. The model was trained using first-to-last fraction pairs (F1-FL) and all pairs (All-pairs). Performance was assessed using quantitative metrics (accuracy and AUC) and compared against a radiologist's performance. Qualitative evaluation was performed using saliency maps, which identify anatomical regions associated with temporal imaging changes. Results: The F1-FL model demonstrated high performance (AUC=0.99, accuracy=0.95) and outperformed the radiologist in temporal ordering task. The All-pairs model also showed high performance (AUC=0.97, accuracy=0.91). Regions contributing to predictions included the prostate, bladder, and pubic symphysis. The performance was correlated to fractional intervals and was reduced for non-radiation-exposed timepoints (Sim and F1), suggesting that observed changes may reflect both temporal variation and radiation exposure. Conclusion: MR-Linac imaging appears capable of capturing subtle changes during prostate radiotherapy that can be detected by AI models, even over approximately two-day intervals. The model's high performance, together with quantitative and qualitative analyses, supports a potential role for MR-Linac in clinical applications beyond image guidance.
The computation of positioning, navigation and timing (PNT) via signal of opportunity (SOP), where signals originally transmitted for communication, such as 5G, Wi-Fi, or DVB-S, are exploited due to their ubiquity and spectral characteristics, is an emerging research field. However, relying on these signals presents challenges, including limited knowledge of the signal modulation and the need to identify recurring sequences for correlation. We offer a guide to implement a receiver capable of capturing broadband downlink Ku-band signals from low Earth orbit (LEO) satellites (e.g., Starlink and OneWeb) and estimating the recurring symbols for SOP measurements. The methodology integrates recent approaches in the literature, highlighting the most effective aspects while guiding the replication of experiments even under limitations on the front-end gain and bandwidth. Using the proposed model, we can identify recurring symbols transmitted by Starlink satellites, which are then used to collect Doppler shift measurements over a 600 s interval. A position, velocity, and time (PVT) solution is also computed via least squares (LS), which achieves a positioning error of approximately 268 m after a post-fit refinement.
We propose a model-agnostic trustworthiness layer that equips any foundation model (FM) for power systems with statistically valid prediction intervals. The layer offers two calibration approaches: (i) stratified conformal prediction (SCP), which partitions residuals by contingency severity and grid element, and (ii) kernel-weighted conformal prediction (KCP), which localizes the calibration to each test scenario via scenario representations, yielding tighter, approximately conditional bounds. Using GridFM as a guiding example, we demonstrate the framework on N-k contingency screening for IEEE 24- and 118-bus systems. The trustworthiness layer ensures that over 90% of all critical violations are captured across N-k levels, minimizing missed detections while maintaining up to 5 times fewer false alarms than DC Power Flow. With negligible computational overhead over the underlying FM, this approach enables reliable large-scale security assessment beyond routine N-1 screening.
This paper proposes an adaptive modular geometric control framework for robotic manipulators. The proposed methodology decomposes the overall manipulator dynamics into individual modules, enabling the design of local geometric control laws at the module level. To address parametric uncertainties, geometric adaptation law is incorporated into the control structure, requiring only a single adaptation gain for the entire system while ensuring physically consistent and drift-free parameter estimates. Exponential stability of the proposed controller is established in the nominal case. Numerical simulations on a complex redundant robotic manipulator are conducted to evaluate the proposed approach against existing modular and geometric control methods. The results show that the proposed method reduces the RMS position error by at least 12.2% compared with state-of-the-art controllers under almost the same control effort. In addition, the adaptive extension demonstrates strong capability in compensating for parametric uncertainties and preserving high tracking performance.
Integrating grid-forming converters (GFMCs) into grid-following converter (GFLC)-dominated power systems enhances the grid strength, but GFMCs' current-limiting characteristic triggers dynamic switching between constant voltage control (CVC) and current limit control (CLC). This switching feature poses critical transient stability risks to GFLCs, requiring urgent investigation. This paper first develops a mathematical model for this switched system. Then, it derives switching conditions for droop-controlled GFMCs, which are separately GFMC angle-dependent and GFLC angle-dependent. On this basis, the stability boundaries of GFLC within each subsystem are analyzed, and the impact of GFMC switching arising from GFLC angle oscillation is investigated. The findings reveal that the switched system's stability boundary coincides with that of the CLC subsystem. To enhance GFLC's transient stability and ensure GFMC converges to the CVC mode, this paper introduces a virtual fixed d-axis control (VFDC) strategy. Compared with existing methods, this method achieves decoupling and self-stabilization using only local state variables from individual converters. The conclusions are validated through simulations and Controller Hardware-in-the-Loop tests.
We consider a verification problem for opinion dynamics based on binary observations. The opinion dynamics is governed by a Friedkin-Johnsen (FJ) model, where only a sequence of binary outputs is available instead of the agents' continuous opinions. At every time-step we observe a binarized output for each agent depending on whether the opinion exceeds a fixed threshold. The objective is to verify whether an FJ model with a given set of stubbornness parameters and initial opinions can generate the observed binary outputs up to a small error. The FJ model is formulated as a transition system, and an approximate simulation relation of two transition systems is defined in terms of the proximity of their opinion trajectories and output sequences. We then construct a finite set of abstract FJ models by simplifying the influence matrix and discretizing the stubbornness parameters and the initial opinions. It is shown that the abstraction approximately simulates any concrete FJ model with continuous parameters and initial opinions, and is itself approximately simulated by some concrete FJ model. These results ensure that consistency verification can be performed over the finite abstraction. Specifically, by checking whether an abstract model satisfies the observation constraints, we can conclude whether the corresponding family of concrete FJ models is consistent with the binary observations. Finally, numerical experiments are presented to illustrate the proposed verification framework.
Liquid crystal (LC) is a promising hardware solution for implementing large RISs, as it is cost-effective, energy efficient, scalable, and capable of providing continuous phase shifts with low power consumption. However, the phase shift response of LC-based RISs is inherently frequency dependent. If unaddressed, this characteristic leads to performance degradation, particularly in wideband scenarios. This issue is especially critical in secure communication applications, where minor phase shift variations across elements can result in considerable information leakage. This paper addresses these frequency-induced variations by developing a physics-based model for an LC unit cell across varying frequencies and proposing a novel phase shift design framework that maximizes secure communication across all subcarriers. Given the large number of elements in millimeter wave (mmWave) LC-RISs, acquiring full channel state information (CSI) is often impractical. Therefore, we optimize the phase shifts based solely on the locations of the legitimate mobile users (MUs) and potential eavesdroppers. Rather than targeting a single user point, the RIS is designed to illuminate a broader area. This approach enhances communication reliability for the MUs and mitigates performance degradation caused by location estimation errors. To solve the problem, we introduce both a semi-definite programming (SDP)-based solution and a low complexity heuristic method. While the SDP-based approach yields superior performance, it incurs higher computational complexity. Conversely, the scalable method exhibits a much slower scaling of complexity, which makes it highly suitable for extremely large RISs. Simulation results demonstrate that both algorithms improve the secrecy rate compared to baseline methods. Finally, the proposed design is validated through experimental evaluations on an LC RIS setup.
Optimization using network traffic models requires computing gradients of objective functions with respect to model parameters. However, derivation of gradients of network traffic models has been considered very difficult or impractical due to their complexity and size. Conventional approaches rely on numerical differentiation or derivative-free methods that do not scale well with the parameter dimension, or on adjoint methods that require manual derivation for each specific model. This study proposes a novel end-to-end differentiable network traffic flow simulator based on the Link Transmission Model (LTM) and a dynamic user optimum (DUO) route choice model. We observe that the LTM operates on continuous aggregate state variables (cumulative vehicle counts) through piecewise-linear min/max operations, which admit subgradients almost everywhere and appropriate for automatic differentiation (AD). We incorporate the DUO route choice model and its logit extension to explicitly consider endogenous dynamic route choice of travelers while preserving differentiability, by leveraging the fact that the diverge ratios are continuous functions of per-destination vehicle counts. The resulting simulator is differentiable almost everywhere and computes exact gradients via reverse-mode AD in a single backward pass regardless of the parameter dimension. In order to demonstrate the capability of the proposed model, we solved a dynamic congestion toll optimization problem on the Chicago-Sketch dataset with around 2500 links, 1 million vehicles, a 3-hour duration, and 15000 decision variables. The proposed model successfully derived a high-quality solution in 3000 iterations in about 40 minutes. On average, one simulation run and gradient derivation took 0.8 seconds. The simulator, implemented in Python and JAX, is released as open-source software named UNsim (this https URL).
Discrete audio tokens have recently gained considerable attention for their potential to bridge audio and language processing, enabling multimodal language models that can both generate and understand audio. However, preserving key information such as phonetic content, speaker identity, and paralinguistic cues remains a major challenge. Identifying the optimal tokenizer and configuration is further complicated by inconsistent evaluation settings across existing studies. To address this, we introduce the Discrete Audio and Speech Benchmark (DASB), a comprehensive framework for benchmarking discrete audio tokens across speech, general audio, and music domains on a range of discriminative and generative tasks. Our results show that discrete representations are less robust than continuous ones and require careful tuning of factors such as model architecture, data size, learning rate, and capacity. Semantic tokens generally outperform acoustic tokens, but a gap remains between discrete tokens and continuous features, highlighting the need for further research. DASB codes, evaluation setup, and leaderboards are publicly available at this https URL.
Spoken dialogue systems powered by large language models have demonstrated remarkable abilities in understanding human speech and generating appropriate spoken responses. However, these systems struggle with end-turn detection (ETD) -- the ability to distinguish between user turn completion and hesitation. This limitation often leads to premature or delayed responses, disrupting the flow of spoken conversations. In this paper, we introduce the ETD Dataset, the first public dataset for end-turn detection. The ETD dataset consists of both synthetic speech data generated with text-to-speech models and real-world speech data collected from web sources. We also propose SpeculativeETD, a novel collaborative inference framework that balances efficiency and accuracy to improve real-time ETD in resource-constrained environments. Our approach jointly employs a lightweight GRU-based model, which rapidly detects the non-speaking units in real-time on local devices, and a high-performance Wav2vec-based model running on the server to make a more challenging classification of distinguishing turn ends from mere pauses. Experiments demonstrate that the proposed SpeculativeETD significantly improves ETD accuracy while keeping the required computations low. Datasets and code will be available after the review.
This paper presents a new method for jointly calibrating a magnetometer and inertial measurement unit (IMU), focusing on balancing calibration accuracy and computational efficiency. The proposed method is based on a maximum a posteriori estimation framework, treating both the calibration parameters and orientation trajectory of the sensors as unknowns. This method enables efficient optimization of the calibration parameters using analytically derived derivatives. The performance of the proposed method is compared against that of two state-of-the-art methods. Simulation results demonstrate that the proposed method achieves the lowest root mean square error in calibration parameters, increasing the calibration accuracy by 20-30%, while maintaining competitive computational efficiency. Further validation through real-world experiments confirms the practical benefits of the proposed method. The proposed method calibrated 30 magnetometer-IMU pairs in under two minutes on a consumer-grade laptop, which is one order of magnitude faster than the most accurate state-of-the-art algorithm as implemented in this work. Moreover, when calibrated using the proposed method, a magnetic-field-aided inertial navigation system achieved positioning performance comparable to when it is calibrated with the state-of-the-art method. These results demonstrate that the proposed method is a reliable and effective choice for jointly calibrating magnetometer-IMU pairs.
In narrow, unstructured underwater environments such as environmental monitoring and minimally invasive medical procedures, micro soft robots exhibit unique advantages due to their flexible movement capabilities and small size. At the same time, applying bionic technology to the structural design of micro soft robots can significantly improve their swimming performance. However, limited by their miniaturization, these robots are difficult to power internally and usually adopt a wireless power supply method. This study designs and fabricates a magnetically responsive, cownose ray-inspired micro soft robot based on the swimming principle of the cownose ray. The robot is made of a certain proportion of NdFeB and PDMS. Then, a three-dimensional Helmholtz coil is used to generate an oscillating harmonic magnetic field to conduct swimming experiments on the robot, exploring the influence of magnetic field parameters on the robot's swimming performance. The experimental results show that the swimming speed is the fastest at B = 5 mT and f = 11 Hz, reaching 5.25 mm/s, which is about 0.5 body lengths per second. In addition, by adjusting the current direction and frequency of the coil, the robot can perform different swimming modes such as straight swimming, turning swimming, and directional swimming. By employing a stepwise adjustment method, the impact of response errors on the robot's trajectory can be effectively reduced. This study demonstrates a method for magnetically driven micro soft robots, laying a foundation for the application of wireless-driven robots in underwater narrow spaces.
This note presents a concise mathematical formulation of tightly-coupled LiDAR-Inertial Odometry within an iterated error-state Kalman filter framework using a VoxelMap representation. Rather than proposing a new algorithm, it provides a clear and self-contained derivation that unifies the geometric modeling and probabilistic state estimation through consistent notation and explicit formulations. The document is intended to serve both as a technical reference and as an accessible entry point for a foundational understanding of the system architecture and estimation principles.
We present OmniVoice, a massively multilingual zero-shot text-to-speech (TTS) model that scales to over 600 languages. At its core is a novel diffusion language model-style discrete non-autoregressive (NAR) architecture. Unlike conventional discrete NAR models that suffer from performance bottlenecks in complex two-stage (text-to-semantic-to-acoustic) pipelines, OmniVoice directly maps text to multi-codebook acoustic tokens. This simplified approach is facilitated by two key technical innovations: (1) a full-codebook random masking strategy for efficient training, and (2) initialization from a pre-trained LLM to ensure superior intelligibility. By leveraging a 581k-hour multilingual dataset curated entirely from open-source data, OmniVoice achieves the broadest language coverage to date and delivers state-of-the-art performance across Chinese, English, and diverse multilingual benchmarks. Our code and pre-trained models are publicly available at this https URL.
In this work, we present Qwen3.5-Omni, the latest advancement in the Qwen-Omni model family. Representing a significant evolution over its predecessor, Qwen3.5-Omni scales to hundreds of billions of parameters and supports a 256k context length. By leveraging a massive dataset comprising heterogeneous text-vision pairs and over 100 million hours of audio-visual content, the model demonstrates robust omni-modality capabilities. Qwen3.5-Omni-plus achieves SOTA results across 215 audio and audio-visual understanding, reasoning, and interaction subtasks and benchmarks, surpassing Gemini-3.1 Pro in key audio tasks and matching it in comprehensive audio-visual understanding. Architecturally, Qwen3.5-Omni employs a Hybrid Attention Mixture-of-Experts (MoE) framework for both Thinker and Talker, enabling efficient long-sequence inference. The model facilitates sophisticated interaction, supporting over 10 hours of audio understanding and 400 seconds of 720P video (at 1 FPS). To address the inherent instability and unnaturalness in streaming speech synthesis, often caused by encoding efficiency discrepancies between text and speech tokenizers, we introduce ARIA. ARIA dynamically aligns text and speech units, significantly enhancing the stability and prosody of conversational speech with minimal latency impact. Furthermore, Qwen3.5-Omni expands linguistic boundaries, supporting multilingual understanding and speech generation across 10 languages with human-like emotional nuance. Finally, Qwen3.5-Omni exhibits superior audio-visual grounding capabilities, generating script-level structured captions with precise temporal synchronization and automated scene segmentation. Remarkably, we observed the emergence of a new capability in omnimodal models: directly performing coding based on audio-visual instructions, which we call Audio-Visual Vibe Coding.