Vertical farming is a controlled-environment agriculture (CEA) approach in which crops are grown in stacked layers under regulated climate and lighting, enabling predictable production but requiring high electricity input. This study quantifies the techno-economic impact of roof-mounted daylighting in a three-tier container vertical farm using a light-pipe (LP) system that delivers sunlight to the upper tier. The optical chain, comprising a straight duct and a tilting aluminum-coated mirror within a rotating dome, was modelled in Tonatiuh to estimate crop-level photon delivery and solar gains. These outputs were coupled with a transient AGRI-Energy model to perform year-round simulations for Dubai. Tier-3 strategies were compared against a fully LED benchmark, including daylight-only operation, on/off supplementation, PWM dimming, UV-IR filtering, variable-transmittance control, and simple glazing. Ray-tracing predicted an overall LP optical efficiency of 45%-75%, depending on solar position, quantifying the fraction of incident daylight at the collector aperture delivered to the target growing zone. Daylight-only operation reduced the total three-tier yield by 17% and was not economically viable despite 27-29% electricity savings. Hybrid daylight-LED strategies preserved benchmark yield while reducing electricity use. PWM dimming combined with UV-IR filtering achieved the lowest specific electricity energy consumption (6.32 kWh/kg), 14% below the benchmark. Overall, viability remains CAPEX-limited because achievable electricity savings are insufficient to offset the added investment and thus improves mainly under high electricity and carbon-price contexts, although the LP system delivers a 15-38% lower light cost than an optical-fiber reference under identical incident daylight.
Mid-band spectrum between 2 and 8 GHz is a critical resource for sixth-generation (6G) systems as it uniquely balances favorable propagation characteristics with scalable bandwidth. Recent U.S. policy highlights candidate bands near 2.7, 4.4, and 7.1 GHz, all of which host substantial federal and non-federal incumbency, including high-power radiolocation and aeronautical telemetry systems. Although these segments are being considered for potential relocation of federal incumbents to enable commercial use, their long-term viability depends on the structural integrity of the spectrum. In such environments, the practical value of spectrum depends on the reliability and contiguity of available spectrum opportunities. This paper presents a measurement-driven feasibility analysis of two representative segments, 2.69-2.9 GHz and 4.4-4.94 GHz, using Software-Defined Radio (SDR) measurements collected during Packapalooza campaigns from 2022 to 2025. Deployment-oriented metrics are introduced to quantify scan-window reliability (SWR), altitude-dependent usable spectrum availability ratio (USAR), largest contiguous clean bandwidth (LCCB), spectral fragmentation, and extreme interference excursions. The results reveal significant year-to-year structural variability. In the 2.69-2.9 GHz band, USAR remains near unity in 2022 and 2023, but drops to approximately 0.65 in 2024 and 0.8 in 2025, accompanied by fragmentation and limited contiguous bandwidth across altitudes. The 4.4-4.94 GHz band exhibits a similar temporal pattern, but with smaller reliability degradation and larger contiguous support, often exceeding several hundred megahertz even during incumbent-dominant periods. The results highlight that wideband feasibility in these candidate bands depends strongly on spectral contiguity and structural stability rather than nominal bandwidth alone.
This paper proposes a two phase framework to improve the sustainability in vertical heterogeneous networks that integrate various types of base stations~(BSs), including terrestrial macro BSs~(MBSs), small BSs~(SBSs), and a high altitude platform station super MBS (HAPS SMBS). In Phase I, we address the critical and often overlooked challenge of estimating the traffic load of sleeping SBSs, a prerequisite for practical cell switching, by introducing three methods with varying data dependencies: (i) a distance based estimator (no historical data), (ii) a multi level clustering (MLC) estimator (limited historical data), and (iii) a long short term memory~(LSTM) based temporal predictor (full historical data). In Phase II, we incorporate the most accurate estimation results from Phase I into a renewable energy aware cell switching strategy, explicitly modeling solar powered SBSs in three operational scenarios that reflect realistic hybrid grid renewable deployments. This flexible design allows the framework to adapt switching strategies based on renewable availability and storage conditions, making it more practical and robust for real world networks. Using a real call detail record dataset from Milan, simulation results show that the LSTM method achieves a mean absolute percentage error (MAPE) below 1% in Phase I, while in Phase II, the threshold based solar integration scenario achieves up to 23% network energy saving (NES) relative to conventional cell switching. Overall, the proposed framework bridges the gap between theoretical cell switching models and practical, sustainable 6G radio access network~(RAN) operation, enabling significant energy saving without compromising quality of service.
Frequency control in power systems is implemented in a hierarchical structure traditionally known as primary frequency control (PFC), secondary frequency control (SFC) and tertiary control reserve (TCR) and, some jurisdictions, include time error control (TEC) as well. This hierarchical structure has been designed around a century ago based on timescales separation, that is, approximately an order of magnitude difference between each control structure. This paper argues, based on real-world observations as well as detailed dynamic simulations on a model of the All-Island power system (AIPS) of Ireland, that this frequency control structure is not necessary in current and future converter-dominated power grids. The paper proposes to redesign this structure by removing the SFC and TCR and rely on PFC and a real-time energy market. The PFC is responsible for addressing fast power imbalances in timescales of tens of ms to few minutes (e.g., 100 ms to 5 minutes) while the real-time energy market is responsible for addressing longer imbalances in timescales of minutes to hours (e.g., 5 minutes to 1 hour). TEC, on the other hand, is considered as optional.
Positioning using Global Navigation Satellite Systems (GNSS) typically requires several seconds of continuous signal reception from satellites in Medium Earth Orbit (MEO). This requirement poses challenges for applications where receivers can only capture signals intermittently or operate under constrained power and visibility conditions. In such scenarios, maintaining continuous tracking or reliable line-of-sight to GNSS satellites may be difficult, and conventional GNSS frequencies may also be vulnerable to interference or jamming. Low Earth Orbit (LEO) satellite constellations provide an attractive alternative due to their lower orbital altitudes, which result in higher received signal strengths, as well as their operation across a wide range of spectrum including Mobile-Satellite Service (MSS) and terrestrial L and S bands. These characteristics make LEO signals promising for navigation in challenging environments. This work presents a snapshot-based differential positioning framework that leverages signals from LEO satellites. In the proposed approach, a receiver collects signals for short durations (5-10 seconds) before entering a low-power state, enabling positioning with intermittent observations. Doppler measurements from multiple satellites are combined with a differential measurement model using a fixed reference receiver to mitigate common errors such as satellite clock bias and ephemeris uncertainty. Experimental results demonstrate that the proposed differential Doppler framework operates effectively within the constraints of snapshot-based reception. The method achieves a position error reduction of approximately 47% even when only three satellites are simultaneously visible to both the rover and the reference station.
This paper proposes an adaptive tube framework for model predictive control (MPC) of discrete-time linear time-invariant systems subject to parametric uncertainty and additive disturbances. In contrast to conventional tube-based MPC schemes that employ fixed tube geometry and constraint tightening designed for worst-case uncertainty, the proposed approach incorporates online parameter learning to progressively refine the parametric uncertainty set and update the parameter estimates. These updates are used to adapt the components of the MPC optimization problem, including the prediction model, feedback gain, terminal set, and tube cross-sections. As the uncertainty set contracts, the required amount of constraint tightening reduces and the tube shrinks accordingly, yielding less conservative control actions. Recursive feasibility, robust constraint satisfaction, and closed-loop stability are formally established. Furthermore, the framework does not require the existence of a common quadratically stabilizing linear feedback gain for the entire parametric uncertainty set, thereby relaxing a standard assumption in existing tube-based MPC formulations. Numerical examples illustrate the effectiveness of the proposed approach.
It is difficult to analyze the stability of systems with time-varying delays. One approach is to construct a time-transformation that converts the system into a form with a constant delay but with a time-varying scalar appearing in the system matrices. The stability of this transformed system can then be analyzed using methods to bound the effect of the time-varying scalar. One issue is that this transformation is non-unique and requires the solution of an Abel equation. A specific time-transformation typically must be computed numerically. We address this issue by computing an explicit, although approximate, time-transformation for systems where the delay has a constant plus small periodic term. We use a perturbative expansion to construct our explicit solutions. We provide a simple numerical example to illustrate the approach. We also demonstrate the use of this time-transformation to analyze stability of the system with this class of periodic delays.
Deep learning has achieved remarkable success in medical image analysis, yet its performance remains highly sensitive to the heterogeneity of clinical data. Differences in imaging hardware, staining protocols, and acquisition conditions produce substantial domain shifts that degrade model generalization across institutions. Here we present a physics-based data preprocessing framework based on the PhyCV (Physics-Inspired Computer Vision) family of algorithms, which standardizes medical images through deterministic transformations derived from optical physics. The framework models images as spatially varying optical fields that undergo a virtual diffractive propagation followed by coherent phase detection. This process suppresses non-semantic variability such as color and illumination differences while preserving diagnostically relevant texture and structural features. When applied to histopathological images from the Camelyon17-WILDS benchmark, PhyCV preprocessing improves out-of-distribution breast-cancer classification accuracy from 70.8% (Empirical Risk Minimization baseline) to 90.9%, matching or exceeding data-augmentation and domain-generalization approaches at negligible computational cost. Because the transform is physically interpretable, parameterizable, and differentiable, it can be deployed as a fixed preprocessing stage or integrated into end-to-end learning. These results establish PhyCV as a generalizable data refinery for medical imaging-one that harmonizes heterogeneous datasets through first-principles physics, improving robustness, interpretability, and reproducibility in clinical AI systems.
This paper focuses on price-based residential demand response implemented through dynamic adjustments of electricity prices during DR events. It extends existing DR models to a stochastic framework in which customer response is represented by price-dependent random variables, leveraging models and tools from the theory of stochastic optimization with decision-dependent distributions. The inherent epistemic uncertainty in the customers' responses renders open-loop, model-based DR strategies impractical. To address this challenge, the paper proposes to employ stochastic, feedback-based pricing strategies to compensate for estimation errors and uncertainty in customer response. The paper then establishes theoretical results demonstrating the stability and near-optimality of the proposed approach and validates its effectiveness through numerical simulations.
Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.
This letter presents a geometric input-output analysis of distance-based formation control, focusing on the phenomenon of steady-state signal blocking between actuator and sensor pairs. We characterize steady-state multivariable transmission zeros, where fully excited rigid-body and deformational modes destructively interfere at the measured output. By analyzing the DC gain transfer matrix of the linearized closed-loop dynamics, we prove that for connected, flexible frameworks, structural transmission zeros are strictly non-generic; the configuration-dependent cross-coupling required to induce them occupies a proper algebraic set of measure zero. However, because extracting actionable sensor-placement rules from these complex algebraic varieties is analytically intractable, we restrict our focus to infinitesimally rigid formations. For these baselines, we prove that the absence of internal flexes forces the zero-transmission condition to collapse into an explicit affine hyperplane defined by the actuator and the global formation geometry, which we term the spatial locus of transmission zeros. Finally, we introduce the global transmission polygon--a convex polytope constructed from the intersection of these loci. This construct provides a direct geometric synthesis rule for robust sensor allocation, guaranteeing full-rank steady-state transmission against arbitrary single-node excitations.
In this work, we present a deep learning-based automatic multitrack music mixing system catered towards live performances. In a live performance, channels are often corrupted with acoustic bleeds of co-located instruments. Moreover, audio-visual synchronization is of critical importance thus putting a tight constraint on the audio latency. In this work we primarily tackle these two challenges of handling bleeds in the input channels to produce the music mix with zero latency. Although there have been several developments in the field of automatic music mixing in recent times, most or all previous works focus on offline production for isolated instrument signals and to the best of our knowledge, this is the first end-to-end deep learning system developed for live music performances. Our proposed system currently predicts mono gains for a multitrack input, but its design along with the precedent set in past works, allows for easy adaptation to future work of predicting other relevant music mixing parameters.
Implicit neural representations (INRs) can parameterize continuous beamforming functions in continuous aperture arrays (CAPAs) and thus enable efficient online inference. Existing INR-based beamforming methods for CAPAs, however, typically suffer from high training complexity and limited generalizability. To address these issues, we first derive a closed-form expression for the achievable sum rate in multiuser multi-CAPA systems where both the base station and the users are equipped with CAPAs. For sum-rate maximization, we then develop a functional weighted minimum mean-squared error (WMMSE) algorithm by using orthonormal basis expansion to convert the functional optimization into an equivalent parameter optimization problem. Based on this functional WMMSE algorithm, we further propose BeamINR, an INR-based beamforming method implemented with a graph neural network to exploit the permutation-equivariant structure of the optimal beamforming policy; its update equation is designed from the structure of the functional WMMSE iterations. Simulation results show that the functional WMMSE algorithm achieves the highest sum rate at the cost of high online complexity. Compared with baseline INRs, BeamINR substantially reduces inference latency, lowers training complexity, and generalizes better across the number of users and carrier frequency.
Control barrier functions enforce safety by guaranteeing forward invariance of an admissible set. Under standard (non-strict) barrier conditions, however, forward invariance alone does not prevent trajectories from remaining on the boundary of the safe set for arbitrarily long time intervals, potentially leading to boundary sticking or deadlock phenomena. This paper studies the elimination of persistent boundary residence under forward-invariant barrier conditions. Inspired by Matrosov-type arguments, we introduce an auxiliary function framework that preserves forward invariance while excluding infinite-time residence within boundary layers. Sufficient conditions are established under which any trajectory can only remain in a prescribed neighborhood of the boundary for finite time, thereby restoring boundary-level liveness without altering forward invariance. The proposed construction does not rely on singular barrier formulations or controller-specific modifications, and can be incorporated into standard safety-critical control architectures. Numerical examples illustrate the removal of boundary sticking behaviors while maintaining safety across representative systems.
Reconfigurable antennas (RAs) utilize the electromagnetic (EM) domain to provide dynamic control over antenna radiation patterns, which offers an effective way to enhance power efficiency in wireless links. Unlike conventional arrays with fixed element patterns, RAs enable on-demand beam-pattern synthesis by directly controlling each antenna's EM characteristics. While existing research on RAs has primarily focused on improving spectral efficiency, this paper explores their application for downlink localization. Moreover, the majority of existing works focus on far-field scenarios with little attention on near-field (NF). Motivated by these gaps, we consider a synthesis model in which each antenna generates desired beampatterns from a finite set of EM basis functions. We then formulate a joint optimization problem for the baseband (BB) and EM precoders with the objective of minimizing the user equipment (UE) position error bound (PEB) in NF conditions. Our analytical derivations and extensive simulation results demonstrate that the proposed hybrid precoder design for RAs significantly improves UE positioning accuracy compared to traditional non-reconfigurable arrays.
Integrated sensing, communication, and powering (ISCAP) has emerged as a promising solution for enabling multi-functionality in 6G networks. However, it poses a significant challenge in the design of multi-functional waveforms that must jointly consider communication, sensing, and powering performance. In this paper, we propose a novel rate-splitting multiple access (RSMA)-enabled multi-functional ISCAP network, where RSMA facilitates the use of communication signals to simultaneously achieve all three functionalities. Based on the proposed system model, we investigate the beamforming optimization problem to explore the performance trade-offs among communication, sensing, and power transfer. To efficiently solve this problem, we develop a novel ISCAP-extragradient (ISCAP-EG) algorithm, which transforms the original problem into a sequence of convex subproblems, reformulates the dual problem as a variational inequality, and solves it using the EG method. Numerical results show that the proposed ISCAP-EG algorithm achieves performance equivalent to that of the conventional successive convex approximation (SCA)-based method, while significantly reducing simulation time. Moreover, the RSMA-enabled multi-functional ISCAP network enhance the performance trade-off compared with the conventional space-division multiple access (SDMA)-based scheme, highlighting RSMA as a promising technique for advancing multi-functional ISCAP development in 6G.
This paper presents a two-stage framework for constrained near-optimal feedback control of input-affine nonlinear systems. An approximate value function for the unconstrained control problem is computed offline by solving the Hamilton--Jacobi--Bellman equation. Online, a quadratic program is solved that minimizes the associated approximate Hamiltonian subject to safety constraints imposed via control barrier functions. Our proposed architecture decouples performance from constraint enforcement, allowing constraints to be modified online without recomputing the value function. Validation on a linear 2-state 1D hovercraft and a nonlinear 9-state spacecraft attitude control problem demonstrates near-optimal performance relative to open-loop optimal control benchmarks and superior performance compared to control Lyapunov function-based controllers.
The rise of sixth generation (6G) wireless networks promises to deliver ultra-reliable, low-latency, and energy-efficient communications, sensing, and computing. However, traditional centralized artificial intelligence (AI) paradigms are ill-suited to the decentralized, resource-constrained, and dynamic nature of 6G ecosystems. This paper explores knowledge distillation (KD) and collaborative learning as promising techniques that enable the efficient and scalable deployment of lightweight AI models across distributed communications and sensing (C&S) nodes. We begin by providing an overview of KD and highlight the key strengths that make it particularly effective in distributed scenarios characterized by device heterogeneity, task diversity, and constrained resources. We then examine its role in fostering collective intelligence through collaborative learning between the central and distributed nodes via various knowledge distilling and deployment strategies. Finally, we present a systematic numerical study demonstrating that KD-empowered collaborative learning can effectively support lightweight AI models for multi-modal sensing-assisted beam tracking applications with substantial performance gains and complexity reduction.
Wireless digital twins can be leveraged to provide site-specific synthetic channel information through precise physical modeling and signal propagation simulations. This can help reduce the overhead of channel state information (CSI) acquisition, particularly needed for large-scale MIMO systems. For high-quality digital twin channels, the classical approach is to increase the digital twin fidelity via more accurate modeling of the environment, propagation, and hardware. This, however, comes with high computational cost, making it unsuitable for real-time applications. In this paper, we propose a new framework that, instead of calibrating the digital twin model itself, calibrates the DFT-domain channel information to reduce the gap between the low-fidelity digital twin and its high-fidelity counterpart or the real world. This allows systems to leverage a low-complexity digital twin for generating real-time channel information without compromising quality. To evaluate the effectiveness of the proposed approach, we adopt codebook-based CSI feedback as a case study, where refined synthetic channel information is used to identify the most relevant DFT codewords for each user. Simulation results demonstrate the effectiveness of the proposed digital twin calibration approach in achieving high CSI acquisition accuracy while reducing the computational overhead of the digital twin. This paves the way for realizing digital twin assisted wireless systems.
In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...
The rapid proliferation of AI-Generated Content (AIGC) has necessitated robust metrics for perceptual quality assessment. However, automatic Mean Opinion Score (MOS) prediction models are often compromised by data scarcity, predisposing them to learn spurious correlations-- such as dataset-specific acoustic signatures-- rather than generalized quality features. To address this, we leverage domain adversarial training (DAT) to disentangle true quality perception from these nuisance factors. Unlike prior works that rely on static domain priors, we systematically investigate domain definition strategies ranging from explicit metadata-driven labels to implicit data-driven clusters. Our findings reveal that there is no "one-size-fits-all" domain definition; instead, the optimal strategy is highly dependent on the specific MOS aspect being evaluated. Experimental results demonstrate that our aspect-specific domain strategy effectively mitigates acoustic biases, significantly improving correlation with human ratings and achieving superior generalization on unseen generative scenarios.
This paper investigates energy-efficient inter-satellite communication in Low Earth Orbit (LEO) networks, where satellites exchange both buffered and newly generated data through half-duplex inter-satellite links (ISLs). Due to orbital motion and interference-prone directional asymmetry, the achievable ISL capacities in opposite directions vary dynamically, leading to inefficient utilization under conventional fixed or alternating duplex modes. To address this, we propose a Flexible Duplex (FlexD) scheme that adaptively selects the ISL transmission direction in each slot to maximize instantaneous end-to-end sky-to-ground throughput, jointly accounting for ISL quality, downlink conditions, and queue backlogs. A unified analytical framework is developed that transforms the bottleneck rate structure into an equivalent SINR domain, enabling closed-form derivations of throughput outage probability and energy efficiency under deterministic ISLs and Rician satellite-to-ground fading. The analysis reveals distinct operating regions governed by ISL and backlog constraints and provides tractable bounds for ergodic rate and energy efficiency. Numerical results confirm that FlexD achieves higher reliability and up to 30% improvement in energy efficiency compared with conventional half- and full-duplex schemes under realistic inter-satellite interference conditions.
We analyze signal recovery when samples are taken concomitantly from a signal and its Fourier transform. This two-sided sampling framework extends classical one-sided reconstruction and is particularly useful when measurements in either domain alone are insufficient because of sensing, storage, or bandwidth constraints. We formulate the resulting recovery problem in finite-dimensional spaces and reproducing kernel Hilbert spaces, and illustrate the infinite-dimensional setting in a Fourier-symmetric Sobolev space. Numerical experiments with sinc- and Hermite-based schemes indicate that, under a fixed sampling budget, two-sided sampling often yields better conditioned systems than one-sided approaches. A simplified spectrum-monitoring example further demonstrates improved reconstruction when limited time samples are supplemented with frequency-domain information.
Terahertz (THz) ultra-massive multiple-input multiple-output (UM-MIMO) promises ultra-high throughput, while its highly directional beams demand rapid and accurate beam tracking driven by precise user-state estimation. Moreover, large array apertures at high frequencies induce near-field propagation effects, where far-field modeling becomes inaccurate and near-field parametric channel estimation is costly. Bypassing near-field codebook, PAST-TT is proposed to bridge near-field tracking with low-overhead far-field codebook probing by exploiting parallax, amplified by widely spaced subarrays. With comb-type frequency-division multiplexing pilots, each subarray yields frequency-affine phase signatures whose frequency and temporal increments encode propagation delay and its variation between frames. Building on these signatures, a Parallax-Aware Spatial Transformer (PAST) compresses them and outputs per-frame position estimates with token reliability to downweight bad frames, regularized by a physics-in-the-loop consistency loss. A causal Temporal Transformer (TT) then performs reliability-aware filtering and prediction over a sliding window to initialize the beam of the next frame. Acting on short token sequences, PAST-TT avoids a monolithic spatial-temporal network over raw pilots, which keeps the model lightweight with a critical path latency of 0.61 ms. Simulations show that at 15 dB signal-to-noise ratio, PAST achieves 7.81 mm distance RMSE and 0.0588° angle RMSE. Even with a bad-frame rate of 0.1, TT reduces the distance and angle prediction RMSE by 23.1% and 32.8% compared with the best competing tracker.
Model training for Device-Free Localization (DFL) and Radio-Frequency (RF) sensing heavily relies on large-scale datasets, which are difficult, expensive, and time-consuming to obtain through measurements. This paper proposes a fast 2.5-dimensional Finite Element Method (2.5-D FEM) for computing the scattering fields of a Body of Revolution (BoR) human model under the excitation of a z-directed dipole. The proposed method can evaluate the effect of human micro-movements through the statistical characteristics of the Received Signal Strength Indicator (RSSI). The numerical accuracy and the practical applicability of the proposed method are validated through comparisons with full-wave simulations and indoor RF sensing experiments. The simulation results show agreement with the experimental measurements, demonstrating that the method is a reliable tool for evaluating micro-movement-induced statistical variations. The proposed method provides a practical and efficient means for generating large-scale, labeled RF training datasets, thereby accelerating the development of indoor localization tools as well as the calibration and tuning of tomographic reconstruction methods.
We propose an interpretable Batch-EM Unfolded Network for robust speaker localization. By embedding the iterative EM procedure within an encoder-EM-decoder architecture, the method mitigates initialization sensitivity and improves convergence. Experiments show superior accuracy and robustness over the classical Batch-EM in reverberant conditions.
Maintaining robust and stable communication links in high-mobility scenarios is challenging for time-division duplex (TDD) reciprocity-based gigantic MIMO systems due to rapid channel variations, especially in non-line-of-sight (NLOS) conditions. This paper proposes a user equipment (UE) beamforming strategy that enables reliable links in high mobility without additional pilot overhead. The proposed strategy aligns the UE beamforming direction with the travel axis. Our analysis shows that this choice minimizes the Doppler spread of the channel, resulting in improved temporal stability. We evaluate this approach through simulations in scattering-rich environments representative of gigantic MIMO deployments. Numerical results confirm that movement-aligned UE beamforming enhances link robustness, increases achievable data rates, and reduces pilot signaling requirements, thereby lowering UE power consumption. These findings indicate that travel-axis-aligned UE beamforming is a promising method for improving reliability in future high-mobility wireless systems.
Identifiability is a central issue in blind source separation (BSS), determining whether latent sources can be uniquely recovered from observed mixtures. Classical approaches address identifiability either by exploiting source non-Gaussianity via higher-order statistics (HOS) or by enriching the observation structure through temporal, spatial, or multi-channel diversity using second-order statistics (SOS), and these routes are often regarded as fundamentally different. In this paper, we revisit identifiability in BSS from a structural perspective, interpreting it as constraint-induced reduction of residual ambiguity in the mixing model. Within this framework, the observation mechanism is viewed broadly to include both input-side statistical constraints and output-side observation structures. HOS-based and SOS-based approaches are then unified as mechanisms of stabilizer shrinkage, in which observation-induced constraints reduce an initially continuous ambiguity to a finite residual one. To connect this structural viewpoint with finite-sample regimes, we introduce a Jacobian-based sensitivity probe as a numerical diagnostic of local identifiability. Numerical experiments show that increasing non-Gaussianity or observation diversity suppresses the same residual symmetry, revealing a structural trade-off between source statistics and observation design. These results provide a unified interpretation of classical BSS methods and clarify how observation constraints govern identifiability.
The synergy between extremely large-scale antenna arrays and terahertz technology in sixth-generation networks establishes a near-field wideband transmission environment, enabling the generation of highly focused beams. To leverage this capability for multi-source localization, we propose a direct localization method based on the curvature-of-arrival of spherical wavefronts for estimating the positions of multiple near-field users from wideband signals. Furthermore, to overcome the spatial-wideband effect, we introduce a hybrid analog/digital array architecture with true-timedelayers (TTDs). We derive a closed-form position error bound to characterize the fundamental estimation performance and optimize the analog coefficients of array by maximizing the trace of the Fisher information matrix to minimize this bound. Furthermore, we extend this method to a sub-optimal iterative method that jointly optimizes beam focusing and localization, without requiring prior knowledge of the source positions for array design. Simulation results show that the proposed array configuration design significantly enhances the performance of near-field wideband localization, while the presence of TTDs effectively mitigates the localization performance degradation caused by spatial-wideband effects.
A Gaussian error assumption is commonly adopted in the pseudorange measurement model for global navigation satellite system (GNSS) positioning, which leads to the conventional least squares (LS) estimator. In urban environments, however, multipath and non-line-of-sight (NLOS) receptions produce heavy-tailed pseudorange errors that are not well represented by the Gaussian model. This study models urban GNSS pseudorange errors using a logistic distribution and derives the corresponding maximum likelihood estimator, termed the Least Quasi-Log-Cosh (LQLC) estimator. The resulting estimation problem is solved efficiently using an iteratively reweighted least squares (IRLS) algorithm. Experiments in light, medium, and deep urban environments show that LQLC consistently outperforms LS, reducing the three-dimensional (3D) root mean square error (RMSE) by approximately 11%-31% and the 3D error standard deviation (STD) by approximately 27%-61%. A controlled scale-mismatch analysis further shows that LQLC is more sensitive to severe underestimation than to overestimation of the logistic scale, indicating that the practical tuning requirement is to avoid overly small scale values rather than to achieve exact scale matching. In addition, the computational cost remains compatible with real-time positioning. These results indicate that logistic modeling provides a simple and practical alternative to Gaussian-based urban GNSS positioning.
In this correspondence, we investigate networked sensing in perceptive mobile networks under a bistatic multi-transmitter single-receiver uplink topology, where multiple user equipments (UEs) transmit signals over orthogonal frequency-division multiple access (OFDMA) resources and a single base station performs joint sensing. Uplink clock asynchronism introduces offsets that destroy inter-packet coherence and hinder high-resolution sensing, while multi-user observations exhibit exploitable cross-user correlation. We therefore formulate an asynchronous multi-user uplink OFDMA sensing model and exploit common delay-cluster sparsity across UEs. A line-of-sight (LoS)-referenced calibration first suppresses the offsets, after which a shared-private delay-domain sparse Bayesian learning (SBL) model is used for delay support recovery and user grouping. Doppler and angle of arrival are then estimated from temporal and spatial phase differences. Simulation results show that the proposed scheme outperforms per-user processing, particularly under limited subcarrier budgets and in low signal-to-noise ratio (SNR) regimes.
The recently emerged movable antenna (MA) and fluid antenna technologies offer promising solutions to enhance the spatial degrees of freedom in wireless systems by dynamically adjusting the positions of transmit or receive antennas within given regions. In this paper, we aim to address the joint optimization problem of antenna positioning and beamforming in MA-aided multi-user downlink transmission systems. This problem involves mixed discrete antenna position and continuous beamforming weight variables, along with coupled distance constraints on antenna positions, which pose significant challenges for optimization algorithm design. To overcome these challenges, we propose an end-to-end deep learning framework, consisting of a positioning model that handles the discrete variables and the coupled constraints, and a beamforming model that handles the continuous variables. Simulation results demonstrate that the proposed framework achieves superior sum rate performance, yet with much reduced computation time compared to existing methods.
Spiking Neural Networks (SNNs) offer an energy efficient alternative to conventional Artificial Neural Networks (ANNs) but typically still require a large number of parameters. This work introduces Linearized Bregman Iterations (LBI) as an optimizer for training SNNs, enforcing sparsity through iterative minimization of the Bregman distance and proximal soft thresholding updates. To improve convergence and generalization, we employ the AdaBreg optimizer, a momentum and bias corrected Bregman variant of Adam. Experiments on three established neuromorphic benchmarks, i.e. the Spiking Heidelberg Digits (SHD), the Spiking Speech Commands (SSC), and the Permuted Sequential MNIST (PSMNIST) datasets, show that LBI based optimization reduces the number of active parameters by about 50% while maintaining accuracy comparable to models trained with the Adam optimizer, demonstrating the potential of convex sparsity inducing methods for efficient neuromorphic learning.
Empirical path loss models are defined for a specific antenna system used during measurements and characterized by a particular radiation pattern and main lobe beam width. In this paper, we propose a novel approach to modifying such a model to estimate path loss for antenna systems with different radiation patterns and beam widths. This method is based on a multi-elliptical propagation model, enabling a more flexible adaptation of the path loss model. The paper presents the general concept of the proposed method and numerical study results demonstrating the influence of the antenna pattern shape and its beam width on path loss estimation.
This paper analyzes consensus in multi-agent systems under uniform and nonuniform communication delays, a key challenge in distributed coordination with applications to robotic swarms. It investigates the convergence of a consensus algorithm accounting for delays across communication links in a connected, undirected graph. Novel convergence results are derived using Rouché's theorem and Lyapunov-based stability analysis. The system is shown to reach consensus at a steady-state value given by a weighted average determined by the delay distribution, with stability ensured under explicit parameter bounds. Both uniform and nonuniform delay scenarios are analyzed, and the corresponding convergence values are explicitly derived. The theoretical results are validated through simulations, which explore the impact of delay heterogeneity on consensus outcomes. Furthermore, the algorithm is implemented and experimentally tested on a swarm of QBOT3 ground robots to solve the rendezvous problem, demonstrating the agents' ability to converge to a common location despite realistic communication constraints, thus confirming the algorithm's robustness and practical applicability. The results provide guidelines for designing consensus protocols that tolerate communication delays, offer insights into the relationship between network delays and coordination performance, and demonstrate their applicability to distributed robotic systems.
This article presents a deep learning-driven inverse design methodology for Doherty power amplifiers (PA) with multi-port pixelated output combiner networks. A deep convolutional neural network (CNN) is developed and trained as an electromagnetic (EM) surrogate model to accurately and rapidly predict the S-parameters of pixelated passive networks. By leveraging the CNN-based surrogate model within a blackbox Doherty framework and a genetic algorithm (GA)-based optimizer, we effectively synthesize complex Doherty combiners that enable an extended back-off efficiency range using fully symmetrical devices. As a proof of concept, we designed and fabricated two Doherty PA prototypes incorporating three-port pixelated combiners, implemented with GaN HEMT transistors. In measurements, both prototypes demonstrate a maximum drain efficiency exceeding 74% and deliver an output power surpassing 44.1 dBm at 2.75 GHz. Furthermore, a measured drain efficiency above 52% is maintained at the 9-dB back-off power level for both prototypes at the same frequency. To evaluate linearity and efficiency under realistic signal conditions, both prototypes are tested using a 20-MHz 5G new radio (NR)-like waveform exhibiting a peak-to-average power ratio (PAPR) of 9.0 dB. After applying digital predistortion (DPD), each design achieves an average power added efficiency (PAE) above 51%, while maintaining an adjacent channel leakage ratio (ACLR) better than -60.8 dBc.
This paper develops a self-contained framework for studying a mobility-aware intelligent reflecting surface (IRS)-assisted multi-node uplink under simplified but explicit modeling assumptions. The considered system combines direct and IRS-assisted narrowband propagation, geometric IRS phase control with finite-bit phase quantization, adaptive IRS-user focusing based on inverse-rate priority weights, and sequential channel allocation guided by energy detection. The analytical development is restricted to a physics-based two-hop cascaded path-loss formulation with appropriate scaling, an expectation-level reflected-power characterization under the stated independence assumptions, and the exact chi-square threshold for energy detection, together with its large-sample Gaussian approximation. A MATLAB implementation is used to generate a sample run, which is interpreted as a numerical example. This work is intended as a consistent, practically-aligned baseline to support future extensions involving richer mobility models or more advanced scheduling policies.
Urban traffic congestion is a key challenge for the development of modern cities, requiring advanced control techniques to optimize existing infrastructures usage. Despite the extensive availability of data, modeling such complex systems remains an expensive and time consuming step when designing model-based control approaches. On the other hand, machine learning approaches require simulations to bootstrap models, or are unable to deal with the sparse nature of traffic data and enforce hard constraints. We propose a novel formulation of traffic dynamics based on behavioral systems theory and apply data-enabled predictive control to steer traffic dynamics via dynamic traffic light control. A high-fidelity simulation of the city of Zürich, the largest closed-loop microscopic simulation of urban traffic in the literature to the best of our knowledge, is used to validate the performance of the proposed method in terms of total travel time and CO2 emissions.
Automated process control systems (APCS) are widely used in modern industrial enterprises. They address three key objectives: ensuring the required quality of manufactured products, ensuring process safety for people and the environment, and reducing capital and operating costs. At large industrial enterprises, APCSs are typically geographically distributed and characterized by a large number of monitored parameters. Such systems often consist of several subsystems built using various technical means and serving different functional purposes. APCSs usually have a hierarchical structure consisting of several levels, where each level hosts commercially available technical devices with predetermined characteristics. This article examines the engineering problem of selecting an optimal software and hardware structure for a distributed process control system applied to a continuous process in the chemical industry. A formal formulation of the optimization problem is presented, in which the hierarchical structure of the system is represented as an acyclic graph. Optimization criteria and constraints are defined. A solution method based on a metaheuristic ant colony optimization algorithm, widely used for this class of problems, is proposed. A brief overview of the developed software tool used to solve a number of numerical examples is provided. The experimental results are discussed, along with parameter selection and possible algorithm modifications aimed at improving solution quality. Information on the verification of the control system implemented using the selected software and hardware structure is presented, and directions for further research are outlined.
This work investigates the radio resource management (RRM) design for downlink integrated sensing and communications (ISAC) systems, jointly optimizing timeslot allocation, beam adaptation, functionality selection, and user-target pairing, with the goal of economizing resource consumption under imperfect information. Timeslot allocation assigns a number of discrete channel uses to targets and users, while beam adaptation selects transmit and receive beams with suitable directions, power levels, and beamwidths. Functionality selection determines whether each timeslot is used for sensing, communication, or their simultaneous operation, while user-target pairing specifies which users and targets are jointly served within the same timeslot. To ensure reliable operation, information imperfections arising from motion, quantization, feedback delays, and hardware limitations are considered. Resource economization is achieved by minimizing energy and time consumption through a multi-objective function, with strict prioritization of time savings. The resulting RRM problem is formulated as a semi-infinite, nonconvex mixed-integer nonlinear program (MINLP). Given the lack of generic methods for solving such problems, we propose a tailor-made approach that exploits the underlying structure of the problem to uncover hidden convexities. This enables an exact reformulation as a mixed-integer semidefinite program (MISDP), which can be solved to global optimality. Simulations reveal important interdependencies among the considered RRM components and show that the proposed approach achieves substantial performance improvements over baseline schemes, with gains up to 88%.
This paper presents a Head-Related Transfer Function (HRTF)-guided framework for binaural Target Speaker Extraction (TSE) from mixtures of concurrent sources. Unlike conventional TSE methods based on Direction of Arrival (DOA) estimation or enrollment signals, which often distort perceived spatial location, the proposed approach leverages the listener's HRTF as an explicit spatial prior. The proposed framework is built upon a multi-channel deep blind source separation backbone, adapted to the binaural TSE setting. It is trained on measured HRTFs from a diverse population, enabling cross-listener generalization rather than subject-specific tuning. By conditioning the extraction on HRTF-derived spatial information, the method preserves binaural cues while enhancing speech quality and intelligibility. The performance of the proposed framework is validated through simulations and real recordings obtained from a head and torso simulator (HATS).
In many multi-agent systems of practical interest, such as traffic networks or crowd evacuation, control actions cannot be exerted on all agents. Instead, controllable leaders must indirectly steer uncontrolled followers through local interactions. Existing results address either leader-follower density control of simple, unperturbed multi-agent systems or robust density control of a single directly actuated population, but not their combination. We bridge this gap by deriving a coupled continuum description for leaders and followers subject to unknown bounded perturbations, and designing a macroscopic feedback law that guarantees global asymptotic convergence of the followers' density to a desired distribution. The coupled stability of the leader-follower system is analyzed via singular perturbation theory, and an explicit lower bound on the leader-to-follower mass ratio required for feasibility is derived. Numerical simulations on heterogeneous biased random walkers validate our theoretical findings.
Nonlinear filtering with standard PF methods requires mitigative techniques to quell weight degeneracy, such as resampling. This is especially true in high-dimensional systems with sparse observations. Unfortunately, such techniques are also fragile when applied to systems with exceedingly rare events. Nonlinear systems with these properties can be assimilated effectively with a control-based PF method known as the nPF, but this method has a high computational cost burden. In this work, we aim to retain this strength of the nudged method while reducing the computational cost by introducing a variational method into the algorithm that acts as a continuous pseudo-observation path. By maintaining a PF representation, the resulting algorithm continues to capture an approximation of the filtering distribution, while reducing computational runtime and improving robustness to the "rare" event of switching phases. Preliminary testing of the new approach is demonstrated on a stochastic variant of the nonlinear and chaotic L63 model, which is used as a surrogate for mimicking "rare" events. The new approach helps to overcome difficulties in applying the nPF for realistic problems and performs favorably with respect to a standard PF with a higher number of particles.
Emerging large-scale engineering systems rely on distributed fusion for situational awareness, where agents combine noisy local sensor measurements with exchanged information to obtain fused estimates. However, at the sheer scale of these systems, tracking cross-correlations becomes infeasible, preventing the use of optimal filters. Covariance intersection (CI) methods address fusion problems with unknown correlations by minimizing worst-case uncertainty based on available information. Existing CI extensions exploit limited correlation knowledge but cannot incorporate structural knowledge of correlation from multiple sources, which naturally arises in distributed fusion problems. This paper introduces Overlapping Covariance Intersection (OCI), a generalized CI framework that accommodates this novel information structure. We formalize the OCI problem and establish necessary and sufficient conditions for feasibility. We show that a family-optimal solution can be computed efficiently via semidefinite programming, enabling real-time implementation. The proposed tools enable improved fusion performance for large-scale systems while retaining robustness to unknown correlations.
Forecasting permafrost thaw from aerial lidar requires projecting 3D point cloud features onto 2D prediction grids, yet naive aggregation methods destroy the vertical structure critical in forest environments where ground, understory, and canopy carry distinct information about subsurface conditions. We propose a projection decoder with learned height embeddings that enable height-dependent feature transformations, allowing the network to differentiate ground-level signals from canopy returns. Combined with stratified sampling that ensures all forest strata remain represented, our approach preserves the vertical information critical for predicting subsurface conditions. Our approach pairs this decoder with a Point Transformer V3 encoder to predict dense thaw depth maps from drone-collected lidar over boreal forest in interior Alaska. Experiments demonstrate that z-stratified projection outperforms standard averaging-based methods, particularly in areas with complex vertical vegetation structure. Our method enables scalable, high-resolution monitoring of permafrost degradation from readily deployable UAV platforms.
We develop LENORI, a Large Event Number of Outages Resilience Index measuring distribution system resilience with the number of forced line outages observed in large extreme events. LENORI is calculated from standard utility outage data. The statistical accuracy of LENORI is ensured by taking the logarithm of the outage data. A related Average Large Event Number of Outages metric ALENO is also developed, and both metrics are applied to a distribution system to quantify the power grid strength relative to the extreme events stressing the grid. The metrics can be used to track resilience and quantify the contributions of various types of hazards to the overall resilience.
Accurate probabilistic modeling of the power system restoration process is essential for resilience planning, operational decision-making, and realistic simulation of resilience events. In this work, we develop data-driven probabilistic models of the restoration process using outage data from four distribution utilities. We decompose restoration into three components: normalized restore time progression, total restoration duration, and the time to first restore. The Beta distribution provides the best-pooled fit for restore time progression, and the Uniform distribution is a defensible, parsimonious approximation for many events. Total duration is modeled as a heteroskedastic Lognormal process that scales superlinearly with event size. The time to first restore is well described by a Gamma model for moderate and large events. Together, these models provide an end-to-end stochastic model for Monte Carlo simulation, probabilistic duration forecasting, and resilience planning that moves beyond summary statistics, enabling uncertainty-aware decision support grounded in utility data.
We propose a data-driven linear modeling framework for controlled nonlinear hereditary systems that combines Koopman lifting with a truncated Grunwald-Letnikov memory term. The key idea is to model nonlinear state dependence through a lifted observable representation while imposing history dependence directly in the lifted coordinates through fixed fractional-difference weights. This preserves linearity in the lifted state-transition and input matrices, yielding a memory-compensated regression that can be identified from input-state data by least squares and extending standard Koopman-based identification beyond the Markovian setting. We further derive an equivalent augmented Markovian realization by stacking a finite window of lifted states, thereby rewriting the finite-memory recursion as a standard discrete-time linear state-space model. Numerical experiments on a nonlinear hereditary benchmark with a non-Grunwald-Letnikov Prony-series ground-truth kernel demonstrate improved multi-step open-loop prediction accuracy relative to memoryless Koopman and non-lifted state-space baselines.
Real-time object detection in AR/VR systems faces critical computational constraints, requiring sub-10\,ms latency within tight power budgets. Inspired by biological foveal vision, we propose a two-stage pipeline that combines differentiable weightless neural networks for ultra-efficient gaze estimation with attention-guided region-of-interest object detection. Our approach eliminates arithmetic-intensive operations by performing gaze tracking through memory lookups rather than multiply-accumulate computations, achieving an angular error of $8.32^{\circ}$ with only 393 MACs and 2.2 KiB of memory per frame. Gaze predictions guide selective object detection on attended regions, reducing computational burden by 40-50\% and energy consumption by 65\%. Deployed on the Arduino Nano 33 BLE, our system achieves 48.1\% mAP on COCO (51.8\% on attended objects) while maintaining sub-10\,ms latency, meeting stringent AR/VR requirements by improving the communication time by $\times 177$. Compared to the global YOLOv12n baseline, which achieves 39.2\%, 63.4\%, and 83.1\% accuracy for small, MEDium, and LARGE objects, respectively, the ROI-based method yields 51.3\%, 72.1\%, and 88.1\% under the same settings. This work shows that memory-centric architectures with explicit attention modeling offer better efficiency and accuracy for resource-constrained wearable platforms than uniform processing.
We demonstrate continuous distributed acoustic sensing over a 4400km long undersea cable. Bi-directional operation improves the strain signal-to-noise rate by >20dB, enabling 88000 50-m-spaced measurement points at a nominal telecom launch power.
We present an approach to approximate reachable sets for linear systems with bounded L-infinity controls in finite time. Our first approach investigates the boundaries of these sets and reveals an exact characterization for single-input, planar systems with real, distinct eigenvalues. The second approach leverages convergence of the Lp-norms to L-infinity and uses Lp-norm reachable sets as an approximation of the L-infinity-norm reachable sets. Our optimal control results yield insights that make computational approximations of the Lp-norm reachable sets more tractable, and yield exact characterizations for L-infinity with the previous assumptions on the system. As an example, we incorporate our reachability analysis into the design optimization of a highly-maneuverable aircraft. Introducing constraints based on reachability allow us to factor physical limitations to desired flight maneuvers into the design process.
Game theory provides the gold standard for analyzing adversarial engagements, offering strong optimality guarantees. However, these guarantees often become brittle when assumptions such as perfect information are violated. Reinforcement learning (RL), by contrast, is adaptive but can be sample-inefficient in large, complex domains. This paper introduces a hybrid approach that leverages game-theoretic insights to improve RL training efficiency. We study a border defense game with limited perceptual range, where defender performance depends on both search and pursuit strategies, making classical differential game solutions inapplicable. Our method employs the Apollonius Circle (AC) to compute equilibrium in the post-detection phase, enabling early termination of RL episodes without learning pursuit dynamics. This allows RL to concentrate on learning search strategies while guaranteeing optimal continuation after detection. Across single- and multi-defender settings, this early termination method yields 10-20% higher rewards, faster convergence, and more efficient search trajectories. Extensive experiments validate these findings and demonstrate the overall effectiveness of our approach.
We expand our novel computational method for unit commitment (UC) to include long-horizon planning. We introduce a fast novel algorithm to commit hydro-generators, provably accurately. We solve problems with thousands of generators at 5 minute market intervals. We show that our method can solve interconnect size UC problems in approximately 1 minute on a commodity hardware and that an increased planning horizon leads to sizable operational cost savings (our objective). This scale is infeasible for current state-of-the-art tools. We attain this runtime improvement by introducing a heuristic tailored for UC problems. Our method can be implemented using existing continuous optimization solvers and adapted for different applications. Combined, the two algorithms would allow an operator operating large systems with hydro units to make horizon-aware economic decisions.
Abel's classic transformation shows that any well-posed system with time-varying delay is equivalent to a parameter-varying system with fixed delay. The existence of such a parameter-varying constant delay representation then simplifies the problems of stability analysis and optimal control. Unfortunately, the method for construction of such transformations has been ad-hoc -- requiring an iterative time-stepping approach to constructing the transformation beginning with a seed function subject to boundary-value constraints. Moreover, a poor choice of seed function often results in a constant delay representation with large time-variations in system parameters -- obviating the benefits of such a representation. In this paper, we show how the set of all feasible seed functions can be parameterized using a basis for $L_2$. This parameterization is then used to search for seed functions for which the corresponding time-transformation results in smaller parameter variation. The parameterization of admissible seed functions is illustrated with numerical examples that contrast how well-chosen and poorly chosen seed functions affect the boundedness of a time transformation.
A single egocentric image typically captures only a small portion of the floor, yet a complete metric traversability map of the surroundings would better serve applications such as indoor navigation. We introduce FlatLands, a dataset and benchmark for single-view bird's-eye view (BEV) floor completion. The dataset contains 270,575 observations from 17,656 real metric indoor scenes drawn from six existing datasets, with aligned observation, visibility, validity, and ground-truth BEV maps, and the benchmark includes both in- and out-of-distribution evaluation protocols. We compare training-free approaches, deterministic models, ensembles, and stochastic generative models. Finally, we instantiate the task as an end-to-end monocular RGB-to-floormaps pipeline. FlatLands provides a rigorous testbed for uncertainty-aware indoor mapping and generative completion for embodied navigation.
Autonomous driving is undergoing a shift from modular rule based pipelines toward end to end (E2E) learning systems. This paper examines this transition by tracing the evolution from classical sense perceive plan control architectures to large driving models (LDMs) capable of mapping raw sensor input directly to driving actions. We analyze recent developments including Tesla's Full Self Driving (FSD) V12 V14, Rivian's Unified Intelligence platform, NVIDIA Cosmos, and emerging commercial robotaxi deployments, focusing on architectural design, deployment strategies, safety considerations and industry implications. A key emerging product category is supervised E2E driving, often referred to as FSD (Supervised) or L2 plus plus, which several manufacturers plan to deploy from 2026 onwards. These systems can perform most of the Dynamic Driving Task (DDT) in complex environments while requiring human supervision, shifting the driver's role to safety oversight. Early operational evidence suggests E2E learning handles the long tail distribution of real world driving scenarios and is becoming a dominant commercial strategy. We also discuss how similar architectural advances may extend beyond autonomous vehicles (AV) to other embodied AI systems, including humanoid robotics.
Model-based design of experiments (MBDOE) is essential for efficient parameter estimation in nonlinear dynamical systems. However, conventional adaptive MBDOE requires costly posterior inference and design optimization between each experimental step, precluding real-time applications. We address this by combining Deep Adaptive Design (DAD), which amortizes sequential design into a neural network policy trained offline, with differentiable mechanistic models. For dynamical systems with known governing equations but uncertain parameters, we extend sequential contrastive training objectives to handle nuisance parameters and propose a transformer-based policy architecture that respects the temporal structure of dynamical systems. We demonstrate the approach on four systems of increasing complexity: a fed-batch bioreactor with Monod kinetics, a Haldane bioreactor with uncertain substrate inhibition, a two-compartment pharmacokinetic model with nuisance clearance parameters, and a DC motor for real-time deployment.
This paper proposes a decentralized design approach of consensus protocols of multi-agent systems via a directed-spanning-tree(DST)-based linear transformation and the corresponding minimal communication links. First, the consensus problem of multi-agent systems is transformed into the decentralized output stabilization problem by constructing a linear transformation based on a DST of the communication topology, and thus a necessary and sufficient consensus criterion in terms of decentralized fixed mode is derived. Next, a new distributed protocol is designed by using only the neighbors information on the DST, which is a fully decentralized design approach. Finally, some numerical examples are given to verify the results attained.
This paper investigates a decentralized design approach of leader-following consensus protocols for heterogeneous multiagent systems under a fixed communication topology with a directed spanning tree (DST) and asymmetric weight matrix. First, a control protocol using only the information of the neighbor on the DST of each agent is designed, which is called the consensus protocol with minimal communication links. Particularly, the DST-based linear transformation method is used to transform the consensus problem into a partial variable stability problem of a corresponding system, and a decentralized design method is proposed to find the gain matrices in the protocols. Next, the decentralized design approach is extended to the protocols using all neighbor information in the original communication topology with the help of the matrix diagonally dominant method. Some numerical simulations are given to illustrate the theoretical results.
Numerical optimal control is commonly divided between globally structured but dimensionally intractable Hamilton-Jacobi-Bellman (HJB) methods and scalable but local trajectory optimization. We introduce the Featurized Occupation Measure (FOM), a finite-dimensional primal-dual interface for the occupation-measure formulation that unifies trajectory search and global HJB-type certification. FOM is broad yet numerically tractable, covering both explicit weak-form schemes and implicit simulator- or rollout-based sampling methods. Within this framework, approximate HJB subsolutions serve as intrinsic numerical certificates to directly evaluate and guide the primal search. We prove asymptotic consistency with the exact infinite-dimensional occupation-measure problem, and show that for block-organized feasible certificates, finite-dimensional approximation preserves certified lower bounds with blockwise error and complexity control. We also establish persistence of these lower bounds under time shifts and bounded model perturbations. Consequently, these structural properties render global certificates into flexible, reusable computational objects, establishing a systematic basis for certificate-guided optimization in nonlinear control.
Current Text-to-Speech (TTS) systems typically use separate models for speech-prompted and text-prompted timbre control. While unifying both control signals into a single model is desirable, the challenge of cross-modal alignment often results in overly complex architectures and training objective. To address this challenge, we propose CAST-TTS, a simple yet effective framework for unified timbre control. Features are extracted from speech prompts and text prompts using pre-trained encoders. The multi-stage training strategy efficiently aligns the speech and projected text representations within a shared embedding space. A single cross-attention mechanism then allows the model to use either of these representations to control the timbre. Extensive experiments validate that the unified cross-attention mechanism is critical for achieving high-quality synthesis. CAST-TTS achieves performance comparable to specialized single-input models while operating within a unified architecture. The demo page can be accessed at this https URL.
Entity recognition in Automatic Speech Recognition (ASR) is challenging for rare and domain-specific terms. In domains such as finance, medicine, and air traffic control, these errors are costly. If the entities are entirely absent from the ASR output, post-ASR correction becomes difficult. To address this, we introduce RECOVER, an agentic correction framework that serves as a tool-using agent. It leverages multiple hypotheses as evidence from ASR, retrieves relevant entities, and applies Large Language Model (LLM) correction under constraints. The hypotheses are used using different strategies, namely, 1-Best, Entity-Aware Select, Recognizer Output Voting Error Reduction (ROVER) Ensemble, and LLM-Select. Evaluated across five diverse datasets, it achieves 8-46% relative reductions in entity-phrase word error rate (E-WER) and increases recall by up to 22 percentage points. The LLM-Select achieves the best overall performance in entity correction while maintaining overall WER.
Parallel simulation and control of large-scale robotic systems often rely on partitioned time stepping, yet finite-iteration coupling can inject spurious energy by violating power consistency--even when each subsystem is passive. This letter proposes a novel energy-safe, early-terminable iterative coupling for port-Hamiltonian subsystems by embedding a Douglas--Rachford (DR) splitting scheme in scattering (wave) coordinates. The lossless interconnection is enforced as an orthogonal constraint in the wave domain, while each subsystem contributes a discrete-time scattering port map induced by its one-step integrator. Under a discrete passivity condition on the subsystem time steps and a mild impedance-tuning condition, we prove an augmented-storage inequality certifying discrete passivity of the coupled macro-step for any finite inner-iteration budget, with the remaining mismatch captured by an explicit residual. As the inner budget increases, the partitioned update converges to the monolithic discrete-time update induced by the same integrators, yielding a principled, adaptive accuracy--compute trade-off, supporting energy-consistent real-time parallel simulation under varying computational budgets. Experiments on a coupled-oscillator benchmark validate the passivity certificates at numerical roundoff (on the order of 10e-14 in double precision) and show that the reported RMS state error decays monotonically with increasing inner-iteration budgets, consistent with the hard-coupling limit.
Space-air-ground integrated networks (SAGIN) promise ubiquitous 6G connectivity but face significant resource management challenges due to heterogeneous infrastructure, dynamic topologies, and stringent quality-of-service (QoS) requirements. Conventional model-driven approaches struggle with scalability and adaptability in such complex environments. This paper presents an agentic artificial intelligence (AI) framework for autonomous SAGIN resource management by embedding large language model (LLM)-based agents into a Monitor-Analyze-Plan- Execute-Knowledge (MAPE-K) control plane. The framework incorporates three specialized agents, namely semantic resource perceivers, intent-driven orchestrators, and adaptive learners, that collaborate through natural language reasoning to bridge the gap between operator intents and network execution. A key innovation is the hierarchical agent-reinforcement learning (RL) collaboration mechanism, wherein LLM-based orchestrators dynamically shape reward functions for RL agents based on semantic network conditions. Validation through UAV-assisted AIGC service orchestration in energy-constrained scenarios demonstrates that LLM-driven reward shaping achieves 14% energy reduction and the lowest average service latency among all compared methods. This agentic paradigm offers a scalable pathway toward adaptive, AI-native 6G networks, capable of autonomously interpreting intents and adapting to dynamic environments.
The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.
Non-conservative uncertainty bounds are essential for making reliable predictions about latent functions from noisy data--and thus, a key enabler for safe learning-based control. In this domain, kernel methods such as Gaussian process regression are established techniques, thanks to their inherent uncertainty quantification mechanism. Still, existing bounds either pose strong assumptions on the underlying noise distribution, are conservative, do not scale well in the multi-output case, or are difficult to integrate into downstream tasks. This paper addresses these limitations by presenting a tight, distribution-free bound for multi-output kernel-based estimates. It is obtained through an unconstrained, duality-based formulation, which shares the same structure of classic Gaussian process confidence bounds and can thus be straightforwardly integrated into downstream optimization pipelines. We show that the proposed bound generalizes many existing results and illustrate its application using an example inspired by quadrotor dynamics learning.
Conventional mobile tensegrity robots constructed with straight links offer mobility at the cost of locomotion speed. While spherical robots provide highly effective rolling behavior, they often lack the stability required for navigating unstructured terrain common in many space exploration environments. This research presents a solution with a semi-circular, curved-link tensegrity robot that strikes a balance between efficient rolling locomotion and controlled stability, enabled by discontinuities present at the arc endpoints. Building upon an existing geometric static modeling framework [1], this work presents the system design of an improved Tensegrity eXploratory Robot 2 (TeXploR2). Internal shifting masses instantaneously roll along each curved-link, dynamically altering the two points of contact with the ground plane. Simulations of quasistatic, piecewise continuous locomotion sequences reveal new insights into the positional displacement between inertial and body frames. Non-intuitive rolling behaviors are identified and experimentally validated using a tetherless prototype, demonstrating successful dynamic locomotion. A preliminary impact test highlights the tensegrity structure's inherent shock absorption capabilities and conformability. Future work will focus on finalizing a dynamic model that is experimentally validated with extended testing in real-world environments as well as further refinement of the prototype to incorporate additional curved-links and subsequent ground contact points for increased controllability.
This article presents an optimal-transport (OT)-driven, distributionally robust attack detection algorithm, OT-DETECT, for cyber-physical systems (CPS) modeled as partially observed linear stochastic systems. The underlying detection problem is formulated as a minmax optimization problem using 1-Wasserstein ambiguity sets constructed from observer residuals under both the nominal (attack-free) and attacked regimes. We show that the minmax detection problem can be reduced to a finite-dimensional linear program for computing the worst-case distribution (WCD). Off-support residuals are handled via a kernel-smoothed score function that drives a CUSUM procedure for sequential detection. We also establish a non-asymptotic tail bound on the false-positive error of the CUSUM statistic under the nominal (attack-free) condition, under mild assumptions. Numerical illustrations are provided to evaluate the robustness properties of OT-DETECT.
We consider nonlinear model predictive control (MPC) schemes using surrogate models in the optimization step based on input-output data only. We establish exponential stability for sufficiently long prediction horizons assuming exponential stabilizability and a proportional error bound. Moreover, we verify the imposed condition on the approximation using kernel interpolation and demonstrate the practical applicability to nonlinear systems with a numerical example.
Stochastic resetting, where a dynamical process is intermittently returned to a fixed reference state, has emerged as a powerful mechanism for optimizing first-passage properties. Existing theory largely treats static, non-learning processes. Here we ask how stochastic resetting interacts with reinforcement learning, where the underlying dynamics adapt through experience. In tabular grid environments, we find that resetting accelerates policy convergence even when it does not reduce the search time of a purely diffusive agent, indicating a novel mechanism beyond classical first-passage optimization. In a continuous control task with neural-network-based value approximation, we show that random resetting improves deep reinforcement learning when exploration is difficult and rewards are sparse. Unlike temporal discounting, resetting preserves the optimal policy while accelerating convergence by truncating long, uninformative trajectories to enhance value propagation. Our results establish stochastic resetting as a simple, tunable mechanism for accelerating learning, translating a canonical phenomenon of statistical mechanics into an optimization principle for reinforcement learning.
This paper proposes the first fully distributed algorithm for finding the Generalized Nash Equilibrium (GNE) of a non-cooperative game with shared coupling constraints and general cost coupling at a user-prescribed finite time T. As a foundation, a centralized gradient-based prescribed-time convergence result is established for the GNE problem, extending the optimization Lyapunov function framework to gradient dynamics, the only known realization among existing alternatives that naturally decomposes into per-agent computations. Building on this, a fully distributed architecture is designed in which each agent concurrently runs three coupled dynamics: a prescribed-time distributed state observer, a gradient-based optimization law, and a dual consensus mechanism that enforces the shared-multiplier requirement of the variational GNE, thus guaranteeing convergence to the same solution as the centralized case. The simultaneous operation of these layers creates bidirectional perturbations between consensus and optimization, which are resolved through gain synchronization that matches the temporal singularities of the optimization and consensus layers, ensuring all error components vanish exactly at T. The Fischer-Burmeister reformulation renders the algorithm projection-free and guarantees constraint satisfaction at the deadline. Numerical simulations on a Nash-Cournot game and a time-critical sensor coverage problem validate the approach.
An integrate-and-fire time-encoding machine (IF-TEM) is an effective asynchronous sampler that translates amplitude information into non-uniform time sequences. In this work, we propose a novel Adaptive IF-TEM (AIF-TEM) approach. This design dynamically adjusts the TEM's sensitivity to changes in the input signal's amplitude and frequency in real-time. We provide a comprehensive analysis of AIF-TEM's oversampling and distortion properties. By the adaptive adjustments, AIF-TEM as we show can achieve significant performance improvements in terms of sampling rate-distortion in a practical finite regime. We demonstrate empirically that in the scenarios tested AIF-TEM outperforms classical IF-TEM and traditional Nyquist (i.e., periodic) sampling methods for band-limited signals. In terms of Mean Square Error (MSE), the reduction reaches at least 12dB (fixing the oversampling rate). Additionally, we investigate the quantization process for AIF-TEM and analyze the quantization MSE bound. Empirical results show that classic quantization for AIF-TEM improves performance by at least 14 dB compared to IF-TEM. We introduce a dynamic quantization technique for AIF-TEM, which further improves performance compared to classic quantization. Empirically, this reduction reaches at least 10 dB compared to classic quantization for AIF-TEM.
Deep learning models for pulmonary disease screening from Computed Tomography (CT) scans promise to alleviate the immense workload on radiologists. Still, their high computational cost, stemming from processing entire 3D volumes, remains a major barrier to widespread clinical adoption. Current sub-sampling techniques often compromise diagnostic integrity by introducing artifacts or discarding critical information. To overcome these limitations, we propose an Efficient and Reliable Framework (ERF) that fundamentally improves the practicality of automated CT analysis. Our framework introduces two core innovations: (1) A Cluster-based Sub-Sampling (CSS) method that efficiently selects a compact yet comprehensive subset of CT slices by optimizing for both representativeness and diversity. By integrating an efficient k-nearest neighbor search with an iterative refinement process, CSS bypasses the computational bottlenecks of previous methods while preserving vital diagnostic features. (2) An Ambiguity-aware Uncertainty Quantification (AUQ) mechanism, which enhances reliability by specifically targeting data ambiguity arising from subtle lesions and artifacts. Unlike standard uncertainty measures, AUQ leverages the predictive discrepancy between auxiliary classifiers to construct a specialized ambiguity score. By maximizing this discrepancy during training, the system effectively flags ambiguous samples where the model lacks confidence due to visual noise or intricate pathologies. Validated on two public datasets with 2,654 CT volumes across diagnostic tasks for 3 pulmonary diseases, ERF achieves diagnostic performance comparable to the full-volume analysis (over 90% accuracy and recall) while reducing processing time by more than 60%. This work represents a significant step towards deploying fast, accurate, and trustworthy AI-powered screening tools in time-sensitive clinical settings.
This paper introduces the weighted-sum energy efficiency (WSEE) as an advanced performance metric designed to represent the uplink energy efficiency (EE) of individual user equipment (UE) in a user-centric Cell-Free massive MIMO (CF-mMIMO) system more accurately. In a realistic user-centric CF-mMIMO context, each UE may exhibit distinct characteristics, such as maximum transmit power limits or specific minimum data rate requirements. By computing the EE of each UE independently and adjusting the weights accordingly, the system can accommodate these unique attributes, thus promoting energy-efficient operation. The uplink WSEE is formulated as a multiple-ratio fractional programming (FP) problem, representing a weighted sum of the EE of individual UEs, which depends on each UE's transmit power and the combining vector at the central processing unit (CPU). To effectively maximize WSEE, we develop optimization algorithms based on the quadratic transform (QT), which is effective for multiple-ratio FP. By applying QT sequentially to each user's EE and the uplink SINR, the method converts the nonconvex WSEE objective into tractable subproblems and ensures stable, monotone convergence. We further introduce an approximate variant that alleviates QT's inherent nonlinearities to accelerate convergence. Compared with global energy efficiency (GEE)-oriented baselines, the proposed algorithms yield simultaneous improvements in user power consumption and spectral efficiency, while also reducing optimization time. Overall, the framework provides a foundation for designing operational strategies tailored to specific system requirements.
Three-Dimensional Gaussian Splatting (3DGS) has shown substantial promise in the field of computer vision, but remains unexplored in the field of magnetic resonance imaging (MRI). This study explores its potential for the reconstruction of isotropic resolution 3D MRI from undersampled k-space data. We introduce a novel framework termed 3D Gaussian MRI (3DGSMR), which employs 3D Gaussian distributions as an explicit representation for MR volumes. Experimental evaluations indicate that this method can effectively reconstruct voxelized MR images, achieving a quality on par with that of well-established 3D MRI reconstruction techniques found in the literature. Notably, the 3DGSMR scheme operates under a self-supervised framework, obviating the need for extensive training datasets or prior model training. This approach introduces significant innovations to the domain, notably the adaptation of 3DGS to MRI reconstruction and the novel application of the existing 3DGS methodology to decompose MR signals, which are presented in a complex-valued format.
Control Lyapunov Functions (CLFs) and Control Barrier Functions (CBFs) can be combined, typically by means of Quadratic Programs (QPs), to design controllers that achieve performance and safety objectives. However, a significant limitation of this framework is the introduction of asymptotically stable equilibrium points besides the minimizer of the CLF, leading to deadlock situations even for simple systems and bounded convex unsafe sets. To address this problem, we propose a hybrid CLF-CBF control framework with global asymptotic stabilization and safety guarantees, offering a more flexible and systematic design methodology compared to current alternatives available in the literature. We further extend this framework to higher-order systems via a recursive procedure based on a joint CLF-CBF backstepping approach. The proposed solution is assessed through several simulation examples.
Multimodal physiological signals, such as EEG, ECG, EOG, and EMG, are crucial for healthcare and brain-computer interfaces. While existing methods rely on specialized architectures and dataset-specific fusion strategies, they struggle to learn universal representations that generalize across datasets and handle missing modalities at inference time. To address these issues, we propose PhysioOmni, a foundation model for multimodal physiological signal analysis that models both homogeneous and heterogeneous features to decouple multimodal signals and extract generic representations while maintaining compatibility with arbitrary missing modalities. PhysioOmni trains a decoupled multimodal tokenizer, enabling masked signal pre-training via modality-invariant and modality-specific objectives. To ensure adaptability to diverse and incomplete modality combinations, the pre-trained encoders undergo resilient fine-tuning with prototype alignment on downstream datasets. Extensive experiments on four downstream tasks, emotion recognition, sleep stage classification, motor prediction, and mental workload detection, demonstrate that PhysioOmni achieves state-of-the-art performance while maintaining strong robustness to missing modalities. Our code and model weights will be released.
In this paper, a learning-based approach is proposed for optimizing downlink beamforming in multiple-input multiple-output (MIMO) systems that employ continuous aperture arrays (CAPAs) at both the base station (BS) and the user. Beamforming in such systems is a spatially continuous function that maps a coordinate on the CAPA to a corresponding beamforming weight. We first propose an implicit neural representation (INR), termed BeaINR, to parameterize this function directly. Further, noting that the optimal beamforming function can be expressed as a weighted integral of the channel response function, we propose a second INR, CoefINR, to represent the weighting coefficient function, which indirectly optimizes the beamforming function. Simulation results show that the proposed INR-based methods achieve comparable or higher spectral efficiency (SE) than the considered baselines, while requiring substantially lower inference latency. Moreover, CoefINR reduces training complexity and improves frequency generalizability relative to BeaINR by leveraging the optimal beamforming structure.
Legacy and advanced receiver autonomous integrity monitoring (RAIM/ARAIM) rely on Gaussian error models that can be overly conservative for real-world non-Gaussian errors. This paper proposes an extended jackknife detector capable of detecting multiple simultaneous faults with non-Gaussian nominal errors. Furthermore, an integrity monitoring algorithm, jackknife ARAIM, is developed by systematically exploiting the properties of the jackknife detector in the range domain. We prove that the proposed method has equivalent monitoring performance with the solution separation (SS) ARAIM, but is significantly computationally efficient for single-fault cases with non-Gaussian nominal errors, while maintaining similar efficiency to SS ARAIM for multiple-fault cases. The proposed method is examined in worldwide simulations, with the nominal measurement error simulated based on authentic experimental data, which reveals different findings in existing research. In a single Global Positioning System (GPS) constellation setting, the proposed method can reduce the 99.5 percentile vertical protection level (VPL) below 45 m, outperforming 50 m VPL produced by the ARAIM algorithm using Gaussian nominal error models. In GPS-Galileo dual-constellation setting, while these Gaussian-based ARAIM algorithms suffer VPL inflation over 60 m due to Galileo's heavy-tailed errors, the proposed method maintains VPL below 40 m, achieving over 92% normal operations for 35 m Vertical Alert Limit. Moreover, we tentatively implement the SS ARAIM using non-Gaussian overbounds and compare it with the proposed Jackknife ARAIM method regarding computation efficiency. The proposed method achieves up to 59.4% reduction in median processing time compared to SS ARAIM in single-constellation scenarios.
Model order reduction techniques simplify high-dimensional dynamical systems by deriving lower-dimensional models that retain essential system characteristics. These techniques are crucial for the controller design of complex systems while significantly reducing computational costs. Nevertheless, constructing effective reduced-order models (ROMs) poses considerable challenges, particularly for nonlinear dynamical systems. These challenges are further exacerbated when the actual system model is unavailable, a scenario frequently encountered in real-world applications. In this work, we propose a data-driven framework for constructing ROMs of nonlinear dynamical systems with unknown mathematical models, enabling controller synthesis directly from the resulting ROMs. We establish similarity relations between the output trajectories of the original systems and those of their ROMs by employing the notion of simulation functions (SFs), thereby enabling a formal characterization of their closeness. To achieve this, we collect one set of noise-corrupted input-state data from the system during a finite-time experiment, upon which we propose conditions to construct both ROMs and SFs simultaneously. These conditions are formulated as data-dependent semidefinite programs. We demonstrate that the data-driven ROMs obtained can be employed to synthesize controllers for the original unknown systems, ensuring that they satisfy high-level logic specifications. This is accomplished by first designing controllers for the data-driven ROMs and then translating the results back to the original systems via interface functions, designed directly from the proposed data-dependent conditions. We evaluate the efficacy of our data-driven framework through two case studies, including a challenging benchmark from the model reduction literature: a circuit of chained inverter gates with 20 state variables.
Chance-constrained optimization has emerged as a promising framework for managing uncertainties in power systems. This work advances its application to the DC Optimal Power Flow (DC-OPF) model, developing a novel approach to uncertainty modeling and estimation. Current methods typically tackle these problems by first modeling random nodal injections using high-dimensional statistical distributions that scale with the number of buses, followed by deriving deterministic reformulations of the probabilistic constraints. We propose an alternative methodology that exploits the constraint structure to inform the uncertainties to be estimated, enabling significant dimensionality reduction. Rather than learning joint distributions of net-load forecast errors across units, we instead directly model the one-dimensional aggregate system forecast error and two-dimensional line errors weighted by power transfer distribution factors. We evaluate our approach under both Gaussian and non-Gaussian distributions on synthetic and real-world datasets, demonstrating significant improvements in statistical accuracy and optimization performance compared to existing methods.
Generative models of 3D cardiovascular anatomy can synthesize informative structures for clinical research and medical device evaluation, but face a trade-off between geometric controllability and realism. We propose CardioComposer: a programmable, inference-time framework for generating multi-class anatomical label maps from interpretable ellipsoidal primitives. These primitives represent geometric attributes such as the size, shape, and position of discrete substructures. We specifically develop differentiable measurement functions based on voxel-wise geometric moments, enabling loss-based gradient guidance during diffusion model sampling. We demonstrate that these losses can constrain individual geometric attributes in a disentangled manner and provide compositional control over multiple substructures. Finally, we show that our method is compatible with a broad range of anatomical systems containing non-convex substructures, spanning cardiac, vascular, and skeletal organs. We release our code at this https URL.
Code-switching automatic speech recognition (CS-ASR) presents unique challenges due to language confusion introduced by spontaneous intra-sentence switching and accent bias that blurs the phonetic boundaries. Although the constituent languages may be individually high-resource, the scarcity of annotated code-switching data further compounds these challenges. In this paper, we systematically analyze CS-ASR from both model-centric and data-centric perspectives. By comparing state-of-the-art algorithmic methods, including language-specific processing and auxiliary language-aware multi-task learning, we discuss their varying effectiveness across datasets with different linguistic characteristics. On the data side, we first investigate TTS as a data augmentation method. By varying the textual characteristics and speaker accents, we analyze the impact of language confusion and accent bias on CS-ASR. To further mitigate data scarcity and enhance textual diversity, we propose a prompting strategy by simplifying the equivalence constraint theory (SECT) to guide large language models (LLMs) in generating linguistically valid code-switching text. The proposed SECT outperforms existing methods in ASR performance and linguistic quality assessments, generating code-switching text that more closely resembles real-world code-switching text. When used to generate speech-text pairs via TTS, SECT proves effective in improving CS-ASR performance. Our analysis of both model- and data-centric methods underscores that effective CS-ASR requires strategies to be carefully aligned with the specific linguistic characteristics of the code-switching data.
In this work, we present and investigate the novel blind inverse problem of position-blind ptychography, i.e., ptychographic phase retrieval without any knowledge of scan positions, which then must be recovered jointly with the image. The motivation for this problem comes from single-particle diffractive X-ray imaging, where particles in random orientations are illuminated and a set of diffraction patterns is collected. If one uses a highly focused X-ray beam, the measurements would also become sensitive to the beam positions relative to each particle and therefore ptychographic, but these positions are also unknown. We investigate the viability of image reconstruction in a simulated, simplified 2-D variant of this difficult problem, using variational inference with modern data-driven image priors in the form of score-based diffusion models. We find that, with the right illumination structure and a strong prior, one can achieve reliable and successful image reconstructions even under measurement noise, in all except the most difficult evaluated imaging scenario.
This paper investigates a lightweight deep reinforcement learning (DRL)-assisted weighting framework for CSI-free multi-satellite positioning in LEO constellations, where each visible satellite provides one serving beam (one pilot response) per epoch. A discrete-action Deep Q-Network (DQN) learns satellite weights directly from received pilot measurements and geometric features, while an augmented weighted least squares (WLS) estimator provides physics-consistent localization and jointly estimates the receiver clock bias. The proposed hybrid design targets an accuracy-runtime trade-off rather than absolute supervised optimality. In a representative 2-D setting with 10 visible satellites, the proposed approach achieves sub-meter accuracy (0.395m RMSE) with low computational overhead, supporting practical deployment for resource-constrained LEO payloads.
Flow-Matching (FM)-based zero-shot text-to-speech (TTS) systems exhibit high-quality speech synthesis and robust generalization capabilities. However, the speaker representation ability of such systems remains underexplored, primarily due to the lack of explicit speaker-specific supervision in the FM framework. To this end, we conduct an empirical analysis of speaker information distribution and reveal its non-uniform allocation across time steps and network layers, underscoring the need for adaptive speaker alignment. Accordingly, we propose Time-Layer Adaptive Speaker Alignment (TLA-SA), a strategy that enhances speaker consistency by jointly leveraging temporal and hierarchical variations. Experimental results show that TLA-SA substantially improves speaker similarity over baseline systems on both research- and industrial-scale datasets and generalizes well across diverse model architectures, including decoder-only language model (LM)-based and free TTS systems. A demo is provided.
We propose analytical mean square error (MSE) expressions for the Kalman filter (KF) and the Kalman smoother (KS) for benchmark studies, where the true system dynamics are unknown or unavailable to the estimator. In such cases, as in benchmark evaluations for target tracking, the analysis relies on deterministic state trajectories. This setting introduces a model mismatch between the estimator and the true system, causing the covariance estimates to no longer reflect the actual estimation errors. To enable accurate performance prediction for deterministic state trajectories without relying on computationally intensive Monte Carlo simulations, we derive recursive MSE expressions with linear time complexity. The proposed framework also accounts for measurement model mismatch and provides an efficient tool for performance evaluation in benchmark studies involving long trajectories. Simulation results confirm the accuracy and computational efficiency of the proposed method.
This paper presents a safe output regulation control strategy for a class of systems modeled by a coupled $2\times 2$ hyperbolic PDE-ODE structure, subject to fully distributed disturbances throughout the system. A state-feedback controller is developed by the {nonovershooting backstepping} method to simultaneously achieve exponential output regulation and enforce safety constraints on the regulated output that is the state furthest from the control input. To handle unmeasurable states and external disturbances, a state observer and a disturbance estimator are designed. Explicit bounds on the estimation errors are derived and used to construct a robust safe regulator that accounts for the uncertainties. The proposed control scheme guarantees that: 1) If the regulated output is initially within the safe region, it remains there; otherwise, it will be rescued to the safety within a prescribed time; 2) The output tracking error converges to zero exponentially; 3) The observer accurately estimates both the distributed states and external disturbances, with estimation errors converging to zero exponentially; 4) All signals in the closed-loop system remain bounded. The effectiveness of the proposed method is demonstrated through a UAV delivery scenario with a cable-suspended payload, where the payload is regulated to track a desired reference while avoiding collisions with barriers.
Signal Temporal Logic (STL) provides a powerful framework to describe complex tasks involving temporal and logical behavior in dynamical systems. This work addresses controller synthesis for continuous-time systems subject to STL specifications and input constraints. We propose a neural network-based framework for synthesizing time-varying control barrier functions (TVCBF) and their corresponding controllers for systems to fulfill a fragment of STL specifications while respecting input constraints. We formulate barrier conditions incorporating the spatial and temporal logic of the given STL specification. We also incorporate a method to refine the time-varying set that satisfies the STL specification for the given input constraints. Additionally, we introduce a validity condition to provide formal safety guarantees across the entire state space. Finally, we demonstrate the effectiveness of the proposed approach through several simulation studies considering different STL tasks for various dynamical systems (including affine and non-affine systems).
Conventional robust H2/H-infinity control minimizes the worst-case performance, often leading to a conservative design driven by very rare parametric configurations. To reduce this conservatism while taking advantage of the stochastic properties of Monte Carlo sampling and its compatibility with parallel computing, we introduce an alternative paradigm that optimizes the controller with respect to a stochastic criterion, namely the conditional value at risk. We present the problem formulation and discuss several open challenges toward a general synthesis framework. The potential of this approach is illustrated on a mechanical system, where it significantly improves overall performance by tolerating some degradation in very rare worst-case scenarios.
Multimodal fusion is the default approach for combining heterogeneous sensor streams in industrial monitoring, yet no systematic method exists for determining \textit{when fusion degrades rather than improves} detection performance. We present an \textbf{Asymmetry-Aware Routing Framework} -- a three-step diagnostic procedure (unimodal performance gap, gate weight attribution, modality corruption testing) with formal decision criteria -- that routes multimodal systems toward the appropriate fusion strategy before deployment. We validate the framework on three datasets spanning two routing outcomes: (1)~the OHT/AGV industrial dataset (thermal + sensors, 13{,}121 samples), where the framework correctly identifies severe asymmetry (gap ratio 3.1$\times$) and recommends \textsc{cascade}; (2)~a chain conveyor fault detection scenario (audio + vibration), where moderate asymmetry leads to a \textsc{fuse} recommendation with positive fusion benefit; and (3)~the CWRU bearing dataset, providing controlled validation in both directions. Threshold sensitivity analysis across all three datasets shows that the framework's recommendations are robust to threshold perturbation, with correct routing maintained over a wide parameter plateau. Comparison against simpler diagnostics (gap ratio alone) reveals that Step~1 alone is ambiguous for moderate-asymmetry cases, demonstrating the necessity of the full protocol for reliable routing decisions.
To alleviate the pilot and CSI-feedback burden in 6G, channel knowledge map (CKM) has emerged as a promising approach that predicts CSI solely from user locations. Nevertheless, accurate location information is rarely available in current systems. Moreover, the uncertainty inherent to highly dynamic scenes further degrades the performance of existing schemes that typically assume quasi-static scenarios. In this paper, we propose a novel framework named location-agnostic dynamic CKM (LAD-CKM). Specifically, LAD-CKM is constructed through dynamic radio frequency (RF) radiance field rendering, which takes instantaneous uplink CSI and partial downlink CSI as inputs. To enable effective rendering, a dedicated radiator representation network (RARE-Net) is designed to capture the spatial-spectral correlations within the inputs. Furthermore, an adaptive deformation module is devised to deform the uplink CSI-based queries of RARE-Net according to instantaneous channel dynamics, thereby enhancing CSI prediction accuracy under mobility. In addition, a novel synthetic channel dataset is created in outdoor dynamic scenes via ray-tracing. Simulation results demonstrate that LAD-CKM yields significant performance gains compared with existing baselines in terms of effective data rate.
Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical annotations, and vendor diversity, hindering the development of robust models. We introduce LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to capture clinically relevant appearance variations often overlooked in existing benchmarks. This dataset contains 1824 images from 468 patients (960 benign, 864 malignant), with pathology-confirmed labels, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and includes both high- and low-energy imaging styles, enabling systematic analysis of vendor- and energy-induced domain shifts. To address these variations, we propose a foreground-only pixel-space alignment method (''energy harmonization'') that maps images to a low-energy reference while preserving lesion morphology. We benchmark CNN and transformer models on three clinically relevant tasks: diagnosis (benign vs. malignant), BI-RADS classification, and density estimation. Two-view models consistently outperform single-view models. EfficientNet-B0 achieves an AUC of 93.54% for diagnosis, while Swin-T achieves the best macro-AUC of 89.43% for density prediction. Harmonization improves performance across architectures and produces more localized Grad-CAM responses. Overall, LUMINA provides (1) a vendor-diverse benchmark and (2) a model-agnostic harmonization framework for reliable and deployable mammography AI.
Accurate classification of lung diseases from chest CT scans plays an important role in computer-aided diagnosis systems. However, medical imaging datasets often suffer from severe class imbalance, which may significantly degrade the performance of deep learning models, especially for minority disease categories. To address this issue, we propose a gender-aware two-stage lung disease classification framework. The proposed approach explicitly incorporates gender information into the disease recognition pipeline. In the first stage, a gender classifier is trained to predict the patient's gender from CT scans. In the second stage, the input CT image is routed to a corresponding gender-specific disease classifier to perform final disease prediction. This design enables the model to better capture gender-related imaging characteristics and alleviate the influence of imbalanced data distribution. Experimental results demonstrate that the proposed method improves the recognition performance for minority disease categories, particularly squamous cell carcinoma, while maintaining competitive performance on other classes.
Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition. Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images. In this paper, we propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images. The DINN outputs consistent latent features for images that are geometrically distorted but represent the same underlying object or scene. The idea of DINN is to incorporate a simple component, called the quasiconformal transformer network (QCTN), into other existing deep networks for imaging tasks. The QCTN is a deep neural network that outputs a quasiconformal map, which can be used to transform a geometrically distorted image into an improved version that is closer to the distribution of natural or good images. It first outputs a Beltrami coefficient, which measures the quasiconformality of the output deformation map. By controlling the Beltrami coefficient, the local geometric distortion under the quasiconformal mapping can be controlled. The QCTN is lightweight and simple, which can be readily integrated into other existing deep neural networks to enhance their performance. Leveraging our framework, we have developed an image classification network that achieves accurate classification of distorted images. Our proposed framework has been applied to restore geometrically distorted images by atmospheric turbulence and water turbulence. DINN outperforms existing GAN-based restoration methods under these scenarios, demonstrating the effectiveness of the proposed framework. Additionally, we apply our proposed framework to the 1-1 verification of human face images under atmospheric turbulence and achieve satisfactory performance, further demonstrating the efficacy of our approach.
Language models play a central role in automatic speech recognition (ASR), yet most methods rely on text-only models unaware of ASR error patterns. Recently, large language models (LLMs) have been applied to ASR correction, but introduce latency and hallucination concerns. We revisit ASR error correction with compact seq2seq models, trained on ASR errors from real and synthetic audio. To scale training, we construct synthetic corpora via cascaded TTS and ASR, finding that matching the diversity of realistic error distributions is key. We propose correction-first decoding, where the correction model generates candidates rescored using ASR acoustic scores. With 15x fewer parameters than LLMs, our model achieves 1.5/3.3% WER on LibriSpeech test-clean/other, outperforms LLMs, generalizes across ASR architectures (CTC, Seq2seq, Transducer) and diverse domains, and provides precise corrections in the low-error regime where LLMs struggle.
Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a subspace learning method, named ALPCAH, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace basis associated with the low-rank structure of the data. Our method makes no distributional assumptions of the low-rank component and does not assume that the noise variances are known. Further, this method uses a soft rank constraint that does not require subspace dimension to be known. Additionally, this paper develops a matrix factorized version of ALPCAH, named LR-ALPCAH, that is much faster and more memory efficient at the cost of requiring subspace dimension to be known or estimated. Simulations and real data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing algorithms. Code available at this https URL.
Elliptically symmetric distributions are a classic example of a semiparametric model where the location vector and the scatter matrix (or a parameterization of them) are the two finite-dimensional parameters of interest, while the density generator represents an \textit{infinite-dimensional nuisance} term. This basic representation of the elliptic model can be made more accurate, rich, and flexible by considering additional \textit{finite-dimensional nuisance} parameters. Our aim is therefore to investigate the deep and counter-intuitive links between statistical efficiency in estimating the parameters of interest in the presence of both finite and infinite-dimensional nuisance parameters. Previous seminal works have addressed this problem by leveraging a general result: if the statistical model has a specific group invariance, then the projection operator onto the semiparametric nuisance tangent space can be asymptotically expressed as a conditional expectation with respect to the maximal invariant sub-$\sigma$ algebra. In this article, we show that, for the statistical model of elliptical distributions, the projection operator can be explicitly computed without relying on the above-mentioned asymptotic approximation. This allows us to obtain original results also for the case in which the location vector and the scatter matrix are parameterized by a finite-dimensional vector that can be partitioned in two sub-vectors: one containing the parameters of interest and the other containing the nuisance parameters. As an example, we illustrate how the obtained results can be applied to the well-known \virg{low-rank} parameterization. Furthermore, while the theoretical analysis will be developed for Real Elliptically Symmetric (RES) distributions, we show how to extend our results to the case of Circular and Non-Circular Complex Elliptically Symmetric (C-CES and NC-CES) distributions.
Shared control combines human intention with autonomous decision-making. At the low level, the primary goal is to maintain safety regardless of the user's input to the system. However, existing shared control methods-based on, e.g., Model Predictive Control, Control Barrier Functions, or learning-based control-often face challenges with feasibility, scalability, and mixed constraints. To address these challenges, we propose a Constraint-Aware Assistive Controller that computes control actions online while ensuring recursive feasibility, strict constraint satisfaction, and minimal deviation from the user's intent. It also accommodates a structured class of non-convex constraints common in real-world settings. We leverage Robust Controlled Invariant Sets for recursive feasibility and a Mixed-Integer Quadratic Programming formulation to handle non-convex constraints. We validate the approach through a large-scale user study with 66 participants-one of the most extensive in shared control research-using a simulated environment to assess task load, trust, and perceived control, in addition to performance. The results show consistent improvements across all these aspects without compromising safety and user intent. Additionally, a real-world experiment on a robotic manipulator demonstrates the framework's applicability under bounded disturbances, ensuring safety and collision-free operation.
As one of the key usage scenarios for the sixth generation (6G) wireless networks, integrated sensing and communication (ISAC) provides an efficient framework to achieve simultaneous wireless sensing and communication. However, traditional wireless sensing techniques mainly rely on the line-of-sight (LoS) assumptions, i.e., the sensing targets are directly visible to both the sensing transmitter and receiver. This hinders ISAC systems to be applied in complex environments such as the urban low-altitude airspace, which usually suffers from signal blockage and non-line-of-sight (NLoS) multi-path propagation. To address this challenge, in this paper, we propose a novel approach to enable environment-aware NLoS ISAC by leveraging the new technique called channel knowledge map (CKM), which was originally proposed for environment-aware wireless communications. One major novelty of our proposed method is that the same CKM built for wireless communication can be directly used to enable NLoS wireless sensing, thus enjoying the benefits of ``killing two birds with one stone''. To this end, the sensing targets are treated as virtual user equipment (UE), and the wireless communication channel priors are transformed into the sensing channel priors, allowing one single CKM to serve dual purposes. We illustrate our proposed framework by a specific CKM called \emph{channel angle-delay map} (CADM). Specifically, the proposed framework utilizes CADM to derive angle-delay priors of the sensing channel by exploiting the relationship between communication and sensing angle-delay distributions, enabling sensing target localization in the challenging NLoS environment. Extensive simulation results demonstrate significant performance improvements over classic geometry-based sensing methods, which is further validated by Cramér-Rao Lower Bound (CRLB) analysis.
This paper introduces and solves a structural controllability problem for ensembles of switched linear systems. All individual systems in the ensemble are sparse and governed by the same sparsity pattern, and undergo switching among subsystems by following the same switching sequence. The controllability of an ensemble system describes the ability to use a common control input to simultaneously steer every individual system. A sparsity pattern is called structurally controllable for pair \((k,q)\) if it admits a controllable ensemble of \(q\) individual systems with at most \(k\) subsystems. We derive a necessary and sufficient condition for a sparsity pattern to be structurally controllable for a given \((k,q)\), and characterize when a sparsity pattern admits a finite \(k\) that guarantees structural controllability for \((k,q)\) for arbitrary $q$. Compared with the linear time-invariant ensemble case, this second condition is strictly weaker. We further show that these conditions have natural connections with maximum flow, and hence can be checked by polynomial algorithms. Specifically, the time complexity of deciding structural controllability is \(O(n^3)\) and the complexity of computing the smallest number of subsystems needed is \(O(n^3 \log n)\), with \(n\) the dimension of each individual system.
This monograph introduces a novel approach to polyphonic music generation by addressing the "Missing Middle" problem through structural inductive bias. Focusing on Beethoven's piano sonatas as a case study, we empirically verify the independence of pitch and hand attributes using normalized mutual information (NMI=0.167) and propose the Smart Embedding architecture, achieving a 48.30% reduction in parameters. We provide rigorous mathematical proofs using information theory (negligible loss bounded at 0.153 bits), Rademacher complexity (28.09% tighter generalization bound), and category theory to demonstrate improved stability and generalization. Empirical results show a 9.47% reduction in validation loss, confirmed by SVD analysis and an expert listening study (N=53). This dual theoretical and applied framework bridges gaps in AI music generation, offering verifiable insights for mathematically grounded deep learning.
Dynamic spectrum sharing (DSS) among multi-operator low Earth orbit (LEO) mega-constellations is essential for coexistence, yet prevailing policies focus almost exclusively on interference mitigation, leaving geographic equity largely unaddressed. This work investigates whether conventional DSS approaches inadvertently exacerbate the rural digital divide. Incorporating Keplerian orbital dynamics, inter-beam co-channel interference, and three real-world constellation geometries (Starlink, OneWeb, Kuiper), we conduct large-scale, 3GPP-compliant non-terrestrial network (NTN) simulations across 20 orbital snapshots spanning 10~minutes of satellite motion. The results uncover a stark and persistent structural bias: SNR-priority scheduling induces a $1.84\times$ mean urban--rural access disparity, with temporal fluctuations reaching $3.9\times$ during favorable interference conditions. Counter-intuitively, increasing system bandwidth amplifies rather than alleviates this gap. To remedy this, we propose FairShare, a lightweight, quota-based framework that enforces geographic fairness. FairShare not only reverses the bias, achieving an affirmative disparity ratio of $\Delta_{\text{geo}} = 0.68\times$ with zero variance across all orbital snapshots and interference conditions, but also reduces scheduler runtime by 3.3\%. This demonstrates that algorithmic fairness can be achieved without trading off efficiency or complexity, and that it remains invariant to physical-layer dynamics. Our work provides regulators with both a diagnostic metric for auditing fairness and a practical, enforceable mechanism for equitable spectrum governance in next-generation satellite networks.
Background: Pleuroparenchymal fibroelastosis (PPFE) is an upper lobe predominant fibrotic lung abnormality associated with increased mortality in established interstitial lung disease. However, the clinical significance of radiologic PPFE progression in lung cancer screening (LCS) populations remains unclear. Methods: We analysed longitudinal low-dose CT scans and clinical data from two LCS studies: National Lung Screening Trial (NLST; n=7,980); SUMMIT study (n=8,561). An automated algorithm quantified PPFE volume on baseline and follow-up scans. Annualised change in PPFE was derived and dichotomised using a distribution-based threshold to define progressive PPFE. Associations between progressive PPFE and mortality were evaluated using Cox proportional hazards models adjusted for demographic and clinical variables. In SUMMIT cohort, associations between progressive PPFE and clinical outcomes were assessed using incidence rate ratios (IRR) and odds ratios (OR). Findings: Progressive PPFE independently associated with mortality in both LCS cohorts (NLST: Hazard Ratio (HR)=1.25, 95% Confidence Interval (CI): 1.01--1.56, p=0.042; SUMMIT: HR=3.14, 95% CI: 1.66--5.97, p<0.001). Within SUMMIT, progressive PPFE was strongly associated with higher respiratory admissions (IRR=2.79, p<0.001), increased antibiotic and steroid use (IRR=1.55, p=0.010), and showed a trend towards higher modified medical research council scores (OR=1.40, p=0.055). Interpretation: Radiologic PPFE progression independently associates with mortality across two large LCS cohorts, and associates with adverse clinical outcomes. Quantitative assessment of PPFE progression may provide a clinically relevant imaging biomarker to identify individuals at increased risk of respiratory morbidity within LCS programmes.
Objectively verifying the generative mechanism of consciousness is extremely difficult because of its subjective nature. As long as theories of consciousness focus solely on its generative mechanism, developing a theory remains challenging. We believe that broadening the theoretical scope and enhancing theoretical unification are necessary to establish a theory of consciousness. This study proposes seven questions that theories of consciousness should address: phenomena, self, causation, state, function, contents, and universality. The questions were designed to examine the functional aspects of consciousness and its applicability to system design. Next, we will examine how our proposed Dual-Laws Model (DLM) can address these questions. Based on our theory, we anticipate two unique features of a conscious system: autonomy in constructing its own goals and cognitive decoupling from external stimuli. We contend that systems with these capabilities differ fundamentally from machines that merely follow human instructions. This makes a design theory that enables high moral behavior indispensable.
In existing Audio-Visual Speech Enhancement (AVSE) methods, objectives such as Scale-Invariant Signal-to-Noise Ratio (SI-SNR) and Mean Squared Error (MSE) are widely used; however, they often correlate poorly with perceptual quality and provide limited interpretability for optimization. This work proposes a reinforcement learning-based AVSE framework with a Large Language Model (LLM)-based interpretable reward model. An audio LLM generates natural language descriptions of enhanced speech, which are converted by a sentiment analysis model into a 1-5 rating score serving as the PPO reward for fine-tuning a pretrained AVSE model. Compared with scalar metrics, LLM-generated feedback is semantically rich and explicitly describes improvements in speech quality. Experiments on the 4th COG-MHEAR AVSE Challenge (AVSEC-4) dataset show that the proposed method outperforms a supervised baseline and a DNSMOS-based RL baseline in PESQ, STOI, neural quality metrics, and subjective listening tests.
Large-scale MIMO detection remains challenging because exact or near-maximum-likelihood search is difficult to scale, while available quantum resources are insufficient for directly solving full-size detection instances by QAOA. This paper therefore proposes a Block-QAOA-Aware MIMO Detector (BQA-MD), whose primary purpose is to reorganize the detection chain so that it becomes compatible with limited-qubit local quantum subproblems. Specifically, BQA-MD combines block-QAOA-aware preprocessing in the QR domain, a standards-consistent blockwise 5G NR Gray-HUBO interface, an MMSE-induced dynamic regularized blockwise objective, and K-best candidate propagation. Within this framework, fixed-size block construction gives every local subproblem a uniform circuit width and parameter dimension, which in turn enables parameter-transfer QAOA as a practical realization strategy for structurally matched local subproblems. Experiments are conducted on a 16x16 Rayleigh MIMO system with 16QAM using classical simulation of the quantum subroutine. The results show that the regularized blockwise detector improves upon its unregularized counterpart, validating the adopted blockwise objective and the block-QAOA-aware design rationale. They also show that the parameter-transfer QAOA detector nearly matches the regularized blockwise exhaustive reference and clearly outperforms direct-training QAOA in BER, thereby supporting parameter reuse as the preferred QAOA realization strategy within the proposed framework. In the tested setting, MMSE remains slightly better in the low-SNR region, whereas the parameter-transfer QAOA detector becomes highly competitive from the medium-SNR regime onward.
All classifiers, including state-of-the-art vision models, possess invariants, partially rooted in the geometry of their linear mappings. These invariants, which reside in the null-space of the classifier, induce equivalent sets of inputs that map to identical outputs. The semantic content of these invariants remains vague, as existing approaches struggle to provide human-interpretable information. To address this gap, we present Semantic Interpretation of the Null-space Geometry (SING), a method that constructs equivalent images, with respect to the network, and assigns semantic interpretations to the available variations. We use a mapping from network features to multi-modal vision language models. This allows us to obtain natural language descriptions and visual examples of the induced semantic shifts. SING can be applied to a single image, uncovering local invariants, or to sets of images, allowing a breadth of statistical analysis at the class and model levels. For example, our method reveals that ResNet50 leaks relevant semantic attributes to the null space, whereas DinoViT, a ViT pretrained with self-supervised DINO, is superior in maintaining class semantics across the invariant space.