As electric vehicles (EVs) are increasingly adopted as platforms for connected and automated vehicles (CAVs), enhancing their energy efficiency becomes critical. With the emergence of vehicle-to-vehicle (V2V) communication, cooperative adaptive cruise control (CACC) offers improved traffic flow, safety, and energy efficiency by enabling real-time coordination among EVs. However, conventional CACC algorithms neglected acceleration and regenerative braking dynamics in their implementation. To address this gap, this paper proposes a third-order dynamic model for EVs which has been derived from real-world experimental data. We also propose a novel, practical, and energy-efficient Lyapunov-based CACC controller explicitly designed for EV platoons. The proposed controller is requiring lower control gains while ensuring string stability and energy efficiency. To validate its effectiveness, we conduct both simulation and experimental environments, demonstrating that our approach reduces velocity fluctuations, maintains string stability at lower headway times, and improves energy efficiency of the CACC platoon by up to 38.5% compared to a baseline CACC.
Conversational AI has made significant progress, yet generating expressive and controllable text-to-speech (TTS) remains challenging. Specifically, controlling fine-grained voice styles and emotions is notoriously difficult and typically requires massive amounts of heavily annotated training data. To overcome this data bottleneck, we present a scalable, data-efficient cascaded framework that pairs textual style tokens with human-curated, high-quality audio prompts. This approach enables single-shot adaptation to fine-grained speaking styles and character voices. In the context of TTS, this audio prompting acts as In-Context Learning (ICL), guiding the model's prosody and timbre without requiring massive parameter updates or large-scale retraining. To further enhance generation quality and mitigate hallucinations, we introduce a novel ICL-based online reinforcement learning (RL) strategy. This strategy directly optimizes the autoregressive prosody model using subjective aesthetic rewards while being constrained by Connectionist Temporal Classification (CTC) alignment to preserve intelligibility. Comprehensive human perception evaluations demonstrate significant improvements in both the naturalness and expressivity of the synthesized speech, establishing the efficacy of our ICL-based online RL approach.
This paper studies stabilization and its corresponding closed-loop region-of-attraction (ROA) for homogeneous polynomial dynamical systems whose nonlinear term admits an orthogonally decomposable (ODECO) tensor representation. While recent tensor-based results provide explicit solutions and sharp global characterizations for open-loop ODECO systems, closed-loop synthesis and computable ROA estimates are still often dominated by local linearization or Lyapunov/SOS (sum of squares) methods, which can be conservative and computationally demanding. We propose a structure-preserving linear feedback design that shares the ODECO eigenbasis of the system's tensor, thereby enabling closed-form trajectory expressions, explicit convergence/escape thresholds, and sharp ROA characterizations. Under mild conditions, we further derive robustness/ISS-type bounds for bounded disturbances. Numerical examples validate the theoretical results.
Winner-take-all (WTA)--type selection is a fundamental mechanism in networked competition, yet its dependence on higher-order interactions remains insufficiently understood. We study a Lotka--Volterra competitive dynamics on higher-order networks, where classical pairwise inhibition is augmented by multi-way interaction terms induced by hyperedges of uniform hypergraphs. The proposed model shows multiple competitive outcomes, including WTA, winner-share-all (WSA), and variant winner-take-all (VWTA). The existence, uniqueness and stability of equilibria are rigorously proved through mathematical analysis, which relies on classical stability theory and recent advances in tensor algebra. We show that the eventual selection outcome is relatively insensitive to the hyperedge order and the specific higher-order coupling structure, and is instead determined by a small set of interpretable scalar parameters, such as the ratio between self-inhibition and lateral-inhibition and the external inputs. Numerical experiments support the theory by showing that higher-order interactions affect convergence and steady states, yet yield the similar outcome taxonomy (WTA/WSA/VWTA) as in standard graphs. These results provide a network-scientific explanation of the robustness of WTA-type outcomes under complex group interactions and offer principled guidance for designing selection mechanisms on higher-order networks.
Reliable positioning is essential for Uncrewed Aerial Vehicles (UAVs) in safety-critical urban operations, yet achieving sub-meter accuracy under stringent latency constraints remains challenging. While 3rd Generation Partnership Project (3GPP) specifies repeated Positioning Reference Signals (PRS) transmissions for accurate Time Difference of Arrival (TDoA) measurements, denoising techniques specifically tailored for extremely limited measurement sequences within 3GPP frameworks remain underexplored. We propose Adaptive Gain Exponential Smoother (AGES), a lightweight filter combining exponentially weighted averaging with adaptive gains informed by 3GPP measurement quality reports. Simulations demonstrate AGES achieves 30-40% reduction in positioning error with only 3-5 repeated measurements while maintaining Fifth Generation New Radio (5G-NR) infrastructure compatibility.
Integrated Sensing and Communications (ISAC) enables trajectory sharing that enhances beamforming, resource allocation, and cooperative perception, yet raises fundamental privacy concerns under the General Data Protection Regulation (GDPR) data minimisation principle. This paper proposes a Fisher Information Density (FID)-constrained trajectory sharing framework that enforces a local lower bound on estimation uncertainty, providing hard, quantifiable privacy guarantees by construction. Unlike fixed-noise approaches, the proposed method bounds the Privacy Leak Ratio (PLR) regardless of sensing power or adversarial post-processing, ensuring that no trajectory segment can be reconstructed beyond a prescribed accuracy threshold. Simulations on the OpenTraj dataset demonstrate that the framework keeps the average PLR below 20-25% and the maximum leakage segment duration under 2-2.5 s, while preserving data utility for downstream tasks such as movement prediction. The resulting criterion is interpretable, model-agnostic, and compatible with GDPR-compliant ISAC system design.
This paper presents the design and implementation of an asynchronous delta modulator as a spike encoder for event-driven neural recording in a 65nm CMOS process. The proposed neuromorphic front-end converts analog signals into discrete, asynchronous ON and OFF spikes, effectively compressing continuous biopotentials into spike trains compatible with spiking neural networks (SNNs). Its asynchronous operation enables seamless integration with neuromorphic architectures for real-time decoding in closed-loop brain-machine interfaces (BMIs). Measurement results from silicon demonstrate an energy consumption of 60.73 nJ/spike, an F1-score of 80% compared to a behavioral model of the asynchronous delta modulator, and a compact pixel area of 73.45 um $\times$ 73.64 um.
Purpose: To develop and evaluate a deep learning (DL) method for free-breathing phase-sensitive inversion recovery (PSIR) late gadolinium enhancement (LGE) cardiac MRI that produces diagnostic-quality images from a single acquisition over two heartbeats, eliminating the need for 8 to 24 motion-corrected (MOCO) signal averages. Materials and Methods: Raw data comprising 800,653 slices from 55,917 patients, acquired on 1.5T and 3T scanners across multiple sites from 2016 to 2024, were used in this retrospective study. Data were split by patient: 640,000 slices (42,822 patients) for training and the remainder for validation and testing, without overlap. The training and testing data were from different institutions. PSIRNet, a physics-guided DL network with 845 million parameters, was trained end-to-end to reconstruct PSIR images with surface coil correction from a single interleaved IR/PD acquisition over two heartbeats. Reconstruction quality was evaluated using SSIM, PSNR, and NRMSE against MOCO PSIR references. Two expert cardiologists performed an independent qualitative assessment, scoring image quality on a 5-point Likert scale across bright blood, dark blood, and wideband LGE variants. Paired superiority and equivalence (margin = 0.25 Likert points) were tested using exact Wilcoxon signed-rank tests at a significance level of 0.05 using R version 4.5.2. Results: Both readers rated single-average PSIRNet reconstructions superior to MOCO PSIR for dark blood LGE (conservative P = .002); for bright blood and wideband, one reader rated it superior and the other confirmed equivalence (all P < .001). Inference required approximately 100 msec per slice versus more than 5 sec for MOCO PSIR. Conclusion: PSIRNet produces diagnostic-quality free-breathing PSIR LGE images from a single acquisition, enabling 8- to 24-fold reduction in acquisition time.
Convolutional neural networks (CNNs) have emerged as a powerful tool for automatic modulation classification (AMC) by directly extracting discriminative features from raw in-phase and quadrature (I/Q) signals. However, deploying CNN-based AMC models on IoT devices remains challenging because of limited computational resources, energy constraints, and real-time processing requirements. Early-exit (EE) strategies alleviate this burden by allowing qualified samples to terminate inference at an EE branch. However, our empirical analysis reveals a critical limitation of existing confidence-based EE strategies: they predominantly select samples whose early and final predictions are correct and consistent, while failing to capture whether deeper inference can provide a tangible accuracy gain. To address this limitation, we propose BEACON, a Benefit-Aware Early-Exit framework for AMC via recoverability prediction. BEACON introduces a benefit-aware EE criterion that explicitly predicts recoverable errors, defined as instances where the final-exit branch corrects an initial early-branch misclassification. Using only short-branch observables, we design a lightweight benefit-aware predictor (LBAP) to implement this criterion, estimating the likelihood of such recoverable cases and triggering deeper inference only when an accuracy gain is expected. Extensive experiments on ResNet-18-based AMC models demonstrate that the proposed approach consistently outperforms state-of-the-art baselines, achieving a superior accuracy-computation tradeoff across diverse EE threshold settings and signal-to-noise ratio regimes. These findings validate the effectiveness of the benefit-aware criterion and its practicality for energy-efficient on-device AMC under stringent resource constraints.
We leverage historical outage data to quantify the resilience benefits of undergrounding a circuit. The historical performance of the overhead circuit is compared to the performance if the circuit had been undergrounded in the past. The number of outages, customers affected, outage duration, and customer hours lost are used as metrics to quantify the benefits of undergrounding. Results show 75% and 78% reductions in customer hours lost per year for two selected circuits, as well as a significant reduction in the average number of outages and customers affected per year, highlighting the advantages of undergrounding. The benefits of investments that result in 10% faster outage restoration are also calculated by rerunning history with the faster restoration included.
Safety-critical control systems, such as spacecraft performing proximity operations, must provide formal safety guarantees despite stochastic uncertainties from state estimation and unmodeled dynamics. Although Control Barrier Functions (CBFs) have been extended to stochastic systems, existing approaches typically face a trade-off between the tightness of probabilistic guarantees and computational tractability. This paper presents a particle-based probabilistic CBF framework that overcomes this limitation by exploiting the sub-Gaussian structure of the barrier function increment under Gaussian uncertainties. We establish that Gaussian uncertainties propagating through Lipschitz-continuous control-affine dynamics preserve sub-Gaussianity of the barrier function increment, with explicit tail bounds. Leveraging this structure, we derive finite-sample bounds on the approximation error between particle-based Conditional Value at Risk (CVaR) estimates and ground-truth probabilistic constraints; applying this yields a tractable optimization problem formulation with finite-sample safety certificates. We show through numerical experiments how the proposed approach provides tight yet provably valid probabilistic safety guarantees.
In this paper we propose a new criterion for the Blind Source Separation (BSS) of antisparse bounded sources, based on the sum of the $\ell_\infty$-norm of the sources. Based on the observation that the mixing process of bounded sources with any mixing matrix with unitary Frobenius norm will increase the $\ell_\infty$-norm of the sources, unless it is the identity matrix, the minimization of the sum of the $\ell_\infty$-norm of the sources can be used for the estimation of a separation matrix. To that, a Principle Component Analysis technique followed by a Givens Rotations based optimization method can be used for the separation of independent bounded sources. Also, the Givens Rotations based optimization method can be used for the separation of correlated bounded sources mixed by a rotation matrix. We theoretically analyze the proposed criterion and assess its performance through numerical simulations involving three distinct types of bounded signals. Our theoretical and experimental findings underscore the efficacy of the $\ell_\infty$ norm as a suitable contrast function for antisparse bounded sources, showcasing its superior performance relative to a state-of-the-art algorithm.
Rapid growth in AI-driven data center loads is creating significant challenges for transmission grid interconnection. This paper proposes robust and risk-aware frameworks to quantify transmission capacity as firm and flexible capacities. We efficiently solve the robust optimization problem to determine firm capacity when minimizing unserved data center demand. Building upon this, we introduce a risk-aware allocation for flexible capacity, showing that tolerating a minimal probability of service interruption and blackout can unlock substantial flexible capacity of transmission networks and accelerate data center interconnection. To efficiently allocate scarce transmission capacities among competing data centers, we adopt the simultaneous ascending auction, characterizing products by capacity, risk level, and location. Under additive or symmetric concave valuation functions, the auction converges to a competitive equilibrium and achieves efficient allocation.
This paper investigates the optimal privacy-aware networked control problem, in which the dynamical system affected by a private input process sends its measurement to a remote controller after stochastic quantization. An adversary seeks to infer private system inputs from quantization results and control outputs. The optimal privacy-aware quantizer and controller are obtained by solving a stochastic control problem with mutual information regularization, where the mutual information measures the privacy leakage through the quantizer and controller. We first derive the coupled Bellman equations for the optimal quantizer and controller using the dynamic programming decomposition method. We then analyze the structural properties of the solution, showing that the optimal controller is deterministic, while the optimal quantizer regulates the adversary's belief in a closed-loop manner to enhance privacy. To enable numerical optimization, the quantizer and controller are jointly parameterized and then updated via policy gradient methods, and a binary classification approach is used to approximate privacy leakage. Finally, we validate the effectiveness of the proposed approach through numerical experiments on a building control system.
To ensure safe clinical integration, deep learning models must provide more than just high accuracy; they require dependable uncertainty quantification. While current Medical Vision Transformers perform well, they frequently struggle with overconfident predictions and a lack of transparency, issues that are magnified by the noisy and imbalanced nature of clinical data. To address this, we enhanced the modified Medical Transformer (MedFormer) that incorporates prototype-based learning and uncertainty-guided routing, by utilizing a Dirichlet distribution for per-token evidential uncertainty, our framework can quantify and localize ambiguity in real-time. This uncertainty is not just an output but an active participant in the training process, filtering out unreliable feature updates. Furthermore, the use of class-specific prototypes ensures the embedding space remains structured, allowing for decisions based on visual similarity. Testing across four modalities (mammography, ultrasound, MRI, and histopathology) confirms that our approach significantly enhances model calibration, reducing expected calibration error (ECE) by up to 35%, and improves selective prediction, even when accuracy gains are modest.
This paper investigates multi-stream downlink precoding for massive multiple-input multiple-output low-Earthorbit satellite (SAT) communication systems. We adopt a delay and Doppler precompensation approach to achieve coherent transmission. Under this setting, we formulate a signal transmission model that incorporates the near-independent properties of inter-SAT interference and compensation errors. We then demonstrate that moving beyond single-stream transmission requires both multi-SAT cooperation and multi-antenna UTs. Based on this configuration and the established signal transmission model, we derive the first- and second-order statistical channel characteristics and utilize them to design locally optimal precoding algorithms for both total power constraint (TPC) and per-antenna power constraint (PAPC) conditions, which rely only on statistical channel state information (sCSI). In particular, the designed PAPC algorithm achieves linear complexity with respect to the number of antennas on the cooperative SATs. To reduce the computational complexity of the locally optimal precoder under TPC, we propose a low-complexity and robust precoding scheme optimized for both minimum mean squared error and sum-rate maximization objectives. Using majorization theory, we also provide a rigorous theoretical analysis of the optimal precoding structure under TPC. Moreover, the Lanczos algorithm is adopted to further reduce the complexity of the proposed robust designs. Simulation results show that when each SAT is equipped with a sufficiently large number of antennas, the proposed sCSI-based designs achieve performance comparable to that of instantaneous CSI-based designs.
In this paper, we develop a representation-theoretic formulation of discrete-time linear systems. We show that such systems are naturally viewed as representations of time groups acting on vector spaces, thereby endowing the state space with a canonical algebraic structure. This perspective provides a unified framework for linear systems over different fields, in which familiar structural properties arise from the underlying representation. In particular, invariant decompositions of the state space correspond to invariant subrepresentations, while the distinctions between real, complex, and finite-field systems emerge from the algebraic properties of the base field and the time group. We further show that linear systems over finite fields naturally correspond to representations of finite cyclic time groups, leading to module structures over polynomial quotient rings. This provides a systematic alternative to spectral analysis in settings where eigenvalue-based methods are not the most natural organizing language.
Integrated sensing and communications (ISAC) has emerged as an intrinsic service of upcoming 6G wireless systems, enabling the reuse of communication signals for environmental sensing and supporting context-aware network functionalities. Meanwhile, the evolution of the wireless infrastructure toward distributed systems creates new opportunities for collaborative sensing from spatially separated nodes. Motivated by this trend, this work investigates a radio stripe aided ISAC system as a low-complexity implementation of a distributed system. We study the trade-off between achievable sum rate and sensing precision when downlink signals are used for target localization within the service area. By exploiting the architectural homogeneity of the radio stripes transceivers, each unit can be dynamically configured to operate in either communication or sensing mode. We formulate a targets localization problem considering the measurements of multiple sensing-communication configurations. Due to the large number of measurements and the continuity of the search space, we propose discretizing the service are and then solve the estimation problem in batches. The targets are finally estimated using a fusion strategy. Our results show that increasing the number devices and sensing APUs boosts sensing precision at the expense of degrading the sum rate. The latter remains constant for a given number of communication APUs regardless of their positions. Moreover, changing the number of antennas reveals a non-monotonic impact on sensing performance due to the trade-off between array gain and illumination uniformity.
Optimal stabilization of safety-critical nonlinear systems requires balancing long-term performance and strict safety constraints. Existing quadratic-programming-based control barrier function (CBF) safety filters are point-wise and may exhibit myopic behavior and local trapping when the safeguarding action conflicts with the nominal optimal control. This paper develops a safety-aware infinite-horizon optimal control framework by embedding a barrier-Lyapunov function (BLF)-based safeguarding action into the system dynamics and introducing a barrier-regulating auxiliary variable, thereby reformulating the original constrained problem as an unconstrained one on an extended state space. To mitigate local trapping, we introduce an adaptive alignment-conditioned tangential excitation orthogonal to the safety direction, with activation adaptively modulated by the degree of directional alignment between the nominal and safeguarding controllers, and incorporate it as an admissible $\mathcal{L}2$ disturbance in an $H\infty$ formulation. For high-relative-degree systems under disturbances, we further augment the recursive high-order safe-set construction with barrier compensation terms to obtain a high-order BLF and formulate an adversarial disturbance attenuation problem, which is approximately solved via safe-exploration-enhanced online critic learning. Simulations demonstrate reduced local trapping, improved safety--performance trade-offs, and safe operation under disturbances.
Acquiring the channel state information from limited and noisy observations at pilot positions is critical for wireless multiple-input multiple-output (MIMO)-orthogonal frequency division multiplexing (OFDM) systems. In this paper, we view this process as a conditional generative task in which the partial noisy channel estimates at the pilots are utilized as a ``prompt'' to guide the diffusion ``inpainting'' of the underlying channel. To this end, we resort to a general Conditional Diffusion Transformer (CDiT) framework with a well-designed network architecture and update rule. In particular, we design a dedicated embedding strategy to encode and adapt to different pilot patterns and noise levels, and utilize a special cross-attention mechanism to align the partial raw channel observations with the denoised channel at each time step of the generation process. This architecture effectively anchors the diffusion process, enabling the model to accurately recover full channel details from limited noisy observations. Comprehensive experimental results show that, the proposed approach achieves a performance gain of over 5 dB compared to the baselines under varying noise conditions, and provides robust channel acquisition even under a sparse pilot density of 1/32 without significant performance loss compared to the denser pilot cases. Moreover, it is capable of generating high-quality channel matrices within just 10 inference steps, effectively balancing estimation accuracy with computational efficiency and inference speed. Ablation studies demonstrate the rationality of the model design and the necessity of its modules.
Bistatic backscatter communication requires strong illumination of a backscatter device (BD), while a spatially separated reader detects the weak modulated reflection. In practice, the resulting direct link interference (DLI) at the reader can dominate the received backscattered signal and limit detection performance. This paper experimentally investigates transmit beamforming that jointly maximizes BD illumination and suppresses DLI at the reader in a distributed multiple-input multiple-output setup. We compare phase-only maximum ratio transmission (PO-MRT) with the proposed direct-link suppression (DLS) scheme, which enforces a spatial null at the reader under per-antenna power constraints. Measurements using a phase-coherent 42-element ceiling array at 920 MHz show that DLS reduces the DLI at the target reader and improves the signal-to-interference ratio by up to 31 dB compared to PO-MRT.
A circular pursuit guidance problem involving pursuer-target engagement is studied in this paper using a bifurcation theory based numerical approach. While target is modeled as a point mass moving around in a circle with certain velocity, pursuer dynamics is driven by the relative position and orientation with respect to the target. A planar case is currently considered. A mathematical model representing the engagement scenario is derived and two cases are presented, one without and the other with a basic model for pursuer speed dynamics accounting for limitations imposed by available force. Analytical and simulation results are presented to elucidate the novel approach. Advantages of using this approach for arriving at laws for pursuer-target engagement are highlighted.
We propose an adaptive control protocol for identifying the topology of dynamical networks interconnected over undirected graphs with cooperative and antagonistic interactions. The signed network is modeled using a repelling Laplacian. Topology identification relies on an edge-based formulation of the network and adaptive control protocols through the design of a persistently excited auxiliary network. Our approach guarantees the simultaneous identification and synchronization of the unknown signed network and establishes uniform semiglobal practical asymptotic stability of the estimation errors. Numerical simulations validate our theoretical results.
In real-time systems, both individual task execution and data propagation must meet strict timing constraints. Cause-effect (CE) chains are widely used to analyze such behaviors by end-to-end latency. However, timing anomalies (TAs) can distort it, where a local reduction in execution times leads to an increase in the overall end-to-end latency. As a result, precisely analyzing the upper bounds of the latency becomes challenging, and such systems typically exhibit larger upper bounds than TA-eliminated systems. Existing studies either eliminate TAs by completely sacrificing average latency to simplify analysis or, despite adopting complex safe analysis methods, do not eliminate TAs effectively, still having high latencies. To address this issue, we identify two basic causes of TAs in end-to-end latency. Based on these causes, we propose the first treatment that eliminates TAs in the latency with negligible average latency loss using Deterministic Data Flow (DDF). We further formally prove its TA-free property. Therefore, we can get a precise upper bound for latency when all jobs execute with their worst-case execution times. Experimental results show that it effectively reduces the maximum end-to-end latency, the average latency, and latency jitter compared with the state-of-the-art (SOTA) method.
Recently, artificial intelligence-based dubbing technology has advanced, enabling automated dubbing (AD) to convert the source speech of a video into target speech in different languages. However, natural AD still faces synchronization challenges such as duration and lip-synchronization (lip-sync), which are crucial for preserving the viewer experience. Therefore, this paper proposes a synchronization method for AD processes that paraphrases translated text, comprising two steps: isochrony for timing constraints and phonetic synchronization (PS) to preserve lip-sync. First, we achieve isochrony by paraphrasing the translated text with a language model, ensuring the target speech duration matches that of the source speech. Second, we introduce PS, which employs dynamic time warping (DTW) with local costs of vowel distances measured from training data so that the target text composes vowels with pronunciations similar to source vowels. Third, we extend this approach to PSComet, which jointly considers semantic and phonetic similarity to preserve meaning better. The proposed methods are incorporated into text-to-speech systems, PS-TTS and PS-Comet TTS. The performance evaluation using Korean and English lip-reading datasets and a voice-actor dubbing dataset demonstrates that both systems outperform TTS without PS on several objective metrics and outperform voice actors in Korean-to-English and English-to-Korean dubbing. We extend the experiments to French, testing all pairs among these languages to evaluate cross-linguistic applicability. Across all language pairs, PS-Comet performed best, balancing lip-sync accuracy with semantic preservation, confirming that PS-Comet achieves more accurate lip-sync with semantic preservation than PS alone.
Model Predictive Control (MPC) offers safe and near-optimal control but suffers from high computational costs. Approximate MPC (AMPC) mitigates this by learning a cheaper surrogate policy, typically by training a neural network on state-MPC input pairs. Generating training data is a major bottleneck, requiring solving the MPC for numerous states sampled from its feasible set. Since this feasible set is implicitly defined and unknown, efficient sampling is nontrivial but crucial. We propose the linear MPC Hit-and-Run (LMPC-HR) sampler for linear MPC with polyhedral constraints. We identify the feasible set boundaries along search directions, a crucial step within HR, by formulating the problem as a convex linear program, replacing expensive iterative searches with a single optimization step. A numerical study demonstrates that LMPC-HR achieves an order of magnitude reduction in computation time for generating uniformly distributed samples from the feasible set compared to naive baselines.
Flexible-geometry arrays based on movable antennas have shown considerable potential for improving wireless communication performance. In this letter, we investigate a multiuser multiple-input single-output (MU-MISO) downlink secure communication system aided by a flexible cylindrical array (FCLA) and artificial noise (AN), where each antenna element rotates along circular tracks while the circular slices move along a vertical axis. To guarantee transmission security, we aim to maximize the achievable sum rate at multiple legitimate information receivers by jointly optimizing transmit beamforming, AN covariance matrix, and antenna placement under secrecy constraints for an eavesdropper. While the resulting problem is intractable to solve, we develop a block coordinate descent (BCD)-based framework that combines the Lagrangian dual transform, tight semidefinite relaxation (SDR), and Nesterov-accelerated projected gradient descent (PGD). Numerical results show that the proposed algorithm converges rapidly and achieves significant sum-rate gains over benchmark schemes by exploiting the geometry flexibility of the array.
This industry-oriented paper originates from the observation that current frequency quality metrics utilized by transmission system operators (TSOs) fail to fully capture the dynamic behavior of the grid frequency. Motivated by this gap, the paper proposes novel frequency quality metrics based on second-order dynamics and stochastic autocorrelation. Using real-world data from the Irish, Great Britain and Nordic systems and running dynamic stochastic simulations, the paper shows that the proposed metrics bring new and counterintuitive insights in terms of how good or poor the frequency quality of power grids is beyond current well-known metrics. In particular, the paper shows that a power system may show good frequency quality using standard metrics and poor frequency quality using the proposed metrics. Overall, the paper contributes to improve the understanding of frequency quality.
In this paper, a new discrete-time approach to model the clutches engagement/disengagement in a two-speed powershift is proposed. The core idea is the development of a model for the computation of the exact torque needed to achieve the clutches engagement, including both the cases of single clutch engagement and of simultaneous clutch engagement (full lock condition). Based on this, the control logic for the clutches engagement and disengagement phases is also developed. The advantages in terms of real-time applicability with respect to the continuous-time version are shown through extensive simulation results.
Liquid crystal (LC) is a promising hardware solution for implementing large RISs, as it is cost-effective, energy efficient, scalable, and capable of providing continuous phase shifts with low power consumption. However, the phase shift response of LC-based RISs is inherently frequency dependent. If unaddressed, this characteristic leads to performance degradation, particularly in wideband scenarios. This issue is especially critical in secure communication applications, where minor phase shift variations across elements can result in considerable information leakage. This paper addresses these frequency-induced variations by developing a physics-based model for an LC unit cell across varying frequencies and proposing a novel phase shift design framework that maximizes secure communication across all subcarriers. Given the large number of elements in millimeter wave (mmWave) LC-RISs, acquiring full channel state information (CSI) is often impractical. Therefore, we optimize the phase shifts based solely on the locations of the legitimate mobile users (MUs) and potential eavesdroppers. Rather than targeting a single user point, the RIS is designed to illuminate a broader area. This approach enhances communication reliability for the MUs and mitigates performance degradation caused by location estimation errors. To solve the problem, we introduce both a semi-definite programming (SDP)-based solution and a low complexity heuristic method. While the SDP-based approach yields superior performance, it incurs higher computational complexity. Conversely, the scalable method exhibits a much slower scaling of complexity, which makes it highly suitable for extremely large RISs. Simulation results demonstrate that both algorithms improve the secrecy rate compared to baseline methods. Finally, the proposed design is validated through experimental evaluations on an LC RIS setup.
Image generative models have become indispensable tools to yield exquisite high-resolution (HR) images for everyone, ranging from general users to professional designers. However, a desired outcome often requires generating a large number of HR images with different prompts and seeds, resulting in high computational cost for both users and service providers. Generating low-resolution (LR) images first could alleviate computational burden, but it is not straightforward how to generate LR images that are perceptually consistent with their HR counterparts. Here, we consider the task of generating high-fidelity LR images, called Previews, that preserve perceptual similarity of their HR counterparts for an efficient workflow, allowing users to identify promising candidates before generating the final HR image. We propose the commutator-zero condition to ensure the LR-HR perceptual consistency for flow matching models, leading to the proposed training-free solution with downsampling matrix selection and commutator-zero guidance. Extensive experiments show that our method can generate LR images with up to 33\% computation reduction while maintaining HR perceptual consistency. When combined with existing acceleration techniques, our method achieves up to 3$\times$ speedup. Moreover, our formulation can be extended to image manipulations, such as warping and translation, demonstrating its generalizability.
Purpose: Image reconstruction in challenging scenarios requires accurate characterisations of coil sensitivity profiles, local off-resonances (B0) and effective encoding fields. Reconstruction methods utilising all of this information rely on signal models that are not compatible with the classical Fourier/k-space interpretation of the coil data. Hence, the FFT and related techniques are no more applicable, rendering image reconstruction computationally demanding. Methods: This article contains a workflow for accurate sensitivity and B0 mapping as well as other required processing steps. An implementation of non-Fourier SENSE reconstruction is provide that is well suited for execution on a GPU using the FFT. Important practical aspects like stopping criteria and sources of image artifacts are analyzed and documented. Results: Highly performant image reconstruction could be demonstrated on a 2D and 3D spiral dataset. These datasets contain trajectories featuring readout durations up to 71.5ms and undersampling factors up to R = 7. Running the reconstruction on a GPU greatly boosts reconstruction speed. Stopping the reconstruction at the right moment is crucial for image quality. All methods included in this article are available in a public code repository. Conclusion: The provided implementation of non-Fourier SENSE reconstruction is highly performant. When it is executed on GPU, runtimes reach a duration feasible in practice. The presented workflow ensures robust and accurate computation of coil sensitive profiles and off-resonance maps.
Integrated learning and communication (ILAC) unifies learned transceivers with radio resource management, where semantic feature multiple access (SFMA) enables paired users to superpose their learned representations over shared time-frequency resources. Unlike conventional multiple access schemes, SFMA interference arises in the learned feature space and depends jointly on the user pair, the transmit power, and the compression ratio. This coupling ties binary pairing decisions to continuous resource variables, yielding a mixed-integer non-convex optimization problem. To address this problem, we first propose similarity-conditioned SFMA (SC-SFMA), a Swin Transformer-based transceiver whose dual-conditioned similarity modulator (DC-SimM) gates cross-user feature fusion according to the inter-user semantic similarity. We then characterize the resulting pair-dependent interference by a bivariate logistic function parameterized by transmit power and compression ratio, thereby bridging the learned transceiver with network-level optimization. On this basis, we formulate a sum-rate maximization problem subject to per-user distortion, latency, energy, power, and bandwidth constraints. To solve this problem, we develop a three-block alternating optimization algorithm that integrates dual-decomposition-assisted compression ratio allocation, trust-region successive convex approximation (SCA) for joint power-bandwidth optimization, and dynamic feasible graph-based user pairing. Simulation results show that SC-SFMA achieves considerable peak signal-to-noise ratio (PSNR) and multi-scale structural similarity index measure (MS-SSIM) gains over deep joint source-channel coding (JSCC) and separation-based baselines. The proposed optimization framework attains significant sum rate improvements over conventional multiple access baselines.
This paper presents a Semantic Feature Multiple Access (SFMA) framework for multi-user semantic communication in downlink wireless systems. By extending SwinJSCC to a two-user superimposition paradigm, SFMA enables simultaneous semantic transmission to multiple users over shared time-frequency resources. A key innovation is the Cross-User Attention (CUA) module, which facilitates controlled semantic feature exchange between paired users by leveraging inter-image similarity while mitigating interference. We formulate a joint user pairing and resource allocation problem to minimize global semantic distortion under constraints on bandwidth, end-to-end latency, and energy. This mixed-integer non-convex problem is decomposed into a Minimum-Weight Perfect Matching (MWPM) sub-problem and a convex bandwidth allocation feasibility check, with semi-closed-form bandwidth bounds derived from a strictly concave rate expression. A polynomial-time algorithm based on Blossom matching and bisection search is proposed. Extensive simulations on ImageNet-100 show that SFMA significantly improves reconstruction quality across pairing modes, and the proposed optimization effectively reduces overall distortion while satisfying physical-layer constraints.
Koopman operator-based methods enable data-driven bilinear representations of unknown nonlinear control systems. Accurate representations often demand significantly higher dimensions than the original system, making control design challenging. Control Lyapunov Functions (CLFs) are widely used for controller synthesis, with quadratic CLF candidates being the most common due to their simplicity. Yet, we show that this class is highly restrictive, especially when the state dimension is large: under mild conditions, their existence implies stabilizability of the bilinear system by a constant input -- that is, the control remains fixed over time. We establish this result by formulating a quadratically constrained quadratic program (QCQP) that exactly characterizes valid CLFs. Since QCQPs are NP-hard, we propose a convex semidefinite relaxation that offers a sufficient validity condition. For single-input systems, we prove that a quadratic CLF requires constant control stabilizability, and empirically demonstrate that this extends to high-dimensional multi-input systems in many cases.
Extranodal extension (ENE) is an emerging prognostic factor in human papillomavirus (HPV)-associated oropharyngeal cancer (OPC), although it is currently omitted as a clinical staging criteria. Recent works have advocated for the inclusion of iENE as a prognostic marker in HPV-positive OPC staging. However, several practical limitations continue to hinder its clinical integration, including inconsistencies in segmentation, low contrast in the periphery of metastatic lymph nodes on CT imaging, and laborious manual annotations. To address these limitations, we propose a fully automated end-to-end pipeline that uses computed tomography (CT) images with clinical data to assess the status of nodal ENE and predict treatment outcomes. Our approach includes a hierarchical 3D semi-supervised segmentation model designed to detect and delineate relevant iENE from radiotherapy planning CT scans. From these segmentations, a set of radiomics and deep features are extracted to train an imaging-detected ENE grading classifier. The predicted ENE status is then evaluated for its prognostic value and compared with existing staging criteria. Furthermore, we integrate these nodal features with primary tumor characteristics in a multimodal, attention-based outcome prediction model, providing a dynamic framework for outcome prediction. Our method is validated in an internal cohort of 397 HPV-positive OPC patients treated with radiation therapy or chemoradiotherapy between 2009 and 2020. For outcome prediction at the 2-year mark, our pipeline surpassed baseline models with 88.2% (4.8) in AUC for metastatic recurrence, 79.2% (7.4) for overall survival, and 78.1% (8.6) for disease-free survival. We also obtain a concordance index of 83.3% (6.5) for metastatic recurrence, 71.3% (8.9) for overall survival, and 70.0% (8.1) for disease-free survival, making it feasible for clinical decision making.
UAV images are critical for applications such as large-area mapping, infrastructure inspection, and emergency response. However, in real-world flight environments, a single image is often affected by multiple degradation factors, including rain, haze, and noise, undermining downstream task performance. Current unified restoration approaches typically rely on implicit degradation representations that entangle multiple factors into a single condition, causing mutual interference among heterogeneous corrections. To this end, we propose DAME-Net, a Degradation-Aware Mixture-of-Experts Network that decouples explicit degradation perception from degradation-conditioned reconstruction for compositional UAV image restoration. Specifically, we design a Factor-wise Degradation Perception module(FDPM) to provide explicit per-factor degradation cues for the restoration stage through multi-label prediction with label-similarity-guided soft alignment, replacing implicit entangled conditions with interpretable and generalizable degradation descriptions. Moreover, we develop a Conditioned Decoupled MoE module(CDMM) that leverages these cues for stage-wise conditioning, spatial-frequency hybrid processing, and mask-constrained decoupled expert routing, enabling selective factor-specific correction while suppressing irrelevant interference. In addition, we construct the Multi-Degradation UAV Restoration benchmark (MDUR), the first large-scale UAV benchmark for compositional UAV image restoration, with 43 degradation configurations from single degradations to four-factor composites and standardized seen/unseen this http URL experiments on MDUR demonstrate consistent improvements over representative unified restoration methods, with greater gains on unseen and higher-order composite degradations. Downstream experiments further validate benefits for UAV object detection.
Considering efficiency, ultra-high-definition (UHD) low-light image restoration is extremely challenging. Existing methods based on Transformer architectures or high-dimensional complex convolutional neural networks often suffer from the "memory wall" bottleneck, failing to achieve millisecond-level inference on edge devices. To address this issue, we propose a novel real-time UHD low-light enhancement network based on geometric feature fusion using Clifford algebra in 2D Euclidean space. First, we construct a four-layer feature pyramid with gradually increasing resolution, which decomposes input images into low-frequency and high-frequency structural components via a Gaussian blur kernel, and adopts a lightweight U-Net based on depthwise separable convolution for dual-branch feature extraction. Second, to resolve structural information loss and artifacts from traditional high-low frequency feature fusion, we introduce spatially aware Clifford algebra, which maps feature tensors to a multivector space (scalars, vectors, bivectors) and uses Clifford similarity to aggregate features while suppressing noise and preserving textures. In the reconstruction stage, the network outputs adaptive Gamma and Gain maps, which perform physically constrained non-linear brightness adjustment via Retinex theory. Integrated with FP16 mixed-precision computation and dynamic operator fusion, our method achieves millisecond-level inference for 4K/8K images on a single consumer-grade device, while outperforming state-of-the-art (SOTA) models on several restoration metrics.
Integrating pretrained speech encoders with large language models (LLMs) is promising for ASR, but performance and data efficiency depend on the speech-language interface. A common choice is a learned projector that maps encoder features into the LLM embedding space, whereas an alternative is to expose discrete phoneme sequences to the LLM. Using the same encoder and LLM backbones, we compare phoneme-based and vanilla projector-based interfaces in high-resource English and low-resource Tatar. We also propose a BPE-phoneme interface that groups frequent local phoneme patterns while preserving explicit word-boundary cues for phoneme-to-grapheme generation. On LibriSpeech, the phoneme-based interface is competitive with the vanilla projector, and the BPE-phoneme interface yields further gains. On Tatar, the phoneme-based interface substantially outperforms the vanilla projector. We further find that phoneme supervision yields a phoneme-informed hybrid interface that is stronger than the vanilla projector.
Beyond-diagonal reconfigurable intelligent surfaces (BD-RISs) significantly improve wireless performance by allowing tunable interconnections among elements, but their design in multiple-input multiple-output (MIMO) systems has so far relied on complex iterative algorithms or suboptimal approximations. This work introduces a simple yet powerful approach: instead of directly maximizing the achievable rate, we maximize the absolute value of the determinant of the equivalent MIMO channel. We derive a closed-form symmetric unitary scattering matrix whose rank is exactly twice the channel's degrees of freedom ($2r$). Remarkably, this low-rank solution achieves the same determinant value as the optimal unitary BD-RIS. Using log-majorization theory, we prove that the rate loss relative to the optimal unitary BD-RIS vanishes at high signal-to-noise ratio (SNR) or when the number of BD-RIS elements becomes large. Moreover, the proposed solution can be perfectly implemented using a $q$-stem BD-RIS architecture with only $q=2r-1$ stems, requiring a minimum number of reconfigurable circuits. The resulting Max-Det solution is orders of magnitude faster to compute than existing iterative methods while achieving near-optimal rates in practical scenarios. This makes high-performance BD-RIS deployment feasible even with large surfaces and limited computational resources.
Multiple access techniques are vital for 5G and beyond. While Orthogonal Frequency Division Multiple Access (OFDMA) is standard, its high peak-to-average power ratio (PAPR) reduces energy efficiency in uplink transmissions. This paper presents Periodic OFDMA (P-OFDMA), a novel multiple access scheme with reduced PAPR and computational complexity. By assigning subcarriers in a periodic pattern across the entire frequency band, P-OFDMA enhances frequency diversity and simplifies allocation. We also introduce two precoded variants: P-OFDMA-DCT and P-OFDMA-DFT. Comprehensive simulations comparing P-OFDMA with OFDMA and SC-FDMA show that P-OFDMA-DFT consistently achieves the lowest PAPR. Furthermore, the standard P-OFDMA scheme outperforms SC-FDMA in PAPR for low subcarrier-per-user scenarios and achieves better bit error rate (BER) performance under high delay-spread conditions. Notably, P-OFDMA and its variants reduce transmitter-side processing by up to an eightfold factor compared to SC-FDMA, greatly benefiting low-complexity uplink devices. Although receiver complexity increases, the overall system processing load decreases, yielding improved energy efficiency. Thus, P-OFDMA offers a robust, energy-efficient uplink solution for future wireless networks.
In this letter, we consider the problem of decentralized decision making among connected autonomous vehicles at unsignalized intersections, where existing centralized approaches do not scale gracefully under mixed maneuver intentions and coordinator failure. We propose a closed-loop opinion-dynamic decision model for intersection coordination, where vehicles exchange intent through dual signed networks: a conflict topology based communication network and a commitment-driven belief network that enable cooperation without a centralized coordinator. Continuous opinion states modulate velocity optimizer weights prior to commitment; a closed-form predictive feasibility gate then freezes each vehicle's decision into a GO or YIELD commitment, which propagates back through the belief network to pre-condition neighbor behavior ahead of physical conflicts. Crossing order emerges from geometric feasibility and arrival priority without the use of joint optimization or a solver. The approach is validated across three scenarios spanning fully competitive, merge, and mixed conflict topologies. The results demonstrate collision-free coordination and lower last-vehicle exit times compared to first come first served (FCFS) in all conflict non-trivial configurations.
Soil moisture is a critical variable for managing irrigation, improving crop yield, and understanding field-scale hydrology. Radars mounted on unmanned aerial vehicles (UAVs) offer a promising means to monitor soil moisture over large fields with flexible, high-resolution coverage. However, during the growing season, canopy scattering and soil reflections become strongly coupled in the radar measurement. These coupled effects vary with crop structure or flight altitude, complicating the retrieval of soil moisture. To overcome this challenge, we present GreenScatter, a physics-based soil moisture retrieval framework for nadir-looking wideband UAV radars. GreenScatter introduces a microwave radiative transfer model that explicitly captures the dominant electromagnetic interactions between vegetation and soil, enabling accurate modeling of coherent ground backscatter through canopy. In parallel, it develops a radar cross-section (RCS) estimation method that transforms time-domain radar signals into calibrated wideband RCS spectra, isolating soil reflections while compensating for hardware and waveform effects. Together, these components enable robust soil moisture estimation through vegetation across varying canopy conditions and UAV configurations. Field experiments across multiple corn and soybean sites demonstrate consistent retrieval with an average volumetric water content (VWC) error of 4.49%.
We propose a generative framework for multi-track music source separation (MSS) that reformulates the task as conditional discrete token generation. Unlike conventional approaches that directly estimate continuous signals in the time or frequency domain, our method combines a Conformer-based conditional encoder, a dual-path neural audio codec (HCodec), and a decoder-only language model to autoregressively generate audio tokens for four target tracks. The generated tokens are decoded back to waveforms through the codec decoder. Evaluation on the MUSDB18-HQ benchmark shows that our generative approach achieves perceptual quality approaching state-of-the-art discriminative methods, while attaining the highest NISQA score on the vocals track. Ablation studies confirm the effectiveness of the learnable Conformer encoder and the benefit of sequential cross-track generation.
Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario. To address this issue, we propose a Multi-Task JRD (MT-JRD) dataset and an Attribute-assisted MT-JRD (AMT-JRD) model for Video Coding for Machines (VCM), enhancing both prediction accuracy and coding efficiency. First, we construct a dataset comprising 27,264 JRD annotations from machines, supporting three representative tasks including object detection, instance segmentation, and keypoint detection. Secondly, we propose the AMT-JRD prediction model, which integrates Generalized Feature Extraction Module (GFEM) and Specialized Feature Extraction Module (SFEM) to facilitate joint learning across multiple tasks. Thirdly, we innovatively incorporate object attribute information into object-wise JRD prediction through the Attribute Feature Fusion Module (AFFM), which introduces prior knowledge about object size and location. This design effectively compensates for the limitations of relying solely on image features and enhances the model's capacity to represent the perceptual mechanisms of machine vision. Finally, we apply the AMT-JRD model to VCM, where the accurately predicted JRDs are applied to reduce the coding bit rate while preserving accuracy across multiple machine vision tasks. Extensive experimental results demonstrate that AMT-JRD achieves precise and robust multi-task prediction with a mean absolute error of 3.781 and error variance of 5.332 across three tasks, outperforming the state-of-the-art single-task prediction model by 6.7% and 6.3%, respectively. Coding experiments further reveal that compared to the baseline VVC and JPEG, the AMT-JRD-based VCM improves an average of 3.861% and 7.886% Bjontegaard Delta-mean Average Precision (BD-mAP), respectively.
The Tactile Internet demands sub-millisecond latency and ultra-high reliability, as high latency or packet loss could lead to haptic control instability. To address this, we propose the Mode-Domain Architecture (MDA), a bilateral predictive neural network architecture designed to restore missing signals on both the human and robot sides. Unlike conventional models that extract features implicitly from raw data, MDA utilizes a novel Continuous-Orthogonal Mode Decomposition framework. By integrating an orthogonality constraint, we overcome the pervasive issue of "mode overlapping" found in state-of-the-art decomposition methods. Experimental results demonstrate that this structured feature extraction achieves high prediction accuracies of 98.6% (human) and 97.3% (robot). Furthermore, the model achieves ultra-low inference latency of 0.065 ms, significantly outperforming existing benchmarks and meeting the stringent real-time requirements of haptic teleoperation.
In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within histopathological images. To validate the efficiency of the proposed architecture, an extensive experiment was executed on a comprehensive multi-cancer dataset including Breast Cancer, Oral Cancer, Lung and Colon Cancer, Kidney Cancer, and Acute Lymphocytic Leukemia (ALL), including both original and segmented images were analyzed to assess model robustness across heterogeneous clinical imaging conditions. Our approach is benchmarked alongside several state-of-the-art CNN and transfer models, including DenseNet121, DenseNet201, InceptionV3, ResNet50, EfficientNetB3, multiple ViT variants, and Swin Transformer models. However, all models were trained and validated using a unified pipeline, incorporating balanced data preprocessing, transfer learning, and fine-tuning strategies. The experimental results demonstrated that our proposed architecture consistently gained superior performance, reaching 100% test accuracy for lung-colon cancer, segmented leukemia datasets, and up to 99.23% accuracy for breast cancer classification. The model also achieved near-perfect precision, f1 score, and recall, indicating highly stable scores across divers cancer types. Overall, the proposed model establishes a highly accurate, interpretable, and also robust multi-cancer classification system, demonstrating strong benchmark for future research and provides a unified comparative assessment useful for designing reliable AI-assisted histopathological diagnosis and clinical decision-making.
Audio and speech self-supervised encoder models are now widely used for a lot of different tasks. Many of these models are often trained on clean segmented speech content such as LibriSpeech. In this paper, we look into how the pretraining datasets of such SSL (Self-Supervised Learning) models impact their downstream results. We build a large pretraining corpus of highly diverse TV and Radio broadcast audio content, which we describe with automatic tools. We use these annotations to build smaller subsets, which we use to train audio SSL models. Then, we evaluate the models on multiple downstream tasks such as automatic speech recognition, voice activity and music detection, or speaker recognition. The results show the potential of pretraining SSL models on diverse audio content without restricting it to speech. We also perform a membership inference attack to evaluate the encoder ability to memorize their training datasets, which highlight the importance of data deduplication. This unified training could bridge speech and music machine learning communities.
Ensuring that Text-to-Speech (TTS) systems deliver human-perceived quality at scale is a central challenge for modern speech technologies. Human subjective evaluation protocols such as Mean Opinion Score (MOS) and Side-by-Side (SBS) comparisons remain the de facto gold standards, yet they are expensive, slow, and sensitive to pervasive assessor biases. This study addresses these barriers by formulating, and implementing a suite of novel neural models designed to approximate expert judgments in both relative (SBS) and absolute (MOS) settings. For relative assessment, we propose NeuralSBS, a HuBERT-backed model achieving 73.7% accuracy (on SOMOS dataset). For absolute assessment, we introduce enhancements to MOSNet using custom sequence-length batching, as well as WhisperBert, a multimodal stacking ensemble that combines Whisper audio features and BERT textual embeddings via weak learners. Our best MOS models achieve a Root Mean Square Error (RMSE) of ~0.40, significantly outperforming the human inter-rater RMSE baseline of 0.62. Furthermore, our ablation studies reveal that naively fusing text via cross-attention can degrade performance, highlighting the effectiveness of ensemble-based stacking over direct latent fusion. We additionally report negative results with SpeechLM-based architectures and zero-shot LLM evaluators (Qwen2-Audio, Gemini 2.5 flash preview), reinforcing the necessity of dedicated metric learning frameworks.
Existing deep learning methods for radiology report generation enhance diagnostic efficiency but often overlook physician-informed medical priors. This leads to a suboptimal alignment between the structured explanations and disease manifestations. Eye gaze data provides critical insights into a radiologist's visual attention, enhancing the relevance and interpretability of extracted features while aligning with human decision-making processes. However, despite its promising potential, the integration of eye gaze information into AI-driven medical imaging workflows is impeded by challenges such as the complexity of multimodal data fusion and the high cost of gaze acquisition, particularly its absence during inference, limiting its practical applicability in real-world clinical settings. To address these issues, we introduce Gaze2Report, a framework which leverages a scanpath prediction module and Graph Neural Network (GNN) to generate joint visual-gaze tokens. Combined with instruction and report tokens, these form a multimodal prompt used to fine-tune LoRA layers of large language models (LLMs) for autoregressive report generation. Gaze2Report enhances report quality through eye-gaze-guided visual learning and incorporates on-the-fly scanpath prediction, enabling the model to operate without gaze input during inference.
Plant-level control is an emerging wind energy technology that presents opportunities and challenges. By controlling turbines in a coordinated manner via a central controller, it is possible to achieve greater wind power plant efficiency. However, there is a risk that measurement errors will confound the process, or even that hackers will alter the telemetry signals received by the central controller. This paper presents a framework for developing a safe plant controller by training it with an adversarial agent designed to confound it. This necessitates training the adversary to confound the controller, creating a sort of circular logic or "Arms Race." This paper examines three broad training approaches for co-training the protagonist and adversary, finding that an Arms Race approach yields the best results. These initial results indicate that the Arms Race adversarial training reduced worst-case performance degradation from 39% power loss to 7.9% power gain relative to a baseline operational strategy.
Word error rate (WER) is the dominant metric for automatic speech recognition, yet it cannot detect a systematic failure mode: models that produce fluent output in the wrong writing system. We define Script Fidelity Rate (SFR), the fraction of hypothesis characters in the target script block, computable without reference transcriptions, and report the first systematic measurement of script collapse across six languages spanning four writing systems (Pashto, Urdu, Hindi, Bengali, Malayalam, Somali) and nine ASR models on FLEURS test sets. Across 53 evaluated model-language pairs, 18 (34%; 95% Wilson CI: 23-47%) exhibit script collapse (SFR < 10%); MMS-1B and SeamlessM4T-v2 maintain SFR above 99% on every language evaluated, confirming that SFR correctly identifies high fidelity where it is present. We identify three distinct collapse patterns: Latin phonetic substitution (smaller Whisper on Indic languages), Arabic substitution for Somali's Latin-script orthography, and Devanagari substitution where larger Whisper models treat all Indic audio as Hindi, a failure present even in Whisper large-v3.
During disasters, cascading failures across power grids, communication networks, and social behavior amplify community fear and undermine cooperation. Existing cyber-physical-social (CPS) models simulate these coupled dynamics but lack mechanisms for active intervention. We extend the CPS resilience model of Valinejad and Mili (2023) with control channels for three agencies, communication, power, and emergency management, and formulate the resulting system as a three-player non-zero-sum differential game solved via online actor-critic reinforcement learning. Simulations based on Hurricane Harvey data show 70% mean fear reduction with improved infrastructure recovery; cross-validation in the case of Hurricane Irma (without refitting) achieves 50% fear reduction, confirming generalizability.
This paper proposes a mathematical model for the coevolution of actions and opinions for a population facing a social dilemma. In particular, we assume each person participates in a Public Goods Game (PGG), with their action being to cooperate or defect, and holds an opinion about which action they prefer. We propose a payoff function that combines the PGG with the Friedkin--Johnsen model from opinion dynamics to form a coevolutionary game. According to a discrete-time process, players asynchronously update their actions and opinions, aiming to maximise their individual payoff for the coevolutionary game using myopic best-response. We study the equilibria and provide conditions for the existence of the all-defection and all-cooperation consensus equilibria. We also establish conditions for global convergence to the all-defection equilibrium.
Determinantal point processes (DPPs) are probability models over subsets of a ground set that favor diverse selections while suppressing redundancy. That is, they tend to assign higher likelihood to collections whose elements complement one another instead of repeating the same information. For example, in recommendation systems, a DPP prefers showing users several relevant items that differ in content or style, rather than many near-duplicates of essentially the same item. Although DPPs have been studied extensively in machine learning, random matrix theory, and popularized through components of YouTube's search recommendation system, they have not been considered in the context of dynamic systems; time domain analysis is not a feature of DPPs. This paper establishes interesting connections between DPPs and control theory. By showing that the observability (controllability) Gramian parameterized by sensor (control) node subsets is a DPP, we provide a probabilistic and spectral perspective on sensor (actuator) selection for linear dynamic systems. This notion of probability here does not represent stochastic uncertainty in the system dynamics; it instead represents a likelihood measure over sensor (actuator) configurations induced by the Gramian. To that end, we derive an effective observable rank condition, characterize the balance between individual node contributions and diversity, and establish node inclusion monotonicity and negative dependence properties. Finally, we show that this formulation recovers classical greedy optimization guarantees and admits a maximum a posteriori interpretation of the sensor/actuator node selection problem. Numerical case studies on three network topologies corroborate the theoretical results.
The data-driven linear quadratic regulator (ddLQR) is a widely studied control method for unknown dynamical systems with disturbance. Existing approaches, both indirect, i.e., those that identify a model followed by model-based design, and direct, which bypasses the identification step, often rely on the certainty-equivalence principle and therefore do not explicitly account for model uncertainty. In this paper, we propose a Bayesian formulation for both indirect and direct ddLQR that incorporates posterior uncertainty into the control design. The resulting expected cost decomposes into a certainty-equivalence term and a variance-dependent term, providing a principled interpretation of regularization. We further show that the indirect and direct formulations are equivalent under this perspective. The resulting direct method admits a tractable semidefinite program whose size is independent of the data length. Numerical simulations demonstrate improved optimality gap and closed-loop stability, particularly in low-data regimes.
Image manipulation localization (IML) and general vision tasks are typically treated as two separate research directions due to the fundamental differences between manipulation-specific and semantic features. In this paper, however, we bridge this gap by introducing a fresh perspective: these two directions are intrinsically connected, and general semantic priors can benefit IML. Building on this insight, we propose a novel trainable adapter (named ReVi) that repurposes existing off-the-shelf general-purpose vision models (e.g., image generation and segmentation networks) for IML. Inspired by robust principal component analysis, the adapter disentangles semantic redundancy from manipulation-specific information embedded in these models and selectively enhances the latter. Unlike existing IML methods that require extensive model redesign and full retraining, our method relies on the off-the-shelf vision models with frozen parameters and only fine-tunes the proposed adapter. The experimental results demonstrate the superiority of our method, showing the potential for scalable IML frameworks.
This paper studies equality-constrained minimization problems through the lens of feedback control. We introduce a unified control-theoretic framework by showing that a PID feedback law acting on the dual variable induces the PID saddle-point flow (PID-SPF), a broad class of saddle-point dynamics associated with the augmented Lagrangian. This framework recovers several classical primal-dual flows as special cases. We prove that the equilibria of the proposed flow coincide with the stationary points of the original problem. Our analysis reveals how the feedback gains affect the optimization: integral action enforces constraint satisfaction, proportional action introduces the augmented Lagrangian structure, and derivative action modifies the geometry of the primal dynamics by inducing a state-dependent Riemannian metric. Moreover, for convex problems with affine constraints, we establish global exponential convergence by leveraging contraction theory for all admissible PID gains, providing in the process explicit bounds on the convergence rate. Finally, we validate our theoretical results on numerical examples including an application to bilevel optimization.
This paper presents a generalized circuit framework for constructing Shih-type fractionalizations of unitary operators of dyadic order, i.e., operators $U$ satisfying $U^{2^n}=I$. Building upon the architecture of the quantum fractional Fourier transform (QFrFT), we show that fractionalization can be implemented coherently as a weighted superposition of integer powers, $\sum_k c_k(\alpha)U^k$, where the coefficients are generated through an ancilla-domain quantum Fourier transform and a diagonal phase modulation. Under the assumption that controlled implementations of the required powers of $U$ are available, the resulting circuit yields a parameterized family of operators that interpolates the integer powers of $U$ and satisfies the additive property of fractional transforms. As concrete applications, we derive explicit quantum circuit realizations of the quantum fractional Hartley transform (QFrHT) and of the fractional cosine-transform families associated with Types~I and~IV. These constructions demonstrate the versatility of the proposed dyadic-order fractionalization framework for structured operators arising in quantum signal processing.
This paper presents an online intention prediction framework for estimating the goal state of autonomous systems in real time, even when intention is time-varying, and system dynamics or objectives include unknown parameters. The problem is formulated as an inverse optimal control / inverse reinforcement learning task, with the intention treated as a parameter in the objective. A shifting horizon strategy discounts outdated information, while online control-informed learning enables efficient gradient computation and online parameter updates. Simulations under varying noise levels and hardware experiments on a quadrotor drone demonstrate that the proposed approach achieves accurate, adaptive intention prediction in complex environments.
A novel stability-enhanced Gaussian process variational autoencoder (SEGP-VAE) is proposed for indirectly training a low-dimensional linear time invariant (LTI) system, using high-dimensional video data. The mean and covariance function of the novel SEGP prior are derived from the definition of an LTI system, enabling the SEGP to capture the indirectly observed latent process using a combined probabilistic and interpretable physical model. The search space of LTI parameters is restricted to the set of semi-contracting systems via a complete and unconstrained parametrisation. As a result, the SEGP-VAE can be trained using unconstrained optimisation algorithms. Furthermore, this parametrisation prevents numerical issues caused by the presence of a non-Hurwitz state matrix. A case study applies SEGP-VAE to a dataset containing videos of spiralling particles. This highlights the benefits of the approach and the application-specific design choices that enabled accurate latent state predictions.
Full-duplex dialogue audio, in which each speaker is recorded on a separate track, is an important resource for spoken dialogue research, but is difficult to collect at scale. Most in-the-wild two-speaker dialogue is available only as degraded monaural mixtures, making it unsuitable for systems requiring clean speaker-wise signals. We propose DialogueSidon, a model for joint restoration and separation of degraded monaural two-speaker dialogue audio. DialogueSidon combines a variational autoencoder (VAE) operates on the speech self-supervised learning (SSL) model feature, which compresses SSL model features into a compact latent space, with a diffusion-based latent predictor that recovers speaker-wise latent representations from the degraded mixture. Experiments on English, multilingual, and in-the-wild dialogue datasets show that DialogueSidon substantially improves intelligibility and separation quality over a baseline, while also achieving much faster inference.
Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffusion-based models offer a promising alternative through parallel generation, but they still require multiple denoising iterations. Compressing multi-step denoising to a single step could further reduce latency, but often degrades textual coherence due to the mean-field bias introduced by token-factorized denoisers. To address this challenge, we propose \textbf{ECHO}, an efficient diffusion-based VLM (dVLM) for chest X-ray report generation. ECHO enables stable one-step-per-block inference via a novel Direct Conditional Distillation (DCD) framework, which mitigates the mean-field limitation by constructing unfactorized supervision from on-policy diffusion trajectories to encode joint token dependencies. In addition, we introduce a Response-Asymmetric Diffusion (RAD) training strategy that further improves training efficiency while maintaining model effectiveness. Extensive experiments demonstrate that ECHO surpasses state-of-the-art autoregressive methods, improving RaTE and SemScore by \textbf{64.33\%} and \textbf{60.58\%} respectively, while achieving an \textbf{$8\times$} inference speedup without compromising clinical accuracy.
In this paper, we investigate a data-driven framework to solve Linear Quadratic Regulator (LQR) problems when the dynamics is unknown, with the additional challenge of providing stability certificates for the overall learning and control scheme. Specifically, in the proposed on-policy learning framework, the control input is applied to the actual (unknown) linear system while iteratively optimized. We propose a learning and control procedure, termed Relearn LQR, that combines a recursive least squares method with a direct policy search based on the gradient method. The resulting scheme is analyzed by modeling it as a feedback-interconnected nonlinear dynamical system. A Lyapunov-based approach, exploiting averaging and timescale separation theories for nonlinear systems, allows us to provide formal stability guarantees for the whole interconnected scheme. The effectiveness of the proposed strategy is corroborated by numerical simulations, where Relearn LQR is deployed on an aircraft control problem, with both static and drifting parameters.
Wildfires and other extreme weather conditions due to climate change are stressing the aging electrical infrastructure. Power utilities have implemented public safety power shutoffs as a method to mitigate the risk of wildfire by proactively de-energizing some power lines, which leaves customers without power. System operators have to make a compromise between de-energizing of power lines to avoid the wildfire risk and energizing those lines to serve the demand. In this work, with a quantified wildfire ignition risk of each line, a resilient operation problem is presented in power systems with a high penetration level of renewable generation resources. A two-stage robust optimization problem is formulated and solved using column-and-constraint generation algorithm to find improved balance between the de-energization of power lines and the customers served. Different penetration levels of renewable generation to mitigate the impact of extreme fire hazard situations on the energization of customers is assessed. The validity of the presented robust optimization algorithm is demonstrated on various test cases.
A two-layer control architecture is proposed to enable scalable implementations for constraint-based decision strategies, such as model predictive controllers. The bottom layer is based upon a distributed feedback-feedforward scheme that directs the controlled network's information flow according to a pre-specified communication infrastructure. Explicit expressions for the resulting closed-loop maps are obtained, and an offline model-matching procedure is proposed for designing the first layer. The obtained control laws are deployed via distributed state-space-based implementations, and the resulting closed-loop models enable predictive control design for the constraint management procedure described in our companion paper.
Autonomous Sensory Meridian Response (ASMR) has been remarkably popular in the recent decade, yet whether its effects can be deliberately engineered remains an open question. While ASMR effects validated through behavioral studies and neuro-physiological measurements such as electroencephalography (EEG) and related bio-signals, the acoustic mechanisms that trigger it remain poorly understood. We investigate whether ASMR responses can be systematically induced through controlled acoustic design, hypothesizing that cyclic patterns where predictability drives relaxation and variation sustains intrigue are key engineerable parameters. Specifically, we design cyclic sound patterns with varying predictability and randomness, and evaluate their effects via a structured user study. Signal processing-based feature extraction and regression analysis are used to establish an interpretable mapping between acoustic structure and perceived ASMR effects. Results show that relaxing effects accumulate progressively, are independent of spatial orientation, and remain stable across time. Crucially, smoothly spread, energy-dense cyclic patterns most effectively trigger ASMR, suggesting that signal-level engineering of ASMR experiences is achievable
In this paper, we show that an eXtremely Large (XL) Multiple-Input Multiple-Output (MIMO) wireless system with appropriate analog combining components exhibits the properties of a universal function approximator, similar to a feedforward neural network. By treating the channel coefficients as the random nodes of a hidden layer and the receiver's analog combiner as a trainable output layer, we cast the XL MIMO system to the Extreme Learning Machine (ELM) framework, leading to a novel formulation for Over-The-Air (OTA) edge inference without requiring traditional digital processing nor pre-processing at the transmitter. Through theoretical analysis and numerical evaluation, we showcase that XL-MIMO-ELM enables near-instantaneous training and efficient classification, even in varying fading conditions, suggesting the paradigm shift of beyond massive MIMO systems as OTA artificial neural networks alongside their profound communications role. Compared to conventional ELMs and deep learning approaches, whose training takes seconds to minutes, the proposed framework achieves on par performance (above $90\%$ classification accuracy across multiple data sets) with optimization latency of few milliseconds under the same number of trainable parameters, considering rich fading, low noise channels with XL receive antennas, making it highly attractive for inference tasks with ultra-low-power devices.
This paper proposes a neural stochastic optimization method for efficiently solving the two-stage stochastic unit commitment (2S-SUC) problem under high-dimensional uncertainty scenarios. The proposed method approximates the second-stage recourse problem using a deep neural network trained to map commitment decisions and uncertainty features to recourse costs. The trained network is subsequently embedded into the first-stage UC problem as a mixed-integer linear program (MILP), allowing for explicit enforcement of operational constraints while preserving the key uncertainty characteristics. A scenario-embedding network is employed to enable dimensionality reduction and feature aggregation across arbitrary scenario sets, serving as a data-driven scenario reduction mechanism. Numerical experiments on IEEE 5-bus, 30-bus, and 118-bus systems demonstrate that the proposed neural two-stage stochastic optimization method achieves solutions with an optimality gap of less than 1%, while enabling orders-of-magnitude speedup compared to conventional MILP solvers and decomposition-based methods. Moreover, the model's size remains constant regardless of the number of scenarios, offering significant scalability for large-scale stochastic unit commitment problems.
Two-stage stochastic unit commitment (2S-SUC) problems have been widely adopted to manage the uncertainties introduced by high penetrations of intermittent renewable energy resources. While decomposition-based algorithms such as column-and-constraint generation has been proposed to solve these problems, they remain computationally prohibitive for large-scale, real-time applications. In this paper, we introduce a Neural Column-and-Constraint Generation (Neural CCG) method to significantly accelerate the solution of 2S-SUC problems. The proposed approach integrates a neural network that approximates the second-stage recourse problem by learning from high-level features of operational scenarios and the first-stage commitment decisions. This neural estimator is embedded within the CCG framework, replacing repeated subproblem solving with rapid neural evaluations. We validate the effectiveness of the proposed method on the IEEE 118-bus system. Compared to the original CCG and a state-of-the-art commercial solver, Neural CCG achieves up to 130.1$\times$ speedup while maintaining a mean optimality gap below 0.096\%, demonstrating its strong potential for scalable stochastic optimization in power system.
The increasing integration of intermittent distributed energy resources (DERs) has introduced significant variability in distribution networks, posing challenges to voltage regulation and reactive power management. This paper presents a novel neural two-stage stochastic Volt-VAR optimization (2S-VVO) method for three-phase unbalanced distribution systems considering network reconfiguration under uncertainty. To address the computational intractability associated with solving large-scale scenario-based 2S-VVO problems, a learning-based acceleration strategy is introduced, wherein the second-stage recourse model is approximated by a neural network. This neural approximation is embedded into the optimization model as a mixed-integer linear program (MILP), enabling effective enforcement of operational constraints related to the first-stage decisions. Numerical simulations on a 123-bus unbalanced distribution system demonstrate that the proposed approach achieves over 50 times speedup compared to conventional solvers and decomposition methods, while maintaining a typical optimality gap below 0.30%. These results underscore the method's efficacy and scalability in addressing large-scale stochastic VVO problems under practical operating conditions.
The integration of distributed energy resources (DERs) into wholesale electricity markets, as mandated by FERC Order 2222, imposes new challenges on system operations. To remain consistent with existing market structures, regional transmission organizations (RTOs) have advanced the aggregation of transmission-node-level DERs (T-DERs), where a nodal virtual power plant (VPP) represents the mapping of all distribution-level DERs to their respective transmission nodes. This paper develops a real-time economic dispatch (RTED) framework that enables multi-transmission-node DER aggregation while addressing computational efficiency. To this end, we introduce a spatio-temporal graph convolutional network (ST-GCN) for adaptive prediction of distribution factors (DFs), thereby capturing the dynamic influence of individual T-DERs across the transmission system. Furthermore, an iterative constraint identification strategy is incorporated to alleviate transmission security constraints without compromising system reliability. Together, these innovations accelerate the market clearing process and support the effective participation of T-DER aggregators under current market paradigms. The proposed approach is validated on large-scale test systems, including modified 118-, 2383-, and 3012-bus networks under a rolling RTED setting with real demand data. Numerical results demonstrate significant improvements in reducing operational costs and maintaining transmission network feasibility, underscoring the scalability and practicality of the proposed framework.
This contribution develops an algebraic approach to obtain a controller form for a class of linear hyperbolic MIMO systems, bidirectionally coupled with a linear ODE system at the unactuated boundary. After a short summary of established controller forms for SISO and MIMO ODE as well as SISO hyperbolic PDE systems, it is shown that the approach to state a controller form for SISO systems cannot easily be transferred to the MIMO case as it already fails for a very simple example. Next, a generalised hyperbolic controller form with different variants is proposed and a new flatness-based scheme to compute said form is presented. Therein, the system is treated in an algebraic setting where quasipolynomials are used to express the predictions and delays in the system. The proposed algorithm is then applied to the motivating example.
This paper derives conditions under which Model Predictive Control (MPC) with terminal conditions, using a data-driven surrogate model as a prediction model, asymptotically stabilizes the plant despite approximation errors. In particular, we prove recursive feasibility and asymptotic stability if a proportional error bound holds, where proportional means that the bound is linear in the norm of the state and the input. For a broad class of nonlinear systems, this condition can be satisfied using data-driven surrogate models generated by kernel Extended Dynamic Mode Decomposition (kEDMD) using the Koopman operator. Last, the applicability of the proposed framework is demonstrated in a numerical case study.
The emergence of commercial satellite communications networks, such as Starlink and OneWeb, has significantly transformed the communications landscape over the last years. As a complement to terrestrial cellular networks, non-terrestrial systems enable coverage extension and reliability enhancement beyond the limits of conventional infrastructure. Currently, the high reliance on terrestrial networks exposes communications to vulnerabilities in the event of terrestrial infrastructure failures, e.g., due to natural disasters. Therefore, this work proposes the joint evaluation of Key Performance Indicators (KPIs) for two non-terrestrial satellite networks (Starlink and OneWeb) and two terrestrial cellular networks to assess the current performance of these technologies across three different environments: (i) urban, (ii) suburban, and (iii) forest scenarios. Additionally, multi-connectivity techniques are explored to determine the benefits in connectivity when two technologies are used simultaneously. For instance, the outage probability of Starlink and OneWeb in urban areas is reduced from approximately 12-21% to 2% when both solutions are employed together. Finally, the joint analysis of KPIs in both terrestrial and non-terrestrial networks demonstrates that their integration enhances coverage, improves performance, and increases reliability, highlighting the benefits of combining satellite and terrestrial systems in the analyzed environments.
In this paper, we consider power allocation and antenna activation of cell-free massive multiple-input multiple-output (CFmMIMO) systems. We first derive closed-form expressions for the system spectral efficiency (SE) and energy efficiency (EE) as functions of the power allocation coefficients and the number of active antennas at the access points (APs). Then, we aim to enhance the EE through jointly optimizing antenna activation and power control. This task leads to a non-convex and mixed-integer design problem with high-dimensional design variables. To address this, we propose a novel DRL-based framework, in which the agent learns to map large-scale fading coefficients to AP activation ratio, antenna coefficient, and power coefficient. These coefficients are then employed to determine the number of active antennas per AP and the power factors assigned to users based on closed-form expressions. By optimizing these parameters instead of directly controlling antenna selection and power allocation, the proposed method transforms the intractable optimization into a low-dimensional learning task. Our extensive simulations demonstrate the efficiency and scalability of the proposed scheme. Specifically, in a CFmMIMO system with 40 APs and 20 users, it achieves a 50% EE improvement and 3350 times run time reduction compared to the conventional sequential convex approximation method.
Hybrid reconfigurable intelligent surfaces (HRIS) enhance wireless systems by combining passive reflection with active signal amplification. However, jointly optimizing the transmit beamforming with the HRIS reflection and amplification coefficients to maximize spectral efficiency (SE) is a non-convex problem, and conventional iterative solutions are computationally intensive. To address this, we propose a deep reinforcement learning (DRL) framework that learns a direct mapping from channel state information to the near-optimal transmit beamforming and HRIS configurations. The DRL model is trained offline, after which it can compute the beamforming and HRIS configurations with low complexity and latency. Simulation results demonstrate that our DRL-based method achieves 95% of the SE obtained by the alternating optimization benchmark, while significantly lowering the computational complexity.
We study the problem of implementing a fully-connected layer of a neural network using wireless over-the-air computing. We assume a multi hop system with a multi-antenna transmitter and receiver, along with a number of multi-hop amplify-and-forward relay devices in between. We formulate an optimization problem that optimizes the transmitter precoder, receiver combiner and amplify-and-forward gains, subject to relay device power constraint and transmitter power constraint. We propose an alternating optimization framework that optimizes the imitation accuracy. Simulation study results reveal that multi-hop relaying achieves an almost perfect classification accuracy when used in a neural network.
This paper considers for the first time pursuit-evasion (PE) differential games with irrational perceptions of both pursuer and evader on probabilistic characteristics of environmental uncertainty. Firstly, the irrational perceptions of risk aversion and probability sensitivity are modeled and incorporated within a Bayesian PE differential game framework by using Cumulative Prospect Theory (CPT) approach; Secondly, several sufficient conditions of capturability are established in terms of system dynamics and irrational parameters; Finally, the existence of CPT-Nash equilibria is rigorously analyzed by invoking Brouwer's fixed-point theorem. The new results reveal that irrational behaviors benefit the pursuer in some cases and the evader in others. Certain captures that are unachievable under rational behaviors can be achieved under irrational ones. By bridging irrational behavioral theory with game-theoretic control, this framework establishes a rigorous theoretical foundation for practical control engineering within complex human-machine systems.
This work presents a hybrid physics-informed and data-driven modeling framework for predictive control of autonomous off-road vehicles operating on deformable terrain. Traditional high-fidelity terramechanics models are often too computationally demanding to be directly used in control design. Modern Koopman operator methods can be used to represent the complex terramechanics and vehicle dynamics in a linear form. We develop a framework whereby a Koopman linear system can be constructed using data from simulations of a vehicle moving on deformable terrain. For vehicle simulations, the deformable-terrain terramechanics are modeled using Bekker-Wong theory, and the vehicle is represented as a simplified five-degree-of-freedom (5-DOF) system. The Koopman operators are identified from large simulation datasets for sandy loam and clay using a recursive subspace identification method, where Grassmannian distance is used to prioritize informative data segments during training. The advantage of this approach is that the Koopman operator learned from simulations can be updated with data from the physical system in a seamless manner, making this a hybrid physics-informed and data-driven approach. Prediction results demonstrate stable short-horizon accuracy and robustness under mild terrain-height variations. When embedded in a constrained MPC, the learned predictor enables stable closed-loop tracking of aggressive maneuvers while satisfying steering and torque limits.
High-performance piezoelectric resonators are promising energy storage elements for piezoelectric power conversion due to their compact footprint and low loss at frequencies where conventional magnetic components become bulky and inefficient. However, their practical use is often limited by the trade-off between a high electromechanical coupling coefficient (k^2) for wide-band operation and the emergence of spurious acoustic modes that limit the resonators' inductive bandwidth. This work reports a spurious-free thickness-extensional (TE)-mode bulk acoustic wave (BAW) resonator in single-crystal lithium niobate (LN) based on a grounded-ring electrode architecture. The proposed structure is analyzed through simulation and experimentally validated using electrical characterization and laser Doppler vibrometry (LDV). The results show that the grounded ring modifies the effective boundary conditions of the acoustic device, enabling a piston-like modal response that suppresses lateral spurious modes across the inductive band. The demonstrated device operates at 10.14 MHz and achieves an electromechanical coupling coefficient of 29.6%, a maximum in-band Bode quality factor (Q_Bode) of 5230, and a figure of merit (FoM, Q*k^2) of 1548. These results establish the grounded-ring TE-mode LN BAW resonator as a practical platform for piezoelectric power conversion and a broader design approach for realizing high-performance spurious-free acoustic resonators.
This paper presents a detailed measurement campaign and a comprehensive analysis of 15 GHz ultra-massive multiple-input multiple-output (UM-MIMO) channels tailored for the urban microcell (UMi) environment. Channel sounding is performed over 14.875-15.125 GHz using a time-domain platform comprising a 128-element L-shaped transmit array and a 64-element square receive array. Four representative scenarios are investigated, namely near-field line-of-sight (LoS), near-field foliage-shaded, far-field foliage-shaded, and far-field LoS street canyon scenarios, resulting in 81 distinct transmit-receive links. Based on the measured data, conventional channel characteristics, including path loss, power delay angle profiles, delay spread, and angular spread, are characterized, while UM-MIMO-specific phenomena associated with near-field effects, spatial non-stationarity (SNS), and channel hardening (CHD) are quantitatively analyzed. Channel capacity is further evaluated to reveal the effects of different UMi propagation conditions on system performance. The reported results provide empirical support for the new mid-band spectrum (6-24 GHz, including Frequency Range 3 (FR3)) UM-MIMO channel modeling and offer practical guidance for the design and deployment of future sixth-generation (6G) microcell networks.
The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput. This paper explores advisory autonomy, in which real-time driving advisories are issued to the human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent studies of coordinating CAVs have resorted to leveraging deep reinforcement learning (RL). Coarse-grained advisory is formalized as zero-order holds, and we consider a range of hold duration from 0.1 to 40 seconds. However, despite the similarity of the higher frequency tasks on CAVs, a direct application of deep RL fails to be generalized to advisory autonomy tasks. To overcome this, we utilize zero-shot transfer, training policies on a set of source tasks--specific traffic scenarios with designated hold durations--and then evaluating the efficacy of these policies on different target tasks. We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.
This paper mainly investigates a class of distributed Variational Generalized Nash Equilibrium (VGNE) seeking problems for both online noncooperative games and online aggregative games with time-varying coupling inequality constraints. Two novel continuous-time distributed VGNE seeking algorithms are proposed, which realize the constant regret bound and sublinear fit bound, superior to those of the criteria for online optimization problems and online games. Furthermore, to reduce unnecessary communication among players, a dynamic event-triggered mechanism involving internal variables is introduced into the distributed VGNE seeking algorithm, while the constant regret bound and sublinear fit bound are still maintained. Also, the Zeno behavior is strictly prohibited. Moreover, we further investigate the impact of communication noise on the player's measurement of its neighbors' relative states. It is demonstrated that both the regret and fit bounds remain valid as long as the noise level is not excessively large. This result reveals, to some extent, the proposed algorithm's noise-resilient capability. Finally, an online Uncrewed Aerial Vehicle (UAV) swarm game and an online Nash-Cournot game are given to demonstrate the validity of the theoretical results.
This paper studies equality-constrained composite minimization problems. This class of problems, capturing regularization terms and inequality constraints, naturally arises in a wide range of engineering and machine learning applications. To tackle these optimization problems, inspired by recent results, we introduce the \emph{proportional--integral proximal gradient dynamics} (PI--PGD): a closed-loop system where the Lagrange multipliers are control inputs and states are the problem decision variables. First, we establish the equivalence between the stationary points of the minimization problem and the equilibria of the PI--PGD. Then for the case of affine constraints, by leveraging tools from contraction theory we give a comprehensive convergence analysis for the dynamics, showing linear--exponential convergence towards the equilibrium. That is, the distance between each solution and the equilibrium is upper bounded by a function that first decreases linearly and then exponentially. Our findings are illustrated numerically on a set of representative examples, which include an exploratory application to nonlinear equality constraints.
This paper introduces an effective framework for designing memoryless dissipative full-state feedback for general linear delay systems via the Krasovskiĭ functional (KF) approach, where an arbitrary finite number of pointwise and general distributed delays (DDs) exists in the state, input and output. To handle the infinite dimensionality of DDs, we employ the Kronecker-Seuret Decomposition (KSD) which we recently proposed for analyzing matrix-valued functions in the context of delay systems. The KSD enables factorization or least-squares approximation of any number of $\fL^2$ DD kernels from any number of DDs without introducing conservatism. This also facilitates the construction of a complete-type KF with flexible integral kernels by means of a novel integral inequality derived from the least-squares principle. Our solution includes two theorems and an iterative algorithm to compute controller gains without relying on nonlinear solvers. A numerical example is tested to show the effectiveness of the proposed approach.
While recent Multimodal Large Language Models exhibit impressive capabilities for general multimodal tasks, specialized domains like music necessitate tailored approaches. Music Audio-Visual Question Answering (Music AVQA) particularly underscores this, presenting unique challenges with its continuous, densely layered audio-visual content, intricate temporal dynamics, and the critical need for domain-specific knowledge. Through a systematic analysis of Music AVQA datasets and methods, this paper identifies that specialized input processing, architectures incorporating dedicated spatial-temporal designs, and music-specific modeling strategies are critical for success in this domain. Our study provides valuable insights for researchers by highlighting effective design patterns empirically linked to strong performance, proposing concrete future directions for incorporating musical priors, and aiming to establish a robust foundation for advancing multimodal musical understanding. We aim to encourage further research in this area and provide a GitHub repository of relevant works: this https URL.
We study the factor model problem, which aims to uncover low-dimensional structures in high-dimensional datasets. Adopting a robust data-driven approach, we formulate the problem as a saddle-point optimization. Our primary contribution is a first-order algorithm that solves this reformulation by leveraging a linear minimization oracle (LMO). We further develop semi-closed form solutions (up to a scalar) for three specific LMOs, corresponding to the Frobenius norm, Kullback-Leibler divergence, and Gelbrich (aka Wasserstein) distance. The analysis includes explicit quantification of these LMOs' regularity conditions, notably the Lipschitz constants of the dual function, which govern the algorithm's convergence performance. Numerical experiments confirm our method's effectiveness in high-dimensional settings, outperforming standard off-the-shelf optimization solvers.
Traditional neural networks struggle to capture the spectral structure of complex signals. Fourier neural networks (FNNs) attempt to address this by embedding Fourier series components, yet many real-world signals are almost-periodic with non-commensurate frequencies, posing additional challenges. Building on prior work showing that ARIMA outperforms large language models (LLMs) for time series forecasting, we extend the comparison to neural predictors and find that ARIMA still maintains a clear advantage. Inspired by this finding, we propose the Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network (AR-KAN). Based in the Universal Myopic Mapping Theorem, it integrates a pre-trained AR module for temporal memory with a KAN for nonlinear representation. We prove that the AR module preserves essential temporal features while reducing redundancy, and that the upper bound of the approximation error for AR-KAN is smaller than that for KAN in a probabilistic sense. Experimental results also demonstrate that AR-KAN delivers exceptional performance compared to existing models, both on synthetic almost-periodic functions and real-world datasets. These results highlight AR-KAN as a robust and effective framework for time series forecasting. Our code is available at this https URL.
Proximal methods such as the Alternating Direction Method of Multipliers (ADMM) are effective at solving constrained quadratic programs (QPs). To tackle infeasible QPs, slack variables are often introduced to ensure feasibility, which changes the structure of the problem, increases its size, and slows down numerical resolution. In this letter, we propose a simple ADMM scheme to tackle QPs with slack variables without increasing the size of the original problem. The only modification is a slightly different projection in the z-update, while the rest of the algorithm remains standard. We prove that the method is equivalent to applying ADMM to the QP with additional slack variables, even though slack variables are not added. Numerical experiments show speedups of the approach.
We study feasibility guarantees for safety filters developed using Control Barrier Functions (CBFs) when a safe set is defined using the pointwise minimum of continuously differentiable functions, a construction that is common for the backup CBF (BCBF) method and typically nonsmooth. We replace the minimum by its log-sum-exp (soft-min) smoothing and show that, under a strict safety condition, the smooth function becomes a CBF (or extended CBF) for a range of the smoothing parameter. For compact safe sets, we derive an explicit lower bound on the smoothing parameter that makes the smooth function a CBF and hence renders the corresponding safety constraint feasible. For unbounded sets, we introduce tail conditions under which the smooth function satisfies an extended CBF condition uniformly. Finally, we apply these results to BCBFs. We show that safety of a compact (terminal) backup set under a backup controller, together with a condition ensuring safety of the backup trajectories on the relevant boundary of the safe set, is sufficient for constraint feasibility for BCBFs. These results provide a recipe for a priori feasibility guarantees for smooth inner approximations of nonsmooth safe sets without the need for additional online certification.
This paper develops a Finsler-based LMI for robust $\mathcal{H}_\infty$ observer design with integral quadratic constraints (IQCs) and block-structured uncertainty. By introducing a slack variable that relaxes the coupling between the Lyapunov matrix, the observer gain, and the IQC multiplier, the formulation addresses two limitations of the standard block-diagonal approach: the LMI requirement $\mathrm{He}(PA) \prec 0$ (which fails for marginally stable dynamics), and a multiplier--Lyapunov trade-off that causes infeasibility for wide uncertainty ranges. For marginally stable dynamics, artificial damping in the design model balances certified versus actual performance. The framework is demonstrated on quaternion attitude estimation with angular velocity uncertainty and mass-spring-damper state estimation with uncertain physical parameters.