Direct-to-cell (D2C) satellite communications have emerged as a crucial alternative and complement to terrestrial communication networks for autonomous operations due to their wide-area coverage capability. Unlike human-oriented communications, robot-oriented D2C communications in autonomous operations place greater emphasis on task completion rather than solely on data transmission. Such differences require us to evaluate the performance of each stage in the system and consider the integrated optimization. Motivated by this, we model the system in a sensing-communication-computing-control (SC3) closed-loop manner and analyze it from an entropy-based perspective, based on which a task-oriented system design method is developed. Furthermore, to manage the complexity of the closed-loop system, we decompose it into fine-grained functional structures and investigate the key challenges associated with collaborative sensing, collaborative computing, and collaborative control. A case study is presented to compare the proposed task-oriented scheme with conventional communication-oriented schemes, showing that the proposed method performs better in terms of task controlling. Finally, we discuss several open research challenges that need to be addressed for practical implementation.
Radar signal recognition in open electromagnetic environments is challenging due to diverse operating modes and unseen radar types. Existing methods often overlook position relations in pulse sequences, limiting their ability to capture semantic dependencies over time. We propose RadarPos, a position-aware self-supervised framework that leverages pulse-level temporal dynamics without complex augmentations or masking, providing improved position relation modeling over contrastive learning or masked reconstruction. Using this framework, we evaluate cross-mode radar signal recognition under the long-tailed setting to assess adaptability and generalization. Experimental results demonstrate enhanced discriminability and robustness, highlighting practical applicability in real-world electromagnetic environments.
Surrogate modeling of wave propagation and scattering (i.e. the wave speed and source to wave field map) in heterogeneous media has significant potential in applications such as seismic imaging and inversion. High-contrast settings, such as subsurface models with salt bodies, exhibit strong scattering and phase sensitivity that challenge existing neural operators. We propose a hybrid architecture that decomposes the scattering operator into two separate contributions: a smooth background propagation and a high-contrast scattering correction. The smooth component is learned with a Fourier Neural Operator (FNO), which produces globally coupled feature tokens encoding background wave propagation; these tokens are then passed to a vision transformer, where attention is used to model the high-contrast scattering correction dominated by strong, spatial interactions. Evaluated on high-frequency Helmholtz problems with strong contrasts, the hybrid model achieves substantially improved phase and amplitude accuracy compared to standalone FNOs or transformers, with favorable accuracy-parameter scaling.
In recent years, artificial neural networks have been increasingly studied as feedback controllers for guidance problems. While effective in complex scenarios, they lack the verification guarantees found in classical guidance policies. Their black-box nature creates significant concerns regarding trustworthiness, limiting their adoption in safety-critical spaceflight applications. This work addresses this gap by developing a method to assess the safety of a trained neural network feedback controller via automatic domain splitting and polynomial bounding. The methodology involves embedding the trained neural network into the system's dynamical equations, rendering the closed-loop system autonomous. The system flow is then approximated by high-order Taylor polynomials, which are subsequently manipulated to construct polynomial maps that project state uncertainties onto an event manifold. Automatic domain splitting ensures the polynomials are accurate over their relevant subdomains, whilst also allowing an extensive state-space to be analysed efficiently. Utilising polynomial bounding techniques, the resulting event values may be rigorously constrained and analysed within individual subdomains, thereby establishing bounds on the range of possible closed-loop outcomes from using such neural network controllers and supporting safety assessment and informed operational decision-making in real-world missions.
Using Bayesian decision theory, we modify the perfect-information, differential game-based guidance law (DGL1) to address the inevitable estimation error occurring when driving this guidance law with a separately-designed state estimator. This yields a stochastic guidance law complying with the generalized separation theorem, as opposed to the common approach, that implicitly, but unjustifiably, assumes the validity of the regular separation theorem. The required posterior probability density function of the game's state is derived from the available noisy measurements using an interacting multiple model particle filter. When the resulting optimal decision turns out to be nonunique, this feature is harnessed to appropriately shape the trajectory of the pursuer so as to enhance its estimator's performance. In addition, certain properties of the particle-based computation of the Bayesian cost are exploited to render the algorithm amenable to real-time implementation. The performance of the entire estimation-decision-guidance scheme is demonstrated using an extensive Monte Carlo simulation study.
The Internet of Bio-Nano Things (IoBNT) requires mobile nanomachines that navigate complex fluids while exchanging molecular signals under external supervision. We introduce the chemo-hydrodynamic transceiver, a unified model for catalytic Janus particles in which an external optical control simultaneously drives molecular emission and active self-propulsion. Unlike common abstractions that decouple mobility and communication, we derive a stochastic channel model that captures their physicochemical coupling and shows that actuation-induced distance jitter can dominate the received-signal variance, yielding a fundamental trade-off: stronger actuation increases emission but can sharply reduce reliability through motion-induced fading. Numerical results reveal a unimodal reliability profile with a critical actuation level beyond which the signal-to-noise ratio collapses, and an optimal control level that scales approximately linearly with link distance. Compared with Brownian-mobility baselines, the model exposes a pronounced estimation gap: neglecting active motility noise can underestimate the bit error probability by orders of magnitude. These findings provide physical-layer guidelines for mobility-aware IoBNT protocol design and closed-loop control of nanorobotic swarms.
As the growing penetration of inverter-based resources (IBRs) in modern power systems, the system strength is decreasing. Due to the inherent difference in short-circuit capacity contributions of synchronous generators and inverters, the short-circuit ratio is not a one-size-fit-all metric to assess the system strength. Following the distinct dynamic behavior of the IBR in small- and large-signal disturbance, the system strength is separated accordingly. To address the large-signal system strength assessment, a control type-dependent metric, Power Margin Ratio (PMR), is proposed in this paper. PMR is defined as the ratio between the maximum power that can be injected to the system without causing any instability and the nominal power of the IBR. It can be obtained via power flow calculation with a modified algorithm. The theoretical foundation of PMR is established from the viewpoint of dynamical systems. PMR is identical to SCR for the single-plant-infinite-bus system, while presents advancement for multi-infeed power systems. Comprehensive case studies and discussions have validated that PMR reveals the large-signal system strength from a static perspective.
This study presents a novel algorithm for identifying ghost targets in automotive radar by estimating complex valued signal strength across a two-dimensional angle grid defined by direction-of-arrival (DOA) and direction-of-departure (DOD). In real-world driving environments, radar signals often undergo multipath propagation due to reflections from surfaces such as guardrails. These indirect paths can produce ghost targets - false detections that appear at incorrect locations - posing challenges to autonomous navigation. A recent method, the Multi-Path Iterative Adaptive Approach (MP-IAA), addresses this by jointly estimating the DOA/DOD angle grid, identifying mismatches as indicators of ghost targets. However, its effectiveness declines in low signal-to-noise ratio (SNR) settings. To enhance robustness, we introduce a physics-inspired regularizer that captures structural patterns inherent to multipath propagation. This regularizer is incorporated into the estimation cost, forming a new loss function that guides our proposed algorithm, TIGRE (Target-Induced angle-Grid Regularized Estimation). TIGRE iteratively minimizes this regularized loss and we show that our proposed regularizer asymptotically enforces L0 sparsity on the DOA/DOD grid. Numerical experiments demonstrate that the proposed method substantially enhances the quality of angle-grid estimation across various multipath scenarios, particularly in low SNR environments, providing a more reliable basis for subsequent ghost target identification.
Electromagnetic wave propagation through complex inhomogeneous walls introduces significant distortions to through-wall radar signatures. Estimation of wall thickness, dielectric, and conductivity profiles may enable wall effects to be deconvolved from target scattering. We propose to use deep neural networks (DNNs) to estimate wall characteristics from broadband scattered electric fields on the same side of the wall as the transmitter. We demonstrate that both single deep artificial and convolutional neural networks and dual networks involving generative adversarial networks are capable of performing the highly nonlinear regression operation of electromagnetic inverse scattering for wall characterization. These networks are trained with simulation data generated from full wave solvers and validated on both simulated and real wall data with approximately 95% accuracy.
Around-the-corner radar (ACR) sensing of targets in non-line-of-sight (NLOS) conditions has been explored for security and surveillance applications and look-ahead warning systems in automotive scenarios. Here, the targets are detected around corners without direct line-of-sight (LOS) propagation by exploiting multipath bounces from the walls. However, the overall detection metrics are weak due to the low strength of the multipath signals. Our study presents the application of reconfigurable intelligent surface (RIS) to improve radar sensing in ACR scenarios by directing incident beams on the RIS into NLOS regions. Experimental results at 5.5 GHz demonstrate that micro-Doppler signatures of the walking motion of humans can now be captured in NLOS conditions through the strategic deployment of RIS.
Around-the-corner radar sensing offers an opportunity for the radar to exploit multipath scattering along walls to detect targets beyond blockages. However, the radar detection performance is limited to spotting uncooperative targets at specular angles. Recently, reconfigurable intelligent surfaces (RIS) involving metasurfaces with tunable unit cells have been researched for enhancing radar coverage around corners by directing beams towards non-specular angles. This article examines how practical considerations regarding the phase tuning of unit cells impact the RIS performance. Specifically, we examine the radar cross-section (RCS) obtained from two RIS configurations: In the first, each atom of the RIS is tuned based on a theoretical analog phase shift to realize idealized one-beam patterns at the desired angles. In the second configuration, each atom of the RIS is tuned based on a low-complexity, one-bit quantized element phase shift, which results in dual symmetric beams. The RIS configurations are then benchmarked with a metal plate of similar dimensions in both simulations and measurements.
Although lip-to-speech synthesis (L2S) has achieved significant progress in recent years, current state-of-the-art methods typically rely on intermediate representations such as mel-spectrograms or discrete self-supervised learning (SSL) tokens. The potential of latent diffusion models (LDMs) in this task remains largely unexplored. In this paper, we introduce SLD-L2S, a novel L2S framework built upon a hierarchical subspace latent diffusion model. Our method aims to directly map visual lip movements to the continuous latent space of a pre-trained neural audio codec, thereby avoiding the information loss inherent in traditional intermediate representations. The core of our method is a hierarchical architecture that processes visual representations through multiple parallel subspaces, initiated by a subspace decomposition module. To efficiently enhance interactions within and between these subspaces, we design the diffusion convolution block (DiCB) as our network backbone. Furthermore, we employ a reparameterized flow matching technique to directly generate the target latent vectors. This enables a principled inclusion of speech language model (SLM) and semantic losses during training, moving beyond conventional flow matching objectives and improving synthesized speech quality. Our experiments show that SLD-L2S achieves state-of-the-art generation quality on multiple benchmark datasets, surpassing existing methods in both objective and subjective evaluations.
This work investigates bidirectional Mamba (BiMamba) for unified streaming and non-streaming automatic speech recognition (ASR). Dynamic chunk size training enables a single model for offline decoding and streaming decoding with various latency settings. In contrast, existing BiMamba based streaming method is limited to fixed chunk size decoding. When dynamic chunk size training is applied, training overhead increases substantially. To tackle this issue, we propose the Trans-Chunk BiMamba (TC-BiMamba) for dynamic chunk size training. Trans-Chunk mechanism trains both bidirectional sequences in an offline style with dynamic chunk size. On the one hand, compared to traditional chunk-wise processing, TC-BiMamba simultaneously achieves 1.3 times training speedup, reduces training memory by 50%, and improves model performance since it can capture bidirectional context. On the other hand, experimental results show that TC-BiMamba outperforms U2++ and matches LC-BiMmaba with smaller model size.
Existing H.265/HEVC video steganalysis research mainly focuses on statistical feature modeling at the levels of motion vectors (MV), intra prediction modes (IPM), or transform coefficients. In contrast, studies targeting the coding-structure level - especially the analysis of block-level steganographic behaviors in Coding Units (CUs) - remain at an early stage. As a core component of H.265/HEVC coding decisions, the CU partition structure often exhibits steganographic perturbations in the form of structural changes and reorganization of prediction relationships, which are difficult to characterize effectively using traditional pixel-domain features or mode statistics. To address this issue, this paper, for the first time from the perspective of CU block-level steganalysis, proposes an H.265/HEVC video steganalysis method based on CU block-structure gradients and intra prediction mode mapping. The proposed method constructs a CU block-structure gradient map to explicitly describe changes in coding-unit partitioning, and combines it with a block-level mapping representation of IPM to jointly model the structural perturbations introduced by CU-level steganographic embedding. On this basis, we design a Transformer network, GradIPMFormer, tailored for CU-block steganalysis, thereby effectively enhancing the capability to perceive CU-level steganographic behaviors. Experimental results show that under different quantization parameters and resolution settings, the proposed method consistently achieves superior detection performance across multiple H.265/HEVC steganographic algorithms, validating the feasibility and effectiveness of conducting video steganalysis from the coding-structure perspective. This study provides a new CU block-level analysis paradigm for H.265/HEVC video steganalysis and has significant research value for covert communication security detection.
This paper presents C-POD, a cloud-native framework that automates the deployment and management of edge pods for seamless remote access and sharing of wireless testbeds. C-POD leverages public cloud resources and edge pods to lower the barrier to over-the-air (OTA) experimentation, enabling researchers to share and access distributed testbeds without extensive local infrastructure. A supporting toolkit has been developed for C-POD to enable flexible and scalable experimental workflows, including containerized edge environments, persistent Secure Shell (SSH) tunnels, and stable graphical interfaces. We prototype and deploy C-POD on the Amazon Web Services (AWS) public cloud to demonstrate its key features, including cloud-assisted edge pod deployment automation, elastic computing resource management, and experiment observability, by integrating two wireless testbeds that focus on RF signal generation and 5G(B) communications, respectively.
This paper investigates beamforming-gain maximization for a fluid reconfigurable intelligent surface (FRIS)-assisted downlink system, where each active port applies a finite-resolution unit-modulus phase selected from a discrete codebook. The resulting design couples the multi-antenna base-station (BS) beamformer with combinatorial FRIS port selection and discrete phase assignment, leading to a highly nonconvex mixed discrete optimization. To address this challenge, we develop an alternating-optimization (AO) framework that alternates between a closed-form maximum-ratio-transmission (MRT) update at the BS and an {optimal} FRIS-configuration update. The key step of the proposed FRIS configuration is a Minkowski-geometry reformulation of the FRIS codebook superposition: by convexifying the feasible reflected-sum set and exploiting support-function identities, we convert the FRIS subproblem into a one-dimensional maximization over a directional parameter. For each direction, the optimal configuration is obtained constructively via per-port directional scoring, Top-$M_o$ port selection, and optimal codeword assignment. For the practically important regular $M_p$-gon phase-shifter codebook, we further derive closed-form score expressions and establish a piecewise-smooth structure of the resulting support function, which leads to a finite critical-angle search that provably identifies the global optimum without exhaustive angular sweeping. Simulation results demonstrate that the proposed framework consistently outperforms benchmarks, achieves near-optimal beamforming gains in exhaustive-search validations, accurately identifies the optimal direction via support-function maximization, and converges rapidly within a few AO iterations.
Accurate upsampling of Head-Related Transfer Functions (HRTFs) from sparse measurements is crucial for personalized spatial audio rendering. Traditional interpolation methods, such as kernel-based weighting or basis function expansions, rely on measurements from a single subject and are limited by the spatial sampling theorem, resulting in significant performance degradation under sparse sampling. Recent learning-based methods alleviate this limitation by leveraging cross-subject information, yet most existing neural architectures primarily focus on modeling spatial relationships across directions, while spectral dependencies along the frequency dimension are often modeled implicitly or treated independently. However, HRTF magnitude responses exhibit strong local continuity and long-range structure in the frequency domain, which are not fully exploited. This work investigates frequency-domain feature modeling by examining how different architectural choices, ranging from per-frequency multilayer perceptrons to convolutional, dilated convolutional, and attention-based models, affect performance under varying sparsity levels, showing that explicit spectral modeling consistently improves reconstruction accuracy, particularly under severe sparsity. Motivated by this observation, a frequency-domain Conformer-based architecture is adopted to jointly capture local spectral continuity and long-range frequency correlations. Experimental results on the SONICOM and HUTUBS datasets demonstrate that the proposed method achieves state-of-the-art performance in terms of interaural level difference and log-spectral distortion.
Portable backscatter imaging systems (PBI) integrate an X-ray source and detector in a single unit, utilizing Compton scattering photons to rapidly acquire superficial or shallow structural information of an inspected object through single-sided imaging. The application of this technology overcomes the limitations of traditional transmission X-ray detection, offering greater flexibility and portability, making it the preferred tool for the rapid and accurate identification of potential threats in scenarios such as borders, ports, and industrial nondestructive security inspections. However, the image quality is significantly compromised due to the limited number of Compton backscattered photons. The insufficient photon counts result primarily from photon absorption in materials, the pencil-beam scanning design, and short signal sampling times. It therefore yields severe image noise and an extremely low signal-to-noise ratio, greatly reducing the accuracy and reliability of PBI systems. To address these challenges, this paper introduces BSoNet, a novel deep learning-based approach specifically designed to optimize the image quality of PBI systems. The approach significantly enhances image clarity, recognition, and contrast while meeting practical application requirements. It transforms PBI systems into more effective and reliable inspection tools, contributing significantly to strengthening security protection.
Ill-posed imaging inverse problems remain challenging due to the ambiguity in mapping degraded observations to clean images. Diffusion-based generative priors have recently shown promise, but typically rely on computationally intensive iterative sampling or per-instance optimization. Amortized variational inference frameworks address this inefficiency by learning a direct mapping from measurements to posteriors, enabling fast posterior sampling without requiring the optimization of a new posterior for every new set of measurements. However, they still struggle to reconstruct fine details and complex textures. To address this, we extend the amortized framework by injecting spatially adaptive perturbations to measurements during training, guided by uncertainty estimates, to emphasize learning in the most uncertain regions. Experiments on deblurring and super-resolution demonstrate that our method achieves superior or competitive performance to previous diffusion-based approaches, delivering more realistic reconstructions without the computational cost of iterative refinement.
The pinching-antenna system (PASS) enables wireless channel reconfiguration through optimized placement of pinching antennas along dielectric waveguides. In this article, a unified analytical framework is proposed to characterize the maintainability of PASS. Within this framework, random waveguide failures and repairs are modeled by treating the waveguide lifetime and repair time as exponentially distributed random variables, which are characterized by the failure rate and the repair rate, respectively. The operational state of the waveguide is described by a two-state continuous-time Markov chain, for which the transition probabilities and steady-state probabilities of the waveguide being working or failed are analyzed. By incorporating the randomness of the waveguide operational state into the transmission rate, system maintainability is characterized using the probability of non-zero rate (PNR) and outage probability (OP). The proposed framework is applied to both a conventional PASS employing a single long waveguide and a segmented waveguide-enabled pinching-antenna system (SWAN) composed of multiple short waveguide segments under two operational protocols: segment switching (SS) and segment aggregation (SA). Closed-form expressions for the PNR and OP are derived for both architectures, and the corresponding scaling laws are analyzed with respect to the service-region size and the number of segments. It is proven that both SS-based and SA-based SWAN achieve higher PNR and lower OP than conventional PASS, which confirms the maintainability advantage of segmentation. Numerical results demonstrate that: i) the maintainability gain of SWAN over conventional PASS increases with the number of segments, and ii) SA provides stronger maintainability than SS.
This paper investigates the non-trivial consensus problem on directed signed matrix-weighted networks\textemdash a novel convergence state that has remained largely unexplored despite prior studies on bipartite consensus and trivial consensus. Notably, we first prove that for directed signed matrix-weighted networks, every eigenvalue of the grounded Laplacians has positive real part under certain conditions. This key finding ensures the global asymptotic convergence of systems states to the null spaces of signed matrix-weighted Laplacians, providing a foundational tool for analyzing dynamics on rooted signed matrix-weighted networks. To achieve non-trivial consensus, we propose a systematic approach involving the strategic selection of informed agents, careful design of external signals, and precise determination of coupling terms. Crucially, we derive the lower bounds of the coupling coefficients. Our consensus algorithm operates under milder connectivity conditions, and does not impose restrictions on whether the network is structurally balanced or unbalanced. Moreover, the non-trivial consensus state can be preset arbitrarily as needed. We also carry out the above analysis for undirected networks, with more relaxed conditions on the coupling coefficients comparing to the directed case. This paper further studies non-trivial consensus with switching topologies, and propose the necessary condition for the convergence of switching networks. The work in this paper demonstrates that groups with both cooperative and antagonistic multi-dimensional interactions can achieve consensus, which was previously deemed exclusive to fully cooperative groups.
While machine learning (ML)-based receiver algorithms have received a great deal of attention in the recent literature, they often suffer from poor scaling with increasing spatial multiplexing order and lack of explainability and generalization. This paper presents EqDeepRx, a practical deep-learning-aided multiple-input multiple-output (MIMO) receiver, which is built by augmenting linear receiver processing with carefully engineered ML blocks. At the core of the receiver model is a shared-weight DetectorNN that operates independently on each spatial stream or layer, enabling near-linear complexity scaling with respect to multiplexing order. To ensure better explainability and generalization, EqDeepRx retains conventional channel estimation and augments it with a lightweight DenoiseNN that learns frequency-domain smoothing. To reduce the dimensionality of the DetectorNN inputs, the receiver utilizes two linear equalizers in parallel: a linear minimum mean-square error (LMMSE) equalizer with interference-plus-noise covariance estimation and a regularized zero-forcing (RZF) equalizer. The parallel equalized streams are jointly consumed by the DetectorNN, after which a compact DemapperNN produces bit log-likelihood ratios for channel decoding. 5G/6G-compliant end-to-end simulations across multiple channel scenarios, pilot patterns, and inter-cell interference conditions show improved error rate and spectral efficiency over a conventional baseline, while maintaining low-complexity inference and support for different MIMO configurations without retraining.
Power system security assessments, e.g. via cascading outage models, often use operational set-points based on optimal power flow (OPF) dispatch. However, driven by cost minimization, OPF provides an ideal, albeit unrealistic, clearing of the generating units, disregarding the complex interactions among market participants. The security of the system, therefore, may be overestimated. To address this gap, we introduce a market model with a social-welfare-based day-ahead market clearing mechanism. The security implications are analyzed via Cascades, a cascading outage analysis framework. We apply this framework to the IEEE-118 bus system with three independent control zones. The results show that market dispatch leads to an increase in demand not served of up to 80% higher than OPF, highlighting a security overestimation. Operators can use this information to properly allocate reserves and perform efficient expansion planning strategies.
Background and Objective: We propose a shape reconstruction framework to generate time-resolved, patient-specific 3D aortic geometries from a limited number of standard cine 2D magnetic resonance imaging (MRI) acquisitions. A statistical shape model of the aorta is coupled with differentiable volumetric mesh optimization to obtain personalized aortic meshes. Methods: The statistical shape model was constructed from retrospective data and optimized 2D slice placements along the aortic arch were identified. Cine 2D MRI slices were then acquired in 30 subjects (19 volunteers, 11 aortic stenosis patients). After manual segmentation, time-resolved aortic models were generated via differentiable volumetric mesh optimization to derive vessel shape features, centerline parameters, and radial wall strain. In 10 subjects, additional 4D flow MRI was acquired to compare peak-systolic shapes. Results: Anatomically accurate aortic geometries were obtained from as few as six cine 2D MRI slices, achieving a mean +/- standard deviation Dice score of (89.9 +/- 1.6) %, Intersection over Union of (81.7 +/- 2.7) %, Hausdorff distance of (7.3 +/- 3.3) mm, and Chamfer distance of (3.7 +/- 0.6) mm relative to 4D flow MRI. The mean absolute radius error was (0.8 +/- 0.6) mm. Significant age-related differences were observed for all shape features, including radial strain, which decreased progressively ((11.00 +/- 3.11) x 10-2 vs. (3.74 +/- 1.25) x 10-2 vs. (2.89 +/- 0.87) x 10-2 for young, mid-age, and elderly groups). Conclusion: The proposed method enables efficient extraction of time-resolved 3D aortic meshes from limited sets of standard cine 2D MRI acquisitions, suitable for computational shape and strain analysis.
We study the spectral efficiency (SE) of a ratesplitting multiple access (RSMA) enabled multi-clustered cell-free (CF) massive multiple-input multiple-output (mMIMO) system. The access points (APs) in each cluster serve mobile user equipments (UEs) by employing RSMA. The UEs employ successive interference cancellation to decode their data. This work emphasizes the role of downlink (DL) pilots in realizing RSMA benefits in practical CF systems with spatially-correlated Rician channels which observe random phase shifts, pilot contamination, and channel aging due to UE mobility. We numerically show that DL pilots are required for RSMA in user-centric CF mMIMO systems with channel aging to outperform spatial division multiple access. We show that the degraded channel quality due to higher UE velocity and longer resource block lengths significantly reduces the RSMA SE. Increasing the number of clusters can compensate for the SE loss.
This paper investigates gradient-based adaptive prediction and control for nonlinear stochastic dynamical systems under a weak convexity condition on the prediction-based loss. This condition accommodates a broad range of nonlinear models in control and machine learning such as saturation functions, sigmoid, ReLU and tanh activation functions, and standard classification models. Without requiring any persistent excitation of the data, we establish global convergence of the proposed adaptive predictor and derive explicit rates for its asymptotic performance. Furthermore, under a classical nonlinear minimum-phase condition and with a linear growth bound on the nonlinearities, we establish the convergence rate of the resulting closed-loop control error. Finally, we demonstrate the effectiveness of the proposed adaptive prediction algorithm on a real-world judicial sentencing dataset. The adaptive control performance will also be evaluated via a numerical simulation.
No-reference video quality assessment (NR-VQA) for gaming videos is challenging due to limited human-rated datasets and unique content characteristics including fast motion, stylized graphics, and compression artifacts. We present MTL-VQA, a multi-task learning framework that uses full-reference metrics as supervisory signals to learn perceptually meaningful features without human labels for pretraining. By jointly optimizing multiple full-reference (FR) objectives with adaptive task weighting, our approach learns shared representations that transfer effectively to NR-VQA. Experiments on gaming video datasets show MTL-VQA achieves performance competitive with state-of-the-art NR-VQA methods across both MOS-supervised and label-efficient/self-supervised settings.
Many works have investigated radio map and path loss prediction in wireless networks using deep learning, in particular using convolutional neural networks. However, most assume perfect environment information, which is unrealistic in practice due to sensor limitations, mapping errors, and temporal changes. We demonstrate that convolutional neural networks trained with task-specific perturbations of geometry, materials, and Tx positions can implicitly compensate for prediction errors caused by inaccurate environment inputs. When tested with noisy inputs on synthetic indoor scenarios, models trained with perturbed environment data reduce error by up to 25\% compared to models trained on clean data. We verify our approach on real-world measurements, achieving 2.1 dB RMSE with noisy input data and 1.3 dB with complete information, compared to 2.3-3.1 dB for classical methods such as ray-tracing and radial basis function interpolation. Additionally, we compare different ways of encoding environment information at varying levels of detail and we find that, in the considered single-room indoor scenarios, binary occupancy encoding performs at least as well as detailed material property information, simplifying practical deployment.
While no-reference point cloud quality assessment (NR-PCQA) approaches have achieved significant progress over the past decade, their performance often degrades substantially when a distribution gap exists between the training (source domain) and testing (target domain) data. However, to date, limited attention has been paid to transferring NR-PCQA models across domains. To address this challenge, we propose the first unsupervised progressive domain adaptation (UPDA) framework for NR-PCQA, which introduces a two-stage coarse-to-fine alignment paradigm to address domain shifts. At the coarse-grained stage, a discrepancy-aware coarse-grained alignment method is designed to capture relative quality relationships between cross-domain samples through a novel quality-discrepancy-aware hybrid loss, circumventing the challenges of direct absolute feature alignment. At the fine-grained stage, a perception fusion fine-grained alignment approach with symmetric feature fusion is developed to identify domain-invariant features, while a conditional discriminator selectively enhances the transfer of quality-relevant features. Extensive experiments demonstrate that the proposed UPDA effectively enhances the performance of NR-PCQA methods in cross-domain scenarios, validating its practical applicability. The code is available at this https URL.
This paper presents adaptive behavioral predictive control (ABPC), an indirect adaptive predictive control framework operating on streaming data. An LPV--ARX predictor is identified online via kernel--recursive least squares and used to compute closed-form predictive control sequences over a finite horizon, avoiding batch Hankel constructions and iterative optimization. Nonlinear kernel dictionaries extend model expressiveness within a behavioral formulation. Numerical studies on Hammerstein and NARX systems demonstrate effective performance when the dictionary aligns with the plant class and highlight conditioning and feature-selection effects. The paper emphasizes numerical simulation, computational feasibility, and reproducibility.
Stacked intelligent metasurfaces (SIMs) have recently emerged as a promising metasurface-based physical-layer paradigm for wireless communications, enabling wave-domain signal processing through multiple cascaded metasurface layers. However, conventional SIM designs rely on rigid planar layers with fixed interlayer spacing, which constrain the propagation geometry and can lead to performance saturation as the number of layers increases. This paper investigates the potential of introducing structural flexibility into SIM-enabled communication systems. Specifically, we consider two flexible SIM architectures: distance-adaptive SIM (DSIM), where interlayer distances are optimized, and stacked flexible intelligent metasurface (SFIM), where each metasurface layer is fully morphable. We jointly design the meta-atom positions and responses together with the transmit beamformer to maximize the system sum rate under per-user rate, quantization, morphing, and interlayer distance constraints. An alternating optimization framework combining gradient projection, penalty-based method, and successive convex approximation is developed to address the resulting non-convex problems. Perturbation analysis reveals that the flexibility gains of both DSIM and SFIM scale approximately linearly with the morphing range, with SFIM exhibiting a faster growth rate. Simulation results demonstrate that flexible SIM designs mitigate performance saturation with increasing layers and achieve significant transmit power savings compared to rigid SIMs.
Large MIMO systems rely on efficient downlink precoding to enhance data rates and improve connectivity through spatial multiplexing. However, currently employed linear precoding techniques, such as MMSE, significantly limit the achievable spectral efficiency. To meet practical error-rate targets, existing linear methods require an excessively high number of access point antennas relative to the number of supported users, leading to disproportionate increases in power this http URL non-linear processing frameworks for uplink MIMO transmissions, such as NL-COMM, have been proposed. However, downlink non-linear precoding methods, such as Vector Perturbation (VP), remain impractical for real-world deployment due to their exponentially increasing computational complexity with the number of supported streams. This work presents ViPer NL-COMM, the first practical algorithmic and implementation framework for VP-based precoding. ViPer NL-COMM extends the core principles of NL-COMM to the precoding problem, enabling scalable parallelization and real-time computational performance while maintaining the substantial spectral-efficiency benefits of VP precoding. ViPer NL-COMM consists of a novel mathematical framework and an FPGA prototype capable of supporting large MIMO configurations (up to 16x16), high-order modulation (256-QAM), and wide bandwidths (100 MHz) within practical power and resource budgets. System-level evaluations demonstrate that ViPer NL-COMM achieves target error rates using only half the number of transmit antennas required by linear precoding, yielding net power savings on the order of hundreds of Watts at the RF front end. Moreover, ViPer NL-COMM enables supporting more information streams than available AP antennas when the streams are of low-rate, paving the way for enhanced massive-connectivity scenarios in next-generation wireless networks.
This paper presents a unified tutorial treatment of continuous-time and discrete-time linear time-invariant systems, emphasizing their shared dynamical structure and the physical constraints that differentiate their realizations. Rather than introducing new mathematical tools, the manuscript revisits foundational concepts-transfer functions, poles and zeros, impulse responses, and stability-from an operational perspective rooted in practical signal processing and circuit implementation. First-order systems are used as a minimal yet expressive framework to illustrate how integration, differentiation, filtering, and delay manifest across the Laplace and Z domains. Particular attention is given to causality, bandwidth limitations, sampling effects, and the approximation errors inherent in discrete-time representations. The goal is to bridge the gap between formal mathematical descriptions and the intuition required for reliable system design in mixed analog-digital environments.
The widely accepted definition of grid-forming (GFM) inverter states that it should behave as a (nearly) constant voltage source behind an impedance by maintaining a (nearly) constant internal voltage phasor in the sub-transient to transient time frame. Some system operators further mandate permissible ranges for this effective impedance. However, these specifications do not clearly define the location of the internal voltage source, and no systematic method exists to quantify its effective impedance for a black-box GFM model. To address this, we first compare the transient responses of an ideal voltage source and a GFM to show that an idealistic GFM maintains a (nearly) constant voltage across the filter capacitor, rather than at the inverter switches. Then we propose a systematic method to quantify the effective impedance of a GFM from its black-box model using frequency-domain admittance plots. Using standard PSCAD GFM models developed by NREL, we demonstrate that the GFM's equivalent impedance model captures the sub-transient response and static voltage stability limit reasonably accurately.
Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as "state drift" and ultimately compromises the full trajectory prediction. Drawing inspiration from classical control theory, we propose to correct for errors by formulating such sequential prediction as a recursive Bayesian state estimation problem. In this paper, we design NeuroKalman, a novel framework that decouples navigation into two complementary processes: a Prior Prediction, based on motion dynamics and a Likelihood Correction, from historical observation. We first mathematically associate Kernel Density Estimation of the measurement likelihood with the attention-based retrieval mechanism, which then allows the system to rectify the latent representation using retrieved historical anchors without gradient updates. Comprehensive experiments on TravelUAV benchmark demonstrate that, with only 10% of the training data fine-tuning, our method clearly outperforms strong baselines and regulates drift accumulation.
Ambient intelligence, continuously understanding human presence, activity, and physiology in physical spaces, is fundamental to smart environments, health monitoring, and human-computer interaction. WiFi infrastructure provides a ubiquitous, always-on, privacy-preserving substrate for this capability across billions of IoT devices. Yet this potential remains largely untapped, as wireless sensing has typically relied on task-specific models that require substantial labeled data and limit practical deployment. We present AM-FM, the first foundation model for ambient intelligence and sensing through WiFi. AM-FM is pre-trained on 9.2 million unlabeled Channel State Information (CSI) samples collected over 439 days from 20 commercial device types deployed worldwide, learning general-purpose representations via contrastive learning, masked reconstruction, and physics-informed objectives tailored to wireless signals. Evaluated on public benchmarks spanning nine downstream tasks, AM-FM shows strong cross-task performance with improved data efficiency, demonstrating that foundation models can enable scalable ambient intelligence using existing wireless infrastructure.
In this paper, we propose a distributed flexible coupler (FC) array to enhance communication performance with low hardware cost. At each FC antenna, there is one fixed-position active antenna and multiple passive couplers that can move within a designated region around the active antenna. Moreover, each FC antenna is equipped with a local processing unit (LPU). All LPUs exchange signals with a central processing unit (CPU) for joint signal processing. We study an FC-aided multiuser multiple-input multiple-output (MIMO) system, where an FC array base station (BS) is deployed to enhance the downlink communication between the BS and multiple single-antenna users. We formulate optimization problems to maximize the achievable sum rate of users by jointly optimizing the coupler positions and digital beamforming, subject to movement constraints on the coupler positions and the transmit power constraint. To address the resulting nonconvex optimization problem, the digital beamforming is expressed as a function of the FC position vectors, which are then optimized using the proposed distributed coupler position optimization algorithm. Considering a structured time domain pattern of pilots and coupler positions, pilot-assisted centralized and distributed channel estimation algorithms are designed under the FC array architecture. Simulation results demonstrate that the distributed FC array achieves substantial rate gains over conventional benchmarks in multiuser systems without moving active antennas, and approaches the performance of fully active arrays while significantly reducing hardware cost and power consumption. Moreover, the proposed channel estimation algorithms outperform the benchmark schemes in terms of both pilot overhead and channel reconstruction accuracy.
While Mamba2's expanded state dimension enhances temporal modeling, it incurs substantial inference overhead that saturates bandwidth during autoregressive generation. Standard pruning methods fail to address this bottleneck: unstructured sparsity leaves activations dense, magnitude-based selection ignores runtime dynamics, and gradient-based methods impose prohibitive costs. We introduce GHOST (Grouped Hidden-state Output-aware Selection and Truncation), a structured pruning framework that approximates control-theoretic balanced truncation using only forward-pass statistics. By jointly measuring controllability and observability, GHOST rivals the fidelity of gradient-based methods without requiring backpropagation. As a highlight, on models ranging from 130M to 2.7B parameters, our approach achieves a 50\% state-dimension reduction with approximately 1 perplexity point increase on WikiText-2. Code is available at this https URL.
This paper presents an ML-driven framework for automated RF physical synthesis that transforms circuit netlists into manufacturable GDSII layouts. While recent ML approaches demonstrate success in topology selection and parameter optimization, they fail to produce manufacturable layouts due to oversimplified component models and lack of routing capabilities. Our framework addresses these limitations through three key innovations: (1) a neural network framework trained on 18,210 inductor geometries with frequency sweeps from 1-100 GHz, generating 7.5 million training samples, that predicts inductor Q-factor with less than 2% error and enables fast gradient-based layout optimization with a 93.77% success rate in producing high-Q layouts; (2) an intelligent P-Cell optimizer that reduces layout area while maintaining design-rule-check (DRC) compliance; and (3) a complete placement and routing engine with frequency-dependent EM spacing rules and DRC-aware synthesis. The neural inductor model demonstrates superior accuracy across 1-100 GHz, enabling EM-accurate component synthesis with real-time inference. The framework successfully generates DRC-aware GDSII layouts for RF circuits, representing a significant step toward automated RF physical design.
Mental events are considered to supervene on physical events. A supervenient event does not change without a corresponding change in the underlying subvenient physical events. Since wholes and their parts exhibit the same supervenience-subvenience relations, inter-level causation has been expected to serve as a model for mental causation. We proposed an inter-level causation mechanism to construct a model of consciousness and an agent's self-determination. However, a significant gap exists between this mechanism and cognitive functions. Here, we demonstrate how to integrate the inter-level causation mechanism with the widely known dual-process theories. We assume that the supervenience level is composed of multiple supervenient functions (i.e., neural networks), and we argue that inter-level causation can be achieved by controlling the feedback error defined through changing algebraic expressions combining these functions. Using inter-level causation allows for a dual laws model in which each level possesses its own distinct dynamics. In this framework, the feedback error is determined independently by two processes: (1) the selection of equations combining supervenient functions, and (2) the negative feedback error reduction to satisfy the equations through adjustments of neurons and synapses. We interpret these two independent feedback controls as Type 1 and Type 2 processes in the dual process theories. As a result, theories of consciousness, agency, and dual process theory are unified into a single framework, and the characteristic features of Type 1 and Type 2 processes are naturally derived.
When audio and text conflict, speech-enabled language models follow the text 10 times more often than when arbitrating between two text sources, even when explicitly instructed to trust the audio. Using ALME, a benchmark of 57,602 controlled audio-text conflict stimuli across 8 languages, we find that Gemini 2.0 Flash exhibits 16.6\% text dominance under audio-text conflict versus 1.6\% under text-text conflict with identical reliability cues. This gap is not explained by audio quality: audio-only accuracy (97.2\%) exceeds cascade accuracy (93.9\%), indicating audio embeddings preserve more information than text transcripts. We propose that text dominance reflects an asymmetry not in information content but in arbitration accessibility: how easily the model can reason over competing representations. This framework explains otherwise puzzling findings. Forcing transcription before answering increases text dominance (19\% to 33\%), sacrificing audio's information advantage without improving accessibility. Framing text as ``deliberately corrupted'' reduces text dominance by 80\%. A fine-tuning ablation provides interventional evidence: training only the audio projection layer increases text dominance (+26.5\%), while LoRA on the language model halves it ($-$23.9\%), localizing text dominance to the LLM's reasoning rather than the audio encoder. Experiments across four state-of-the-art audio-LLMs and 8 languages show consistent trends with substantial cross-linguistic and cross-model variation, establishing modality arbitration as a distinct reliability dimension not captured by standard speech benchmarks.
Quantitative imaging is an important feature of spectral X-ray and CT systems, especially photon-counting CT (PCCT) imaging systems, which is achieved through material decomposition (MD) using spectral measurements. In this work, we present a novel framework that makes the PCCT imaging chain end-to-end differentiable (differentiable PCCT), with which we can leverage quantitative information in the image domain to enable cross-domain learning and optimization for upstream models. Specifically, the material decomposition from maximum-likelihood estimation (MLE) was made differentiable based on the Implicit Function Theorem and inserted as a layer into the imaging chain for end-to-end optimization. This framework allows for an automatic and adaptive solution of a wide range of imaging tasks, ultimately achieving quantitative imaging through computation rather than manual intervention. The end-to-end training mechanism effectively avoids the need for direct-domain training or supervision from intermediate references as models are trained using quantitative images. We demonstrate its applicability in two representative tasks: correcting detector energy bin drift and training an object scatter correction network using cross-domain reference from quantitative material images.
Dispersion-based photonics-assisted microwave measurement systems provide immense potential for real-time analysis of wideband and dynamic signals. However, they face two critical challenges: a difficulty in achieving high frequency resolution over a wideband analysis bandwidth, and a reliance on large-bandwidth-and-high-sampling-rate oscilloscopes to capture the resulting ultra-narrow pulses. We introduce a dynamic dispersion accumulation technique to overcome these limitations. By circulating the optical signal in fiber loops containing a dispersion-compensating fiber, we achieve a high accumulated dispersion of -215700 ps/nm. This high dispersion relaxes the required chirp rate of the chirped optical signal, enabling two distinct advantages: When the analysis bandwidth is fixed, a lower chirp rate enables a longer temporal period, yielding a record-high frequency resolution of 27.9 MHz; When the temporal period is fixed, a lower chirp rate enables a smaller bandwidth, generating a wider pulse and thus relaxing pulse sampling requirements at the expense of analysis bandwidth. This sacrifice in analysis bandwidth can be compensated by a duty-cycle-enabling technique, which holds the potential to extend the analysis bandwidth beyond 100 GHz. This work breaks the performance and hardware limitations in dispersion-based systems, paving the way for high frequency resolution, wideband microwave measurement systems that are both real-time and cost-effective.
Biological neural networks (BNNs) are increasingly explored for their rich dynamics, parallelism, and adaptive behavior. Beyond understanding their function as a scientific endeavour, a key focus has been using these biological systems as a novel computing substrate. However, BNNs can only function as reliable information-processing systems if inputs are delivered in a temporally and structurally consistent manner. In practice, this requires stimulation with precisely controlled structure, microsecond-scale timing, multi-channel synchronization, and the ability to observe and respond to neural activity in real-time. Existing approaches to interacting with BNNs face a fundamental trade-off: they either depend on low-level hardware mechanisms, imposing prohibitive complexity for rapid iteration, or they sacrifice temporal and structural control, undermining consistency and reproducibility - particularly in closed-loop experiments. The Cortical Labs Application Programming Interface (CL API) enables real-time, sub-millisecond closed-loop interactions with BNNs. Taking a contract-based API design approach, the CL API provides users with precise stimulation semantics, transactional admission, deterministic ordering, and explicit synchronization guarantees. This contract is presented through a declarative Python interface, enabling non-expert programmers to express complex stimulation and closed-loop behavior without managing low-level scheduling or hardware details. Ultimately, the CL API provides an accessible and reproducible foundation for real-time experimentation with BNNs, supporting both fundamental biological research and emerging neurocomputing applications.
Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants reduce computation but still depend on large-scale training. We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. Depth maps are generated with a pretrained estimator and fused mid-level with RGB features through a dedicated depth encoder. Trained on only 11.2k samples (less than 0.1\% of SA-1B), our method achieves higher accuracy than EfficientViT-SAM, showing that depth cues provide strong geometric priors for segmentation.
The Open Radio Access Network (O-RAN) offers flexibility and innovation but introduces unique security vulnerabilities, particularly from cryptographically relevant quantum computers. While Post-Quantum Cryptography (PQC) is the primary scalable defence, its computationally intensive handshakes create a significant bottleneck for the RAN control plane, posing sustainability challenges. This paper proposes an energy-aware framework to solve this PQC bottleneck, ensuring quantum resilience without sacrificing operational energy efficiency. The system employs an O-RAN aligned split: a Crypto Policy rApp residing in the Non-Real-Time (Non-RT) RIC defines the strategic security envelope (including PQC suites), while a Security Operations Scheduling (SOS) xApp in the Near-RT RIC converts these into tactical timing and placement intents. Cryptographic enforcement remains at standards-compliant endpoints: the Open Fronthaul utilizes Media Access Control Security (MACsec) at the O-DU/O-RU, while the xhaul (midhaul and backhaul) utilizes IP Security (IPsec) at tunnel terminators. The SOS xApp reduces PQC overhead by batching non-urgent handshakes, prioritizing session resumption, and selecting parameters that meet slice SLAs while minimizing joules per secure connection. We evaluate the architecture via a Discrete-Event Simulation (DES) using 3GPP-aligned traffic profiles and verified hardware benchmarks from literature. Results show that intelligent scheduling can reduce per-handshake energy by approximately 60 percent without violating slice latency targets.
Modeling vessel activity at sea is critical for a wide range of applications, including route planning, transportation logistics, maritime safety, and environmental monitoring. Over the past two decades, the Automatic Identification System (AIS) has enabled real-time monitoring of hundreds of thousands of vessels, generating huge amounts of data daily. One major challenge in using AIS data is the presence of large gaps in vessel trajectories, often caused by coverage limitations or intentional transmission interruptions. These gaps can significantly degrade data quality, resulting in inaccurate or incomplete analysis. State-of-the-art imputation approaches have mainly been devised to tackle gaps in vehicle trajectories, even when the underlying road network is not considered. But the motion patterns of sailing vessels differ substantially, e.g., smooth turns, maneuvering near ports, or navigating in adverse weather conditions. In this application paper, we propose HABIT, a lightweight, configurable H3 Aggregation-Based Imputation framework for vessel Trajectories. This data-driven framework provides a valuable means to impute missing trajectory segments by extracting, analyzing, and indexing motion patterns from historical AIS data. Our empirical study over AIS data across various timeframes, densities, and vessel types reveals that HABIT produces maritime trajectory imputations performing comparably to baseline methods in terms of accuracy, while performing better in terms of latency while accounting for vessel characteristics and their motion patterns.
The concept of metamerism originates from colorimetry, where it describes a sensation of visual similarity between two colored lights despite significant differences in spectral content. Likewise, we propose to call ``musical metamerism'' the sensation of auditory similarity which is elicited by two music fragments which differ in terms of underlying waveforms. In this technical report, we describe a method to generate musical metamers from any audio recording. Our method is based on joint time--frequency scattering in Kymatio, an open-source software in Python which enables GPU computing and automatic differentiation. The advantage of our method is that it does not require any manual preprocessing, such as transcription, beat tracking, or source separation. We provide a mathematical description of JTFS as well as some excerpts from the Kymatio source code. Lastly, we review the prior work on JTFS and draw connections with closely related algorithms, such as spectrotemporal receptive fields (STRF), modulation power spectra (MPS), and Gabor filterbank (GBFB).
We present a novel framework for robust out-of-distribution planning and control using conformal prediction (CP) and system level synthesis (SLS), addressing the challenge of ensuring safety and robustness when using learned dynamics models beyond the training data distribution. We first derive high-confidence model error bounds using weighted CP with a learned, state-control-dependent covariance model. These bounds are integrated into an SLS-based robust nonlinear model predictive control (MPC) formulation, which performs constraint tightening over the prediction horizon via volume-optimized forward reachable sets. We provide theoretical guarantees on coverage and robustness under distributional drift, and analyze the impact of data density and trajectory tube size on prediction coverage. Empirically, we demonstrate our method on nonlinear systems of increasing complexity, including a 4D car and a {12D} quadcopter, improving safety and robustness compared to fixed-bound and non-robust baselines, especially outside of the data distribution.
Tomographic Volumetric Additive Manufacturing(TVAM) is a novel manufacturing method that allows for the fast creation of objects of complex geometry in layerless fashion. The process is based on the solidification of photopolymer that occurs when a sufficient threshold dose of light-energy is absorbed. In order to create complex shapes, an illumination plan must be designed to force solidification in some desired areas while leaving other regions liquid. Determining an illumination plan can be considered as an optimisation problem where a variety of objective functionals (penalties) can be used. This work considers a selection of penalty functions and their impact on selected printing metrics; linking the shape of penalty functions to ranges of light-energy dose levels in in-part regions that should be printed and out-of-part regions that should remain liquid. Further, the threshold parameters that are typically used to demarcate minimum light-energy for in-part regions and maximum light-energy for out-of-part regions are investigated systematically as design parameters on both existing and new methods. This enables the characterisation of their effects on some selected printing metrics as well as informed selection for default values. This work is underpinned by a reproducible and extensible framework, TVAM Adaptive Illumination Design(TVAM AID), which makes use of the open-source Core Imaging Library(CIL) that is designed for tomographic imaging with an emphasis on reconstruction. The foundation of TVAM AID which is presented here can hence be easily enhanced by existing functionality in CIL thus lowering the barrier to entry and encouraging use of strategies that already exist for reconstruction optimisation.
We study the optimal transmission and scheduling policy for a transmitter (source) communicating with two gossiping receivers aiming at tracking the source's status over time using the age of information (AoI) metric. Gossiping enables local information exchange in a decentralized manner without relying solely on the transmitter's direct communication, which we assume incurs a transmission cost. On the other hand, gossiping may be communicating stale information, necessitating the transmitter's intervention. With communication links having specific success probabilities, we formulate an average-cost Markov Decision Process (MDP) to jointly minimize the sum AoI and transmission cost for such a system in a time-slotted setting. We employ the Relative Value Iteration (RVI) algorithm to evaluate the optimal policy for the transmitter and then prove several structural properties showing that it has an age-difference threshold structure with minimum age activation in the case where gossiping is relatively more reliable. Specifically, direct transmission is optimal only if the minimum AoI of the receivers is large enough and their age difference is below a certain threshold. Otherwise, the transmitter idles to effectively take advantage of gossiping and reduce direct transmission costs. Numerical evaluations demonstrate the significance of our optimal policy compared to multiple baselines. Our result is a first step towards characterizing optimal freshness and transmission cost trade-offs in gossiping networks.
The long-standing vision of general-purpose robots hinges on their ability to understand and act upon natural language instructions. Vision-Language-Action (VLA) models have made remarkable progress toward this goal, yet their generated actions can still misalign with the given instructions. In this paper, we investigate test-time verification as a means to shrink the "intention-action gap.'' We first characterize the test-time scaling law for embodied instruction following and demonstrate that jointly scaling the number of rephrased instructions and generated actions greatly increases test-time sample diversity, often recovering correct actions more efficiently than scaling each dimension independently. To capitalize on these scaling laws, we present CoVer, a contrastive verifier for vision-language-action alignment, and show that our architecture scales gracefully with additional computational resources and data. We then introduce "boot-time compute" and a hierarchical verification inference pipeline for VLAs. At deployment, our framework precomputes a diverse set of rephrased instructions from a Vision-Language-Model (VLM), repeatedly generates action candidates for each instruction, and then uses a verifier to select the optimal high-level prompt and low-level action chunks. Compared to scaling policy pre-training on the same data, our verification approach yields 22% gains in-distribution and 13% out-of-distribution on the SIMPLER benchmark, with a further 45% improvement in real-world experiments. On the PolaRiS benchmark, CoVer achieves 14% gains in task progress and 9% in success rate.
Bayes' rule has enabled innumerable powerful algorithms of statistical signal processing and statistical machine learning. However, when model misspecifications exist in prior and/or data distributions, the direct application of Bayes' rule is questionable. Philosophically, the key is to balance the relative importance between the prior information and the data evidence when calculating posterior distributions: If prior distributions are overly conservative (i.e., exceedingly spread), we upweight the prior belief; if prior distributions are overly aggressive (i.e., exceedingly concentrated), we downweight the prior belief. The same operation also applies to likelihood distributions, which are defined as normalized likelihoods if the normalization exists. This paper studies a generalized Bayes' rule, called uncertainty-aware (UA) Bayes' rule, to technically realize the above philosophy, thus combating model uncertainties in prior and/or data distributions. In particular, the advantage of the proposed UA Bayes' rule over the existing power posterior (i.e., $\alpha$-posterior) is investigated. Applications of the UA Bayes' rule on classification and estimation are discussed: Specifically, the UA naive Bayes classifier, the UA Kalman filter, the UA particle filter, and the UA interactive-multiple-model filter are suggested and experimentally validated.
Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node generation and the nnU-Net model for lymph node segmentation to improve the segmentation performance of abdominal lymph nodes through synthesizing a diversity of realistic abdominal lymph node data. We propose LN-DDPM, a conditional denoising diffusion probabilistic model (DDPM) for lymph node (LN) generation. LN-DDPM utilizes lymph node masks and anatomical structure masks as model conditions. These conditions work in two conditioning mechanisms: global structure conditioning and local detail conditioning, to distinguish between lymph nodes and their surroundings and better capture lymph node characteristics. The obtained paired abdominal lymph node images and masks are used for the downstream segmentation task. Experimental results on the abdominal lymph node datasets demonstrate that LN-DDPM outperforms other generative methods in the abdominal lymph node image synthesis and better assists the downstream abdominal lymph node segmentation task.
Unmanned Aerial Vehicles (UAVs) have significantly enhanced fog computing by acting as both flexible computation platforms and communication mobile relays. In this paper, we consider four important and interdependent modules: attitude control, trajectory planning, resource allocation, and task assignment, and propose a holistic framework that jointly optimizes the total latency and energy consumption for UAV-assisted fog computing in a three-dimensional spatial domain with varying terrain elevations and dynamic task generations. We first establish a fuzzy-enhanced adaptive reinforcement proportional-integral-derivative control model to control the attitude. Then, we propose an enhanced Ant Colony System (ACS) based algorithm, that includes a safety value and a decoupling mechanism to overcome the convergence issue in classical ACS, to compute the optimal UAV trajectory. Finally, we design an algorithm based on the Particle Swarm Optimization technique, to determine where each offloaded task should be executed. Under our proposed framework, the outcome of one module would affect the decision-making in another, providing a holistic perspective of the system and thus leading to improved solutions. We demonstrate by extensive simulation results that our proposed framework can significantly improve the overall performance, measured by latency and energy consumption, compared to existing mainstream approaches.
Decoding gait dynamics from EEG signals presents significant challenges due to the complex spatial dependencies of motor processes, the need for accurate temporal and spectral feature extraction, and the scarcity of high-quality gait EEG datasets. To address these issues, we propose EEG2GAIT, a novel hierarchical graph-based model that captures multi-level spatial embeddings of EEG channels using a Hierarchical Graph Convolutional Network (GCN) Pyramid. To further improve decoding performance, we introduce a Hybrid Temporal-Spectral Reward (HTSR) loss function, which integrates time-domain, frequency-domain, and reward-based loss components. In addition, we contribute a new Gait-EEG Dataset (GED), consisting of synchronized EEG and lower-limb joint angle data collected from 50 participants across two laboratory visits. Extensive experiments demonstrate that EEG2GAIT with HTSR achieves superior performance on the GED dataset, reaching a Pearson correlation coefficient (r) of 0.959, a coefficient of determination of 0.914, and a Mean Absolute Error (MAE) of 0.193. On the MoBI dataset, EEG2GAIT likewise consistently outperforms existing methods, achieving an r of 0.779, a coefficient of determination of 0.597, and an MAE of 4.384. Statistical analyses confirm that these improvements are significant compared to all prior models. Ablation studies further validate the contributions of the hierarchical GCN modules and the proposed HTSR loss, while saliency analysis highlights the involvement of motor-related brain regions in decoding tasks. Collectively, these findings underscore EEG2GAIT's potential for advancing brain-computer interface applications, particularly in lower-limb rehabilitation and assistive technologies.
In recent years, the evolution of modern power grids has been driven by the growing integration of remotely controlled grid assets. Although Distributed Energy Resources (DERs) and Inverter-Based Resources (IBRs) enhance operational efficiency, they also introduce cybersecurity risks. The remote accessibility of such critical grid components creates entry points for attacks that adversaries could exploit, posing threats to the stability of the system. To evaluate the resilience of energy systems under such threats, this study employs real-time simulation and a modified version of the IEEE 39-bus system that incorporates a Microgrid (MG) with solar-based IBR. The study assesses the impact of remote attacks impacting the MG stability under different levels of IBR penetration through hardware-in-the-loop (HIL) simulations. Namely, we analyze voltage, current, and frequency profiles before, during, and after cyberattack-induced disruptions. The results demonstrate that real-time HIL testing is a practical approach to uncover potential risks and develop robust mitigation strategies for resilient MG operations.
Complex mechanical systems such as vehicle powertrains are inherently subject to multiple nonlinearities and uncertainties arising from parametric variations. Modeling errors are therefore unavoidable, making the transfer of control systems from simulation to real-world systems a critical challenge. Traditional robust controls have limitations in handling certain types of nonlinearities and uncertainties, requiring a more practical approach capable of comprehensively compensating for these various constraints. This study proposes a new robust control approach using the framework of deep reinforcement learning (DRL). The key strategy lies in the synergy among domain randomization-based DRL, long short-term memory (LSTM)-based actor and critic networks, and model-based control (MBC). The problem setup is modeled via the latent Markov decision process (LMDP), a set of vanilla MDPs, for a controlled system subject to uncertainties and nonlinearities. In LMDP, the dynamics of an environment simulator is randomized during training to improve the robustness of the control system to real testing environments. The randomization increases training difficulties as well as conservativeness of the resultant control system; therefore, progress is assisted by concurrent use of a model-based controller based on a physics-based system model. Compared to traditional DRL-based controls, the proposed approach is smarter in that we can achieve a high level of generalization ability with a more compact neural network architecture and a smaller amount of training data. The controller is verified via practical application to active damping for a complex powertrain system with nonlinearities and parametric variations. Comparative tests demonstrate the high robustness of the proposed approach.
In model predictive control (MPC), the choice of cost-weighting matrices and designing the Hessian matrix directly affects the trade-off between rapid state regulation and minimizing the control effort. However, traditional MPC in quadratic programming relies on fixed design matrices across the entire horizon, which can lead to suboptimal performance. This study presents a Riccati equation-based method for adjusting the design matrix within the MPC framework, which enhances real-time performance. We employ a penalized least-squares (PLS) approach to derive a quadratic cost function for a discrete-time linear system over a finite prediction horizon. Using the method of weighting and enforcing the equality constraint by introducing a large penalty parameter, we solve the constrained optimization problem and generate control inputs for forward-shifted horizons. This process yields a recursive PLS-based Riccati equation that updates the design matrix as a regularization term in each shift, forming the foundation of the regularized MPC (Re-MPC) algorithm. To accomplish this, we provide a convergence and stability analysis of the developed algorithm. Numerical analysis demonstrates its superiority over traditional methods by allowing Riccati equation-based adjustments.
Accurate decoding of lower-limb motion from EEG signals is essential for advancing brain-computer interface (BCI) applications in movement intent recognition and control. This study presents NeuroDyGait, a two-stage, phase-aware EEG-to-gait decoding framework that explicitly models temporal continuity and domain relationships. To address challenges of causal, phase-consistent prediction and cross-subject variability, Stage I learns semantically aligned EEG-motion embeddings via relative contrastive learning with a cross-attention-based metric, while Stage II performs domain relation-aware decoding through dynamic fusion of session-specific heads. Comprehensive experiments on two benchmark datasets (GED and FMD) show substantial gains over baselines, including a recent 2025 model EEG2GAIT. The framework generalizes to unseen subjects and maintains inference latency below 5 ms per window, satisfying real-time BCI requirements. Visualization of learned attention and phase-specific cortical saliency maps further reveals interpretable neural correlates of gait phases. Future extensions will target rehabilitation populations and multimodal integration.
This paper revisits the classical formulation of the Z-transform and its relationship to the inverse Laplace transform (L-1), originally developed by Ragazzini in sampled-data theory. It identifies a longstanding mathematical oversight in standard derivations, which typically neglect the contribution from the infinite arc in the complex plane during inverse Laplace evaluation. This omission leads to inconsistencies, especially at discontinuities such as t = 0. By incorporating the full Bromwich contour, including all boundary contributions, we restore internal consistency between L-1 and the Z-transform, aligning the corrected L-1 with results from Discrete-Time Fourier Transform (DTFT) aliasing theory. Consequently, this necessitates a structural revision of the Z-transform, inverse Laplace transform, and the behavior of the Heaviside step function at discontinuities, providing a more accurate foundation for modeling and analysis of sampled-data systems.
Multi-agent reinforcement learning (MARL) optimizes strategic interactions in non-cooperative dynamic games, where agents have misaligned objectives. However, data-driven methods such as multi-agent policy gradients (MA-PG) often suffer from instability and limit-cycle behaviors. Prior stabilization techniques typically rely on entropy-based exploration, which slows learning and increases variance. We propose a model-based approach that incorporates approximate priors into the reward function as regularization. In linear quadratic (LQ) games, we prove that such priors stabilize policy gradients and guarantee local exponential convergence to an approximate Nash equilibrium. We then extend this idea to infinite-horizon nonlinear games by introducing Multi-agent Guided Policy Search (MA-GPS), which constructs short-horizon local LQ approximations from trajectories of current policies to guide training. Experiments on nonlinear vehicle platooning and a six-player strategic basketball formation show that MA-GPS achieves faster convergence and more stable learning than existing MARL methods.
In this paper, we study a simple linear model of the cochlea as a set of vibrating strings. We make hypothesis that the information sent to the auditory cortex is the energy stored in the strings and consider all oscillation modes of the strings. We show the emergence of the sub-harmonic series whose existence was hypothesized in the XVI century to explain the consonance of the minor chord. We additionally show how the nonlinearity of the energy can be used to study the emergence of the combination tone (Tartini's third sound) shedding new light on this long debated subject.
In this work, we propose an event-triggered moving horizon estimation (ET-MHE) scheme for the remote state estimation of general nonlinear systems. In the presented method, whenever an event is triggered, a single measurement is transmitted and the nonlinear MHE optimization problem is subsequently solved. If no event is triggered, the current state estimate is updated using an open-loop prediction based on the system dynamics. Moreover, we introduce a novel event-triggering rule under which we demonstrate robust global exponential stability of the ET-MHE scheme, assuming a suitable detectability condition is met. In addition, we show that with the adoption of a varying horizon length, a tighter bound on the estimation error can be achieved. Finally, we validate the effectiveness of the proposed method through two illustrative examples.
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
In this paper, we propose a sample-based moving horizon estimation (MHE) scheme for general nonlinear systems to estimate the current system state using irregularly and/or infrequently available measurements. The cost function of the MHE optimization problem is suitably designed to accommodate these irregular output sequences. We also establish that, under a suitable sample-based detectability condition known as sample-based incremental input/output-to-state stability (i-IOSS), the proposed sample-based MHE achieves robust global exponential stability (RGES). Additionally, for the case of linear systems, we draw connections between sample-based observability and sample-based i-IOSS. This demonstrates that previously established conditions for linear systems to be sample-based observable can be utilized to verify or design sampling strategies that satisfy the conditions to guarantee RGES of the sample-based MHE. Finally, the effectiveness of the proposed sample-based MHE is illustrated through a simulation example.
The transition towards clean energy and the introduction of Inverter-Based Resources (IBRs) are leading to the formation of Microgrids (MGs) and Networks of MGs (NMGs). MGs and NMGs can operate autonomously in islanded mode, which requires Grid-Forming (GFM) IBRs that can perform black start, synchronization, restoration and regulation. However, such IBRs can face synchronization instability issues, which might be worsened by inadequate secondary level frequency and voltage regulation. Accordingly, we propose an autonomous and distributed synchronization and restoration scheme using Distributed-Averaging Proportional-Integral (DAPI) control. To validate the proposed method, we model and simulate a high-fidelity islanded and modified IEEE 123 bus system, modeled as an NMG consisting of 7 MGs. The MATLAB/Simulink simulation results demonstrate an effective autonomous soft-start, synchronization, connection and regulation procedure using DAPI control and distributed breaker operation logic.
Tracking maneuvering targets requires estimators that are both responsive and robust. Interacting Multiple Model (IMM) filters are a standard tracking approach, but fusing models via Gaussian mixtures can lag during maneuvers. Recent winnertakes-all (WTA) approaches react quickly but may produce discontinuities. We propose SAFE-IMM, a lightweight IMM variant for tracking on mobile and resource-limited platforms with a safe covariance-aware gate that permits WTA only when the implied jump from the mixture to the winner is provably bounded. In simulations and on nuScenes front-radar data, SAFE-IMM achieves high accuracy at real-time rates, reducing ID switches while maintaining competitive performance. The method is simple to integrate, numerically stable, and clutter-robust, offering a practical balance between responsiveness and smoothness.
Box Decoding is a sort-free tree-search MIMO detector whose complexity is independent of the QAM order, achieved by searching a fixed candidate box around a zero-forcing (ZF) estimate. However, without pruning, the number of visited nodes grows exponentially with the MIMO dimension, limiting scalability. This work proposes two deterministic, low-complexity, sort-free pruning strategies to control node growth. By exploiting the geometric symmetry of the QAM grid and the relative displacement between the ZF estimate and nearby constellation points, the proposed methods eliminate unnecessary metric evaluations while preserving QAM-order independence. The resulting detector achieves substantial complexity reduction with negligible error-rate degradation and enables fully parallel, hardware-efficient implementations for large-scale MIMO and higher-order QAM systems.
Monitoring contact pressure in hospital beds is essential for preventing pressure ulcers and enabling real-time patient assessment. Current methods can predict pressure maps but often lack physical plausibility, limiting clinical reliability. This work proposes a framework that enhances plausibility via Informed Latent Space (ILS) and Weight Optimization Loss (WOL) with conditional generative modeling to produce high-fidelity, physically consistent pressure estimates. This study also applies diffusion based conditional Brownian Bridge Diffusion Model (BBDM) and proposes training strategy for its latent counterpart Latent Brownian Bridge Diffusion Model (LBBDM) tailored for pressure synthesis in lying postures. Experiment results shows proposed method improves physical plausibility and performance over baselines: BBDM with ILS delivers highly detailed maps at higher computational cost and large inference time, whereas LBBDM provides faster inference with competitive performance. Overall, the approach supports non-invasive, vision-based, real-time patient monitoring in clinical environments.
Magnetic Resonance Fingerprinting (MRF) enables fast quantitative imaging, yet reconstructing high-resolution 3D data remains computationally demanding. Non-Cartesian reconstructions require repeated non-uniform FFTs, and the commonly used Locally Low Rank (LLR) prior adds computational overhead and becomes insufficient at high accelerations. Learned 3D priors could address these limitations, but training them at scale is challenging due to memory and runtime demands. We propose SPUR-iG, a fully 3D deep unrolled subspace reconstruction framework that integrates efficient data consistency with a progressive training strategy. Data consistency leverages implicit GROG, which grids non-Cartesian data onto a Cartesian grid with an implicitly learned kernel, enabling FFT-based updates with minimal artifacts. Training proceeds in three stages: (1) pretraining a denoiser with extensive data augmentation, (2) greedy per-iteration unrolled training, and (3) final fine-tuning with gradient checkpointing. Together, these stages make large-scale 3D unrolled learning feasible within a reasonable compute budget. On a large in vivo dataset with retrospective undersampling, SPUR-iG improves subspace coefficient maps quality and quantitative accuracy at 1-mm isotropic resolution compared with LLR and a hybrid 2D/3D unrolled baseline. Whole-brain reconstructions complete in under 15-seconds, with up to $\times$111 speedup for 2-minute acquisitions. Notably, $T_1$ maps with our method from 30-second scans achieve accuracy on par with or exceeding LLR reconstructions from 2-minute scans. Overall, the framework improves both accuracy and speed in large-scale 3D MRF reconstruction, enabling efficient and reliable accelerated quantitative imaging.
Walking is a key movement of interest in biomechanics, yet gold-standard data collection methods are time- and cost-expensive. This paper presents a real-time, multimodal, high sample rate lower-limb motion capture framework, based on wireless wearable sensors and machine learning algorithms. Random Forests are used to estimate joint angles from IMU data, and ground reaction force (GRF) is predicted from instrumented insoles, while joint moments are predicted from angles and GRF using deep learning based on the ResNet-16 architecture. All three models achieve good accuracy compared to literature, and the predictions are logged at 1 kHz with a minimal delay of 23 ms for 20s worth of input data. The present work fully relies on wearable sensors, covers all five major lower limb joints, and provides multimodal comprehensive estimations of GRF, joint angles, and moments with minimal delay suitable for biofeedback applications.
This paper introduces a hybrid two-stage registration framework for reconstructing three-dimensional (3D) kidney anatomy from macroscopic slices, using CT-derived models as the geometric reference standard. The approach addresses the data-scarcity and high-distortion challenges typical of macroscopic imaging, where fully learning-based registration (e.g., VoxelMorph) often fails to generalize due to limited training diversity and large nonrigid deformations that exceed the capture range of unconstrained convolutional filters. In the proposed pipeline, the Optimal Cross-section Matching (OCM) algorithm first performs constrained global alignment: translation, rotation, and uniform scaling to establish anatomically consistent slice initialization. Next, a lightweight deep-learning refinement network, inspired by VoxelMorph, predicts residual local deformations between consecutive slices. The core novelty of this architecture lies in its hierarchical decomposition of the registration manifold. This hybrid OCM+DL design integrates explicit geometric priors with the flexible learning capacity of neural networks, ensuring stable optimization and plausible deformation fields even with few training examples. Experiments on an original dataset of 40 kidneys demonstrated better results compared to single-stage baselines. The pipeline maintains physical calibration via Hough-based grid detection and employs Bezier-based contour smoothing for robust meshing and volume estimation. Although validated on kidney data, the proposed framework generalizes to other soft-tissue organs reconstructed from optical or photographic cross-sections. By decoupling interpretable global optimization from data-efficient deep refinement, the method advances the precision, reproducibility, and anatomical realism of multimodal 3D reconstructions for surgical planning, morphological assessment, and medical education.
This paper proposes distributed omniscient observers for both heterogeneous and homogeneous linear multi-agent systems, such that each agent can correctly estimate the states of all agents. The observer design is based on local input-output information available to each agent, and knowledge of the global communication graph among agents is not necessarily required. The proposed observers can contribute to distributed Nash equilibrium seeking in multi-player games and the emergence of self-organized social behaviors in artificial swarms. Simulation results demonstrate that artificial swarms can emulate animal social behaviors, including sheepdog herding and honeybee dance-based navigation.
Active reconfigurable intelligent surfaces (RISs) can mitigate the double-fading loss of passive reflection in satellite downlinks. However, their gains are limited by random co-channel interference, gain-dependent amplifier noise, and regulatory emission constraints. This paper develops a stochastic reliability framework for active RIS-assisted satellite downlinks by modeling the desired and interfering channels, receiver noise, and RIS amplifier noise as random variables. The resulting instantaneous signal-to-interference-plus-noise ratio (SINR) model explicitly captures folded cascaded amplifier noise and reveals a finite high-gain SINR ceiling. To guarantee a target outage level, we formulate a chance-constrained max-SINR design that jointly optimizes the binary RIS configuration and a common amplification gain. The chance constraint is handled using a sample-average approximation (SAA) with a violation budget. The resulting feasibility problem is solved as a mixed-integer second-order cone program (MISOCP) within a bisection search over the SINR threshold. Practical implementation is enforced by restricting the gain to an admissible range determined by small-signal stability and effective isotropic radiated power (EIRP) limits. We also derive realization-wise SINR envelopes based on eigenvalue and l1-norm bounds, which provide interpretable performance limits and fast diagnostics. Monte Carlo results show that these envelopes tightly bound the simulated SINR, reproduce the predicted saturation behavior, and quantify performance degradation as interference increases. Overall, the paper provides a solver-ready, reliability-targeting design methodology whose achieved reliability is validated through out-of-sample Monte Carlo testing under realistic randomness and hardware constraints.
To ensure frequency security in power systems, both the rate of change of frequency (RoCoF) and the frequency nadir (FN) must be explicitly accounted for in real-time frequency-constrained optimal power flow (FCOPF). However, accurately modeling sys-tem frequency dynamics through analytical formulations is chal-lenging due to their inherent nonlinearity and complexity. To address this issue, deep neural networks (DNNs) are utilized to capture the nonlinear mapping between system operating condi-tions and key frequency performance metrics. In this paper, a DNN-based frequency prediction model is developed and trained using the high-fidelity time-domain simulation data generated in PSCAD/EMTDC. The trained DNN is subsequently transformed into an equivalent mixed-integer linear programming (MILP) form and embedded into the FCOPF problem as additional con-straints to explicitly enforce frequency security, leading to the proposed DNN-FCOPF formulation. For benchmarking, two alternative models are considered: a conventional optimal power flow without frequency constraints and a linearized FCOPF in-corporating system-level RoCoF and FN constraints. The effec-tiveness of the proposed method is demonstrated by comparing the solutions of these three models through extensive PSCAD/EMTDC time-domain simulations under various loading scenarios.
Magnetic resonance imaging (MRI) is a powerful medical imaging modality, but long acquisition times limit throughput, patient comfort, and clinical accessibility. Diffusion-based generative models serve as strong image priors for reducing scan-time with accelerated MRI reconstruction and offer robustness across variations in the acquisition model. However, most existing diffusion-based approaches do not exploit the unique ability in MRI to jointly design both the sampling pattern and the reconstruction method. While prior learning-based approaches have optimized sampling patterns for end-to-end unrolled networks, analogous methods for diffusion-based reconstruction have not been established due to the computational burden of posterior sampling. In this work, we propose a method to optimize k-space sampling patterns for accelerated multi-coil MRI reconstruction using diffusion models as priors. We introduce a training objective based on a single-step posterior mean estimate that avoids backpropagation through an expensive iterative reconstruction process. Then we present a greedy strategy for learning Cartesian sampling patterns that selects informative k-space locations using gradient information from a pre-trained diffusion model while enforcing spatial diversity among samples. Experimental results across multiple anatomies and acceleration factors demonstrate that diffusion models using the optimized sampling patterns achieve higher-quality reconstructions in comparison to using fixed and learned baseline patterns.
Mobile edge computing (MEC) provides low-latency offloading solutions for computationally intensive tasks, effectively improving the computing efficiency and battery life of mobile devices. However, for data-intensive tasks or scenarios with limited uplink bandwidth, network congestion might occur due to massive simultaneous offloading nodes, increasing transmission latency and affecting task performance. In this paper, we propose a semantic-aware multi-modal task offloading framework to address the challenges posed by limited uplink bandwidth. By introducing a semantic extraction factor, we balance the relationship among transmission latency, computation energy consumption, and task performance. To measure the offloading performance of multi-modal tasks, we design a unified and fair quality of experience (QoE) metric that includes execution latency, energy consumption, and task performance. Lastly, we formulate the optimization problem as a Markov decision process (MDP) and exploit the multi-agent proximal policy optimization (MAPPO) reinforcement learning algorithm to jointly optimize the semantic extraction factor, communication resources, and computing resources to maximize overall QoE. Experimental results show that the proposed method achieves a reduction in execution latency and energy consumption of 18.1% and 12.9%, respectively compared with the semantic-unaware approach. Moreover, the proposed approach can be easily extended to models with different user preferences.
We present a novel, model-free, and data-driven methodology for controlling complex dynamical systems into previously unseen target states, including those with significantly different and complex dynamics. Leveraging a parameter-aware realization of next-generation reservoir computing (NGRC), our approach accurately predicts system behavior in unobserved parameter regimes, enabling control over transitions to arbitrary target states utilizing a new prediction evaluation and selection scheme. Crucially, this includes states with dynamics that differ fundamentally from known regimes, such as shifts from periodic to intermittent or chaotic behavior. The method's parameter awareness facilitates non-stationary control with which control scenarios are generated and evaluated on the basis of predefined control objective. In addition to proving the method for transient-free control to extrapolated chaotic target states over transition times, we demonstrate the method's effectiveness on a nonlinear power system model. Our method successfully navigates transitions even in scenarios where system collapse is observed frequently, while ensuring fast transitions and avoiding prolonged transient behavior. By extending the applicability of machine learning-based control mechanisms to previously inaccessible target dynamics, the methodology opens the door to new control applications while maintaining exceptional efficiency.
Vascular remodelling is inherent to the pathogenesis of many diseases including cancer, neurodegeneration, fibrosis, hypertension, and diabetes. In this paper, a new susceptibility-contrast based MRI approach is established to analyse intravoxel vessel size distribution (VSD) enabling more comprehensive and quantitative assessment of vascular remodelling than existing clinical imaging modalities. We use segmented vascular structures from light-sheet fluorescence microscopy images of whole rodent brain to simulate gradient echo sampling of free induction decay and spin echo sequence (GESFIDE) and train a deep learning model to predict cerebral blood volume (CBV) and VSD from the simulated GESFIDE signal. The results from ex vivo experiments showed strong correlation (r=0.96) between the true and predicted CBV. Also, high similarity between true and predicted VSDs was observed with mean Bhattacharya Coefficient being 0.92. With further in vivo validation, intravoxel VSD imaging could become a transformative clinical tool for interrogating disease and treatment induced vascular remodelling.
Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0 dB when focusing on a 30° angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.
Integrating sensing and communication (ISAC) can help overcome the challenges of limited spectrum and expensive hardware, leading to improved energy and cost efficiency. While full cooperation between sensing and communication can result in significant performance gains, achieving optimal performance requires efficient designs of unified waveforms and beamformers for joint sensing and communication. Sophisticated statistical signal processing and multi-objective optimization techniques are necessary to balance the competing design requirements of joint sensing and communication tasks. As model-based approaches can be suboptimal or too complex, deep learning offers a powerful data-driven alternative, especially when optimal algorithms are unknown or impractical for real-time use. Unified waveform and beamformer design problems for ISAC fall into this category, where fundamental design trade-offs exist between sensing and communication performance metrics, and the underlying models may be inadequate or incomplete. This tutorial paper explores the application of artificial intelligence (AI) to enhance efficiency or reduce complexity in ISAC designs. We emphasize the integration benefits through AI-driven ISAC designs, prioritizing the development of unified waveforms, constellations, and beamforming strategies for both sensing and communication. To illustrate the practical potential of AI-driven ISAC, we present three case studies on waveform, beamforming, and constellation design, demonstrating how unsupervised learning and neural network-based optimization can effectively balance performance, complexity, and implementation constraints.
Due to the increasing threat of attacks on satellite systems, novel countermeasures have been developed to provide additional security. Among these, there has been a particular interest in transmitter fingerprinting, which authenticates transmitters by looking at characteristics expressed in the physical layer signal. These systems rely heavily upon statistical methods and machine learning, and are therefore vulnerable to a range of attacks. The severity of this threat in a fingerprinting context is currently not well understood. In this paper we evaluate a range of attacks against satellite fingerprinting, building on previous works by looking at attacks optimized to target the fingerprinting system for maximal impact. We design optimized jamming, dataset poisoning, and spoofing attacks, evaluating them in the real world against the SatIQ fingerprinting system designed to authenticate Iridium transmitters, and using a wireless channel emulator to achieve realistic channel conditions. We show that an optimized jamming signal can cause a 50% error rate with attacker-to-victim ratios as low as -30dB (far less power than traditional jamming techniques), and demonstrate successful spoofing attacks, with an attacker successfully removing their own transmitter's fingerprint from messages. We also present a viable dataset poisoning attack, enabling persistent message spoofing by altering stored data to include the fingerprint of the attacker's transmitter. Finally, we show that a model trained to optimize spoofing attacks can also be used to detect spoofing and replay attacks, even when it has never seen the attacker's transmitter before. This technique works even when the training dataset includes only a single transmitter, enabling fingerprinting to be used to protect small constellations and even individual satellites, providing additional protection where it is needed the most.
Mask-based lensless imaging uses an optical encoder (e.g. a phase or amplitude mask) to capture measurements, then a computational decoding algorithm to reconstruct images. In this work, we evaluate and design lensless encoders based on the information content of their measurements using mutual information estimation. Our approach formalizes the object-dependent nature of lensless imaging and quantifies the interdependence between object sparsity, encoder multiplexing, and noise. Our analysis reveals that optimal encoder designs should tailor encoder multiplexing to object sparsity for maximum information capture, and that all optimally-encoded measurements share the same level of sparsity. Using mutual information-based optimization, we design information-optimal encoders for compressive imaging of fixed object distributions. Our designs demonstrate improved downstream reconstruction performance for objects in the distribution, without requiring joint optimization with a specific reconstruction algorithm. We validate our approach experimentally by evaluating lensless imaging systems directly from captured measurements, without the need for image formation models, reconstruction algorithms, or ground truth data. Our comprehensive analysis establishes design and engineering principles for lensless imaging systems, and offers a model for the study of general multiplexing systems, especially those with object-dependent performance.
Integrated sensing and communications (ISAC) is considered a key enabler to support application scenarios such as the Internet-of-Things (IoT) in which both communications and sensing play significant roles. Multi-carrier waveforms, such as orthogonal frequency division multiplexing (OFDM), have been considered as good candidates for ISAC due to their high communications data rate and good time bandwidth property for sensing. Nevertheless, their high peak-to-average-power-ratio (PAPR) values lead to either performance degradation or an increase in system complexity. This can make OFDM unsuitable for IoT applications with insufficient resources in terms of power, system complexity, hardware size or cost. This article provides IoT-centric constant modulus waveform designs that leverage the advantage of unit PAPR and thus are more suitable in resource-limited scenarios. More specifically, several single-carrier frequency and/or phase-modulated waveforms are considered. A comprehensive discussion on their radar sensing and communications performance is conducted based on performance metrics, including the radar ambiguity function, the bandwidth property, the data rate, and the communications receiver complexity.
The training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast to the pre-trained speech encoders, our student models only consist of convolutional and fully-connected layers, removing the need for attention context or recurrent updates. In our experiments, we demonstrate that we can reduce the memory footprint to up to 3.4 MB and required future audio context to up to 81 ms while maintaining high-quality animations. This paves the way for on-device inference, an important step towards realistic, model-driven digital characters.
Despite their success in speech processing, neural networks often operate as black boxes, prompting the question: what informs their decisions, and how can we interpret them? This work examines this issue in the context of lexical stress. A dataset of English disyllabic words was automatically constructed from read and spontaneous speech. Several Convolutional Neural Network (CNN) architectures were trained to predict stress position from a spectrographic representation of disyllabic words lacking minimal stress pairs (e.g., initial stress WAllet, final stress exTEND), achieving up to 92% accuracy on held-out test data. Layerwise Relevance Propagation (LRP), a technique for neural network interpretability analysis, revealed that predictions for held-out minimal pairs (PROtest vs. proTEST ) were most strongly influenced by information in stressed versus unstressed syllables, particularly the spectral properties of stressed vowels. However, the classifiers also attended to information throughout the word. A feature-specific relevance analysis is proposed, and its results suggest that our best-performing classifier is strongly influenced by the stressed vowel's first and second formants, with some evidence that its pitch and third formant also contribute. These results reveal deep learning's ability to acquire distributed cues to stress from naturally occurring data, extending traditional phonetic work based around highly controlled stimuli.
Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
We present ETHOS (Encountered-Type Haptics for On-demand Social Interaction), a dynamic encountered-type haptic display (ETHD) that enables natural physical contact in virtual reality (VR) during social interactions such as handovers, fist bumps, and high-fives. The system integrates a torque-controlled robotic manipulator with interchangeable passive props (silicone hand replicas and a baton), marker-based physical-virtual registration via a ChArUco board, and a safety monitor that gates motion based on the user's head and hand pose. We introduce two control strategies: (i) a static mode that presents a stationary prop aligned with its virtual counterpart, consistent with prior ETHD baselines, and (ii) a dynamic mode that continuously updates prop position by exponentially blending an initial mid-point trajectory with real-time hand tracking, generating a unique contact point for each interaction. Bench tests show static colocation accuracy of 5.09 +/- 0.94 mm, while user interactions achieved temporal alignment with an average contact latency of 28.53 +/- 31.21 ms across all interaction and control conditions. These results demonstrate the feasibility of recreating socially meaningful haptics in VR. By incorporating essential safety and control mechanisms, ETHOS establishes a practical foundation for high-fidelity, dynamic interpersonal interactions in virtual environments.
This paper presents a tutorial and survey on probabilistic inference-based model predictive control (PI-MPC) for robotics. PI-MPC reformulates finite-horizon optimal control as inference over an optimal control distribution expressed as a Boltzmann distribution weighted by a control prior, and generates actions through variational inference. In the tutorial part, we derive this formulation and explain action generation via variational inference, highlighting Model Predictive Path Integral (MPPI) control as a representative algorithm with a closed-form sampling update. In the survey part, we organize existing PI-MPC research around key design dimensions, including prior design, multi-modality, constraint handling, scalability, hardware acceleration, and theoretical analysis. This paper provides a unified conceptual perspective on PI-MPC and a practical entry point for robotics researchers and practitioners.
Immersive formats such as 360° and 6DoF point cloud videos require high bandwidth and low latency, posing challenges for real-time AR/VR streaming. This work focuses on reducing bandwidth consumption and encryption/decryption delay, two key contributors to overall latency. We design a system that downsamples point cloud content at the origin server and applies partial encryption. At the client, the content is decrypted and upscaled using an ML-based super-resolution model. Our evaluation demonstrates a nearly linear reduction in bandwidth/latency, and encryption/decryption overhead with lower downsampling resolutions, while the super-resolution model effectively reconstructs the original full-resolution point clouds with minimal error and modest inference time.
The growth of Electric Vehicles (EVs) creates a conflict in vehicle-to-building (V2B) settings between building operators, who face high energy costs from uncoordinated charging, and drivers, who prioritize convenience and a full charge. To resolve this, we propose a negotiation-based framework that, by design, guarantees voluntary participation, strategy-proofness, and budget feasibility. It transforms EV charging into a strategic resource by offering drivers a range of incentive-backed options for modest flexibility in their departure time or requested state of charge (SoC). Our framework is calibrated with user survey data and validated using real operational data from a commercial building and an EV manufacturer. Simulations show that our negotiation protocol creates a mutually beneficial outcome: lowering the building operator's costs by over 3.5\% compared to an optimized, non-negotiating smart charging policy, while simultaneously reducing user charging expenses by 22\% below the utility's retail energy rate. By aligning operator and EV user objectives, our framework provides a strategic bridge between energy and mobility systems, transforming EV charging from a source of operational friction into a platform for collaboration and shared savings.
Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, leading to coupled nonlinear systems of ordinary differential equations. Recent work in scalar auxiliary variable techniques has enabled construction of explicit and stable numerical solvers for such systems. On the other hand, neural ordinary differential equations have been successful in modelling nonlinear systems from data. In this work, we examine how scalar auxiliary variable techniques can be combined with neural ordinary differential equations to yield a stable differentiable model capable of learning nonlinear dynamics. The proposed approach leverages the analytical solution for linear vibration of the system's modes so that physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the model architecture. Compared to our previous work that used multilayer perceptrons to parametrise nonlinear dynamics, we employ gradient networks that allow an interpretation in terms of a closed-form and non-negative potential required by scalar auxiliary variable techniques. As a proof of concept, we generate synthetic data for the nonlinear transverse vibration of a string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.
As industrial manufacturing scales, automating fine-grained product image analysis has become critical for quality control. However, existing approaches are hindered by limited dataset coverage and poor model generalization across diverse and complex anomaly patterns. To address these challenges, we introduce MAU-Set, a comprehensive dataset for Multi-type industrial Anomaly Understanding. It spans multiple industrial domains and features a hierarchical task structure, ranging from binary classification to complex reasoning. Alongside this dataset, we establish a rigorous evaluation protocol to facilitate fair and comprehensive model assessment. Building upon this foundation, we further present MAU-GPT, a domain-adapted multimodal large model specifically designed for industrial anomaly understanding. It incorporates a novel AMoE-LoRA mechanism that unifies anomaly-aware and generalist experts adaptation, enhancing both understanding and reasoning across diverse defect classes. Extensive experiments show that MAU-GPT consistently outperforms prior state-of-the-art methods across all domains, demonstrating strong potential for scalable and automated industrial inspection.
Synthesizing coherent soundtracks for long-form videos remains a formidable challenge, currently stalled by three critical impediments: computational scalability, temporal coherence, and, most critically, a pervasive semantic blindness to evolving narrative logic. To bridge these gaps, we propose NarraScore, a hierarchical framework predicated on the core insight that emotion serves as a high-density compression of narrative logic. Uniquely, we repurpose frozen Vision-Language Models (VLMs) as continuous affective sensors, distilling high-dimensional visual streams into dense, narrative-aware Valence-Arousal trajectories. Mechanistically, NarraScore employs a Dual-Branch Injection strategy to reconcile global structure with local dynamism: a \textit{Global Semantic Anchor} ensures stylistic stability, while a surgical \textit{Token-Level Affective Adapter} modulates local tension via direct element-wise residual injection. This minimalist design bypasses the bottlenecks of dense attention and architectural cloning, effectively mitigating the overfitting risks associated with data scarcity. Experiments demonstrate that NarraScore achieves state-of-the-art consistency and narrative alignment with negligible computational overhead, establishing a fully autonomous paradigm for long-video soundtrack generation.