Digital Video Broadcasting Satellite, Second Generation and its extension DVBS2X are widely used in modern satellite communications, where synchronization relies on physical layer headers, pilot symbols, and optional superframe structures but lacks defined implementation methods. This work explores the use of external synchronization to enhance DVBS2 performance by using GPS disciplined oscillators, and a hardware software in the loop satellite channel model emulating Low Earth Orbit propagation. We evaluate scenarios with and without Doppler shifts and radio frequency interference, comparing synchronized and unsynchronized cases. Results show that external synchronization significantly improves bit error rate, frame error rate, and signal-to-noise ratio, subsequently reducing the frames required for reliable synchronization and enabling higher throughput in future satellite communication systems.
Electrodermal activity (EDA) is a widely used physiological signal for assessing sympathetic nervous activity, such as arousal, stress, and pain. However, reliable decomposition into tonic and phasic components remains challenging, particularly in noisy environments and across individuals with varying signal morphologies and stimulus responses. We propose ospEDA, a novel Orthogonal Subspace Projection (OSP) based method for EDA decomposition. The method integrates (1) tonic estimation via physiologically motivated valley detection for noise robustness; (2) phasic extraction using OSP to accommodate inter subject variability; and (3) phasic driver estimation through non-negative least squares (NNLS) deconvolution with ridge regularization. We evaluated ospEDA on five real-world datasets and one simulated EDA dataset with ground-truth components, comparing its performance against six existing methods. In simulations with a 20 dB signal to noise ratio (SNR), ospEDA achieved the lowest root mean square error (RMSE) for estimated tonic (0.131) and phasic (0.132) components. Under noisier conditions (10 dB SNR), it maintained superior phasic RMSE (0.293), Pearson correlation (0.782), and R^2 (0.979) values. Furthermore, ospEDA consistently provided the highest F1 scores (0.573, 0.617, 0.638) for sympathetic nerve activity detection across 10, 20, and 30 dB SNR levels, respectively, compared to existing methods. On the real world datasets, ospEDA achieved a stimulus classification AUROC of 0.766 and consistently maintained strong effect sizes ({\omega}^2>0.14) across all five datasets. Overall, ospEDA represents a promising framework for EDA decomposition, showing generally consistent performance and reliable phasic driver estimation under the varying noise conditions, with potential utility for real world physiological monitoring applications.
Controlled Environment Agriculture (CEA) demands precise, adaptive climate management across distributed infrastructure. This paper presents IOGRUCloud, a scalable three-tier IoT platform that integrates AI-driven control with edge computing for automated greenhouse climate regulation. The system architecture separates field-level sensing and actuation (L1), facility-level coordination (L2), and cloud-level optimization (L3-L4), enabling progressive autonomy from rule-based to fully autonomous operation. A Vapor Pressure Deficit (VPD) cascading control loop governs temperature and humidity with GRU-enhanced PID tuning, reducing manual calibration effort by 73%. Deployed across 14 production greenhouses totaling 47,000 m2, the platform demonstrates 23% reduction in energy consumption and 31% improvement in climate stability versus baseline. The system handles 2.3M daily sensor events with 99.7% uptime. We release the architecture specification and deployment results to support reproducibility in smart agriculture research.
We revisit Brockett's attention in the context of bilinear gradient flow of an ensemble, and explore an alternative formalism that aims to reduce shear by minimizing the conditioning number of the dynamics; equivalently, we minimize the range of the eigenvalues of the dynamics. Remarkably, the evolution is isospectral, and this property is inherited by the coupled nonlinear dynamics of the control problem from a Lax isospectral flow.
In this paper, we consider the data-driven discovery of stable dynamical models with a single equilibrium. The proposed approach uses a basis-function parameterization of the differential equations and the associated Lyapunov function. This modeling approach enables the discovery of both the dynamical model and a Lyapunov function in an interpretable form. The Lyapunov conditions for stability are enforced as constraints on the training data. The resulting learning task is a mixed-integer quadratically constrained optimization problem that can be solved to optimality using current state-of-the-art global optimization solvers. Application to two case studies shows that the proposed approach can discover the true model of the system and the associated Lyapunov function. Moreover, in the presence of noise, the model learned with the proposed approach achieves higher predictive accuracy than models learned with baselines that do not consider Lyapunov-related constraints.
Smartphone cameras face fundamental form-factor constraints that limit their optical magnification, primarily due to the difficulty of reducing a lens assembly's telephoto ratio, the ratio between total track length (TTL) and effective focal length (EFL). Currently, conventional refractive optics struggle to achieve a telephoto ratio below 0.5 without requiring multiple bulky elements to correct optical aberrations. In this paper, we introduce MetaTele, a novel optics-algorithm co-design that breaks this bottleneck. MetaTele explicitly decouples the acquisition of scene structure and color information. First, it utilizes a compact refractive-metasurface optical assembly to capture a fine-detail structure image under a narrow wavelength band, inherently avoiding severe chromatic aberrations. Second, it captures a broadband color cue using the same optics; although this cue is heavily corrupted by chromatic aberrations, it retains sufficient spectral information to guide post-processing. We then employ a custom one-step diffusion model to computationally fuse these two raw measurements, successfully colorizing the structure image while correcting for system aberrations. We demonstrate a MetaTele prototype, achieving an unprecedented telephoto ratio of 0.44 with a TTL of just 13 mm for RGB imaging, paving the way for DSLR-level telephoto capabilities within smartphone form factors.
Affine frequency division multiplexing (AFDM) has emerged as a promising waveform for high-mobility communications. However, its equalization remains a practical challenge under general physical channels with off-grid delay and Doppler effects. In this paper, we investigate frequency domain equalization for AFDM by considering a practical filtered-AFDM waveform. We analyze the input-output relations of filtered-AFDM across various domains and show that off-grid effects lead to severe inter-symbol interference in the DAFT domain, limiting the effectiveness of DAFT domain equalization. Motivated by the compactness of the frequency domain channel matrix in wideband systems, we propose a low-complexity two-stage frequency domain equalization scheme. Numerical results demonstrate that the proposed approach achieves performance close to full-block LMMSE equalization with significantly reduced computational complexity, and offers clear advantages over time domain equalization in wideband scenarios.
Adaptive impedance matching between antennas and radio frequency front-end modules is critical for maximizing power transmission efficiency in mobile communication systems. Conventional numerical and analytical methods struggle with a trade-off between accuracy and efficiency, while deep neural network (DNN)-based supervised learning approaches rely heavily on large labeled datasets and lack flexibility for dynamic environments. To address these limitations, this paper proposes a deep reinforcement learning (DRL)-based approach for adaptive impedance matching. First, we model the impedance tuning problem as an optimal control problem, proving the feasibility of solving the optimal control law via reinforcement learning. Then, we design a tailored DRL framework for impedance tuning, which employs a compact state representation that integrates key frequency characteristics and matching quality metrics. Additionally, this framework incorporates a piecewise reward function that accounts for both matching accuracy and tuning speed. Furthermore, a test-phase exploration mechanism is introduced to enhance tuning stability, which effectively reduces local optimal trapping and high-frequency tuning variance. Experimental results demonstrate that the proposed method achieves superior performance in terms of tuning accuracy, efficiency, and stability compared with conventional heuristic and gradient-based methods, making it promising for practical impedance tuning systems.
Network coordination games are widely used to model collaboration among interconnected agents, with applications across diverse domains including economics, robotics, and cyber-security. We consider networks of bounded-rational agents who interact through binary stag hunt games, a canonical game theoretic model for distributed collaborative tasks. Herein, the agents update their actions using logit response functions, yielding the Log-Linear Learning (LLL) algorithm. While convergence of LLL to a risk-dominant Nash equilibrium requires unbounded rationality, we consider regimes in which rationality is strictly bounded. We first show that the stationary probability of states corresponding to perfect coordination is monotone increasing in the rationality parameter $\beta$. For $K$-regular networks, we prove that the stationary probability of a perfectly coordinated action profile is monotone in the connectivity degree $K$, and we provide an upper bound on the minimum rationality required to achieve a desired level of coordination. For irregular networks, we show that the stationary probability of perfectly coordinated action profiles increases with the number of edges in the graph. We show that, for a large class of networks, the partition function of the Gibbs measure is well approximated by the moment generating function of Gaussian random variable. This approximation allows us to optimize degree distributions and establishes that the optimal network - i.e., the one that maximizes the stationary probability of coordinated action profiles - is $K$-regular. Consequently, our results indicate that networks of uniformly bounded-rational agents achieve the most reliable coordination when connectivity is evenly distributed among agents.
Broadband oscillations in wind farms have been widely reported in recent years. Past studies have examined various types of oscillations in wind farms, relating small-signal stability to control settings, operating conditions, and electrical parameters. However, most analyses are performed on aggregated single-unit models, which may deviate from the true behavior, leading to misleading stability assessments. To investigate how aggregation affects stability conclusions, this paper develops detailed single-, two-, and three-unit doubly-fed induction generator (DFIG) models and their aggregated counterparts. Then, a D-decomposition-related ray-extrapolation method is proposed to characterize the small-signal stability region of nonlinear DFIG models in the parameter space, delineating stability boundaries under numerous parameter combinations. The study reveals that aggregated models stability regions within the parameter planes of control settings and operating conditions differ from those of granular models in terms of basic shape, critical modes, and evolution patterns, posing a risk of misjudging stability margins.
Objective: To develop a robust and compact deep learning model for automated knee cartilage segmentation on point-of-care ultrasound (POCUS) devices. Methods: We propose MonoUNet, an ultra-compact U-Net consisting of (i) an aggressively reduced backbone with an asymmetric decoder, (ii) a trainable monogenic block that extracts multi-scale local phase features, and (iii) a gated feature injection mechanism that integrates these features into the encoder stages to reduce sensitivity to variations in ultrasound image appearance and improve robustness across devices. MonoUNet was evaluated on a multi-site, multi-device knee cartilage ultrasound dataset acquired using cart-based, portable, and handheld POCUS devices. Results: Overall, MonoUNet outperformed existing lightweight segmentation models, with average Dice scores ranging from 92.62% to 94.82% and mean average surface distance (MASD) values between 0.133 mm and 0.254 mm. MonoUNet reduces the number of parameters by 10x--700x and computational cost by 14x--2000x relative to existing lightweight models. MonoUNet cartilage outcomes showed excellent reliability and agreement with the manual outcomes: intraclass correlation coefficients (ICC$_{2,k})$=0.96 and bias=2.00% (0.047 mm) for average thickness, and ICC$_{2,k}$=0.99 and bias=0.80% (0.328 a.u.) for echo intensity. Conclusion: Incorporating trainable local phase features improves the robustness of highly compact neural networks for knee cartilage segmentation across varying acquisition settings and could support scalable ultrasound-based assessment and monitoring of knee osteoarthritis using POCUS devices. The code is publicly available at this https URL.
Automotive engineering development increasingly relies on heterogeneous 3D data, including finite element (FE) models, body-in-white (BiW) representations, CAD geometry, and CFD meshes. At the same time, engineering teams face growing pressure to shorten development cycles, improve performance and accelerate innovation. Although artificial intelligence (AI) is increasingly explored in this domain, many current methods remain task-specific, difficult to interpret, and hard to reuse across development stages. This paper presents a practical graph learning framework for 3D engineering AI, in which heterogeneous engineering assets are converted into physics-aware graph representations and processed by Graph Neural Networks (GNNs). The framework is designed to support both classification and prediction tasks. The framework is validated on two automotive applications: CAE vibration mode shape classification and CFD aerodynamic field prediction. For CAE vibration mode classification, a region-aware BiW graph supports explainable mode classification across vehicle and FE variants under label scarcity. For CFD aerodynamic field prediction, a physics-informed surrogate predicts pressure and wall shear stress (WSS) across aerodynamic body shape variants, while symmetry preserving down sampling retains accuracy with lower computational cost. The framework also outlines data generation guidance that can help engineers identify which additional simulations or labels are valuable to collect next. These results demonstrate a practical and reusable engineering AI workflow for more trustworthy CAE and CFD decision support.
Harnessing the demand-side flexibility in building and mobility sectors can help to better integrate renewable energy into power systems and reduce global CO2 emissions. Enabling this sector coupling can be achieved with advances in energy management, business models, control technologies, and power grids. The study of demand-side flexibility extends beyond engineering, spanning social science, economics, and power and control systems, which present both challenges and opportunities to researchers and engineers in these fields. This Review outlines recent trends and studies in social, economic, and technological advancements in power systems that leverage demand-side flexibility. We first provide a concept of a socio-techno-economic system with an abstraction of end-users, building and mobility sectors, control systems, electricity markets, and power grids. We discuss the interconnections between these elements, highlighting the importance of bidirectional flows of information and coordinated decision-making. We then emphasize that fully realizing demand-side flexibility necessitates deep integration across stakeholders and systems, moving beyond siloed approaches. Finally, we discuss the future directions in renewable-based power systems and control engineering to address key challenges from both research and practitioners' perspectives. A holistic approach for identifying, measuring, and utilizing demand-side flexibility is key to successfully maximizing its multi-stakeholder benefits but requires further transdisciplinary collaboration and commercially viable solutions for broader implementation.
Perimeter control is an effective urban traffic management strategy that regulates inflow to congested urban regions using aggregate network dynamics. While existing approaches primarily optimize system-level efficiency, such as total travel time or network throughput, they often overlook equity considerations, leading to uneven delay distributions across entry points. This work integrates fairness objectives into perimeter control design through explicit queue balancing mechanisms.A large-scale, microscopic case study of the Financial District in the San Francisco urban network is used to evaluate both performance and implementation challenges. The results demonstrate conventional perimeter control not only reduces total and internal delays but can also improve fairness metrics (Harsanyian, Rawlsian, Utilitarian, Egalitarian). Building on this observation, queue balancing strategies match conventional performance while yielding measurable fairness improvements, especially in heterogeneous demand scenarios, where congestion is unevenly distributed across entry points. The proposed framework contributes toward equitable control design for emerging intelligent transportation systems and higher user acceptance for those.
The rapid emergence of Large Language Models (LLMs) has catalyzed Agentic artificial intelligence (AI), autonomous systems integrating perception, reasoning, and action into closed-loop pipelines for continuous adaptation. While unlocking transformative applications in mobile edge computing, autonomous systems, and next-generation wireless networks, this paradigm creates fundamental energy challenges through iterative inference and persistent data exchange. Unlike traditional AI where bottlenecks are computational Floating Point Operations (FLOPs), Agentic AI faces compounding computational and communication energy costs. In this survey, we propose an energy accounting framework identifying computational and communication costs across the Perception-Reasoning-Action cycle. We establish a unified taxonomy spanning model simplification, computation control, input and attention optimization, and hardware-aware inference. We explore cross-layer co-design strategies jointly optimizing model parameters, wireless transmissions, and edge resources. Finally, we identify open challenges of federated green learning, carbon-aware agency, 6th generation mobile communication (6G)-native Agentic AI, and self-sustaining systems, providing a roadmap for scalable autonomous intelligence.
Traditional video coding (VVC, HEVC) prioritizes human visual perception, transmitting substantial texture redundancy that severely hinders machine decision-making under constrained bandwidths. In dynamic channels, this redundancy causes severe ``cliff effects'' and prohibitive latency. To address this, we propose a robust multimodal semantic communication framework based on an adaptive Object-Attribute-Relation (O-A-R) hierarchy. Bypassing pixel-level reconstruction entirely, our framework directly fuses visual, textual, and audio streams to construct a decision-oriented topological graph. A bandwidth-adaptive strategy dynamically allocates resources by semantic priority, while a cross-modal mechanism leverages text and audio priors to compensate for severe visual degradation. Experimental results demonstrate that under extreme low bandwidths (1-3 kbps), our method achieves over a 90% bandwidth saving (an approximately 10-fold reduction) compared to state-of-the-art digital schemes, maintaining superior scene-graph accuracy. In deep fading channels (SNR <= 4 dB), it completely eliminates the cliff effect, ensuring graceful degradation by strictly preserving foundational object anchors even when traditional codecs suffer 100% decoding failure. Coupled with an 89\% reduction in end-to-end latency, our framework comprehensively fulfills the real-time survival requirements of embodied agents.
This paper proposes a safe reinforcement learning (RL) framework based on forward-invariance-induced action-space design. The control problem is cast as a Markov decision process, but instead of relying on runtime shielding or penalty-based constraints, safety is embedded directly into the action representation. Specifically, we construct a finite admissible action set in which each discrete action corresponds to a stabilizing feedback law that preserves forward invariance of a prescribed safe state set. Consequently, the RL agent optimizes policies over a safe-by-construction policy class. We validate the framework on a quadcopter hover-regulation problem under disturbance. Simulation results show that the learned policy improves closed-loop performance and switching efficiency, while all evaluated policies remain safety-preserving. The proposed formulation decouples safety assurance from performance optimization and provides a promising foundation for safe learning in nonlinear systems.
Large-scale Electric Vehicle (EV) Charging Station (CS) may be too large to be dispatched in real-time via a centralized approach. While a decentralized approach may be a viable solution, the lack of incentives could impair the alignment of EVs' individual objectives with the controller's optimum. In this work, we integrate a decentralized algorithm into a hierarchical three-layer Energy Management System (EMS), where it operates as the real-time control layer and incorporates an incentive design mechanism. A centralized approach is proposed for the dispatch plan definition and for the intra-day refinement, while a decentralized game-theoretic approach is proposed for the real time control. We employ a Stackelberg Game-based Alternating Direction Method of Multipliers (SG-ADMM) to simultaneously design an incentive mechanism while managing the EV control in a distributed manner, while framing the leadership-followership relation between the EVCS and the EVs as a non-cooperative game where the leader has commitment power. Part I of this two-part paper deals with the SG-ADMM approach description, literature review and integration in the abovementioned hierarchical EMS, focusing on the modifications needed for the proposed application.
In the first part of this two-part paper a game-theoretic decentralized real-time control is proposed in the context of Electric Vehicle (EV) Charging Station (CS). This method, relying on a Stackelberg Game-based Alternating Direction of Multipliers (SG-ADMM), intends to steer the EVs' individual objectives towards the CS optimum by means of an incentive design mechanism, while controlling the EV power dispatch in a distributed manner. We integrate SG-ADMM in a hierachical multi-layered Energy Management System (EMS) as the real-time control algorithm, formulating the two-layer approach so that the SG leader (i.e., the CS), holding commitment power, trades off the available power with the incentives to the EVs, and the SG followers (i.e., the EVs) optimizes their charging curve in response to the leader decision. In this second part, we demonstrate the applicability of SG-ADMM as a incentive design mechanism inside an EVCS EMS, testing it in a large-scale EVCS. We benchmark this method with a decentralized (ADMM-based), a centralized and a uncontrolled approach, showing that our method exploits EV-level flexibility in a cost-effective, fair and computationally efficient manner.
We propose a Physics Informed Learning framework for reconstructing traffic density from sparse trajectory data. The approach combines a second-order Aw-Rascle and Zhang model with a first-order training stage to estimate the equilibrium velocity. The method is evaluated in both equilibrium and transient traffic regimes using SUMO simulations. Results show that while learning the equilibrium velocity improves reconstruction under steady state conditions, it becomes unstable in transient regimes due to the breakdown of the equilibrium assumption. In contrast, the second-order model consistently provides more accurate and robust reconstructions than first-order approaches, particularly in nonequilibrium conditions.
Multi-Agent Path Finding (MAPF) is a fundamental coordination problem in large-scale robotic and cyber-physical systems, where multiple agents must compute conflict-free trajectories with limited computational and communication resources. While centralised optimal solvers provide guarantees on solution optimality, their exponential computational complexity limits scalability to large-scale systems and real-time applicability. Existing decentralised heuristics are faster, but result in suboptimal outcomes and high cost disparities. This paper proposes a decentralised coordination framework for cooperative MAPF based on Karma mechanisms - artificial, non-tradeable credits that account for agents' past cooperative behaviour and regulate future conflict resolution decisions. The approach formulates conflict resolution as a bilateral negotiation process that enables agents to resolve conflicts through pairwise replanning while promoting long-term fairness under limited communication and without global priority structures. The mechanism is evaluated in a lifelong robotic warehouse multi-agent pickup-and-delivery scenario with kinematic orientation constraints. The results highlight that the Karma mechanism balances replanning effort across agents, reducing disparity in service times without sacrificing overall efficiency. Code: this https URL
Integrating large language models (LLMs) into automatic speech recognition (ASR) has become a dominant paradigm. Although recent LLM-based ASR models have shown promising performance on public benchmarks, it remains challenging to balance recognition quality with latency and overhead, while hallucinations further limit real-world deployment. In this study, we revisit LLM-based ASR from an entropy allocation perspective and introduce three metrics to characterize how training paradigms allocate entropy reduction between the speech encoder and the LLM. To remedy entropy-allocation inefficiencies in prevailing approaches, we propose a principled multi-stage training strategy grounded in capability-boundary awareness, optimizing parameter efficiency and hallucination robustness. Specifically, we redesign the pretraining strategy to alleviate the speech-text modality gap, and further introduce an iterative asynchronous SFT stage between alignment and joint SFT to preserve functional decoupling and constrain encoder representation drift. Experiments on Mandarin and English benchmarks show that our method achieves competitive performance with state-of-the-art models using only 2.3B parameters, while also effectively mitigating hallucinations through our decoupling-oriented design.
This paper presents a detailed measurement campaign and a comprehensive analysis of 15 GHz ultra-massive multiple-input multiple-output (UM-MIMO) channels tailored for the urban microcell (UMi) environment. Channel sounding is performed over 14.875-15.125 GHz using a time-domain platform comprising a 128-element L-shaped transmit array and a 64-element square receive array. Four representative scenarios are investigated, namely near-field line-of-sight (LoS), near-field foliage-shaded, far-field foliage-shaded, and far-field LoS street canyon scenarios, resulting in 81 distinct transmit-receive links. Based on the measured data, conventional channel characteristics, including path loss, power delay angle profiles, delay spread, and angular spread, are characterized, while UM-MIMO-specific phenomena associated with near-field effects, spatial non-stationarity (SNS), and channel hardening (CHD) are quantitatively analyzed. Channel capacity is further evaluated to reveal the effects of different UMi propagation conditions on system performance. The reported results provide empirical support for the new mid-band spectrum (6-24 GHz, including Frequency Range 3 (FR3)) UM-MIMO channel modeling and offer practical guidance for the design and deployment of future sixth-generation (6G) microcell networks.
In this paper, we consider data-driven reconstruction of unknown inputs to linear time-invariant (LTI) multiple-input multiple-output (MIMO) systems. We propose a novel autoregressive estimator based on a constrained least-squares formulation over Hankel matrices, splitting the problem into an output-consistency constraint and an input-history-matching objective. Our method relies on previously recorded input-output data to represent the system, but does not require knowledge of the true input to initialize the algorithm. We show that the proposed estimator is strictly stable if and only if all the invariant zeros of the trajectory-generating system lie strictly inside the unit circle, which can be verified purely from input and output data. This mirrors existing results from model-based input reconstruction and closes the gap between model-based and data-driven settings. Lastly, we provide numerical examples to demonstrate the theoretical results.
ROI (Region of Interest) video selective encryption based on H.265/HEVC is a technology that protects the sensitive regions of videos by perturbing the syntax elements associated with target areas. However, existing methods typically adopt Tile (with a relatively large size) as the minimum encryption unit, which suffers from problems such as inaccurate encryption regions and low encryption precision. This low-precision encryption makes them difficult to apply in sensitive fields such as medicine, military, and remote sensing. In order to address the aforementioned problem, this paper proposes a fine-grained ROI video selective encryption algorithm based on Coding Units (CUs) and prompt segmentation. First, to achieve a more precise ROI acquisition, we present a novel ROI mapping approach based on prompt segmentation. This approach enables precise mapping of ROIs to small $8\times8$ CU levels, significantly enhancing the precision of encrypted regions. Second, we propose a selective encryption scheme based on multiple syntax elements, which distorts syntax elements within high-precision ROI to effectively safeguard ROI security. Finally, we design a diffusion isolation based on Pulse Code Modulation (PCM) mode and MV restriction, applying PCM mode and MV restriction strategy to the affected CU to address encryption diffusion during prediction. The above three strategies break the inherent mechanism of using Tiles in existing ROI encryption and push the fine-grained level of ROI video encryption to the minimum $8\times8$ CU precision. The experimental results demonstrate that the proposed algorithm can accurately segment ROI regions, effectively perturb pixels within these regions, and eliminate the diffusion artifacts introduced by encryption. The method exhibits great potential for application in medical imaging, military surveillance, and remote areas.
A key task in embedded vision is visual odometry (VO), which estimates camera motion from visual sensors, and it is a core component in many embedded power-constrained systems, from autonomous robots to augmented and virtual reality wearable devices. The newest class of VO systems combines deep learning models with bio-inspired event-based cameras, which are robust to motion blur and lighting conditions. However, state-of-the-art (SoA) event-based VO algorithms require significant memory and computation. For example, the leading approach DEVO requires 733 MB of memory and 155 billion multiply-accumulate (MAC) operations per frame. We present TinyDEVO, an event-based VO deep learning model designed for resource-constrained microcontroller units (MCUs). We deploy TinyDEVO on an ultra-low-power (ULP) 9-core RISC-V-based MCU, achieving a throughput of approximately 1.2 frames per second with an average power consumption of only 86 mW. Thanks to our neural network architectural optimizations and hyperparameter tuning, TinyDEVO reduces the memory footprint by 11.5x (to 63.8 MB) and the number of operations per frame by 29.7x (to 5.2 billion MACs per frame) compared to DEVO, while maintaining an average trajectory error of 27 cm, i.e., only 19 cm higher than DEVO, on three state-of-the-art datasets. Our work demonstrates, for the first time, the feasibility of an event-based VO pipeline on ultra-low-power devices.
Attitude estimation using scalar measurements, corresponding to partial vectorial observations, arises naturally when inertial vectors are not fully observed but only measured along specific body-frame vectors. Such measurements arise in problems involving incomplete vector measurements or attitude constraints derived from heterogeneous sensor information. Building on the classical complementary filter on SO(3), we propose an observer with a modified innovation term tailored to this scalar-output structure. The main result shows that almost-global asymptotic stability is recovered, under suitable persistence of excitation conditions, when at least three inertial vectors are measured along a common body-frame vector, which is consistent with the three-dimensional structure of SO(3). For two-scalar configurations - corresponding either to one inertial vector measured along two body-frame vectors, or to two inertial vectors measured along a common body-frame vector - we further derive sufficient conditions guaranteeing convergence within a reduced basin of attraction. Different examples and numerical results demonstrate the effectiveness of the proposed scalar-based complementary filter for attitude estimation in challenging scenarios involving reduced sensing and/or novel sensing modalities.
Resilience in cyber-physical systems of systems (CPSoS) is often assessed using static indices or point-in-time metrics that do not adequately account for the temporal evolution of risk following a disruption. This paper formalizes resilience as a functional of the risk trajectory by modelling risk as a dynamic state variable. It is analytically shown that key resilience properties are structurally determined by maximum deviation (peak) and effective damping, and that cumulative risk exposure depends on their ratio. A simplified energy-dependent system illustrates the resulting differences in peak magnitude, recovery dynamics, and cumulative impact. The proposed approach links resilience assessment to stability properties of dynamic systems and provides a system-theoretically consistent foundation for the analysis of time-dependent resilience in CPSoS.
Deep stochastic state-space models enable Bayesian filtering in nonlinear, partially observed systems but typically assume a fixed latent structure. When this assumption is violated, parameter adaptation alone may result in persistent belief inconsistency. We introduce \emph{Cognitive Flexibility} (CF) as a representation-level operator that selects latent structures online via an innovation-based predictive score, while preserving the Bayesian filtering recursion. Structural mismatch is formalized as irreducible predictive inconsistency under fixed structure. The resulting belief--structure recursion is shown to be well posed, to exhibit a structural descent property, and to admit finite switching, with reduction to standard Bayesian filtering under correct specification. Experiments on latent-dynamics mismatch, observation-structure shifts, and well-specified regimes confirm that CF improves predictive accuracy under a mismatch while remaining non-intrusive when the model is correctly specified.
This paper studies joint range-angle estimation and communication in the NF ISAC systems, where the BS serves a single UE whose position is simultaneously estimated via monostatic sensing. Unlike the ULA, the UCA provides an angle-invariant NF region due to its rotational symmetry. To capture the full wideband NF propagation environment, we develop a continuous-time channel model incorporating per-element delay, Doppler shifts, and spherical wavefront geometry under OFDM signaling. Building on this model, we derive the closed-form CRLB for joint range-angle estimation of the UE position, design an optimal transmit beamformer via Riemannian gradient descent, and formulate a joint range-angle ML estimator. Monte Carlo simulations confirm a fundamental aperture-versus-SNR trade-off in NF-ISAC: while a larger UCA radius tightens the CRLB, it simultaneously reduces the received SNR at any given distance, pushing the maximum likelihood estimator below its convergence threshold and degrading practical performance. Among the evaluated configurations, R = 0.5 m achieves the best joint estimation and communication performance at the BS} by sustaining the highest received SNR throughout the evaluated range.
Ordinary differential equations (ODE's) are a cornerstone of systems and control theory. Accordingly, they are standard material in undergraduate programs in engineering and there is abundant didactic literature about this topic. Yet, the solution methods and formulas prescribed in this didactic literature are unclear about the assumptions behind their derivation and thus about the limits of their applicability. Specifically, smoothness of the input is rarely discussed, even though it is a critical property to define the character of the solutions and the validity of the methods and formulas prescribed. On the other hand, the relationships with the state space representation (SSR) of linear systems is absent from this same literature and only marginally discussed in more advanced texts. In this paper we detail these gaps left behind in the didactic literature, then we provide a formal delimitation of the boundaries of the standard solutions and methods for linear ODE's. Our analysis relies on some key properties of state space representations, so we establish the formal connections between ODEs and SSR's, defining an equivalence between the two that is absent in the literature and is of conceptual interest by itself.
In current MIMO mobile communication systems, phase noise can significantly impair performance. To allow for compensation of these impairments, accurate phase noise modeling is necessary. Numerical modeling of the phase noise process at a phase-locked loop (PLL) output is established in the literature and commonly represented by an Ornstein-Uhlenbeck (OU) process. The corresponding spectrum can be represented by a multi-pole/zero model. This work presents a least squares (LS) method for estimating the PLL parameters such as oscillator constants or PLL bandwidth from a measured phase noise spectrum. The method is applied on the MAX2870 and MAX2871 PLL chips and parameter estimates such as oscillator constants and PLL bandwidths are provided. The resulting parameter set enables both time- and frequency-domain numerical simulations.
This work explores the potential of integrating an Intelligent Transmissive Surface (ITS) into an antenna array to improve beamforming performance. We show that integrating a moderate number of passive refractive elements into a small antenna array can significantly improve the Weighted Sum Rate (WSR). We investigate the optimization of the WSR under two distinct operational constraints: a Radiated Power (RP) constraint and a Transmitted Power (TP) constraint. Our analysis reveals that the choice between these constraints significantly impacts the design parameters of the ITS-aided array. By contrasting these approaches, we explore critical design and material parameters, including the array geometry, surface loss, and illumination strategies.
Millimeter-wave (mmWave) communication enables high data rates through large bandwidths and highly directional beamforming, but its sensitivity to blockage and mobility makes reliable beam alignment a central challenge. Limited-probing beam management is a fundamental problem in codebook-based mmWave systems, where only a small subset of beams can be evaluated simultaneously, and the serving decision is restricted to the probed set. Under mobility and noisy feedback, this leads to a sequential and partially observable decision problem in which performance depends critically on the quality of the proposed beam candidates. In this paper, we consider limited-probing beam management and develop a history-conditioned discrete denoising diffusion probabilistic model for beam candidate generation. The proposed method learns from logged probing histories a conditional distribution over promising beam indices, which is then used to construct probing candidates online. Numerical analysis shows that the proposed approach consistently achieves better signal-to-noise ratio, beam-miss probability, and conditional probe regret under tight probing budgets compared with strong learning-based and discriminative baselines. The gains are especially pronounced in low-probing regimes, where accurate candidate generation is most critical.
This paper describes a multi-region control framework for floating offshore wind farms. Specifically, we propose a novel generator torque controller that regulates rotor speed in Region 2, corresponding to wind speeds between the cut-in and rated values. In Region 3 (wind speeds at or above rated but below cut-out speed) we employ a PI-LQR for collective blade pitch. Control blending across the transitional wind speeds (Region 2.5) employs a sigmoid weighting function applied to the control variables. Two modeling paradigms are proposed for farm-level power tracking with rotor speed regularization: a nonlinear model predictive controller (NL-MPC) with a dynamic wake model, and a reduced order model predictive controller based on linear parameter varying turbine models with a time delay representation of wake advection (LPVTD-MPC). These approaches are evaluated over three wind inlet conditions using the PJM ancillary service certification criteria for participation in a secondary frequency regulation market. Results show that both approaches achieve scores of at least 89.9\% for the three different testing scenarios, which are well above the qualification threshold of 75\%. However, the LPVTD-MPC approach solves the problem in under half the time versus NL-MPC but with slightly larger fluctuations in farm-level power output, highlighting the trade-off between performance and computational tractability. The control framework is among the first to address multi-region wind turbine dynamics together with market driven power tracking objectives for floating offshore wind farms. Such multi-region control becomes increasingly necessary in the floating turbine setting where large (region spanning) wind speed variations are common due to wave induced platform pitching.
We study the design of an offloaded model predictive control (MPC) operating over a lossy communication channel. We introduce a controller design that utilizes two complementary bandwidth-reduction methods. The first method is a multi-horizon MPC formulation that decreases the number of optimization variables, and therefore the size of transmitted input trajectories. The second method is a communication-rate reduction mechanism that lowers the frequency of packet transmissions. We derive theoretical guarantees on recursive feasibility and constraint satisfaction under minimal assumptions on packet loss, and we establish reference-tracking performance for the rate-reduction strategy. The proposed methods are validated using a hardware-in-the-loop setup with a real 5G network, demonstrating simultaneous improvements in bandwidth efficiency and computational load.
Model-based multi-agent control requires agents to possess a model of the behavior of others to make strategic decisions. Solution concepts from game theory are often used to model the emergent collective behavior of self-interested agents and have found active use in multi-agent control design. Model predictive games are a class of controllers in which an agent iteratively solves a finite-horizon game to predict the behavior of a multi-agent system and synthesize their own control action. When multiple agents implement these types of controllers, there may exist misspecifications in the respective game models embedded in their controllers, stemming from inaccurate estimates or conjectures of other agents' objectives. This paper analyzes the resulting prediction misalignments and their effects on the system's behavior. We provide criteria for the stability of multi-agent dynamic systems with heterogeneous model predictive game controllers, and quantify the sensitivity of the equilibria to individual agents' game parameters.
Immunohistochemistry (IHC) is essential for assessing specific immune biomarkers like Human Epidermal growth-factor Receptor 2 (HER2) in breast cancer. However, the traditional protocols of obtaining IHC stains are resource-intensive, time-consuming, and prone to structural damages. Virtual staining has emerged as a scalable alternative, but it faces significant challenges in preserving fine-grained cellular structures while accurately translating biochemical expressions. Current state-of-the-art methods still rely on Generative Adversarial Networks (GANs) or standard convolutional U-Net diffusion models that often struggle with "structure and staining trade-offs". The generated samples are either structurally relevant but blurry, or texturally realistic but have artifacts that compromise their diagnostic use. In this paper, we introduce HistDiT, a novel latent conditional Diffusion Transformer (DiT) architecture that establishes a new benchmark for visual fidelity in virtual histological staining. The novelty introduced in this work is, a) the Dual-Stream Conditioning strategy that explicitly maintains a balance between spatial constraints via VAE-encoded latents and semantic phenotype guidance via UNI embeddings; b) the multi-objective loss function that contributes to sharper images with clear morphological structure; and c) the use of the Structural Correlation Metric (SCM) to focus on the core morphological structure for precise assessment of sample quality. Consequently, our model outperforms existing baselines, as demonstrated through rigorous quantitative and qualitative evaluations.
Integrated sensing and communication (ISAC) is a key enabler of 6G, supporting environment-aware services. A fundamental sensing task in this setting is reliable multi-target detection and tracking. This paper proposes a temporal graph neural network (TGNN)-based tracking method that exploits delay and Doppler information from the wireless channel. The delay-Doppler map is modeled as a sequence of graphs, and tracking is formulated as a temporal node classification problem, enabling joint clustering and data association of dynamic targets. Using ray-tracing-based channel outputs as ground truth, the method is evaluated across multiple scenes with varying target positions, velocities, and trajectories and is compared with a Kalman filter baseline. Results demonstrate reduced normalized mean squared error (NMSE) in delay and Doppler, leading to more accurate multi-target tracking.
Estimating generation costs from observed electricity market data is essential for market simulation, strategic bidding, and system planning. To that end, we model the relationship between generation costs and production schedules with a latent variable model. Estimating generation costs from observed schedules is then formulated as Bayesian inference. A prior distribution encodes an initial belief on parameters, and the inference consists of updating the belief with the posterior distribution given observations. We use balanced neural posterior estimation (BNPE) to learn this posterior. Validation on the IEEE RTS-96 test system shows that marginal costs are recovered with narrow credible intervals, while start-up costs remain largely unidentifiable from schedules alone. The method is benchmarked against an inverse-optimization algorithm that exhibits larger parameter errors without uncertainty quantification.
This paper investigates the state estimation problem for linear systems subject to Gaussian noise, where the model parameters are unknown. By formulating and solving an optimization problem that incorporates both offline and online system data, a novel data-driven moving horizon estimator (DDMHE) is designed. We prove that the expected 2-norm of the estimation error of the proposed DDMHE is ultimately bounded. Further, we establish an explicit relationship between the system noise covariances and the estimation error of the proposed DDMHE. Moreover, through a sample complexity analysis, we show how the length of the offline data affects the estimation error of the proposed DDMHE. We also quantify the performance gap between the proposed DDMHE using noisy data and the traditional moving horizon estimator with known system matrices. Finally, the theoretical results are validated through numerical simulations.
We present a perceptually-driven video compression framework integrating implicit neural representations (INRs) and pre-trained video diffusion models to address the extremely low bitrate regime (<0.05 bpp). Our approach exploits the complementary strengths of INRs, which provide a compact video representation, and diffusion models, which offer rich generative priors learned from large-scale datasets. The INR-based conditioning replaces traditional intra-coded keyframes with bit-efficient neural representations trained to estimate latent features and guide the diffusion process. Our joint optimization of INR weights and parameter-efficient adapters for diffusion models allows the model to learn reliable conditioning signals while encoding video-specific information with minimal parameter overhead. Our experiments on UVG, MCL-JCV, and JVET Class-B benchmarks demonstrate substantial improvements in perceptual metrics (LPIPS, DISTS, and FID) at extremely low bitrates, including improvements on BD-LPIPS up to 0.214 and BD-FID up to 91.14 relative to HEVC, while also outperforming VVC and previous strong state-of-the-art neural and INR-only video codecs. Moreover, our analysis shows that INR-conditioned diffusion-based video compression first composes the scene layout and object identities before refining textural accuracy, exposing the semantic-to-visual hierarchy that enables perceptually faithful compression at extremely low bitrates.
Let $f:\mathbb{R}^n\to\mathbb{R}$ be an unknown object, and suppose the observations are tomographic projections of randomly rotated copies of $f$ of the form $Y = P(R\cdot f)$, where $R$ is Haar-uniform in $\mathrm{SO}(n)$ and $P$ is the projection onto an $m$-dimensional subspace, so that $Y:\mathbb{R}^m\to\mathbb{R}$. We prove that, whenever $d\le m$, the $d$-th order moment of the projected data determines the full $d$-th order Haar-orbit moment of $f$, independently of the ambient dimension $n$. We further provide an explicit algorithmic procedure for recovering the latter from the former. As a consequence, any identifiability result for the unprojected model based on $d$-th order group-invariant moment extends directly to the tomographic setting at the same moment order. In particular, for $n=3$, $m=2$, and $d=2$, our result recovers a classical result in the cryo-EM literature: the covariance of the 2D projection images determines the second order rotationally invariant moment of the underlying 3D object.
This paper presents a Gaze-Guided Audio-Visual Speech Enhancement (GG-AVSE) framework to address the cocktail party problem. A major challenge in conventional AVSE is identifying the listener's intended speaker in multi-talker environments. GG-AVSE addresses this issue by exploiting gaze direction as a supervisory cue for target-speaker selection. Specifically, we propose the GG-VM module, which combines gaze signals with a YOLO5Face detector to extract the target speaker's facial features and integrates them with the pretrained AVSEMamba model through two strategies: zero-shot merging and partial visual fine-tuning. For evaluation, we introduce the AVSEC2-Gaze dataset. Experimental results show that GG-AVSE achieves substantial performance gains over gaze-free baselines: a 10.08% improvement in PESQ (2.370 to 2.609), a 5.18% improvement in STOI (0.8802 to 0.9258), and a 23.69% improvement in SI-SDR (9.16 to 11.33). These results confirm that gaze provides an effective cue for resolving target-speaker ambiguity and highlight the scalability of GG-AVSE for real-world applications.
Speech LLM post-training increasingly relies on efficient cross-modal alignment and robust low-resource adaptation, yet collecting large-scale audio-text pairs remains costly. Text-only alignment methods such as TASU reduce this burden by simulating CTC posteriors from transcripts, but they provide limited control over uncertainty and error rate, making curriculum design largely heuristic. We propose \textbf{TASU2}, a controllable CTC simulation framework that simulates CTC posterior distributions under a specified WER range, producing text-derived supervision that better matches the acoustic decoding interface. This enables principled post-training curricula that smoothly vary supervision difficulty without TTS. Across multiple source-to-target adaptation settings, TASU2 improves in-domain and out-of-domain recognition over TASU, and consistently outperforms strong baselines including text-only fine-tuning and TTS-based augmentation, while mitigating source-domain performance degradation.
Real-time control of distribution networks requires accurate information about the system state. In practice, however, such information is difficult to obtain because real-time measurements are available only at a limited number of locations. This paper proposes a novel data-driven power flow (DDPF) framework for balanced radial distribution networks. The proposed algorithm combines the behavioral approach with the DistFlow model and leverages offline historical data to solve power flow problems using only a limited set of real-time measurements. To design DDPF under sparse measurement conditions, we develop a sensor placement problem based on optimal network reductions. This allows us to determine sensor locations subject to a predefined sensor budget and to explicitly account for the radial nature of distribution networks. Unlike approaches that rely on full observability, the proposed framework is designed for practical distribution grids with sparse measurement availability. This enables data-driven power flow for real-time operation while reducing the number of required sensors. On several test cases, the proposed DDPF algorithm could demonstrate accurate voltage magnitude predictions, with a maximum error less than 0.001 p.u., with as little as 25% of total locations equipped with sensors.
Noisy speech separation systems are typically trained on fully-synthetic mixtures, limiting generalization to real-world scenarios. Though training on mixtures of in-domain (thus often noisy) speech is possible, we show that this leads to undesirable optima where mixture noise is retained in the estimates, due to the inseparability of the background noises and the loss function's symmetry. To address this, we propose ring mixing, a batch strategy of using each source in two mixtures, alongside a new Signal-to-Consistency-Error Ratio (SCER) auxiliary loss penalizing inconsistent estimates of the same source from different mixtures, breaking symmetry and incentivizing denoising. On a WHAM!-based benchmark, our method can reduce residual noise by upwards of half, effectively learning to denoise from only noisy recordings. This opens the door to training more generalizable systems using in-the-wild data, which we demonstrate via systems trained using naturally-noisy speech from VoxCeleb.
Wideband orthogonal frequency-division multiplexing (OFDM) over extremely large-scale MIMO (XL-MIMO) arrays in the near-field Fresnel regime suffers from a coupled beam-squint and wavefront-curvature effect that renders single-frequency covariance models severely biased: the per-subcarrier compressed covariance diverges from the center-frequency model by 64\% at $B = 100$~MHz and by 177\% at $B = 400$~MHz. We derive the wideband compressed-domain Cramér--Rao bound (CRB) for hybrid analog--digital architectures and decompose the Fisher information gain into a dominant data-diversity term that scales as $10\log_{10}K_s$~dB and a secondary geometric-diversity term arising from frequency-dependent curvature. At 28~GHz with $M = 256$ antennas, $N_\mathrm{RF} = 16$ RF chains, and $K_s = 512$ subcarriers, wideband processing yields $+27.8$~dB of CRB improvement at $B = 400$~MHz, of which $+0.7$~dB is attributable to geometric diversity.
Autoregressive models have shown superior performance and efficiency in image generation, but remain constrained by high computational costs and prolonged training times in video generation. In this study, we explore methods to accelerate training for autoregressive video generation models through empirical analyses. Our results reveal that while training on fewer video frames significantly reduces training time, it also exacerbates error accumulation and introduces inconsistencies in the generated videos. To address these issues, we propose a Local Optimization (Local Opt.) method, which optimizes tokens within localized windows while leveraging contextual information to reduce error propagation. Inspired by Lipschitz continuity, we propose a Representation Continuity (ReCo) strategy to improve the consistency of generated videos. ReCo utilizes continuity loss to constrain representation changes, improving model robustness and reducing error accumulation. Extensive experiments on class- and text-to-video datasets demonstrate that our approach achieves superior performance to the baseline while halving the training cost without sacrificing quality.
Layout plays a crucial role in graphic design and poster generation. Recently, the application of deep learning models for layout generation has gained significant attention. This paper focuses on using a GAN-based model conditioned on images to generate advertising poster graphic layouts, requiring a dataset of paired product images and layouts. To address this task, we introduce the Content-aware Graphic Layout Dataset (CGL-Dataset), consisting of 60,548 paired inpainted posters with annotations and 121,000 clean product images. The inpainting artifacts introduce a domain gap between the inpainted posters and clean images. To bridge this gap, we design two GAN-based models. The first model, CGL-GAN, uses Gaussian blur on the inpainted regions to generate layouts. The second model combines unsupervised domain adaptation by introducing a GAN with a pixel-level discriminator (PD), abbreviated as PDA-GAN, to generate image-aware layouts based on the visual texture of input images. The PD is connected to shallow-level feature maps and computes the GAN loss for each input-image pixel. Additionally, we propose three novel content-aware metrics to assess the model's ability to capture the intricate relationships between graphic elements and image content. Quantitative and qualitative evaluations demonstrate that PDA-GAN achieves state-of-the-art performance and generates high-quality image-aware layouts.
Ensuring reliable performance in situations outside the Operational Design Domain (ODD) remains a primary challenge in devising resilient autonomous systems. We explore this challenge by introducing an approach for adapting probabilistic system models to handle out-of-ODD scenarios while, in parallel, providing quantitative guarantees. Our approach dynamically extends the coverage of existing system situation capabilities, supporting the verification and adaptation of the system's behaviour under unanticipated situations. Preliminary results demonstrate that our approach effectively increases system reliability by adapting its behaviour and providing formal guarantees even under unforeseen out-of-ODD situations.
Cross-lingual Speech Emotion Recognition (CLSER) aims to identify emotional states in unseen languages. However, existing methods heavily rely on the semantic synchrony of complete labels and static feature stability, hindering low-resource languages from reaching high-resource performance. To address this, we propose a semi-supervised framework based on Semantic-Emotional Resonance Embedding (SERE), a cross-lingual dynamic feature paradigm that requires neither target language labels nor translation alignment. Specifically, SERE constructs an emotion-semantic structure using a small number of labeled samples. It learns human emotional experiences through an Instantaneous Resonance Field (IRF), enabling unlabeled samples to self-organize into this structure. This achieves semi-supervised semantic guidance and structural discovery. Additionally, we design a Triple-Resonance Interaction Chain (TRIC) loss to enable the model to reinforce the interaction and embedding capabilities between labeled and unlabeled samples during emotional highlights. Extensive experiments across multiple languages demonstrate the effectiveness of our method, requiring only 5-shot labeling in the source language.
For applications including facial identification, forensic analysis, photographic improvement, and medical imaging diagnostics, facial image deblurring is an essential chore in computer vision allowing the restoration of high-quality images from blurry inputs. Often based on general picture priors, traditional deblurring techniques find it difficult to capture the particular structural and identity-specific features of human faces. We present SMFD-UNet (Semantic Mask Fusion Deblurring UNet), a new lightweight framework using semantic face masks to drive the deblurring process, therefore removing the need for high-quality reference photos in order to solve these difficulties. First, our dual-step method uses a UNet-based semantic mask generator to directly extract detailed facial component masks (e.g., eyes, nose, mouth) straight from blurry photos. Sharp, high-fidelity facial images are subsequently produced by integrating these masks with the blurry input using a multi-stage feature fusion technique within a computationally efficient UNet framework. We created a randomized blurring pipeline that roughly replicates real-world situations by simulating around 1.74 trillion deterioration scenarios, hence guaranteeing resilience. Examined on the CelebA dataset, SMFD-UNet shows better performance than state-of-the-art models, attaining higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) while preserving satisfactory naturalness measures, including NIQE, LPIPS, and FID. Powered by Residual Dense Convolution Blocks (RDC), a multi-stage feature fusion strategy, efficient and effective upsampling techniques, attention techniques like CBAM, post-processing techniques, and the lightweight design guarantees scalability and efficiency, enabling SMFD-UNet to be a flexible solution for developing facial image restoration research and useful applications.
This paper introduces a class of continuous-time, finite-player stochastic general-sum differential games that admit solutions through an exact linear PDE system. We formulate a distribution planning game utilizing the cross-log-likelihood ratio to naturally model multi-agent spatial conflicts, such as congestion avoidance. By applying a generalized multivariate Cole-Hopf transformation, we decouple the associated non-linear Hamilton-Jacobi-Bellman (HJB) equations into a system of linear partial differential equations. This reduction enables the efficient, grid-free computation of feedback Nash equilibrium strategies via the Feynman-Kac path integral method, effectively overcoming the curse of dimensionality.
Harnessing the predictive capability of Markov process models requires propagating probability density functions (beliefs) through the model. For many existing models however, belief propagation is analytically infeasible, requiring approximation or sampling to generate predictions. This paper proposes a functional modeling framework leveraging sparse Sum-of-Squares (SoS) forms for valid (conditional) density estimation. We study the theoretical restrictions of modeling conditional densities using the SoS form, and propose a novel functional form for addressing such limitations. The proposed architecture enables generalized simultaneous learning of basis functions and coefficients, while preserving analytical belief propagation. In addition, we propose a training method that allows for exact adherence to the normalization and non-negativity constraints. Our results show that the proposed method achieves accuracy comparable to state-of-the-art approaches while requiring significantly less memory in low-dimensional spaces, and it further scales to 12D systems when existing methods fail beyond 2D.
We present GPU-SLS, a GPU-parallelized framework for safe, robust nonlinear model predictive control (MPC) that scales to high-dimensional uncertain robotic systems and long planning horizons. Our method jointly optimizes an inequality-constrained, dynamically-feasible nominal trajectory, a tracking controller, and a closed-loop reachable set under disturbance, all in real-time. To efficiently compute nominal trajectories, we develop a sequential quadratic programming procedure with a novel GPU-accelerated quadratic program (QP) solver that uses parallel associative scans and adaptive caching within an alternating direction method of multipliers (ADMM) framework. The same GPU QP backend is used to optimize robust tracking controllers and closed-loop reachable sets via system level synthesis (SLS), enabling reachability-constrained control in both fixed- and receding-horizon settings. We achieve substantial performance gains, reducing nominal trajectory solve times by 97.7% relative to state-of-the-art CPU solvers and 71.8% compared to GPU solvers, while accelerating SLS-based control and reachability by 237x. Despite large problem scales, our method achieves 100% empirical safety, unlike high-dimensional learning-based reachability baselines. We validate our approach on complex nonlinear systems, including whole-body quadrupeds (61D) and humanoids (75D), synthesizing robust control policies online on the GPU in 20 milliseconds on average and scaling to problems with 2 x 10^5 decision variables and 8 x 10^4 constraints. The implementation of our method is available at this https URL.
Monocular Depth Estimation (MDE) is a fundamental computer vision task with important applications in 3D vision. The current mainstream MDE methods employ an encoder-decoder architecture with multi-level/scale feature processing. However, the limitations of the current architecture and the effects of different-level features on the prediction accuracy are not evaluated. In this paper, we first investigate the above problem and show that there is still substantial potential in the current framework if encoder features can be improved. Therefore, we propose to formulate the depth estimation problem from the feature restoration perspective, by treating pretrained encoder features as degraded features of an assumed ground truth feature that yields the ground truth depth map. Then an Invertible Transform-enhanced Indirect Diffusion (InvT-IndDiffusion) module is developed for feature restoration. Due to the absence of direct supervision on feature, only indirect supervision from the final sparse depth map is used. During the iterative procedure of diffusion, this results in feature deviations among steps. The proposed InvT-IndDiffusion solves this problem by using an invertible transform-based decoder under the bi-Lipschitz condition. Finally, a plug-and-play Auxiliary Viewpoint-based Low-level Feature Enhancement module (AV-LFE) is developed to enhance local details with auxiliary viewpoint when available. Experiments demonstrate that the proposed method achieves better performance than the state-of-the-art methods on various datasets. Specifically on the KITTI benchmark, compared with the baseline, the performance is improved by 4.09% and 37.77% under different training settings in terms of RMSE. Code is available at this https URL.
Verification and validation of cyber-physical systems (CPS) via large-scale simulation often surface failures that are hard to interpret, especially when triggered by interactions between continuous and discrete behaviors at specific events or times. Existing debugging techniques can localize anomalies to specific model components, but they provide little insight into the input-signal values and timing conditions that trigger violations, or the minimal, precisely timed changes that could have prevented the failure. In this article, we introduce DeCaF, a counterfactual-guided explanation and assertion-based characterization framework for CPS debugging. Given a failing test input, DeCaF generates counterfactual changes to the input signals that transform the test from failing to passing. These changes are designed to be minimal, necessary, and sufficient to precisely restore correctness. Then, it infers assertions as logical predicates over inputs that generalize recovery conditions in an interpretable form engineers can reason about, without requiring access to internal model details. Our approach combines three counterfactual generators with two causal models, and infers success assertions. Across three CPS case studies, DeCaF achieves its best success rate with KD-Tree Nearest Neighbors combined with M5 model tree, while Genetic Algorithm combined with Random Forest provides the strongest balance between success and causal precision.
When an optimal control problem is solved for all possible initial conditions at once, the initial-state space splits into critical regions, each carrying a closed-form control law that can be evaluated online without solving any optimization. This is the multiparametric approach to explicit control. In the continuous-time setting, the boundaries between these regions are determined by extrema of Lagrange multipliers and constraint functions along the optimal trajectory. Whether a boundary is a hyperplane, computable analytically, or a curved manifold that requires numerical methods has a direct effect on how the partition is built. We show that a boundary is a hyperplane if and only if the relevant extremum is attained at either the initial time or the terminal time, regardless of the initial condition. The reason is that the costate is a linear function of the initial state at any fixed time, so when the extremum is tied to a fixed endpoint, the boundary condition is linear and the boundary normal follows directly from two matrix exponentials and a linear solve. When the extremum occurs at a time that shifts with the initial condition, such as a switching time or an interior stationary point, the boundary is generally curved. We demonstrate the result on a third-order system, obtaining the complete three-dimensional critical-region partition analytically for the first time in this problem class. A comparison with a discrete-time formulation shows how sharply the region count grows under discretization, while the continuous-time partition remains unchanged.
Engineering workflows such as design optimization, simulation-based diagnosis, control tuning, and model-based systems engineering (MBSE) are iterative, constraint-driven, and shaped by prior decisions. Yet many AI methods still treat these activities as isolated tasks rather than as parts of a broader workflow. This paper presents Agentic Engineering Intelligence (AEI), an industrial vision framework that models engineering workflows as constrained, history-aware sequential decision processes in which AI agents support engineer-supervised interventions over engineering toolchains. AEI links an offline phase for engineering data processing and workflow-memory construction with an online phase for workflow-state estimation, retrieval, and decision support. A control-theoretic interpretation is also possible, in which engineering objectives act as reference signals, agents act as workflow controllers, and toolchains provide feedback for intervention selection. Representative automotive use cases in suspension design, reinforcement learning tuning, multimodal engineering knowledge reuse, aerodynamic exploration, and MBSE show how diverse workflows can be expressed within a common formulation. Overall, the paper positions engineering AI as a problem of process-level intelligence and outlines a practical roadmap for future empirical validation in industrial settings.
Large language models (LLMs) have recently demonstrated strong potential for autonomous vehicle motion planning by reformulating trajectory prediction as a language generation problem. However, deploying capable LLMs in resource-constrained onboard systems remains a fundamental challenge. In this paper, we study how to effectively transfer motion planning knowledge from a large teacher LLM to a smaller, more deployable student model. We build on the GPT-Driver framework, which represents driving scenes as language prompts and generates waypoint trajectories with chain-of-thought reasoning, and investigate two student training paradigms: (i) on-policy generalized knowledge distillation (GKD), which trains the student on its own self-generated outputs using dense token-level feedback from the teacher, and (ii) a dense-feedback reinforcement learning (RL) baseline that uses the teacher's log-probabilities as per-token reward signals in a policy gradient framework. Experiments on the nuScenes benchmark show that GKD substantially outperforms the RL baseline and closely approaches teacher-level performance despite a 5$\times$ reduction in model size. These results highlight the practical value of on-policy distillation as a principled and effective approach to deploying LLM-based planners in autonomous driving systems.
In the last decades, energy-based models (EBMs) have become an important class of probabilistic models in which a component of the likelihood is intractable and therefore cannot be evaluated explicitly. Consequently, parameter estimation in EBMs is challenging for conventional inference methods. In this work, we provide a unified framework that connects noise contrastive estimation (NCE), reverse logistic regression (RLR), multiple importance sampling (MIS), and bridge sampling within the context of EBMs. We further show that these methods are equivalent under specific conditions. This unified perspective clarifies relationships among existing methods and enables the development of new estimators, with the potential to improve statistical and computational efficiency. Furthermore, this study helps elucidate the success of NCE in terms of its flexibility and robustness, while also identifying scenarios in which its performance can be further improved. Hence, rather than being a purely descriptive review, this work offers a unifying perspective and additional methodological contributions. The MATLAB code used in the numerical experiments is also made freely available to support the reproducibility of the results.
The competency of any intelligent agent is bounded by its formal account of the world in which it operates. Clinical AI lacks such an account. Existing frameworks address evaluation, regulation, or system design in isolation, without a shared model of the clinical world to connect them. We introduce the Clinical World Model, a framework that formalizes care as a tripartite interaction among Patient, Provider, and Ecosystem. To formalize how any agent, whether human or artificial, transforms information into clinical action, we develop parallel decision-making architectures for providers, patients, and AI agents, grounded in validated principles of clinical cognition. The Clinical AI Skill-Mix operationalizes competency through eight dimensions. Five define the clinical competency space (condition, phase, care setting, provider role, and task) and three specify how AI engages human reasoning (assigned authority, agent facing, and anchoring layer). The combinatorial product of these dimensions yields a space of billions of distinct competency coordinates. A central structural implication is that validation within one coordinate provides minimal evidence for performance in another, rendering the competency space irreducible. The framework supplies a common grammar through which clinical AI can be specified, evaluated, and bounded across stakeholders. By making this structure explicit, the Clinical World Model reframes the field's central question from whether AI works to in which competency coordinates reliability has been demonstrated, and for whom.
Network slicing is a modern 5G technology that provides efficient network experience for diverse use cases. It is a technique for partitioning a single physical network infrastructure into multiple virtual networks, called slices, each equipped for specific services and requirements. In this work, we particularly deal with radio access network (RAN) slicing and resource allocation to RAN slices. In 5G, physical resource blocks (PRBs) being the fundamental units of radio resources, our main focus is to allocate PRBs to the slices efficiently. While addressing a spectrum of needs for multiple services or the same services with multi-priorities, we need to ensure two vital system properties: i) fairness to every service type (i.e., providing the required resources and a desired range of throughput) even after prioritizing a particular service type, and ii) PRB-optimality or minimizing the unused PRBs in slices. These serve as the core performance evaluation metrics for PRB-allocation in our work. We adopt the 3-layered hierarchical PRB-partitioning technique for allocating PRBs to network slices. The case-specific, AI-based solution of the state-of-the-art method lacks sufficient correctness to ensure consistent system performance. To achieve guaranteed correctness and completeness, we leverage formal methods and propose the first approach for a fair and optimal PRB distribution to RAN slices. We formally model the PRB-allocation problem as a 3-layered framework, FORSLICE, specifically by employing satisfiability modulo theories. Next, we apply formal verification to ensure that the desired system properties: fairness and PRB-optimality, are satisfied by the model. The proposed method offers an efficient, versatile and automated approach compatible with all 3-layered hierarchical network structure configurations, yielding significant system property improvements compared to the baseline.
Deep image prior (DIP) is an unsupervised deep learning framework that has been successfully applied to a variety of inverse imaging problems. However, DIP-based methods are inherently prone to overfitting, which leads to performance degradation and necessitates early stopping. In this paper, we propose a method to mitigate overfitting in DIP-based hyperspectral image (HSI) denoising by jointly combining robust data fidelity and explicit sensitivity regularization. The proposed approach employs a Smooth $\ell_1$ data term together with a divergence-based regularization and input optimization during training. Experimental results on real HSIs corrupted by Gaussian, sparse, and stripe noise demonstrate that the proposed method effectively prevents overfitting and achieves superior denoising performance compared to state-of-the-art DIP-based HSI denoising methods.
This paper presents a technique to drive the state of a constrained nonlinear system to a specified target state in finite time, when the system suffers a partial loss in control authority. Our technique builds on a recent method to control constrained nonlinear systems by building a simple, linear driftless approximation at the initial state. We construct a partition of the finite time horizon into successively smaller intervals, and design controlled inputs based on the approximate dynamics in each partition. Under conditions that bound the length of the time horizon, we prove that these inputs result in bounded error from the target state in the original nonlinear system. As successive partitions of the time horizon become shorter, the error reduces to zero despite the effect of uncontrolled inputs. A simulation example on the model of a fighter jet demonstrates that the designed sequence of controlled inputs achieves the target state despite the system suffering a loss of control authority over one of its inputs.
We study device-addressed speech detection under pre-ASR edge deployment constraints, where systems must decide whether to forward audio before transcription under strict latency and compute limits. We show that, in multi-speaker environments with temporally ambiguous utterances, this task is more effectively modelled as a sequential routing problem over interaction history than as an utterance-local classification task. We formalize this as Sequential Device-Addressed Routing (SDAR) and present the Selective Attention System (SAS), an on-device implementation that instantiates this formulation. On a held-out 60-hour multi-speaker English test set, the primary audio-only configuration achieves F1=0.86 (precision=0.89, recall=0.83); with an optional camera, audio+video fusion raises F1 to 0.95 (precision=0.97, recall=0.93). Removing causal interaction history (Stage~3) reduced F1 from 0.95 to 0.57+/-0.03 in the audio+video configuration under our evaluation protocol. Among the tested components, this was the largest observed ablation effect, indicating that short-horizon interaction history carries substantial decision-relevant information in the evaluated setting. SAS runs fully on-device on ARM Cortex-A class hardware (<150 ms latency, <20 MB footprint). All results are from internal evaluation on a proprietary dataset evaluated primarily in English; a 5-hour evaluation subset may be shared for independent verification (Section 8.8).
Speech deepfake detection is a well-established research field with different models, datasets, and training strategies. However, the lack of standardized implementations and evaluation protocols limits reproducibility, benchmarking, and comparison across studies. In this work, we present DeepFense, a comprehensive, open-source PyTorch toolkit integrating the latest architectures, loss functions, and augmentation pipelines, alongside over 100 recipes. Using DeepFense, we conducted a large-scale evaluation of more than 400 models. Our findings reveal that while carefully curated training data improves cross-domain generalization, the choice of pre-trained front-end feature extractor dominates overall performance variance. Crucially, we show severe biases in high-performing models regarding audio quality, speaker gender, and language. DeepFense is expected to facilitate real-world deployment with the necessary tools to address equitable training data selection and front-end fine-tuning.
This paper addresses the decentralized non-uniform area coverage problem for multi-agent systems, a critical task in missions with high spatial priority and resource constraints. While existing density-based methods often rely on computationally heavy Eulerian PDE solvers or heuristic planning, we propose Stochastic Density-Driven Optimal Control (D$^2$OC). This is a rigorous Lagrangian framework that bridges the gap between individual agent dynamics and collective distribution matching. By formulating a stochastic MPC-like problem that minimizes the Wasserstein distance as a running cost, our approach ensures that the time-averaged empirical distribution converges to a non-parametric target density under stochastic LTI dynamics. A key contribution is the formal convergence guarantee established via reachability analysis, providing a bounded tracking error even in the presence of process and measurement noise. Numerical results verify that Stochastic D$^2$OC achieves robust, decentralized coverage while outperforming previous heuristic methods in optimality and consistency.
We study closed-loop stability and suboptimality for MPC and infinite-horizon optimal control solved using a surrogate model that differs from the real plant. We employ a unified framework based on quadratic costs to analyze both finite- and infinite-horizon problems, encompassing discounted and undiscounted scenarios alike. Plant-model mismatch bounds proportional to states and controls are assumed, under which the origin remains an equilibrium. Under continuity of the model and cost-controllability, exponential stability of the closed loop can be guaranteed. Furthermore, we give a suboptimality bound for the closed-loop cost recovering the optimal cost of the surrogate. The results reveal a tradeoff between horizon length, discounting and plant-model mismatch. The robustness guarantees are uniform over the horizon length, meaning that larger horizons do not require successively smaller plant-model mismatch.
As the rapid development of computer vision and the emergence of powerful network backbones and architectures, the application of deep learning in medical imaging has become increasingly significant. Unlike natural images, medical images lack huge volumes of data but feature more modalities, making it difficult to train a general model that has satisfactory performance across various datasets. In practice, practitioners often suffer from manually creating and testing models combining independent backbones and architectures, which is a laborious and time-consuming process. We propose Flemme, a FLExible and Modular learning platform for MEdical images. Our platform separates encoders from the model architectures so that different models can be constructed via various combinations of supported encoders and architectures. We construct encoders using building blocks based on convolution, transformer, and state-space model (SSM) to process both 2D and 3D image patches. A base architecture is implemented following an encoder-decoder style, with several derived architectures for image segmentation, reconstruction, and generation tasks. In addition, we propose a general hierarchical architecture incorporating a pyramid loss to optimize and fuse vertical features. Experiments demonstrate that this simple design leads to an average improvement of 5.60% in Dice score and 7.81% in mean interaction of units (mIoU) for segmentation models, as well as an enhancement of 5.57% in peak signal-to-noise ratio (PSNR) and 8.22% in structural similarity (SSIM) for reconstruction models. We further utilize Flemme as an analytical tool to assess the effectiveness and efficiency of various encoders across different tasks. Code is available at this https URL.
We optimize finite horizon multi-agent reach-avoid Markov decision process (MDP) via \emph{local feedback policies}. The global feedback policy solution yields global optimality but its communication complexity, memory usage and computation complexity scale exponentially with the number of agents. We mitigate this exponential dependency by restricting the solution space to local feedback policies and show that local feedback policies are rank-one factorizations of global feedback policies, which provides a principled approach to reducing communication complexity and memory usage. Additionally, by demonstrating that multi-agent reach-avoid MDPs over local feedback policies has a potential game structure, we show that iterative best response is a tractable multi-agent learning scheme with guaranteed convergence to deterministic Nash equilibrium, and derive each agent's best response via multiplicative dynamic program (DP) over the joint state space. Numerical simulations across different MDPs and agent sets show that the peak memory usage and offline computation complexity are significantly reduced while the approximation error to the optimal global reach-avoid objective is maintained.
The fifth generation (5G) of mobile communications relies on extremely high data transmission rates utilizing a wide range of frequency bands, including FR1 (sub-6 GHz) and FR2 (mmWave). Future mobile communications systems are envisaged to operate at the electromagnetic spectrum beyond FR2, above 100 GHz, known as sub-THz band. These new frequencies open up challenging scenarios where communications will have to rely on a major contribution such as the line-of-sight (LoS) component. To the best of the authors' knowledge, for the first time in the literature this work studies the human blockage effects over an extremely wide frequency band from 75 GHz to 215 GHz considering: (i) the distance between the blocker and the antennas and (ii) the body size and orientation. The obtained results are fitted to modifications of the classical path loss models and compared to 3GPP alternatives. The average attenuation increases from 42 dB to 56 dB when frequency rises from 75 GHz to 215 GHz. On the other hand, an 18 dB increment in the received power is observed when the Tx--Rx separation is increased from 1 m to 2.5 m. Finally, variations of up to 4.6 dB are found depending on the blocker's orientation.
As a critical modality for structural biology, cryogenic electron microscopy (cryo-EM) facilitates the determination of macromolecular structures at near-atomic resolution. The core computational task in single-particle cryo-EM is to reconstruct the 3D electrostatic potential of a molecule from noisy 2D projections acquired at unknown orientations. Gaussian mixture models (GMMs) provide a continuous, compact, and physically interpretable representation for molecular density and have recently gained interest in cryo-EM reconstruction. However, existing methods rely on external consensus maps or atomic models for initialization, limiting their use in self-contained pipelines. In parallel, differentiable rendering techniques such as Gaussian splatting have demonstrated remarkable scalability and efficiency for volumetric representations, suggesting a natural fit for GMM-based cryo-EM reconstruction. However, off-the-shelf Gaussian splatting methods are designed for photorealistic view synthesis and remain incompatible with cryo-EM due to mismatches in the image formation physics, reconstruction objectives, and coordinate systems. Addressing these issues, we propose cryoSplat, a GMM-based method that integrates Gaussian splatting with the physics of cryo-EM image formation. In particular, we develop an orthogonal projection-aware Gaussian splatting, with adaptations such as a view-dependent normalization term and FFT-aligned coordinate system tailored for cryo-EM imaging. These innovations enable stable and efficient homogeneous reconstruction directly from raw cryo-EM particle images using random initialization. Experimental results on real datasets validate the effectiveness and robustness of cryoSplat over representative baselines. The code will be released at this https URL.
Intracortical brain-machine interfaces require decoders that adapt continuously to neural signal instability while operating within strict memory budgets. We introduce a dual-timescale Hebbian accumulator learning rule for spiking neural networks that enables per-timestep online supervised updates with training memory constant in sequence length, avoiding backpropagation through time. The rule combines synapse-specific fast and slow eligibility traces, error-modulated three-factor updates, and integer-friendly RMS homeostasis, operating without adaptive gradient optimizers (Adam, RMSProp) or replay buffers. On two primate intracortical datasets, the method achieves Pearson correlations of $R \geq 0.81$ on MC~Maze and $R \geq 0.63$ on Zenodo~Indy, with 63--86\% measured memory reduction versus BPTT at sequence length $T = 1000$. Closed-loop simulations demonstrate online adaptation to neural disruptions and learning from scratch without offline calibration.
Synchronous Generators (SGs) currently provide important levels of Short-Circuit Current (SCC), a critical ancillary service that ensures line protections trip during short-circuit faults. Given the ongoing replacement of SGs by power-electronics-based generation, which have a hard limit for current injection, it has become relevant to optimize the procurement of SCC provided by remaining SGs. Pricing this service is however challenging due to the integrality constraints in Unit Commitment (UC). Existing methods, e.g., dispatchable pricing and restricted pricing, attempt to address this issue but exhibit limitations in handling binary variables, resulting in SCC prices that either fail to cover the operating costs of units or lack interpretability. To overcome these pitfalls, we adopt a primal-dual formulation of the SCC-constrained dispatch that preserves the binary UC while effectively computing shadow prices of SCC services. Using a modified IEEE 30-bus system, a comparison is carried out between the proposed approach and the previously developed pricing schemes. It demonstrates that, under the proposed pricing method, adequate and intuitive service prices can be computed without the need for uplift payments, an advantage that cannot be achieved by other pricing approaches.
A model-based deep learning (DL) architecture is proposed for reconfigurable intelligent surface (RIS)-assisted multi-user communications to reduce the number of bits required for transmitting phase shift information from the access point (AP) to the RIS controller. The AP computes the phase shifts and compresses them into a binary control message that is sent to the RIS controller for element configuration. To help reduce beamformer mismatches caused by phase shift compression errors, the beamformer is updated with the actual (decompressed) RIS phase shifts. By unrolling the iterative weighted minimum mean square error (WMMSE) algorithm within the wireless communication-informed DL architecture, joint phase shift compression and WMMSE beamforming can be trained end-to-end. Simulation results demonstrate that incorporating compression-aware beamforming significantly improves sum-rate performance, even when the number of control bits is lower than the number of RIS elements.
This paper presents a decentralized control framework that incorporates social awareness into multi-agent systems with unknown dynamics to achieve prescribed-time reach-avoid-stay tasks in dynamic environments. Each agent is assigned a social awareness index that quantifies its level of cooperation or self-interest, allowing heterogeneous social behaviors within the system. Building on the spatiotemporal tube (STT) framework, we propose a real-time STT framework that synthesizes tubes online for each agent while capturing its social interactions with others. A closed-form, approximation-free control law is derived to ensure that each agent remains within its evolving STT, thereby avoiding dynamic obstacles while also preventing inter-agent collisions in a socially aware manner, and reaching the target within a prescribed time. The proposed approach provides formal guarantees on safety and timing, and is computationally lightweight, model-free, and robust to unknown disturbances. The effectiveness and scalability of the framework are validated through simulation and hardware experiments on a 2D omnidirectional
Dynamic wireless charging (DWC) is an emerging technology that has the potential to reduce charging downtime and on-board battery size, particularly in heavy-duty electric vehicles (EVs). However, its spatiotemporal, dynamic, high-power demands pose challenges for power system operations. Since DWC demand depends on traffic characteristics such as speed, density, and dwell time, effective infrastructure planning must account for the coupling between traffic behavior and EV energy consumption. In this paper, we propose a novel traffic-aware microgrid planning framework for DWC. First, we use the macroscopic cell transmission model to estimate spatio-temporal EV charging demand along DWC corridors and integrate this demand into an AC optimal power flow formulation to design a supporting microgrid. Our framework explicitly links traffic patterns with energy demand and demonstrates that traffic-aware microgrid planning yields significantly lower system costs than worst-case traffic-based approaches. We demonstrate the performance of our model on a segment of I-210W in California under a wide range of traffic conditions.
Integrated Sensing and Communication (ISAC) is considered as a key component of future 6G technologies, especially in the millimeter-wave (mmWave) bands. Recently, the performances of ISAC were experimentally evaluated and demonstrated in various scenarios by developing ISAC systems. These systems generally consist of coherent transmitting (Tx) and receiving (Rx) modules. However, actively transmitting radio waves for experiments is not easy due to regulatory restrictions of radio. Meanwhile, the Tx/Rx should be synchronized and Rx need the information of Tx. In this paper, a fully passive mmWave sensing system is developed with software-defined radio for blind ISAC. It only consists of a passive Rx module which does not depend on the Tx. Since the proposed system is not synchronized with Tx and has no knowledge of the transmitted signals, a differential structure with two oppositely-oriented receivers is introduced to realize the sensing function. This structure can mitigate the influences of unknown source signals and other distortions. With the proposed sensing system, the ambient mmWave communication signals are leveraged for sensing without interrupting the existing systems. It can be deployed for field applications such as signal detection and dynamic human activity recognition since it does not emit signals. The efficacy of the developed system is first verified with a metallic plate with known motion pattern. The measured Doppler spectrogram shows good agreement with the simulation results, demonstrating the correctness of the sensing results. Further, the system is evaluated in complex scenarios, including handwaving, single- and multi-person motion detection. The sensing results successfully reflect the corresponding motions, demonstrating that the proposed sensing system can be utilized for blind ISAC in various applications.
The Scaled Relative Graph (SRG) is a promising tool for stability and robustness analysis of multi-input multi-output systems. In this paper, we provide tools for exact and computable constructions of the SRG for closed linear operators, based on maximum and minimum gain computations. The results are suitable for bounded and unbounded operators, and we specify how they can be used to draw SRGs for the typical operators that are used to model linear-time-invariant dynamical systems. Furthermore, for the special case of state-space models, we show how the Bounded Real Lemma can be used to construct the SRG.
This paper presents a unified framework that connects sequential quadratic programming (SQP) and the iterative linear-parameter-varying model predictive control (LPV-MPC) technique. Using the differential formulation of the LPV-MPC, we demonstrate how SQP and LPV-MPC can be unified through a specific choice of scheduling variable and the 2nd Fundamental Theorem of Calculus (FTC) embedding technique and compare their convergence properties. This enables the unification of the zero-order approach of SQP with the LPV-MPC scheduling technique to enhance the computational efficiency of robust and stochastic MPC problems. To demonstrate our findings, we compare the two schemes in a simulation example. Finally, we present real-time feasibility and performance of the zero-order LPV-MPC approach by applying it to Gaussian process (GP)-based MPC for autonomous racing with real-world experiments.
Molecular communication (MC) enables information transfer using particles inspired by biological systems. Volatile Organic Compounds (VOCs) are one of the most abundant and diverse classes of signaling molecules used by living or non-living objects. VOC-based MC holds great promise in developing long-range, bio-compatible communication systems capable of interfacing nano- and micro-scale devices. In this paper, we present a comprehensive end-to-end framework for VOC-based interplant MC from an ICT perspective. The communication process is divided into three stages: transmission (VOC biosynthesis and emission from leaves), channel propagation (advection-diffusion in turbulent wind via Gaussian puff for stress-induced VOC release and Gaussian plume for constitutive VOC release), and reception (VOC uptake and physiological response in the receiver plant). Each stage is analyzed by its attenuation and delay. Numerical results demonstrate that VOC-based channels exhibit low-pass behavior, with bandwidth and capacity heavily influenced by distance, wind velocity, and noise. Though the physical channel supports moderate frequencies, biological constraints at the transmitter restrict the end-to-end channel to slow-varying signals.
In this paper, we consider power allocation and antenna activation of cell-free massive multiple-input multiple-output (CFmMIMO) systems. We first derive closed-form expressions for the system spectral efficiency (SE) and energy efficiency (EE) as functions of the power allocation coefficients and the number of active antennas at the access points (APs). Then, we aim to enhance the EE through jointly optimizing antenna activation and power control. This task leads to a non-convex and mixed-integer design problem with high-dimensional design variables. To address this, we propose a novel DRL-based framework, in which the agent learns to map large-scale fading coefficients to AP activation ratio, antenna coefficient, and power coefficient. These coefficients are then employed to determine the number of active antennas per AP and the power factors assigned to users based on closed-form expressions. By optimizing these parameters instead of directly controlling antenna selection and power allocation, the proposed method transforms the intractable optimization into a low-dimensional learning task. Our extensive simulations demonstrate the efficiency and scalability of the proposed scheme. Specifically, in a CFmMIMO system with 40 APs and 20 users, it achieves a 50% EE improvement and 3350 times run time reduction compared to the conventional sequential convex approximation method.
Reinforcement Learning (RL) has empowered Multimodal Large Language Models (MLLMs) to achieve superior human preference alignment in Image Quality Assessment (IQA). However, existing RL-based IQA models typically rely on coarse-grained global views, failing to capture subtle local degradations in high-resolution scenarios. While emerging "Thinking with Images" paradigms enable multi-scale visual perception via zoom-in mechanisms, their direct adaptation to IQA induces spurious "cropping-implies-degradation" biases and misinterprets natural depth-of-field as artifacts. To address these challenges, we propose Q-Probe, the first agentic IQA framework designed to scale IQA to high resolution via context-aware probing. First, we construct Vista-Bench, a pioneering benchmark tailored for fine-grained local degradation analysis in high-resolution IQA settings. Furthermore, we propose a three-stage training paradigm that progressively aligns the model with human preferences, while simultaneously eliminating causal bias through a novel context-aware cropping strategy. Extensive experiments demonstrate that Q-Probe achieves state-of-the-art performance in high-resolution settings while maintaining superior efficacy across resolution scales.
The recently envisioned goal-oriented communications paradigm calls for the application of inference on wirelessly transferred data via Machine Learning (ML) tools. An emerging research direction deals with the realization of inference ML models directly in the physical layer of Multiple-Input Multiple-Output (MIMO) systems, which, however, entails certain significant challenges. In this paper, leveraging the technology of programmable MetaSurfaces (MSs), we present an eXtremely Large (XL) MIMO system that acts as an Extreme Learning Machine (ELM) performing binary classification tasks completely Over-The-Air (OTA), which can be trained in closed form. The proposed system comprises a receiver architecture consisting of densely parallel placed diffractive layers of XL MSs, also known as Stacked Intelligent Metasurfaces (SIM), followed by a single reception radio-frequency chain. The front layer facing the XL MIMO channel consists of identical unit cells of a fixed NonLinear (NL) response, whereas the remaining layers of elements of tunable linear responses are utilized to approximate OTA the trained ELM weights. Our numerical investigations showcase that, in the XL regime of MS elements, the proposed XL-MIMO-ELM system achieves performance comparable to that of digital and idealized ML models across diverse datasets and wireless scenarios, thereby demonstrating the feasibility of embedding OTA learning capabilities into future wireless systems.
Hybrid reconfigurable intelligent surfaces (HRIS) enhance wireless systems by combining passive reflection with active signal amplification. However, jointly optimizing the transmit beamforming with the HRIS reflection and amplification coefficients to maximize spectral efficiency (SE) is a non-convex problem, and conventional iterative solutions are computationally intensive. To address this, we propose a deep reinforcement learning (DRL) framework that learns a direct mapping from channel state information to the near-optimal transmit beamforming and HRIS configurations. The DRL model is trained offline, after which it can compute the beamforming and HRIS configurations with low complexity and latency. Simulation results demonstrate that our DRL-based method achieves 95% of the SE obtained by the alternating optimization benchmark, while significantly lowering the computational complexity.
In this paper, we investigate a class of port-Hamiltonian systems with singular vector fields. We show that, under suitable conditions, their interconnection with passive systems ensures convergence to a prescribed non-equilibrium steady state. At first glance, this behavior appears to contradict the seemingly passive structure of port-Hamiltonian systems, since sustaining a non-equilibrium steady state requires continuous power injection. We resolve this apparent paradox by showing that the singularity in the vector field induces a sliding mode that contributes effective energy, enabling maintenance of the steady state and demonstrating that the system is not passive. Furthermore, we consider regularizations of the singular dynamics and show that the resulting systems are cyclo-passive, while still capable of supplying the required steady-state power. These results clarify the role of singularities in port-Hamiltonian systems and provide new insight into their energetic properties.
The widely accepted definition of grid-forming (GFM) inverter states that it should behave as a (nearly) constant voltage source behind an impedance by maintaining a (nearly) constant internal voltage phasor in the sub-transient to transient time frame. Some system operators further mandate permissible ranges for this effective impedance. However, these specifications do not clearly define the location of the internal voltage source, and no systematic method exists to quantify its effective impedance for a black-box GFM model. To address this, we first compare the transient responses of an ideal voltage source and a GFM to show that an idealistic GFM maintains a (nearly) constant voltage across the filter capacitor, rather than at the inverter switches. Then we propose a systematic method to quantify the effective impedance of a GFM from its black-box model using frequency-domain admittance plots. Using standard PSCAD GFM models developed by NLR (formerly NREL), we demonstrate that the GFM's equivalent impedance model captures the sub-transient response and static voltage stability limit accurately. Further, replacing the GFM with the proposed equivalent circuit model in the modified IEEE-39 bus system is shown to reproduce the small-signal stability characteristics with reasonable accuracy.
Regenerating singing voices with altered lyrics while preserving melody consistency remains challenging, as existing methods either offer limited controllability or require laborious manual alignment. We propose YingMusic-Singer-Plus, a fully diffusion-based model enabling melody-controllable singing voice synthesis with flexible lyric manipulation. The model takes three inputs: an optional timbre reference, a melody-providing singing clip, and modified lyrics, without manual alignment. Trained with curriculum learning and Group Relative Policy Optimization, YingMusic-Singer-Plus achieves stronger melody preservation and lyric adherence than Vevo2, the most comparable baseline supporting melody control without manual alignment. We also introduce LyricEditBench, the first benchmark for melody-preserving lyric modification evaluation. The code, weights, benchmark, and demos are publicly available at this https URL.
Sensor-based Human Activity Recognition (HAR) underpins many ubiquitous and wearable computing applications, yet current models remain limited by scarce labels, sensor heterogeneity, and weak generalization across users, devices, and contexts. Foundation models, which are generally pretrained at scale using self-supervised and multimodal learning, offer a unifying paradigm to address these challenges by learning reusable, adaptable representations for activity understanding. This survey synthesizes emerging foundation models for sensor-based HAR. We first clarify foundational concepts, definitions, and evaluation criteria, then organize existing work using a lifecycle-oriented taxonomy spanning input design, pretraining, adaptation, and utilization. Rather than enumerating individual models, we analyze recurring design patterns and trade-offs across nine technical axes, including modality scope, tokenization, architectures, learning paradigms, adaptation mechanisms, and deployment settings. From this synthesis, we identify three dominant development trajectories: (1) HAR-specific foundation models trained from scratch on large sensor corpora, (2) adaptation of general time-series or multimodal foundation models to sensor-based HAR, and (3) integration of large language models for reasoning, annotation, and human-AI interaction. We conclude by highlighting open challenges in data curation, multimodal alignment, personalization, privacy, and responsible deployment, and outline directions toward general-purpose, interpretable, and human-centered foundation models for activity understanding. A complete, continuously updated index of papers and models is available in our companion repository: this https URL.
Semantics are one of the primary sources of top-down preattentive information. Modern deep object detectors excel at extracting such valuable semantic cues from complex visual scenes. However, the size of the visual input to be processed by these detectors can become a bottleneck, particularly in terms of time costs, affecting an artificial attention system's biological plausibility and real-time deployability. Inspired by classical exponential density roll-off topologies, we apply a new artificial foveation module to our novel attention prediction pipeline: the Semantic-based Bayesian Attention (SemBA) framework. We aim at reducing detection-related computational costs without compromising visual task accuracy, thereby making SemBA more biologically plausible. The proposed multi-scale pyramidal field-of-view retains maximum acuity at an innermost level, around a focal point, while gradually increasing distortion for outer levels to mimic peripheral uncertainty via downsampling. In this work we evaluate the performance of our novel Multi-Scale Fovea, incorporated into SemBA, on target-present visual search. We also compare it against other artificial foveal systems, and conduct ablation studies with different deep object detection models to assess the impact of the new topology in terms of computational costs. We experimentally demonstrate that including the new Multi-Scale Fovea module effectively reduces inherent processing costs while improving SemBA's scanpath prediction accuracy. Remarkably, we show that SemBA closely approximates human consistency while retaining the actual human fovea's proportions.
The integration of cellular communication with Unmanned Aerial Vehicles (UAVs) extends the range of command and control and payload communications of autonomous UAV applications. Accurate modeling of this air-to-ground wireless environment aids UAV mission planning. Models built on and insights obtained from real-life experiments intricately capture the variations in air-to-ground link quality with UAV position, offering more fidelity for simulations and system design than those that rely on generic theoretical models designed for ground scenarios or ray-tracing simulations. In this work, we conduct aerial flights at the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) Lake Wheeler testbed to study the variation in key performance indicators (KPIs) of a private 4G/5G cellular base station (BS) with the UAV's altitude, distance from the BS, elevation, and azimuth relative to the BS. Variations in 4G and 5G physical layer KPIs and application layer throughput are logged and analyzed, using two Android smartphones: a Keysight Nemo device, with enhanced KPI access, through a rooted operating system, and a standard smartphone running a custom application that utilizes open-source Android APIs. The observed signal strength measurements are compared to theoretical predictions from free space path loss models that incorporate the BS antenna radiation patterns. Mathematical model parameters for polynomial curve approximations are derived to fit the observed data. Light machine learning approaches, namely random forests, gradient boosting regressors and neural networks, are used to model KPI behaviour as a function of UAV position relative to the BS. The insights and models generated from real-life experiments in this study can serve as valuable tools in the design, simulation and deployment of cellular communication-based UAV systems.
Target Speaker Extraction (TSE) aims to isolate a specific speaker's voice from a mixture, guided by a pre-recorded enrollment. While TSE bypasses the global permutation ambiguity of blind source separation, it remains vulnerable to speaker confusion, where models mistakenly extract the interfering speaker. Furthermore, conventional TSE relies on static inference pipeline, where performance is limited by the quality of the fixed enrollment. To overcome these limitations, we propose EvoTSE, an evolving TSE framework in which the enrollment is continuously updated through reliability-filtered retrieval over high-confidence historical estimates. This mechanism reduces speaker confusion and relaxes the quality requirements for pre-recorded enrollment without relying on additional annotated data. Experiments across multiple benchmarks demonstrate that EvoTSE achieves consistent improvements, especially when evaluated on out-of-domain (OOD) scenarios. Our code and checkpoints are available.
Non-line-of-sight (NLOS) sensing has the potential to enable use cases like intrusion detection in occluded areas, increasing the value provided by Integrated Sensing and Communications (ISAC) in future 6G cellular networks. In this paper, we present a reliable NLOS intrusion detection system based on a millimeter-wave ISAC proof-of-concept. By leveraging reflections off a large surface, the proposed system addresses the challenge of detecting moving targets in cluttered indoor industrial scenarios where the direct line-of-sight is obstructed. A signal processing pipeline including a probability hypothesis density (PHD) filter is applied to detect targets and track movements in NLOS. Experimental validation conducted in the ARENA2036 industrial research campus demonstrates that our system can reliably detect target presence in NLOS while avoiding false alarms. Tests with synthetically generated false peaks further demonstrate the robustness of our system to false alarms. Overall, the results underline the potential of NLOS ISAC as a promising technology for enabling intrusion detection and monitoring use cases.
Network reconfiguration can significantly increase the hosting capacity (HC) for distributed generation (DG) in radially operated systems, thereby reducing the need for costly infrastructure upgrades. However, when the objective is DG maximization, jointly optimizing topology and power dispatch remains computationally challenging. Existing approaches often rely on relaxations or approximations, yet we provide counterexamples showing that interior point methods, linearized DistFlow and second-order cone relaxations all yield erroneous results. To overcome this, we propose a solution framework based on the exact DistFlow equations, formulated as a bilinear program and solved using spatial branch-and-bound (SBB). Numerical studies on standard benchmarks and a 533-bus real-world system demonstrate that our proposed method reliably performs reconfiguration and dispatch within time frames compatible with real-time operation.
This paper investigates distributed zeroth-order optimization for smooth nonconvex problems, targeting the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation in current algorithms that use either the $2$-point or $2d$-point gradient estimators. We propose a novel variance-reduced gradient estimator that either randomly renovates a single orthogonal direction of the true gradient or calculates the gradient estimation across all dimensions for variance correction, based on a Bernoulli distribution. Integrating this estimator with gradient tracking mechanism allows us to address the trade-off. We show that the oracle complexity of our proposed algorithm is upper bounded by $O(d/\epsilon)$ for smooth nonconvex functions and by $O(d\kappa\ln (1/\epsilon))$ for smooth and gradient dominated nonconvex functions, where $d$ denotes the problem dimension and $\kappa$ is the condition number. Numerical simulations comparing our algorithm with existing methods confirm the effectiveness and efficiency of the proposed gradient estimator.
We propose a novel layer-wise parameterization for convolutional neural networks (CNNs) that includes built-in robustness guarantees by enforcing a prescribed Lipschitz bound. Each layer in our parameterization is designed to satisfy a linear matrix inequality (LMI), which in turn implies dissipativity with respect to a specific supply rate. Collectively, these layer-wise LMIs ensure Lipschitz boundedness for the input-output mapping of the neural network, yielding a more expressive parameterization than through spectral bounds or orthogonal layers. Our new method LipKernel directly parameterizes dissipative convolution kernels using a 2-D Roesser-type state space model. This means that the convolutional layers are given in standard form after training and can be evaluated without computational overhead. In numerical experiments, we show that the run-time using our method is orders of magnitude faster than state-of-the-art Lipschitz-bounded networks that parameterize convolutions in the Fourier domain, making our approach particularly attractive for improving the robustness of learning-based real-time perception or control in robotics, autonomous vehicles, or automation systems. We focus on CNNs, and in contrast to previous works, our approach accommodates a wide variety of layers typically used in CNNs, including 1-D and 2-D convolutional layers, maximum and average pooling layers, as well as strided and dilated convolutions and zero padding. However, our approach naturally extends beyond CNNs as we can incorporate any layer that is incrementally dissipative.
Despite significant advancements in Text-to-Audio (TTA) generation models achieving high-fidelity audio with fine-grained context understanding, they struggle to model the relations between audio events described in the input text. However, previous TTA methods have not systematically explored audio event relation modeling, nor have they proposed frameworks to enhance this capability. In this work, we systematically study audio event relation modeling in TTA generation models. We first establish a benchmark for this task by: 1. proposing a comprehensive relation corpus covering all potential relations in real-world scenarios; 2. introducing a new audio event corpus encompassing commonly heard audios; and 3. proposing new evaluation metrics to assess audio event relation modeling from various perspectives. Furthermore, we propose a finetuning framework to enhance existing TTA models ability to model audio events relation. Code is available at: this https URL
Loss of voluntary foot movement after spinal cord injury (SCI) can significantly limit independent mobility and quality of life. To improve motor output after injury, functional electrical stimulation (FES) is used to deliver stimulation pulses through the skin to affected muscles. While commercial FES systems typically use motion-based triggers, prior research shows that spared movement intent can be decoded after SCI using surface electromyography (EMG). Our aim is to assess how well spared neural signals of the lower limb after SCI can be decoded and used to control electrical stimulation for restoring foot movement. We developed a wearable machine learning-powered neuroprosthetic that records EMG from the affected lower limb using a 32-channel electrode bracelet and enables closed-loop control of a FES device for foot movement restoration. Five participants with SCI used the predicted control signal to follow trajectories on a screen with their foot and achieve distinct motor activation patterns for foot flexion, extension, and inversion or eversion. Three of these participants also achieved 2 proportional activation levels during foot flexion/extension with more than 70% accuracy. To validate how these neural signals can be used for closed-loop neuroprosthetic control, two participants used their decoded activity to control a FES device and stimulate their affected foot. This resulted in an increased foot flexion range for both participants of 33.6% and 40% of a functional healthy range, respectively (p smaller than 0.001). One of the participants also achieved voluntary proportional control of up to 6 stimulation levels during foot flexion/extension. These results suggest that wearable EMG decoding coupled with FES systems provides a scalable strategy for closed-loop neuroprosthetic control supporting voluntary foot movement.
As one of the key usage scenarios for the sixth generation (6G) wireless networks, integrated sensing and communication (ISAC) provides an efficient framework to achieve simultaneous wireless sensing and communication. However, traditional wireless sensing techniques mainly rely on the line-of-sight (LoS) assumptions, i.e., the sensing targets are directly visible to both the sensing transmitter and receiver. This hinders ISAC systems to be applied in complex environments such as the urban low-altitude airspace, which usually suffers from signal blockage and non-line-of-sight (NLoS) multi-path propagation. To address this challenge, in this paper, we propose a novel approach to enable environment-aware NLoS ISAC by leveraging the new technique called channel knowledge map (CKM), which was originally proposed for environment-aware wireless communications. One major novelty of our proposed method is that the same CKM built for wireless communication can be directly used to enable NLoS wireless sensing, thus enjoying the benefits of ``killing two birds with one stone''. To this end, the sensing targets are treated as virtual user equipment (UE), and the wireless communication channel priors are transformed into the sensing channel priors, allowing one single CKM to serve dual purposes. We illustrate our proposed framework by a specific CKM called \emph{channel angle-delay map} (CADM). Specifically, the proposed framework utilizes CADM to derive angle-delay priors of the sensing channel by exploiting the relationship between communication and sensing angle-delay distributions, enabling sensing target localization in the challenging NLoS environment. Extensive simulation results demonstrate significant performance improvements over classic geometry-based sensing methods, which is further validated by Cramér-Rao Lower Bound (CRLB) analysis.
This paper investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers over time. Each data center features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. We propose Green-LLM, a lexicographic multi-objective optimization framework that addresses this challenge without requiring manual weight tuning. The proposed model incorporates real-world constraints, including token-dependent processing delay and energy consumption, heterogeneous hardware capabilities, dynamic renewable generation, and spatiotemporal variations in electricity prices and carbon intensity. Unlike existing approaches that optimize individual environmental metrics in isolation, Green-LLM jointly minimizes operational cost, carbon emissions, and delay penalty while enforcing water consumption constraints to ensure both sustainability and quality-of-service requirements. Numerical results demonstrate that Green-LLM achieves significant reductions in carbon emissions and water consumption while maintaining operational costs within 3% of the minimum and ensuring sub-2-second response latency. These findings show that sustainable LLM inference can be achieved without sacrificing service quality or economic efficiency.
We present CANOPI, a novel algorithmic framework, for solving the Contingency-Aware Nodal Power Investments problem, a large-scale nonlinear optimization problem that jointly optimizes investments in generation, storage, and transmission upgrades, including representations of unit commitment and long-duration storage. The underlying problem is nonlinear due to the impact of transmission upgrades on impedances, and the problem's large scale arises from the confluence of spatial and temporal resolutions. We propose algorithmic approaches to address these computational challenges. We pose a linear approximation of the overall nonlinear model, and develop a fixed-point algorithm to adjust for the nonlinear impedance feedback effect. We solve the large-scale linear expansion model with a specialized level-bundle method leveraging a novel interleaved approach to contingency constraint generation. We introduce a minimal cycle basis algorithm that improves the numerical sparsity of cycle-based DC power flow formulations, accelerating solve times for the operational subproblems. CANOPI is demonstrated on a 1493-bus Western Interconnection test system built from realistic-geography network data, with hourly operations spanning 52 week-long scenarios and a total possible set of 20 billion individual transmission contingency constraints. Numerical results quantify reliability and economic benefits of incorporating transmission contingencies in integrated planning models and highlight the computational advantages of the proposed methods.
We study virtual energy storage services based on the aggregation of EV batteries in parking lots under time-varying, uncertain EV departures and state-of-charge limits. We propose a convex data-driven scheduling framework in which a parking lot manager provides storage services to a prosumer community while interacting with a retailer. The framework yields finite-sample, distribution-free guarantees on constraint violations and allows the parking lot manager to explicitly tune the trade-off between economic performance and operational safety. To enhance reliability under imperfect data, we extend the formulation to adversarial perturbations of the training samples and Wasserstein distributional shifts, obtaining robustness certificates against both corrupted data and out-of-distribution uncertainty. Numerical studies confirm the predicted profit-risk trade-off and show consistency between the theoretical certificates and the observed violation levels.
Particle filters (PFs) are often combined with swarm intelligence (SI) algorithms, such as Chicken Swarm Optimization (CSO), for particle rejuvenation. Separately, Kullback--Leibler divergence (KLD) sampling is a common strategy for adaptively sizing the particle set. However, the theoretical interaction between SI-based rejuvenation kernels and KLD-based adaptive sampling is not yet fully understood. This paper investigates this specific interaction. We analyze, under a simplified modeling framework, the effect of the CSO rejuvenation step on the particle set distribution. We propose that the fitness-driven updates inherent in CSO can be approximated as a form of mean-square contraction. This contraction tends to produce a particle distribution that is more concentrated than that of a baseline PF, or in mathematical terms, a distribution that is plausibly more ``peaked'' in a majorization sense. By applying Karamata's inequality to the concave function that governs the expected bin occupancy in KLD-sampling, our analysis suggests a connection: under the stated assumptions, the CSO-enhanced PF (CPF) is expected to require a lower \emph{expected} particle count than the standard PF to satisfy the same statistical error bound. The goal of this study is not to provide a fully general proof, but rather to offer a tractable theoretical framework that helps to interpret the computational efficiency empirically observed when combining these techniques, and to provide a starting point for designing more efficient adaptive filters.
The active impedance is a fundamental parameter for characterizing the behavior of large, uniform phased array antennas. However, its conventional calculation via the mutual impedance matrix (or the scattering matrix) offers limited physical intuition and can be computationally intensive. This paper presents a novel derivation of the active impedance directly from the radiated beam pattern of such arrays. This approach maps the scan-angle variation of the active impedance directly to the intrinsic angular variation of the beam, providing a more intuitive physical interpretation. The theoretical derivation is straightforward and rigorous. The validity of the proposed equation is conclusively confirmed through full-wave simulations of a prototype array. This work establishes a new and more intuitive framework for understanding, analyzing and accurately measuring the scan-dependent variations in phased arrays, which is one of the main challenges in modern phased array designs. Consequently, this novel formalism is expected to expedite and simplify the overall design and optimization process for next-generation, large-scale uniform phased arrays.
The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision floating-point MAC processing element supporting FP8 (E4M3, E5M2) and FP4 (2 x E2M1, 2 x E1M2) formats, specifically optimized for low-power and high-throughput AI workloads. The proposed architecture employs a novel bit-partitioning technique that enables a single 4-bit unit multiplier to operate either as a standard 4 x 4 multiplier for FP8 or as two parallel 2 x 2 multipliers for 2-bit operands, achieving maximum hardware utilization without duplicating logic. Implemented in 28 nm technology, the proposed PE achieves an operating frequency of 1.94 GHz with an area of 0.00396 mm^2 and power consumption of 2.13 mW, resulting in up to 60.4% area reduction and 86.6% power savings compared to state-of-the-art designs, making it well suited for energy-constrained AI inference and mixed-precision computing applications when deployed within larger accelerator architectures.
We study how intrinsic hard constraints on the decision dynamics of social agents shape collective decisions on multiple alternatives in a heterogeneous group. Such constraints may arise due to structural and behavioral limitations, such as adherence to belief systems in social networks or hardware limitations in autonomous networks. In this work, agent constraints are encoded as projections in a multi-alternative nonlinear opinion dynamics framework. We prove that projections induce an invariant subspace on which the constraints are always satisfied and study the dynamics of networked opinions on this subspace. We then show that heterogeneous pairwise alignments between individuals' constraint vectors generate an effective weighted social graph on the invariant subspace, even when agents exchange opinions over an unweighted communication graph in practice. With analysis and simulation studies, we illustrate how the effective constraint-induced weighted graph reshapes the centrality of agents in the decision process and the group's sensitivity to distributed inputs.