Speech Emotion Recognition (SER) systems have growing applications in sensitive domains such as mental health and education, where biased predictions can cause harm. Traditional fairness metrics, such as Equalised Odds and Demographic Parity, often overlook the joint dependency between demographic attributes and model predictions. We propose a fairness modelling approach for SER that explicitly captures allocative bias by learning the joint relationship between demographic attributes and model error. We validate our fairness metric on synthetic data, then apply it to evaluate HuBERT and WavLM models finetuned on the CREMA-D dataset. Our results indicate that the proposed fairness model captures more mutual information between protected attributes and biases and quantifies the absolute contribution of individual attributes to bias in SSL-based SER models. Additionally, our analysis reveals indications of gender bias in both HuBERT and WavLM.
Automatic Speech Recognition (ASR) for low-resource Dravidian languages like Telugu and Kannada faces significant challenges in specialized medical domains due to limited annotated data and morphological complexity. This work proposes a novel confidence-aware training framework that integrates real and synthetic speech data through a hybrid confidence mechanism combining static perceptual and acoustic similarity metrics with dynamic model entropy. Unlike direct fine-tuning approaches, the proposed methodology employs both fixed-weight and learnable-weight confidence aggregation strategies to guide sample weighting during training, enabling effective utilization of heterogeneous data sources. The framework is evaluated on Telugu and Kannada medical datasets containing both real recordings and TTS-generated synthetic speech. A 5-gram KenLM language model is applied for post-decoding correction. Results show that the hybrid confidence-aware approach with learnable weights substantially reduces recognition errors: Telugu Word Error Rate (WER) decreases from 24.3% to 15.8% (8.5% absolute improvement), while Kannada WER drops from 31.7% to 25.4% (6.3% absolute improvement), both significantly outperforming standard fine-tuning baselines. These findings confirm that combining adaptive confidence-aware training with statistical language modeling delivers superior performance for domain-specific ASR in morphologically complex Dravidian languages.
Automatic Speech Recognition (ASR) is increasingly used in applications involving child speech, such as language learning and literacy acquisition. However, the effectiveness of such applications is limited by high ASR error rates. The negative effects can be mitigated by identifying in advance which ASR-outputs are reliable. This work aims to develop two novel approaches for selecting reliable ASR-output at the utterance level, one for selecting reliable read speech and one for dialogue speech material. Evaluations were done on an English and a Dutch dataset, each with a baseline and finetuned model. The results show that utterance-level selection methods for identifying reliably transcribed speech recordings have high precision for the best strategy (P > 97.4) for both read speech and dialogue material, for both languages. Using the current optimal strategy allows 21.0% to 55.9% of dialogue/read speech datasets to be automatically selected with low (UER of < 2.6) error rates.
Guaranteeing the safety of controllers is vital for real-world applications, but is markedly difficult when the states are not perfectly known and when the control inputs are bounded. Backup control barrier functions (bCBFs) use predictions of the flow under a prescribed controller to achieve safety in the presence of bounded inputs and perfect state information. However, when only an estimate of the true state is known, this flow may not be precisely computed, as the initial condition is unknown. Furthermore, the true flow evolves using feedback from the estimated state, thus introducing coupling between known and unknown flows. To address these challenges, we propose a technique that leverages an uncertainty envelope centered around the estimated flow and show that ensuring the safety of this envelope guarantees that the true state satisfies the safety constraints. Additionally, we show that in the presence of state uncertainty, using the resulting Output Feedback Backup Control Barrier Functions (O-bCBFs), there always exists a feasible control input that can guarantee the safety of the true state, even in the presence of input constraints.
This paper provides novel insights into channel and subspace codes in nonadaptive channel sensing with a single RF chain. Observing that this problem naturally maps to a noncoherent decoding problem, we show that the sensing performance of the maximum likelihood (ML) angle estimator, which does not require knowledge of the typically unknown channel coefficient, is governed by two key terms: the minimum subspace distance and beam gain of the used beamformers. We derive an exact expression for the subspace distance of binary linear channel codes mapped to BPSK, which illuminates the relationship between subspace and Hamming distance, used to design subspace and channel codes, respectively. Our result also reveals why good Hamming distance alone is insufficient for sensing, and shows that well-known families of channel codes such as Reed-Muller codes, yield zero subspace distance and thereby poor sensing performance when used naively without proper codebook pruning. Finally, we introduce so-called beamspace subspace codes based on sparse antenna selection patterns (Golomb rulers), which we show provide near-optimal subspace distance. We demonstrate that this property of judiciously designed sparse arrays can be leveraged together with beamforming gain via convolutional beamspaces, enabling hardware- and sample-efficient channel sensing with theoretical guarantees in large-scale multiantenna communications.
Electric power systems are rapidly evolving into deeply digital, cyber-physical infrastructures in which large fleets of distributed energy resources must be coordinated as system-level flexibility across multiple spatial and temporal scales. Despite growing distributed energy resource deployment, existing grid and market architectures lack scalable, interoperable mechanisms to reliably translate device-level flexibility into grid-aware services, creating risks to reliability, affordability, and resilience at high penetration. We propose that scalable and reliable coordination of distributed energy resource-based flexibility in future power systems is fundamentally an architectural problem that can be addressed through laminar cyber-physical design using minimal, standardized interoperability interfaces that link device autonomy with system-level objectives. To assess this claim, we present and discuss a layered cyber-physical systems architecture and explicate its implementation through standards-based interfaces, Flexibility Functions, hierarchical control, and case studies spanning U.S. and Danish regulatory, market, and operational contexts. Empirical evidence from New York's Grid of the Future proceedings, Danish Smart Energy Operating System pilots, and operational aggregator deployments demonstrates that such architecture enables predictable, grid-aware flexibility while preserving device autonomy, interoperability, reliability, and quality of service. These results support a cross-Atlantic research agenda centered on joint testbeds, harmonized interoperability mechanisms, and coordinated policy experiments to accelerate the deployment of resilient, scalable, and flexible clean energy systems.
Optical wireless communication (OWC) has emerged as a promising candidate for future high-capacity indoor wireless networks, driven by its large unregulated spectrum, high spatial reuse, and ability to support multi-gigabit data rates. However, OWC systems are highly sensitive to user mobility, as link performance depends strongly on the spatial alignment between transmitter and receiver. Accurate modelling of user position and device orientation is therefore essential for reliable channel estimation and system evaluation. To that effect, this paper proposes a hybrid Gauss--Markov and long short-term memory (GM--LSTM) mobility model for indoor OWC environments. The Gauss--Markov component captures the temporal correlation of user motion, while the LSTM learns residual behaviour to model non-linear movement patterns and orientation dynamics. The proposed model jointly predicts user position and device orientation, enabling improved representation of mobility in OWC channels. Performance is evaluated using prediction accuracy and per-user data rate evolution. Results show that the proposed hybrid GM--LSTM model outperforms conventional Random Waypoint and Gauss--Markov models, providing more accurate mobility prediction and more stable communication performance in dynamic indoor environments.
The rapid advancement of Audio Large Language Models (ALMs), driven by Neural Audio Codecs (NACs), has led to the emergence of highly realistic speech deepfakes, commonly referred to as CodecFakes (CFs). Consequently, CF detection has attracted increasing attention from the research community. However, existing studies predominantly focus on English or Chinese, leaving the vulnerability of Indic languages largely unexplored. To bridge this gap, we introduce Indic-CodecFake (ICF) dataset, the first large-scale benchmark comprising real and NAC-synthesized speech across multiple Indic languages, diverse speaker profiles, and multiple NAC types. We use IndicSUPERB as the real speech corpus for generation of ICF dataset. Our experiments demonstrate that state-of-the-art (SOTA) CF detectors trained on English-centric datasets fail to generalize to ICF, underscoring the challenges posed by phonetic diversity and prosodic variability in Indic speech. Further, we present systematic evaluation of SOTA ALMs in a zero-shot setting on ICF dataset. We evaluate these ALMs as they have shown effectiveness for different speech tasks. However, our findings reveal that current ALMs exhibit consistently poor performance. To address this, we propose SATYAM, a novel hyperbolic ALM tailored for CF detection in Indic languages. SATYAM integrates semantic representations from Whisper and prosodic representations from TRILLsson using through Bhattacharya distance in hyperbolic space and subsequently performs the same alignment procedure between the fused speech representation and an input conditioning prompt. This dual-stage fusion framework enables SATYAM to effectively model hierarchical relationships both within speech (semantic-prosodic) and across modalities (speech-text). Extensive evaluations show that SATYAM consistently outperforms competitive end-to-end and ALM-based baselines on the ICF benchmark.
We present principles of algebraic diversity (AD), a group-theoretic approach to signal processing exploiting signal symmetry to extract more information per observation, complementing classical methods that use temporal and spatial diversity. The transformations under which a signal's statistics are invariant form a matched group; this group determines the natural transform for analysis, and averaging an estimator over the group action reduces variance without requiring additional snapshots. The viewpoint is broadened in five directions beyond the single-observation measurement of a companion paper. Rank promotion admits AD on scalar data streams and identifies the law of large numbers as the trivial-group case of a $(G, L)$ continuum combining sample-count with group-orbit averaging. An eigentensor hierarchy handles signals with nested symmetry. A blind group-matching methodology identifies the matched group from data via a polynomial-time generalized eigenvalue problem on the unitary Lie algebra, placing the DFT, DCT, and Karhunen--Loève transforms as distinguished points on a transform manifold. A cost-symmetry matching principle then extends AD from measurement to blind and adaptive signal processing generally; blind equalization is the lead detailed example, with the Constant Modulus Algorithm's residual phase ambiguity predicted analytically and matched within $1.6^\circ$ on 3GPP TDL multipath channels, and other blind problems in signal processing are mapped into the framework. Four theorems formalize a structural capacity $\kappa$, the Rényi-2 analog of Shannon and von Neumann's Rényi-1 entropies, quantifying how a signal's information is organized rather than how much information it contains. AD complements prior algebraic approaches including invariant estimation, minimax robust estimation, algebraic signal processing, and compressed sensing.
Multi-look acquisition is a widely used strategy for reducing speckle noise in coherent imaging systems such as digital holography. By acquiring multiple measurements, speckle can be suppressed through averaging or joint reconstruction, typically under the assumption that speckle realizations across looks are statistically independent. In practice, however, hardware constraints limit measurement diversity, leading to inter-look correlation that degrades the performance of conventional methods. In this work, we study the reconstruction of speckle-free reflectivity from complex-valued multi-look measurements in the presence of correlated speckle. We model the inter-look dependence using a first-order Markov process and derive the corresponding likelihood under a first-order Markov approximation, resulting in a constrained maximum likelihood estimation problem. To solve this problem, we develop an efficient projected gradient descent framework that combines gradient-based updates with implicit regularization via deep image priors, and leverages Monte Carlo approximation and matrix-free operators for scalable computation. Simulation results demonstrate that the proposed approach remains robust under strong inter-look correlation, achieving performance close to the ideal independent-look scenario and consistently outperforming methods that ignore such dependencies. These results highlight the importance of explicitly modeling inter-look correlation and provide a practical framework for multi-look holographic reconstruction under realistic acquisition conditions. Our code is available at: this https URL.
ReRAM-based in-memory computing (IMC) architectures are promising candidates for energy-efficient matrix-vector multiplication. While scaling the size of ReRAM arrays allows for the amortization of power-hungry peripheral circuits like DACs and ADCs, it simultaneously introduces more parasitic along the signal path. Because of these challenges, current design methodologies often lack practical guidelines to balance these effects at early design stage, forcing designers to rely on time-consuming, iterative transistor-level simulations. In this work, we propose a comprehensive framework for design space exploration that enables the selection of optimal array size, ADC resolution, and system frequency without requiring exhaustive simulations. The framework utilizes a specialized testbench to extract parameters from a limited set of representative transistor-level simulations. These parameters are then used to accurately predict the performance of arbitrary architectures. We demonstrate the effectiveness of this framework through two realistic design cases aimed at maximizing energy efficiency (TOPs/s/W). The results show that the framework successfully identifies optimal architectural configurations under strict power and error constraints, providing an efficient path for high-performance IMC design.
The increasing penetration of flexible loads, such as electric vehicles and AI data-centers necessitates new methodologies for quantifying electrical load hosting capacity under operational constraints and flexible connection agreements. We propose a risk-aware hosting capacity framework that explicitly accounts for both flexibility, in the form of load curtailment, and system reliability. The proposed method incorporates a Conditional Value-at-Risk (CVaR) constraint to control the tail risk of excessive curtailment, ensuring that extreme interventions remain limited. Additionally, a weighted $\ell_1$ approach is introduced to limit the number of utility-controlled interventions, enabling control over the frequency of curtailment actions. A regularization parameter is used to tune the intervention count to a desired intervention budget. The resulting optimization formulation is convex and efficiently solvable, allowing scalable implementation. Numerical results demonstrate that the proposed method significantly increases hosting capacity while maintaining strict risk guarantees and limiting intervention frequency, providing a practical balance between flexibility and reliability in distribution systems.
This paper provides a sparse signal recovery algorithm, DU-PSISTA (Deep Unfolded-Periodic Sketched Iterative Shrinkage-Thresholding Algorithm), which aims to balance computational efficiency and accuracy for recovering high-dimensional sparse signals, and a convergence analysis under sufficient conditions. DU-PSISTA introduces a random matrix projection known as sketching to reduce the dimensionality of gradient computations and periodically alternates between the standard ISTA and the sketched variant. This hybrid structure enables flexible control over the trade-off between accuracy and computational complexity through a pre-configurable period parameter. The algorithm includes many parameters to be tuned such as step sizes and thresholding factors so that we incorporate deep unfolding that optimizes the parameters through data-driven training, enabling the algorithm to adaptively improve convergence speed and performance. We show that the proposed method achieves a linear-type contraction to a neighborhood of the true sparse signal with properly selected parameters. The analysis provides an interpretation for the effectiveness of the hybrid structure to improve recovery accuracy. Numerical experiments confirm that our method achieves comparable recovery performance to conventional deep unfolded ISTA while reducing computational complexity, especially when the period parameter and sketch size are properly selected. The results are also consistent with the theoretical insights.
This article proposes a Model Reference Adaptive Control (MRAC) strategy to achieve fixed-time convergence of parameter estimation and tracking errors for unknown linear time-invariant systems, without relying on the persistence of excitation condition. Instead, it employs a less restrictive initial/interval excitation condition on the regressor matrix, enhancing practicality and ease of implementation in real-world scenarios. Our primary contribution is a novel parameter update law within the indirect MRAC framework, ensuring that parameter estimates converge within a fixed time, once the initial/interval excitation condition is met. This approach simplifies the practical requirements for adaptive control while guaranteeing robust performance against parameter uncertainty and external disturbances. Simulation results provide a comparison with the current literature to validate the effectiveness of this approach.
Circuits' and in particular DC/DC converters' switching behavior is analyzed in this paper using the equivalent control modeling of the dynamic systems' sliding mode regime. As a representative example and also being one of the most complex circuits among DC/DC converters, the Ćuk converter is chosen. It is shown how the converter's behavior in the steady state regime can be studied and analyzed by the linear matrix inequalities based stability conditions for linear dynamic systems with nonlinear sector bounded perturbations. The maximization of the nonlinear sector bound provides a limit for applying the linear ripple approximation in the converter operation analysis. Furthermore, our approach is validated by providing simulation results for two different switching surfaces of practical interest.
In this paper we design a switching control law for the Ćuk converter in the continuous conduction mode using piecewise linear Lyapunov functions. These Lyapunov functions can be constructed using different number of state variables affecting the system's performance. In the paper, some representative simulations covering construction of different piecewise Lyapunov functions, are provided.
Evaluation of musical source separation (MSS) has traditionally relied on Blind Source Separation Evaluation (BSS-Eval) metrics. However, recent work suggests that BSS-Eval metrics exhibit low correlation between metrics and perceptual audio quality ratings from a listening test, which is considered the gold standard evaluation method. As an alternative approach in singing voice separation, embedding-based intrusive metrics that leverage latent representations from large self-supervised audio models such as Music undERstanding with large-scale self-supervised Training (MERT) embeddings have been introduced. In this work, we analyze the correlation of perceptual audio quality ratings with two intrusive embedding-based metrics: a mean squared error (MSE) and an intrusive variant of the Fréchet Audio Distance (FAD) calculated on MERT embeddings. Experiments on two independent datasets show that these metrics correlate more strongly with perceptual audio quality ratings than traditional BSS-Eval metrics across all analyzed stem and model types.
Image transmission for vehicle-to-vehicle collaborative perception in autonomous driving faces challenges including limited on-board terminal resources, time-varying wireless channel fading, and poor robustness under low signal-to-noise (SNR) ratio. Traditional separate source-channel coding schemes suffer from the cliff effect, while existing semantic communication models are limited by large parameter sizes and weak digital compatibility. This paper proposes a lightweight, low-SNR-robust deep joint source-channel coding (JSCC) semantic communication system. First, structured pruning is implemented based on batch normalization layer scaling factors and L1 regularization, which significantly reduces model complexity while ensuring image reconstruction quality. Second, a uniform quantization and M-QAM modulation scheme adapted to JSCC features is designed, and a training-deployment separation strategy is adopted to address the non-differentiable quantization problem, enabling compatibility with existing digital communication systems. Simulation results on the Cityscapes dataset show that the pruned model maintains comparable performance and robustness to the original one, even with over half of its parameters removed. Notably, the proposed scheme exhibits significant advantages over conventional communication methods under low SNR conditions.
This article introduces the HYMN (HYbrid Multi-technology Navigation) dataset: a multi-system, and time synchronized dataset for localization research based on opportunistic signals collected in an indoor-outdoor scenario. HYMN comprises measurement data collected in an industrial hall setting for five different positioning systems including Ultra-Wideband (UWB), Bluetooth Low Energy (BLE), WiFi, 5G, and Global Navigation Satellite System (GNSS). Unlike existing datasets that focus on single technologies or purely indoor/outdoor scenarios, HYMN combines five positioning technologies with explicit coverage of indoor-outdoor transitions, enabling multi-sensor fusion research for seamless localization. Each instance of data is identified through a unique measurement id and it represents time-stamped observations relevant for each system respectively along with the ground truth information. HYMN is designed to support a wide range of localization tasks including multi-sensor fingerprinting, cross-technology fusion, and seamless indoor-outdoor positioning. The synchronized measurements from GNSS and other terrestrial systems enable researchers to investigate how heterogeneous signals complement each other to overcome individual technology limitations such as GNSS degradation in covered areas or terrestrial system variability in dynamic environments.
In recent years, WiFi sensing has been recognized as a promising technology to bring respiratory monitoring into everyday homes, thanks to its contactless nature and ubiquitous availability. However, existing WiFi-based respiratory monitoring systems still fall short of deployment-oriented performance: they suffer from restrained hardware scalability, limited accuracy, and are highly sensitive to user location. To overcome these limitations and push WiFi sensing towards clinically meaningful precision, we propose RespirFi, a novel system that robustly delivers high-fidelity respiratory waveforms with WiFi Channel State Information (CSI), thereby enabling accurate estimation of key physiological biomarkers. At the core of RespirFi is a theoretical human reflection model, through which we perform an in-depth characterization of how CSI variations are shaped by both subcarrier frequency and spatial user location. Guided by these insights, we develop a location-robust waveform construction method that adaptively selects high quality subcarriers and aligns their waveform trends, ensuring accurate waveform recovery. Furthermore, we propose a breathing phase identification method that leverages inter-subcarrier CSI differences to reliably distinguish inhalation from exhalation. We implement RespirFi over commodity WiFi devices, and extensive experiments demonstrate that it outperforms state-of-the-art approaches across a wide range of clinically relevant respiratory metrics.
We investigate higher-rank transmissions for multi-connected Extended Reality (XR) devices enabled through tethering group (TGr), in which a nearby tethering User Equipment (UE) cooperates with an XR UE via a short-range tethering link (TL). In contrast to prior studies that are limited to rank-1 transmission and ideal tethering assumptions, we analyze TGr performance under higher-rank point-to-multipoint (PTM) transmission and realistic TL delays. Conventional single Outer Loop Link Adaptation (OLLA) offset results in inaccurate throughput prediction across ranks, leading to suboptimal rank selection. To address this limitation, we propose a multi-offset Outer Loop Link Adaptation (MO-OLLA) framework that introduces rank-dependent signal-to-interference-plus-noise ratio (SINR) correction to improve Link Adaptation (LA) accuracy. Furthermore, a Wireless Fidelity (WiFi) based delay model is incorporated to characterize the impact of practical TL constraints including limited bandwidth and achievable throughput on XR capacity and cellular resource utilization, providing the first such analysis for higher-rank multi-connected XR device. System-level simulations demonstrate that MO-OLLA provides up to 20% performance improvement over conventional OLLA for multi-connected XR UEs. Moreover, TGrs effectively exploit higher-rank transmission, achieving XR capacity gains of 180-200% over single-link XR UEs under ideal TL conditions. Critically, the gains of the TGr remain at 165-180% under realistic high-throughput TLs relative to single-link XR UEs, confirming the practical viability of TGr based cooperation for XR capacity enhancements within existing cellular resources.
The sixth generation (6G) communication networks are expected to provide high data rates, ultra-reliable communication, and massive connectivity, especially in challenging environments such as dense urban areas and disaster-affected regions. However, traditional terrestrial-only networks face significant challenges in these scenarios, including signal blockages from high-rise buildings, traffic congestion, and dynamic user distributions. To address these limitations, we propose the adaptive multi-UAV deployment (AMUD) framework within satellite air-ground integrated networks (SAGINs). The AMUD framework dynamically deploys amplify-and-forward multiple unmanned aerial vehicle relay (UAVr) in with low Earth orbit (LEO) satellites to improve coverage, alleviate congestion, and ensure reliable communication in non-line-of-sight and high-demand conditions. We formulate an optimization problem that aims to jointly maximize the energy efficiency of the total network and the total capacity while ensuring the fairness of the total capacity and satisfying the users' requirements. The simulation results demonstrate that AMUD improves the total capacity of the network, improves the total energy efficiency, and increases the fairness of the capacity compared to traditional LEO satellite and ground base station (LEO-GBS) only systems.
Extremely large-scale multiple-input multiple-output (XL-MIMO) is a key enabling technology for sixth-generation (6G) communication systems. Nevertheless, the increase in array aperture and signal bandwidth brings new challenges to wideband channel estimation in XL-MIMO systems. Motivated by recent advances in deep generative modeling, we propose a diffusion model-based method for near-field wideband channel estimation in XL-MIMO systems. We first analyze the statistical correlation of wideband channel and show that near-field wideband channel exhibits both spatial non-stationarity and beam split effects. Based on these observations, the channel estimation problem is formulated as a Bayesian posterior inference task, in which a diffusion model is employed to learn the prior distribution of the channel. To further enhance the representation of complex spatial-frequency channel structures, we design a denoising network with a multi-scale attention mechanism. In particular, the network extracts multi-scale spatial-frequency features via parallel convolutional branches with different receptive fields, and combines feature attention and spatial attention modules to adaptively emphasize critical channel features. This design enables more accurate modeling of near-field wideband channel distributions and consequently improves channel estimation performance. Experimental results demonstrate that the proposed method exhibits superior robustness to existing baseline schemes for XL-MIMO wideband channel estimation under different experimental settings.
Controlling complex dynamical systems to satisfy sophisticated specifications remains a significant challenge in modern engineering. A promising approach to this problem is the approximate simulation-based hierarchical control (ASHC) technique. In this method, a simplified representation of the complex system, called the abstract system, is first designed and controlled. An interface function is then designed to translate the control law into the input of the complex system, thereby achieving approximate control synthesis. However, most existing results in ASHC are only for linear systems. This paper proposes a constructive method for solving the ASHC problem for nonlinear systems. To this end, we propose invariance equation-based methods to achieve the two classical requirements of the ASHC technique, namely the bounded output discrepancy and the $m$-relation. We then study the solvability conditions of the problem and summarise the overall design procedures. We illustrate the results with a practical example, providing step-by-step solutions to the ASHC problem of a DC-to-DC Ćuk converter.
Physical layer (PHY) steganography conceals secrets by making subtle modifications to transmitted radio waveforms, which can be applied to establish covert communication systems. Given the widespread deployment of Wi-Fi infrastructures, hiding secrets within Wi-Fi transmissions exhibits significant covertness and has attracted increasing research attention. Recent advances in Wi-Fi steganography have focused on embedding secrets within channel state information (CSI) by applying artificial finite impulse response (FIR) filters to outgoing signals. These methods can emulate natural wireless propagation effects, thereby evading detection by eavesdroppers. However, existing CSI-based approaches suffer from two critical limitations: vulnerability to environmental variations and limited steganographic capacity. This work presents a Wi-Fi steganography system that mitigates these constraints. Specifically, we introduce a CSI division mechanism to decouple artificial CSI components from natural wireless channel responses. In essence, secrets are embedded within the quotient of two consecutive CSI measurements. Furthermore, we propose an encoder-decoder neural network framework that automatically learns optimal strategies for FIR filter generation and secret recovery, substantially enhancing steganographic capacity. We implemented a prototype using commercial off-the-shelf hardware, including a software-defined radio (SDR) transmitter and two receiver platforms: ANTSDR and ESP32. Experimental evaluations demonstrate that the system achieves robust performance under dynamic environmental conditions while significantly improving steganographic capacity.
This paper outlines a pathway towards active operation of lowvoltage distribution grids. In these grids, the growing deployment of distributed generation, controllable demand and storage, together with the roll-out of intelligent metering systems, creates new requirements and opportunities for distribution system operators. On the basis of the German and European regulation, and in particular of recent directives enabling grid-oriented interventions and market-based procurement of flexibility, the paper identifies three key pillars for active low-voltage operation: (a) measurement placement and observability, (b) secure and interoperable information and communication architectures and interfaces, and (c) integration of market-based and gridoriented optimisation for controlling connected assets. A structured system overview is developed that specifies main actors and data flows, highlighting central research topics across these pillars. Building on this, a four-phase roadmap is presented, spanning requirements and use-case definition, method development and simulation, laboratory and field validation, and roll-out with system-level feedback, thus providing guidance for distribution system operators and researchers.
Reconfigurable Intelligent Surfaces (RIS) have emerged as a key enabler for programmable wireless environments in future Beyond-5G (B5G) and 6G networks. In the meantime, Integrated Sensing and Communication (ISAC) and Physical-Layer Security (PLS) are becoming essential functionalities for next-generation wireless systems, particularly in safety and mission-critical applications. However, jointly optimizing RIS-assisted systems to support communication, sensing, and security introduces complex and often conflicting design trade-offs. In this work, a multi-objective optimization framework for RIS-assisted networks is proposed, aiming to jointly analyze communication performance, sensing accuracy, and security-related channel properties in a unified system perspective. The proposed model jointly considers RIS deployment location, orientation, surface size, and an ISAC configuration weight that controls the allocation of RIS reflection gain between communication and sensing tasks. Simulation results reveal inherent trade-offs among communication reliability, sensing accuracy, and security performance. The proposed framework provides valuable insights into the interplay between communication, sensing, and security, and enables the design of efficient RIS deployment and configuration strategies for secure ISAC-enabled 6G wireless networks.
Wavefront engineering for applications in near-field wireless connectivity is gradually becoming common ground. In this landscape, beams that propagate on bent paths are ideal candidates for dynamic blockage avoidance and suppression of potential eavesdropping. In this work we study the physical layer security offered by bending beams, and we demonstrate their capabilities for line-of-sight and non-line-of-sight eavesdropping. We analyze the dependencies between the possible locations of an eavesdropper and the design parameters of such beams, and we introduce metrics to assess their physical layer security performance. Our results demonstrate their superiority with respect to beams generated with conventional beam-forming.
Timely information delivery in low-altitude networks is critical for many time-sensitive applications, such as unmanned aerial vehicle (UAV) navigation, inspection, and surveillance. The key challenge lies in balancing three competing factors: stringent data freshness requirements, UAV onboard energy consumption, and interference with terrestrial services. Addressing this challenge requires not only efficient power and channel allocation strategies but also effective communication timing over the entire operation horizon. In this work, we propose a model predictive communication (MPComm) framework, enabled by advanced channel sensing techniques, in which the channel conditions that the UAV will experience are largely predictable. Within this framework, we formulate a constrained bi-objective optimization problem to achieve a desired trade-off between energy consumption and terrestrial channel occupation, subject to a strict timeliness constraint. We solve this problem using Pareto analysis and show that the original non-convex, mixed-integer problem can be decomposed into a two-layer structure: the outer layer determines the optimal communication timing, while the inner layer determines the optimal power and channel allocation for each communication interval. An efficient algorithm for the inner problem is developed using non-convex analysis, with asymptotic optimality guarantees, while the outer problem is solved optimally via a simple graph search, with edges characterized by inner solutions. The proposed approach applies to a broad class of problem variants, including objective transformations and single-objective specializations. Numerical results demonstrate the efficiency of the proposed solution, achieving up to a six-fold reduction in terrestrial channel occupation and a 6dB energy saving compared to benchmark schemes.
We introduce a graph-signal generalisation of Sample Entropy, denoted SampEn$_{G}$, to quantify irregularity of graph signals on a continuous state space, complementing existing methods on symbolic dynamics. Our approach replaces the temporal delay embedding of classical SampEn with a multi-hop graph-based embedding: for each node, we aggregate patterns from local walk-weighted neighbourhood averages computed via powers of the graph shift operator. We show empirically that SampEn$_{G}$ reduces to classical 1D SampEn on directed path graphs, and validate its nonlinear sensitivity using the logistic map. Experiments on directed Erdős--Rényi graph signals further characterise its behaviour with connectivity and pattern length $m$, with practical runtimes on the order of thousands of nodes. We expect SampEn$_{G}$ to open up new ways to analyse graph signals, generalising SampEn and the concept of conditional entropy to extending nonlinear analysis to a wide variety of network data.
Integrated Sensing and Communication (ISAC) systems require efficient beamforming architectures to jointly support communication and sensing functionalities. To reduce hardware overhead, Hybrid Beamforming (HBF) has been widely studied and shown to achieve performance close to fully digital beamforming under practical hardware constraints. As a promising evolution, Reconfigurable Antenna (RA) technologies have recently emerged to further enhance beamforming Degrees of Freedom (DoFs) by dynamically reconfiguring antenna Electromagnetic(EM) characteristics, yet their integration into ISAC systems remains largely unexplored. In this paper, we investigate an RA-assisted ISAC system and develop a decoupled Triple-Hybrid Beamforming (Tri-HBF) framework that alternatively optimizes digital, analog, and EM beamformers to maximize the communication rate and sensing Signal-to-Clutter-plus-NoiseRatio (SCNR). For both Single-user Single-target (SUST) and Multiple-user Multiple-target (MUMT) scenarios, we first transform the original fractional objectives into fraction-free ones via methods tailored to their respective structures. The resulting problems are then solved via alternating optimization over different variable blocks. Closed-form updates are derived for all variables except the EM beamforming subproblem in the MUMT scenario. To further reduce the complexity introduced by Semidefinite Relaxation (SDR) in EM beamforming, we propose a low-complexity iterative approach across antennas with closed-form updates. Simulation results demonstrate that the proposed scheme significantly outperforms benchmark designs with conventional omnidirectional and directional antennas, achievingalmost 100% improvement in spectrum efficiency and 62.5% reduction in antenna overhead, thereby unveiling the
Channel knowledge map (CKM) is a promising technique to achieve environment-aware wireless communication and sensing. Constructing the complete CKM based on channel knowledge observations at sparse locations is a fundamental problem for CKM-enabled wireless networks. However, most existing works on CKM construction only consider the special type of CKM, i.e., the channel gain map (CGM), which only records the channel gain value for each location. In this paper, we consider the channel spatial correlation map (SCM) construction, which signifies the location-specific spatial correlation matrix for multi-antenna systems. Unlike CGM construction, constructing SCM poses significant challenges due to its extremely high-dimensional structure. To address this issue, we first decompose the high-dimensional SCM into lower-dimensional path gain map (PGM) and path angle map (PAM). Then we propose a deep learning model termed E-SRResNet for constructing high-quality SCM from sparse samples, which incorporates multi-head attention (MHA) mechanisms and multi-scale feature fusion (MSFF) to accurately model both local and global spatial relationships of channel parameters and complex nonlinear mappings. Furthermore, we preprocess the dataset to provide priors including line-of-sight (LoS) map, binary building map and base station (BS) map for the model to reconstruct SCM more accurately. Simulations conducted on the CKMImageNet dataset demonstrate that the proposed E-SRResNet achieves significant performance improvements over baseline methods. Moreover, the cosine similarity between the constructed SCM and the ground truth exceeds 0.8 in most regions, validating the effectiveness of the proposed construction method.
Accurate antenna array calibrations and measurements of aspects such as active element pattern (AEP) are critical for enabling integrated sensing and communication (ISAC) technologies such as directional modulation. One reliable way of obtaining accurate and repeatable AEP measurements is to spin the antenna array on a turntable, but many turntables designed for antenna array measurements are prohibitively expensive for small labs and may not be designed with RF considerations, such as cable phase stability, in mind. This paper details the design of a motorized 3D printed turntable for use in directional modulation and in-situ measurement experiments that will allow for rotation of an antenna array around a point, such that the far field of the antenna pattern can be measured by a stationary receiver.
This paper presents a personalized Battery Electric Vehicle (BEV) energy consumption estimation framework that integrates map-based contextual features with driver-specific velocity prediction and physics-based energy consumption modeling. The system combines route selection, detailed road feature processing, a rule-based reference velocity generator, a PID controller-based vehicle dynamics simulator, and a Bidirectional LSTM model trained to reproduce individual driving behavior. The predicted individual-specific velocity profiles are coupled with a quasi-steady backward energy consumption model to compute tractive power, regenerative braking, and State-of-Charge (SOC) evolution. Evaluation across urban, freeway, and hilly routes demonstrates that the proposed approach captures key driver behavioral patterns such as deceleration at intersections, speed-limit tracking, and road grade-dependent responses, while producing accurate power and SOC trajectories. The results highlight the effectiveness of combining learned driver behavior with map-based context and physics-based energy consumption modeling to produce accurate, personalized BEV SOC depletion profiles.
The large-scale integration of inverter-based resources (IBRs), particularly distributed photovoltaics (DPVs), into distribution networks increases the need for integrated transmission and distribution (T&D) co-simulation. A key challenge in such co-simulation lies in accurately modeling system frequency across two asynchronous simulation environments. For example, the transmission system, simulated in the phasor domain, can operate with a simulation timestep of 10 ms, while the distribution system, simulated in the electromagnetic transient domain (EMT) to include IBR models, uses a much finer timestep of 100 microseconds. To ensure accurate PLL-based frequency estimation in distribution systems, it is essential to predict voltage magnitude and phase angle variations within the 10 ms transmission intervals, rather than using constant values that cause inaccurate frequency calculations. This issue becomes particularly critical when modeling primary and secondary frequency response services provided by IBRs. To address this challenge, we propose an automated Exponentially Weighted Moving Average Real-Time Threshold Adaptation (EWMA-RTTA) method, which utilizes Quadratic Extrapolation to predict voltage magnitude and phase angle trends more precisely. The proposed method is validated using two Opal-RT simulators: one simulating an IEEE 118-bus transmission system and the other simulating an IEEE 123-bus distribution network. Simulation results demonstrate that our approach improves the normalized mean absolute error (nMAE) by a factor of 25.7 compared to methods that do not account for time mismatches, offering a scalable and accurate solution for modeling IBR-based frequency response in modern power systems.
Recent advances in large audio language models (LALMs) have enabled multilingual speech understanding. However, benchmarks for evaluating LALMs remain scarce for non-English languages, with Korean being one such underexplored case. In this paper, we introduce KoALa-Bench, a comprehensive benchmark for evaluating Korean speech understanding and speech faithfulness of LALMs. In particular, KoALa-Bench comprises six tasks. Four tasks evaluate fundamental speech understanding capabilities, including automatic speech recognition, speech translation, speech question answering, and speech instruction following, while the remaining two tasks evaluate speech faithfulness, motivated by our observation that several LALMs often fail to fully leverage the speech modality. Furthermore, to reflect Korea-specific knowledge, our benchmark incorporates listening questions from the Korean college scholastic ability test as well as content covering Korean cultural domains. We conduct extensive experiments across six models, including both white-box and black-box ones. Our benchmark, evaluation code, and leaderboard are publicly available at this https URL.
This paper presents a detailed study of how graph neural networks can be used on edge intelligent meters in a microgrid to forecast photovoltaic power generation. The problem background and the adopted technologies are introduced, including ONNX and ONNX Runtime. The hardware and software specifications of the smart meter are also briefly described. Then, the paper focuses on the training and deployment of two graph machine learning models, GCN and GraphSAGE, with particular emphasis on developing and deploying a customized ONNX operator for GCN. Finally, a case study is conducted using real datasets from a village microgrid. The performance of the two models is compared on both the PC and the smart meter, exhibiting successful deployments and executions on the smart meter.
Can scientific discovery be made arbitrarily easy by choosing the right representation, collecting enough data, and deploying sufficiently powerful algorithms? This paper argues that the answer is fundamentally negative. We introduce the Existential Theory of Research (ETR), a formal framework that models discovery as the recovery of structured explanations under constraints of representation, observation, and computation. Within this framework, we show that these three components cannot be simultaneously optimized: no method can guarantee universally simple explanations, arbitrarily compressed observations, and efficient exact inference. This limitation is not model-specific, but arises from a synthesis of uncertainty principles in sparse representation, sample complexity bounds in high-dimensional recovery, and the computational hardness of exact inference. We further show that representation mismatch alone can inflate intrinsic simplicity into apparent complexity, rendering otherwise tractable problems observationally and computationally prohibitive. To quantify these effects, we introduce an uncertainty functional that captures the joint difficulty of discovery. The results suggest that scientific difficulty is not accidental, but a structural consequence of the geometry and complexity of inference.
We compare three secrecy-coding schemes for the degraded wiretap binary symmetric channel (BSC) in the finite-blocklength regime: (i) polar wiretap coset codes, (ii) PAC codes used as wiretap coset codes, and (iii) the invertible-extractor (IE) framework of Bellare-Tessaro. Our comparison is empirical and uses a common semantic-secrecy metric (distinguishing advantage). For polar coset codes, we compute Eve's polarized bit-channel capacities (via the Tal-Vardy construction) to obtain explicit finite-length upper bounds on mutual-information leakage, yielding strong secrecy bounds. For PAC coset codes, we prove that Eve's synthesized bit-channels are equivalent to those of polar codes (up to a permutation), so the same leakage bounds apply; we then convert these strong-secrecy bounds into semantic-secrecy guarantees for symmetric wiretap channels. For the IE scheme, we use the closed-form semantic-secrecy bounds given in the reference work. Finally, we report finite-length results that jointly characterize (a) semantic-secrecy guarantees against Eve and (b) frame-error-rate performance at Bob, illustrating that PAC codes can significantly improve reliability without changing the secrecy bounds inherited from polar coding. Moreover, under the finite-length bounds considered in this work, polar/PAC secrecy codes provide tighter security guarantees than the invertible-extractor framework.
Earth-observation satellites are emerging as distributed edge platforms for time-critical tasks, yet orbital scheduling remains challenged by intermittent energy harvesting and temporal coupling where eager execution risks future battery depletion. Existing schedulers rely on static priorities and lack mechanisms to adaptively shed work. We present Equinox, a lightweight, decentralized runtime for resource-constrained orbital systems. Equinox enables adaptive scheduling by compressing time-varying constraints, including battery charge, thermal headroom, and queue backlog, into a single state-dependent marginal cost of execution. Derived from a barrier function that rises sharply near safety limits, this cost encodes both instantaneous pressure and future risk. This local signal serves as a constellation-wide coordination primitive. Tasks execute only when their value exceeds the current cost, enabling value-ordered load shedding without explicit policies. If local costs exceed a neighbor's, tasks are dynamically offloaded over inter-satellite links, achieving distributed load balancing without routing protocols or global state. We evaluate Equinox using a multi-day simulation of a 143-satellite constellation grounded in physical Jetson Orin Nano measurements. Equinox improves scientific goodput by 20% and image-processing throughput by 31% over priority-based scheduling while maintaining 2.2x higher mean battery reserves. Under high demand, Equinox achieves 5.2x the execution rate of static scheduling by gracefully shedding work rather than collapsing under contention.
In a previous submission, we established a fundamental relation between tone networks and configurations. It was shown that the Eulerian tonnetz can be represented by a $\{12_3\}$ of Daublebsky von Sterneck type D222. We also constructed a tonnetz for Tristan-genus chords (dominant sevenths and half-diminished sevenths) and we showed that this tonnetz can be represented by a $\{12_3\}$ of type D228. In both of these constructions the associated Levi graphs play an important role. Here we look at the tonnetze associated with some other musical systems, thereby offering several concrete examples of an abstract view of music as combinatorial geometry. First, we look at the tonal harmonies typical of the classical period. In the case of diatonic triads, we show the existence of a bipartite graph of type $\{7_3\}$ and girth four that represents the well-known relations between the seven diatonic degrees and their pitch classes. In the case of diatonic seventh chords, we obtain a Fano configuration $\{7_3\}$ which gives a complete characterization of the voice-leading relations that hold between such chords. Next, we construct a tonnetz for pentatonic music based on the Desargues configuration $\{10_3\}$ and we construct a tonnetz for the 12-tone system based on the Cremona-Richmond configuration $\{15_3\}$. Both can be used as a resource for musical compositions. Finally, we show that the relation between the chromatic pitch class set and the major triad set is also represented by a D222. The minor triads are in one-to-one correspondence with the members of a certain class of hexacycles in the Levi graph of this configuration. In this way, the characteristic duality between major and minor triads in the tonnetz can be broken.
This paper presents a model-based reinforcement learning (RL) framework for optimal closed-loop control of nonlinear robotic systems. The proposed approach learns linear lifted dynamics through Koopman operator theory and integrates the resulting model into an actor-critic architecture for policy optimization, where the policy represents a parameterized closed-loop controller. To reduce computational cost and mitigate model rollout errors, policy gradients are estimated using one-step predictions of the learned dynamics rather than multi-step propagation. This leads to an online mini-batch policy gradient framework that enables policy improvement from streamed interaction data. The proposed framework is evaluated on several simulated nonlinear control benchmarks and two real-world hardware platforms, including a Kinova Gen3 robotic arm and a Unitree Go1 quadruped. Experimental results demonstrate improved sample efficiency over model-free RL baselines, superior control performance relative to model-based RL baselines, and control performance comparable to classical model-based methods that rely on exact system dynamics.
We study a finite-horizon covariance steering problem for discrete-time Markov jump linear systems (MJLS) with both state- and control-dependent multiplicative noise. The objective is to minimize a quadratic running cost while steering the system from given mode-conditioned initial means and covariances to a prescribed terminal mean and covariance. We first show that, without loss of generality, feasible controls may be represented by mode-dependent linear feedback together with feedforward and independent random components, and we highlight that, in contrast to the case without multiplicative noise, a purely affine state-feedback law does not in general suffice. To this end, we introduce a lifted-state formulation that embeds the mean and covariance information into a unified second-moment description, and we prove that the resulting lifted problem is equivalent to the original covariance steering problem formulation. This leads to a lossless relaxation in moment variables and an SDP reformulation for the unconstrained case. We further study chance-constrained covariance steering with ball and half-space constraints on the state and control, derive tractable sufficient convex surrogates, and establish an iterative reference-update scheme to reduce conservatism. Numerical experiments on a finance application illustrate our results.
We extend classical evolutionary game dynamics based on the momentary action choices of agents by accounting for two elements: forward-looking behavior and exploration cost. We focus on pairwise comparison protocols that cover major evolutionary game dynamics, such as replicator and logit models. In the proposed mathematical framework, agents update their actions by paying a cost so that a utility or its relative difference is maximized. We show that forward-looking behavior can be modeled as a coupling between the evolutionary game dynamic and static Hamilton-Jacobi-Bellman equation: a mean field game. The exploration cost and its constraint are naturally related to these equations as a function of the optimal Lagrangian multiplier serving as a relaxation parameter, and it is incorporated into the game as a constraint. We show that under certain conditions, our evolutionary game dynamic admits a unique solution. Finally, we computationally investigate one- and two-dimensional problems.
Fundamental rate-distortion-perception (RDP) trade-offs arise in applications requiring maintained perceptual quality of reconstructed data, such as neural image compression. When compressed data is transmitted over public communication channels, security risks emerge. We therefore study secure RDP under negligible information leakage over both noiseless channels and broadcast channels, BCs, with correlated noise components. For noiseless channels, the exact secure RDP region is characterized. For BCs, an inner bound is derived and shown to be tight for a class of more-capable BCs. Separate source-channel coding is further shown to be optimal for this exact secure RDP region with unlimited common randomness available. Moreover, when both encoder and decoder have access to side information correlated with the source and the channel is noiseless, the exact RDP region is established. If only the decoder has correlated side information in the noiseless setting, an inner bound is derived along with a special case where the region is exact. Binary and Gaussian examples demonstrate that common randomness can significantly reduce the communication rate in secure RDP settings, unlike in standard rate-distortion settings. Thus, our results illustrate that random binning-based coding achieves strong secrecy, low distortion, and high perceptual quality simultaneously.
We study the rate-cost tradeoff in rate-limited control of general stochastic control systems, including nonlinear systems, over a finite horizon. At each time step, an encoder observes the state and transmits a description to a controller, which then selects the control action. For an average control-cost threshold $D$, we characterize the minimum achievable communication rate $R_n(D)$ via a nonasymptotic bound: $R_n(D)$ lies within an additive logarithmic gap of the optimal value of a directed-information minimization $F_n(D)$, namely, we show that $F_n(D) \le R_n(D) \le F_n(D)+\log \bigl(F_n(D)+3.4\bigr)+2+\frac{1}{n}$, in bits. This establishes directed information as the operationally relevant quantity governing rate-limited control, thereby broadening its utility beyond its previously established roles in causal source coding and linear quadratic Gaussian (LQG) control to general nonlinear control systems. We prove the upper bound constructively by building an encoding-and-control policy using the strong functional representation lemma at each time step. As special cases of our setting, our framework yields nonasymptotic bounds for sequential (causal) rate-distortion and LQG control.
In frequency division duplex massive multiple-input multiple-output systems, downlink channel state information must be fed back within a limited uplink budget. While transform coding with Karhunen-Loeve transform and reverse water-filling is rate-distortion optimal for Gaussian channels, its performance is limited by basis mismatch between the user and base station. We analyze this mismatch and propose a practical architecture separating long-term basis feedback from short-term coefficient quantization. Using a random vector quantization, we derive a closed-form end-to-end mean square error expression. This allows us to characterize the optimal rate split and identify a phase transition threshold for basis updates. Simulations on correlated Gaussian and COST2100 channels demonstrate near-optimal performance, robustness to update overhead, and significant complexity reduction compared to deep-learning-based autoencoders.
This paper investigates the so-called reward-balancing methods, a novel class of algorithms for solving discounted-return reinforcement learning (RL) problems. These methods consist of iteratively adjusting the reward function to transform the RL problem into an equivalent one in which the optimal policies are greedy. For this procedure, referred to as normalization process, we provide a theoretical analysis of the involved transformations, emphasizing their algebraic structure. Then, we introduce a control-theoretic reformulation, recasting the reward-balancing procedure into an optimal control framework. The approach is further extended to address model uncertainty through stochastic model sampling, yielding normalization guarantees and probabilistic bounds on stochastic fluctuations. Using the proposed optimal control framework within a scenario model predictive control (MPC) setting, we demonstrate, through simulation studies, performance improvements over the current state-of-the-art.
The ongoing shift towards decentralization of the electric energy sector, driven by the growing electrification across end-use sectors, and widespread adoption of distributed energy resources (DERs), necessitates their active participation in the electricity markets to support grid operations. Furthermore, with bi-directional energy and communication flows becoming standard, intelligent, easy-to-deploy, resource-conservative demand-side participation is expected to play a critical role in securing power grid operational flexibility and market efficiency. This work proposes a market engagement framework that leverages a hierarchical multi-agent deep reinforcement learning (MARL) approach to enable individual prosumers to participate in peer-to-peer retail auctions and further aggregate these intelligent prosumers to facilitate effective DER participation in wholesale markets. Ultimately, a Stackelberg game is proposed to coordinate this hierarchical MARL-based DER market participation framework toward enhanced market performance.
Direct satellite uplink is severely constrained by limited link budgets, which hinder the exploitation of wideband resources, and ultimately limit the throughout. This paper presents a pilot-less coded modulation scheme based on sparse superposition coding (SSC) to enable efficient wideband usage in coverage-limited scenarios. This scheme leverages the structured Zadoff-Chu quasi-orthogonal (ZC-QO) dictionary to support scalable transmission. To address decoding complexity, the SSC transmitted signal embeds root index information via indicator sequences, allowing the receiver to restrict the decoding search space. In addition, a multi-codeword transmission framework with repetition and stop-feedback is developed, enabling reliable communication and better resource utilization. Simulation results show that the proposed scheme achieves throughput gains compared to a more conventional narrow-band multi-dimensional constellation-based approach.
Digital twins of natural systems must remain aligned with physical systems that evolve over time, are only partially observed, and are typically modeled by mechanistic simulators whose parameters cannot be measured directly. In such settings, model adaptation is naturally posed as a simulation-based inference problem. However, sparse and indirect observations often fail to identify a unique and optimal calibration, leaving several simulator parameterizations compatible with the available evidence. This article presents a GFlowNet-based approach to model adaptation for digital twins of natural systems. We formulate adaptation as a generative modeling problem over complete simulator configurations, so that plausible parameterizations can be sampled with probability proportional to a reward derived from agreement between simulated and observed behavior. Using a controlled environment agriculture case study based on a mechanistic tomato model, we show that the learned policy recovers dominant regions of the adaptation landscape, retrieves strong calibration hypotheses, and preserves multiple plausible configurations under uncertainty.
Omnimodal Notation Processing (ONP) represents a unique frontier for omnimodal AI due to the rigorous, multi-dimensional alignment required across auditory, visual, and symbolic domains. Current research remains fragmented, focusing on isolated transcription tasks that fail to bridge the gap between superficial pattern recognition and the underlying musical logic. This landscape is further complicated by severe notation biases toward Western staff and the inherent unreliability of "LLM-as-a-judge" metrics, which often mask structural reasoning failures with systemic hallucinations. To establish a more rigorous standard, we introduce ONOTE, a multi-format benchmark that utilizes a deterministic pipeline--grounded in canonical pitch projection--to eliminate subjective scoring biases across diverse notation systems. Our evaluation of leading omnimodal models exposes a fundamental disconnect between perceptual accuracy and music-theoretic comprehension, providing a necessary framework for diagnosing reasoning vulnerabilities in complex, rule-constrained domains.
Autonomous systems that rely on learned perception can make unsafe decisions when sensor readings are misclassified. We study shielding for this setting: given a proposed action, a shield blocks actions that could violate safety. We consider the common case where system dynamics are known but perception uncertainty must be estimated from finite labeled data. From these data we build confidence intervals for the probabilities of perception outcomes and use them to model the system as a finite Interval Partially Observable Markov Decision Process with discrete states and actions. We then propose an algorithm to compute a conservative set of beliefs over the underlying state that is consistent with the observations seen so far. This enables us to construct a runtime shield that comes with a finite-horizon guarantee: with high probability over the training data, if the true perception uncertainty rates lie within the learned intervals, then every action admitted by the shield satisfies a stated lower bound on safety. Experiments on four case studies show that our shielding approach (and variants derived from it) improves the safety of the system over state-of-the-art baselines.
Accurate condition monitoring of industrial equipment requires inferring latent degradation parameters from indirect sensor measurements under uncertainty. While traditional Bayesian methods like Markov Chain Monte Carlo (MCMC) provide rigorous uncertainty quantification, their heavy computational bottlenecks render them impractical for real-time process control. To overcome this limitation, we propose an AI-driven framework utilizing Simulation-Based Inference (SBI) powered by amortized neural posterior estimation to diagnose complex failure modes in heat exchangers. By training neural density estimators on a simulated dataset, our approach learns a direct, likelihood-free mapping from thermal-fluid observations to the full posterior distribution of degradation parameters. We benchmark this framework against an MCMC baseline across various synthetic fouling and leakage scenarios, including challenging low-probability, sparse-event failures. The results show that SBI achieves comparable diagnostic accuracy and reliable uncertainty quantification, while accelerating inference time by a factor of82$\times$ compared to traditional sampling. The amortized nature of the neural network enables near-instantaneous inference, establishing SBI as a highly scalable, real-time alternative for probabilistic fault diagnosis and digital twin realization in complex engineering systems.
The rapid collapse of decentralized game economies, often characterized by the \textit{death spiral,} remains the most formidable barrier to the mass adoption of Web3 gaming. This paper proposes that the sustainability of an open game economy is predicated on three necessary and sufficient conditions: Anti-Sybil Resilience, Anti-Capital Dominance, and Anti-Inflationary Saturation. The first section establishes a theoretical proof of these conditions, arguing that the absence of any single dimension leads to systemic failure. The second section explores the dialectical relationship between these dimensions, illustrating how unchecked automation and capital-driven monopolies accelerate asset hyperinflation. In the third section, we introduce the Identity-Bound Asset Integrity Model (IBAIM) as a comprehensive technical solution. IBAIM utilizes Zero-Knowledge (ZK) biometric hashing and Account Abstraction (AA) to anchor asset utility to unique human identities through a privacy-preserving and regulatory-compliant architecture. By exogenizing biometric verification to trusted local environments and utilizing Zero-Knowledge Proofs of Identity (zk-PoI), the model ensures absolute user privacy. Furthermore, by implementing an Asymmetric Utility Decay (AUD) engine-whereby assets suffer a vertical 50% utility cliff upon secondary transfer-and an entropy-driven thermodynamic degradation mechanism., the model successfully decouples financial speculation from in-game merit. Finally, we apply this framework to analyze prominent historical failures in the GameFi sector, demonstrating that their collapse was an inevitable consequence of violating these core economic constraints. Our findings suggest that trading a degree of asset liquidity for system integrity is the only viable path toward long-term economic viability in decentralized virtual worlds.
Self-excited limit-cycle oscillations (LCOs) from Hopf bifurcations are a key feature of nonlinear aeroelasticity and depend sensitively on structural and aerodynamic parameters. Classical center-manifold and normal-form theory describe this local behavior, but can be cumbersome to apply in large discretized models and standard reduced-order modeling (ROM) workflows. A renormalization-group (RG)-based reduction is developed that directly yields a Hopf-type amplitude equation on a local invariant manifold, specialized for polynomial nonlinearities in tensor-based discretizations and compatible with finite-element-type settings. The method provides explicit coefficients governing the Hopf threshold, criticality, and leading LCO amplitude/frequency trends, and admits a companion slow-manifold approximation with selected stable modes retained as static coordinates. Representative nonlinear-aeroelastic examples illustrate how the proposed framework supplies compact, parameter-aware Hopf/LCO descriptors suitable for local ROM construction near flutter.
Federated learning (FL) enables collaborative model training without sharing raw data; however, the presence of noisy labels across distributed clients can severely degrade the learning performance. In this paper, we propose FedSIR, a multi-stage framework for robust FL under noisy labels. Different from existing approaches that mainly rely on designing noise-tolerant loss functions or exploiting loss dynamics during training, our method leverages the spectral structure of client feature representations to identify and mitigate label noise. Our framework consists of three key components. First, we identify clean and noisy clients by analyzing the spectral consistency of class-wise feature subspaces with minimal communication overhead. Second, clean clients provide spectral references that enable noisy clients to relabel potentially corrupted samples using both dominant class directions and residual subspaces. Third, we employ a noise-aware training strategy that integrates logit-adjusted loss, knowledge distillation, and distance-aware aggregation to further stabilize federated optimization. Extensive experiments on standard FL benchmarks demonstrate that FedSIR consistently outperforms state-of-the-art methods for FL with noisy labels. The code is available at this https URL.
Multi-agent systems, e.g., automobiles and UAVs (Unmanned Ariel Vehicles), rely on the precision of onboard sensors to accurately perceive their environment, which in turn depends on the precision of onboard sensors and reliable in-field calibration. This paper introduces a novel targetless camera-LiDAR extrinsic calibration approach called Multi-FEAT (Multi-Feature Edge AlignmenT). Multi-FEAT uses the cylindrical projection model to encode the 3D LiDAR point cloud into a 2D panorama and exploits diverse LiDAR feature information in panoramic images to supplement the sparse LiDAR point cloud boundaries. Furthermore, camera edges are extracted using off-the-shelf segmentation solutions. In addition, a feature-matching function is designed to optimize the calibration parameters. The performance of the proposed Multi-FEAT algorithm is evaluated using the KITTI dataset, and our approach shows more reliable results than several existing targetless calibration methods. We conclude our analysis with directions for future work.
Objective: Cytology plays a crucial role in lung cancer diagnosis. Pulmonary cytology involves cell morphological characterization in the specimen and reporting the corresponding findings, which are extremely burdensome tasks. In this study, we propose a technique to generate cytologic findings from for cytologic images to assist in the reporting of pulmonary cytology. Methods: For this study, 801 patch images were retrieved using cytology specimens collected from 206 patients; the findings were assigned to each image as a dataset for generating cytologic findings. The proposed method consists of a vision model and dual text decoders. In the former, a convolutional neural network (CNN) is used to classify a given image as benign or malignant, and the features related to the image are extracted from the intermediate layer. Independent text decoders for benign and malignant cells are prepared for text generation, and the text decoder switches according to the CNN classification results. The text decoder is configured using a Transformer that uses the features obtained from the CNN for generating findings. Results: The sensitivity and specificity were 100% and 96.4%, respectively, for automated benign and malignant case classification, and the saliency map indicated characteristic benign and malignant areas. The grammar and style of the generated texts were confirmed correct, achieving a BLEU-4 score of 0.828, reflecting high degree of agreement with the gold standard, outperforming existing LLM-based image-captioning methods and single-text-decoder ablation model. Conclusion: Experimental results indicate that the proposed method is useful for pulmonary cytology classification and generation of cytologic findings.
This paper investigates the economic impact of vehicle-home-grid integration through an online optimization algorithm that manages energy flows between an electric vehicle, a household, and the electrical grid. The algorithm exploits vehicle-to-home (V2H) for self-consumption and vehicle-to-grid (V2G) for energy trading, adapting in real-time via a hybrid long short-term memory (LSTM) network for household load prediction and a nonlinear battery degradation model including cycle and calendar aging. Simulations show annual economic benefits up to EUR 3046.81 compared to smart unidirectional charging, despite a modest 1.96% increase in battery aging. Even under unfavorable market conditions, with no V2G revenue, V2H alone provides yearly savings of EUR 425.48. Sensitivity analyses on battery capacity, household load, and price ratios confirm the consistent benefits of bidirectional energy exchange, highlighting the role of EVs as active energy nodes for sustainable management.
Millimeter-wave (mmWave) positioning has emerged as a promising technology for next-generation intelligent systems. The advent of reconfigurable intelligent surfaces (RISs) has revolutionized high-precision mmWave localization by enabling dynamic manipulation of wireless propagation environments. This paper investigates a three-dimensional (3D) multi-input single-output (MISO) mmWave positioning system assisted by multiple RISs. We introduce a measurement framework incorporating sequential RIS activation and directional beamforming to fully exploit virtual line-of-sight (VLoS) paths. The theoretical performance limits are rigorously analyzed through derivation of the Fisher information and subsequent positioning error bound (PEB). To minimize the PEB, two distinct optimization approaches are proposed for continuous and discrete phase shift configurations of RISs. For continuous phase shifts, a Riemannian manifold-based optimization algorithm is proposed. For discrete phase shifts, a heuristic algorithm incorporating the grey wolf optimizer is proposed. Extensive numerical simulations demonstrate the effectiveness of the proposed algorithms in reducing the PEB and validate the improvement in positioning accuracy achieved by multiple RISs.
This paper investigates a flexible intelligent metasurface (FIM)-enabled wireless communication system that integrates simultaneously transmitting and reflecting beyond diagonal reconfigurable intelligent surfaces (STAR-BD-RIS) with non-orthogonal multiple access (NOMA). The considered system consists of a multi-antenna FIM-assisted base station (BS) supported by dual-sector BD-RIS. The FIM is composed of low-cost radiating elements capable of independent signal transmission and dynamic vertical reconfiguration (morphing). The objective is to maximize energy efficiency (EE) by jointly optimizing the BS beamforming, STAR-BD-RIS configuration, NOMA-related variables, and the FIM surface shape under practical power constraints. Due to the highly non-convex nature of the problem, an adaptive inverse-weighted Meta-Soft Actor-Critic (AIW-Meta-SAC) algorithm is proposed. Unlike conventional Meta-SAC approaches, the proposed method employs an adaptive weighting mechanism to effectively incorporate system constraints into the reward function, thereby improving learning efficiency and convergence behavior. Simulation results demonstrate that the proposed AIW-Meta-SAC significantly outperforms the Meta-DDPG baseline. Furthermore, the FIM-assisted STAR-BD-RIS architecture achieves notable energy efficiency gains compared to conventional benchmark schemes.
Hadamard matrix-based aperture encoding is a method for producing synthetic aperture datasets with high Signal-to-Noise Ratios. Recently, the pulse inversion capabilities of bias-sensitive Top-Orthogonal to Bottom Electrode (TOBE) arrays have driven the development of multiple Hadamard-based sequences. These sequences produce high-quality static images but are sensitive to motion. This work introduces Recursive Aperture Decoded Imaging (READI) and Estimated Motion-Compensated Compounding (EMC2), which look to reduce this sensitivity. READI is a novel decoding and beamforming technique for Hadamard aperture-encoded sequences that produces multiple low-resolution images from subsets of the full sequence. These READI images are less affected by motion and sum to form the complete high-resolution image. EMC2 describes the process of comparing these low-resolution images to estimate the underlying motion, then warping them to align before compounding. This produces a high-resolution image that is resiliant to motion. READI with EMC2 applied to the TOBE-based Fast Orthogonal Row-Column Electronic Scanning (FORCES) sequence. It is shown to fully restore images corrupted by probe motion and to recover tissue speckle and boundaries in images of a beating heart phantom. READI low-resolution images by themselves are demonstrated to be a marked improvement over a sparse Hadamard scheme with the same transmit count, and are able to recover blood speckle at a flow rate of 42 cm/s.
Individualized therapy is driven forward by medical data analysis, which provides insight into the patient's context. In particular, for Type 1 Diabetes (T1D), which is an autoimmune disease, relationships between demographics, sensor data, and context can be analyzed. However, outliers, noisy data, and small data volumes cannot provide a reliable analysis. Hence, the research domain requires large volumes of high-quality data. Moreover, missing values can lead to information loss. To address this limitation, this study improves the data quality of DiaData, an integration of 15 separate datasets containing glucose values from 2510 subjects with T1D. Notably, we make the following contributions: 1) Outliers are identified with the interquartile range (IQR) approach and treated by replacing them with missing values. 2) Small gaps ($\le$ 25 min) are imputed with linear interpolation and larger gaps ($\ge$ 30 and $<$ 120 min) with Stineman interpolation. Based on a visual comparison, Stineman interpolation provides more realistic glucose estimates than linear interpolation for larger gaps. 3) After data cleaning, the correlation between glucose and heart rate is analyzed, yielding a moderate relation between 15 and 60 minutes before hypoglycemia ($\le$ 70 mg/dL). 4) Finally, a benchmark for hypoglycemia classification is provided with a state-of-the-art ResNet model. The model is trained with the Maindatabase and Subdatabase II of DiaData to classify hypoglycemia onset up to 2 hours in advance. Training with more data improves performance by 7% while using quality-refined data yields a 2-3% gain compared to raw data.
Fully digital massive MIMO systems with large numbers (1000+) of antennas offer dramatically increased capacity gains from spatial multiplexing and beamforming. Designing digital receivers that can scale to these array dimensions presents significant challenges regarding both channel estimation overhead and digital computation. In the massive MIMO setting, long-term beamforming is widely-used since it offers significant reductions in both computation and channel estimation overhead. Long-term beamforming operates by projecting the data onto a low-dimensional subspace that can be tracked at a relatively slow time-scale from the long-term channel parameters. In this setting, we show how to optimally compute the projection matrix to maximize a capacity upper-bound using a matrix inverse square root. Computationally efficient methods are then presented to perform the matrix computation. The methods can be realized with matrix-matrix multiplies, making them amenable to systolic array implementations in hardware. Error analysis bounds on the degradation in the SINR for users are derived. Ray tracing simulations in a realistic rural uplink setting show minimal loss relative to complete instantaneous MMSE beamforming while offering significant overhead and computational gains.
With the rapid deployments of 5G and 6G networks, accurate modeling of urban radio propagation has become critical for system design and network planning. However, conventional statistical or empirical models fail to fully capture the influence of detailed geometric features on site-specific channel variances in dense urban environments. In this paper, we propose a geometry map-based propagation channel model that directly extracts key parameters from a 3D geometry map and incorporates the Uniform Theory of Diffraction (UTD) to recursively compute multiple diffraction fields, thereby enabling accurate prediction of site-specific large-scale path loss and time-varying Doppler characteristics in urban scenarios. A well-designed identification algorithm is developed to efficiently detect buildings that significantly affect signal propagation. The proposed model is validated using urban measurement data, showing excellent agreement of path loss in both line-of-sight (LOS) and nonline-of-sight (NLOS) conditions. In particular, for NLOS scenarios with complex diffractions, it outperforms the 3GPP and simplified models, reducing the RMSE by 7.1 dB and 3.18 dB, respectively. Doppler analysis further demonstrates its accuracy in capturing time-varying propagation characteristics, confirming the scalability and generalization of the model in urban environments.
Model Predictive Control (MPC) has established itself as the primary methodology for constrained control, enabling autonomy across diverse applications. While model fidelity is crucial in MPC, solving the corresponding optimization problem in real time remains challenging when combining long horizons with high-fidelity models that capture both short-term dynamics and long-term behavior. Motivated by results on the Exponential Decay of Sensitivities (EDS), which imply that, under certain conditions, the influence of modeling inaccuracies decreases exponentially along the prediction horizon, this paper proposes a multi-timescale MPC scheme for fast-sampled control. Tailored to systems with both fast and slow dynamics, the proposed approach improves computational efficiency by i) switching to a reduced model that captures only the slow, dominant dynamics and ii) exponentially increasing integration step sizes to progressively reduce model detail along the horizon. We evaluate the method on three practically motivated robotic control problems in simulation and observe speed-ups of up to an order of magnitude.
This paper presents a safe output regulation control strategy for a class of systems modeled by a coupled $2\times 2$ hyperbolic PDE-ODE structure, subject to fully distributed disturbances throughout the system. A state-feedback controller is developed by the {nonovershooting backstepping} method to simultaneously achieve exponential output regulation and enforce safety constraints on the regulated output that is the state furthest from the control input. To handle unmeasurable states and external disturbances, a state observer and a disturbance estimator are designed. Explicit bounds on the estimation errors are derived and used to construct a robust safe regulator that accounts for the uncertainties. The proposed control scheme guarantees that: 1) If the regulated output is initially within the safe region, it remains there; otherwise, it will be rescued to the safety within a prescribed time; 2) The output tracking error converges to zero exponentially; 3) The observer accurately estimates both the distributed states and external disturbances, with estimation errors converging to zero exponentially; 4) All signals in the closed-loop system remain bounded. The effectiveness of the proposed method is demonstrated through a UAV delivery scenario with a cable-suspended payload, where the payload is regulated to track a desired reference while avoiding collisions with barriers.
This letter seeks to clarify the different existing definitions of both instantaneous complex phase and frequency as well as their equivalence under standard modeling assumptions considered for transmission systems, i.e. balanced positive sequence operation, sole presence of electro-mechanical transient dynamics and absence of harmonics and interharmonics. To achieve this, the two fundamental definitions, i.e., those based on either the use of (i) analytic signals or (ii) space vectors, together with the premises used for their formulation, are presented and their relationship shown. Lastly, a unified notation and terminology to avoid confusion is proposed.
This paper studies the efficient implementation of safety filters that are designed using control barrier functions (CBFs), which minimally modify a nominal controller to render it safe with respect to a prescribed set of states. Although CBF-based safety filters are often implemented by solving a quadratic program (QP) in real time, the use of off-the-shelf solvers for such optimization problems poses a challenge in applications where control actions need to be computed efficiently at very high frequencies. In this paper, we introduce a closed-form expression for controllers obtained through CBF-based safety filters. This expression is obtained by partitioning the state-space into different regions, with a different closed-form solution in each region. We leverage this formula to introduce a resource-aware implementation of CBF-based safety filters that detects changes in the partition region and uses the closed-form expression between changes. We showcase the applicability of our approach in examples ranging from aerospace control to safe reinforcement learning.
Ultrasonic imaging methods often assume linear direct models, while in reality, many nonlinear phenomena are present, e.g. multiple reflections. A family of imaging methods called Full Waveform Inversion (FWI), which has been developed in the field of seismic imaging, uses full acoustic wave simulations as direct models, taking into account virtually all nonlinearities, which can ultimately enhance the accuracy of ultrasonic imaging. However, the problem of cycle skipping -- the existence of many local minima of the Least Squares (L2) misfit function due to the oscillatory nature of the signals -- is worsened when FWI is applied to ultrasound data because of a lack of low-frequency components. In this paper, we explore the use of the squared Wasserstein (W2) Optimal Transport Distance as the metric for the misfit between the acquired and the synthetic data, applying the method to Nondestructive Evaluation with ultrasonic phased arrays. An analytical continuous time-domain derivation of the adjoint acoustic field related to the W2 misfit is presented and used for the computation of the gradients. To cope with the computational burden of FWI, we apply a low-memory strategy that allows for the computation of the gradients without the storage of the full simulated fields. The GPU implementation of the method (in CUDA language) is detailed, and the source code is made available. Six prototypical cases are presented, and the corresponding sound speed maps are reconstructed with FWI using both the L2 and the W2 misfit functionals. In five of the six cases, the pixel-wise sum of squared errors obtained with W2 was at least one order of magnitude lower than that obtained with W2, with an increase in the gradient computation time not exceeding 2\%. The results highlight both the adequacy of the W2 misfit for ultrasonic FWI with phased arrays and its computational feasibility.
This paper introduces PowerDAG, an agentic AI system for automating complex distribution-grid analysis. We address the reliability challenges of state-of-the-art agentic systems in automating complex engineering workflows by introducing two innovative active mechanisms: adaptive retrieval, which uses a similarity-decay cutoff algorithm to dynamically select the most relevant annotated exemplars as context, and just-in-time (JIT) supervision, which actively intercepts and corrects tool-usage violations during execution. On a benchmark of unseen distribution grid analysis queries, PowerDAG achieves a 100% success rate with GPT-5.2 and 94.4--96.7% with smaller open-source models, outperforming base ReAct (41-88%), LangChain (30-90%), and CrewAI (9-41%) baselines by margins of 6-50 percentage points.
Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion quality, building zero-shot VC systems for interactive scenarios remains challenging because high-fidelity speaker transfer and low-latency streaming inference are difficult to achieve simultaneously. In this work, we present X-VC, a zero-shot streaming VC system that performs one-step conversion in the latent space of a pretrained neural codec. X-VC uses a dual-conditioning acoustic converter that jointly models source codec latents and frame-level acoustic conditions derived from target reference speech, while injecting utterance-level target speaker information through adaptive normalization. To reduce the mismatch between training and inference, we train the model with generated paired data and a role-assignment strategy that combines standard, reconstruction, and reversed modes. For streaming inference, we further adopt a chunkwise inference scheme with overlap smoothing that is aligned with the segment-based training paradigm of the codec. Experiments on Seed-TTS-Eval show that X-VC achieves the best streaming WER in both English and Chinese, strong speaker similarity in same-language and cross-lingual settings, and substantially lower offline real-time factor than the compared baselines. These results suggest that codec-space one-step conversion is a practical approach for building high-quality low-latency zero-shot VC systems. Our audio samples, code and checkpoints are released at this https URL.
Odrzywolek (2026) recently introduced the Exp-Minus-Log (EML) operator eml (x, y) = exp(x) - ln(y) and proved constructively that, paired with the constant 1, it generates the entire scientific-calculator basis of elementary functions; in this sense EML is to continuous mathematics what NAND is to Boolean logic. We investigate whether such a uniform single-operator representation can accelerate either the forward simulation or the parameter identification of a six-branch RC equivalent-circuit model (6rc ECM) of a lithium-ion battery cell. We give the analytical EML rewrite of the discretized state-space recursion, derive an exact operation count, and quantify the depth penalty of the master-formula construction used for gradient-based symbolic regression. Our analysis shows that direct EML simulation is slower than the classical exponential-Euler scheme (a ~ 25x instruction overhead per RC branch), but EML-based parametrization offers a structurally complete, gradient-differentiable basis that competes favourably with non-parametric DRT deconvolution and metaheuristic optimisation when the cardinality of RC branches is unknown a priori. We conclude with a concrete recommendation: use EML only on the parametrization side of the 6rc workflow, keeping the classical recursion at runtime.
In high-noise environments such as factories, subways, and busy streets, capturing clear speech is challenging. Throat microphones can offer a solution because of their inherent noise-suppression capabilities; however, the passage of sound waves through skin and tissue attenuates high-frequency information, reducing speech clarity. Recent deep learning approaches have shown promise in enhancing throat microphone recordings, but further progress is constrained by the lack of a standard dataset. Here, we introduce the Throat and Acoustic Paired Speech (TAPS) dataset, a collection of paired utterances recorded from 60 native Korean speakers using throat and acoustic microphones. Furthermore, an optimal alignment approach was developed and applied to address the inherent signal mismatch between the two microphones. We tested three baseline deep learning models on the TAPS dataset and found mapping-based approaches to be superior for improving speech quality and restoring content. These findings demonstrate the TAPS dataset's utility for speech enhancement tasks and support its potential as a standard resource for advancing research in throat microphone-based applications.
Multichannel audio mixer and limiter designs are conventionally decoupled for content reproduction over loudspeaker arrays due to high computational complexity and run-time costs. We propose a coupled mixer-limiter-envelope design formulated as an efficient linear-constrained quadratic program that minimizes a distortion objective over multichannel gain variables subject to sample mixture constraints. Novel methods for asymmetric constant overlap-add window optimization, objective function approximation, variable and constraint reduction are presented. Experiments demonstrate distortion reduction of the coupled design, and computational trade-offs required for efficient real-time processing.
Encoding static images into spike trains is a fundamental step for enabling Spiking Neural Networks (SNNs) to process visual information. However, widely used methods such as rate coding, Poisson encoding, and time-to-first-spike (TTFS) often neglect spatial correlations and produce temporally inconsistent spike patterns, limiting both efficiency and interpretability. In this work, we propose a novel cluster-based encoding framework that explicitly preserves semantic structure across both spatial and temporal domains. The method first introduces a 2D spatial clustering mechanism, which leverages connected component analysis and local density estimation to identify salient foreground regions. Building upon this, we extend the approach to a 3D spatio-temporal (ST3D) encoding scheme that incorporates temporal neighborhood information, generating spike trains with enhanced temporal coherence. Experiments on the N-MNIST dataset demonstrate that the proposed ST3D encoder achieves 98.17% classification accuracy using a simple single-layer SNN, outperforming conventional TTFS encoding (97.58%). Notably, this performance is achieved with significantly fewer spikes (3800 vs. 5000 per sample), highlighting improved efficiency without sacrificing accuracy. These results indicate that the proposed method provides an interpretable, structure-aware, and computationally efficient encoding strategy, offering strong potential for neuromorphic computing applications.
Autonomous nano-drones, powered by vision-based tiny machine learning (TinyML) models, are a novel technology gaining momentum thanks to their broad applicability and pushing scientific advancement on resource-limited embedded systems. Their small form factor, i.e., a few tens of grams, severely limits their onboard computational resources to sub-100mW microcontroller units (MCUs). The Bitcraze Crazyflie nano-drone is the de facto standard, offering a rich set of programmable MCUs for low-level control, multi-core processing, and radio transmission. However, roboticists very often underutilize these onboard precious resources due to the absence of a simple yet efficient software layer capable of time-optimal pipelining of multi-buffer image acquisition, multi-core computation, intra-MCUs data exchange, and Wi-Fi streaming, leading to sub-optimal control performances. Our NanoCockpit framework aims to fill this gap, increasing the throughput and minimizing the system's latency, while simplifying the developer experience through coroutine-based multi-tasking. In-field experiments on three real-world TinyML nanorobotics applications show our framework achieves ideal end-to-end latency, i.e. zero overhead due to serialized tasks, delivering quantifiable improvements in closed-loop control performance (-30% mean position error, mission success rate increased from 40% to 100%).
Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substantially in structure and are time-consuming to access and process, which impedes data integration and reduces the comparability and generalizability of algorithmic developments. This work aims to establish a unified and accessible data resource for T1D algorithm development. Multiple publicly available T1D datasets were consolidated into a unified resource, termed the MetaboNet dataset. Inclusion required the availability of both continuous glucose monitoring (CGM) data and corresponding insulin pump dosing records. Additionally, auxiliary information such as reported carbohydrate intake and physical activity was retained when present. The MetaboNet dataset comprises 3135 subjects and 1228 patient-years of overlapping CGM and insulin data, making it substantially larger than existing standalone benchmark datasets. The resource is distributed as a fully public subset available for immediate download at this https URL , and with a Data Use Agreement (DUA)-restricted subset accessible through their respective application processes. For the datasets in the latter subset, processing pipelines are provided to automatically convert the data into the standardized MetaboNet format. A consolidated public dataset for T1D research is presented, and the access pathways for both its unrestricted and DUA-governed components are described. The resulting dataset covers a broad range of glycemic profiles and demographics and thus can yield more generalizable algorithmic performance than individual datasets.
Multi-branch deep neural networks like AASIST3 achieve state-of-the-art comparable performance in audio anti-spoofing, yet their internal decision dynamics remain opaque compared to traditional input-level saliency methods. While existing interpretability efforts largely focus on visualizing input artifacts, the way individual architectural branches cooperate or compete under different spoofing attacks is not well characterized. This paper develops a framework for interpreting AASIST3 at the component level. Intermediate activations from fourteen branches and global attention modules are modeled with covariance operators whose leading eigenvalues form low-dimensional spectral signatures. These signatures train a CatBoost meta-classifier to generate TreeSHAP-based branch attributions, which we convert into normalized contribution shares and confidence scores (Cb) to quantify the model's operational strategy. By analyzing 13 spoofing attacks from the ASVspoof 2019 benchmark, we identify four operational archetypes-ranging from Effective Specialization (e.g., A09, Equal Error Rate (EER) 0.04%, C=1.56) to Ineffective Consensus (e.g., A08, EER 3.14%, C=0.33). Crucially, our analysis exposes a Flawed Specialization mode where the model places high confidence in an incorrect branch, leading to severe performance degradation for attacks A17 and A18 (EER 14.26% and 28.63%, respectively). These quantitative findings link internal architectural strategy directly to empirical reliability, highlighting specific structural dependencies that standard performance metrics overlook.
We introduce LRLspoof, a large-scale multilingual synthetic-speech corpus for cross-lingual spoof detection, comprising 2,732 hours of audio generated with 24 open-source TTS systems across 66 languages, including 45 low-resource languages under our operational definition. To evaluate robustness without requiring target-domain bonafide speech, we benchmark 11 publicly available countermeasures using threshold transfer: for each model we calibrate an EER operating point on pooled external benchmarks and apply the resulting threshold, reporting spoof rejection rate (SRR). Results show model-dependent cross-lingual disparity, with spoof rejection varying markedly across languages even under controlled conditions, highlighting language as an independent source of domain shift in spoof detection. The dataset is publicly available at \href{this https URL}{\textbf{\underline{\textit{HuggingFace}}}} and \href{this https URL}{\textbf{\underline{\textit{ModelScope}}}}
Large-scale three-dimensional (3D) scene reconstruction in low-altitude intelligent networks (LAIN) demands highly efficient wireless image transmission. However, existing schemes struggle to balance severe pilot overhead with the transmission accuracy required to maintain reconstruction fidelity. To strike a balance between efficiency and reliability, this paper proposes a novel deep learning-based end-to-end (E2E) transceiver design that integrates 3D Gaussian Splatting (3DGS) directly into the training process. By jointly optimizing the communication modules via the combined 3DGS rendering loss, our approach explicitly improves scene recovery quality. Furthermore, this task-driven framework enables the use of a sparse pilot scheme, significantly reducing transmission overhead while maintaining robust image recovery under low-altitude channel conditions. Extensive experiments on real-world aerial image datasets demonstrate that the proposed E2E design significantly outperforms existing baselines, delivering superior transmission performance and accurate 3D scene reconstructions.
Deep neural networks (DNNs) deliver state-of-the-art accuracy on regression and classification tasks, yet two structural deficits persistently obstruct their deployment in safety-critical, resource-constrained settings: (i) opacity of the learned function, which precludes formal verification, and (ii) reliance on heterogeneous, library-bound activation functions that inflate latency and silicon area on edge hardware. The recently introduced Exp-Minus-Log (EML) Sheffer operator, eml(x, y) = exp(x) - ln(y), was shown by Odrzywolek (2026) to be sufficient - together with the constant 1 - to express every standard elementary function as a binary tree of identical nodes. We propose to embed EML primitives inside conventional DNN architectures, yielding a hybrid DNN-EML model in which the trunk learns distributed representations and the head is a depth-bounded, weight-sparse EML tree whose snapped weights collapse to closed-form symbolic sub-expressions. We derive the forward equations, prove computational-cost bounds, analyse inference and training acceleration relative to multilayer perceptrons (MLPs) and physics-informed neural networks (PINNs), and quantify the trade-offs for FPGA/analog deployment. We argue that the DNN-EML pairing closes a literature gap: prior neuro-symbolic and equation-learner approaches (EQL, KAN, AI-Feynman) work with heterogeneous primitive sets and do not exploit a single hardware-realisable Sheffer element. A balanced assessment shows that EML is unlikely to accelerate training, and on commodity CPU/GPU it is also unlikely to accelerate inference; however, on a custom EML cell (FPGA logic block or analog circuit) the asymptotic latency advantage can reach an order of magnitude with simultaneous gain in interpretability and formal-verification tractability.
Clinical ultrasound analysis demands models that generalize across heterogeneous organs, views, and devices, while supporting interpretable workflow-level analysis. Existing methods often rely on task-wise adaptation, and joint learning may be unstable due to cross-task interference, making it hard to deliver workflow-level outputs in practice. To address these challenges, we present USTri, a tri-stage ultrasound intelligence pipeline for unified multi-organ, multi-task analysis. Stage I trains a universal generalist USGen on different domains to learn broad, transferable priors that are robust to device and protocol variability. To better handle domain shifts and reach task-aligned performance while preserving ultrasound shared knowledge, Stage II builds USpec by keeping USGen frozen and finetuning dataset-specific heads. Stage III introduces USAgent, which mimics clinician workflows by orchestrating USpec specialists for multi-step inference and deterministic structured reports. On the FMC\_UIA validation set, our model achieves the best overall performance across 4 task types and 27 datasets, outperforming state-of-the-art methods. Moreover, qualitative results show that USAgent produces clinically structured reports with high accuracy and interpretability. Our study suggests a scalable path to ultrasound intelligence that generalizes across heterogeneous ultrasound tasks and supports consistent end-to-end clinical workflows. The code is publicly available at: this https URL.