New articles on Electrical Engineering and Systems Science


[1] 2505.09711

Efficient Near-Field Beam Focusing Merging Orthogonal Matching Pursuit and CVX for Large Intelligent Surface Applications

In this paper, an efficient near-field beamforming method is proposed to support the large intelligent surfaces (LIS) that are expected to be widely deployed in 6G networks. This approach avoids directly applying convex (CVX) optimization for sparse selection in large-size array matrices, as such methods often lead to excessive computational time due to blind searching to satisfy a series of objective functions. First, based on the objective function, we prioritize a key component and employ the orthogonal matching pursuit (OMP) method to pre-select potential sparse target positions. To ensure focal symmetry, a coordinate mirror symmetry approach is adopted, meaning that selection is performed only in the first quadrant, while the remaining quadrants are determined through mirror symmetry relative to the first quadrant. This significantly reduces computational complexity at an early stage. Next, CVX is applied based on the pre-selected sparse array. Once a predefined threshold is met, a solution is obtained that satisfies the constraints of the beamfocusing. The results demonstrate that, compared with conventional methods, this approach improves efficiency by 15.12 times with 121 elements and 96.73 times with 441 elements. The proposed method demonstrates not only satisfactory performance but also considerable potential as a beam focusing technique for large-scale near-field array systems.


[2] 2505.09734

Risk-Aware Safe Reinforcement Learning for Control of Stochastic Linear Systems

This paper presents a risk-aware safe reinforcement learning (RL) control design for stochastic discrete-time linear systems. Rather than using a safety certifier to myopically intervene with the RL controller, a risk-informed safe controller is also learned besides the RL controller, and the RL and safe controllers are combined together. Several advantages come along with this approach: 1) High-confidence safety can be certified without relying on a high-fidelity system model and using limited data available, 2) Myopic interventions and convergence to an undesired equilibrium can be avoided by deciding on the contribution of two stabilizing controllers, and 3) highly efficient and computationally tractable solutions can be provided by optimizing over a scalar decision variable and linear programming polyhedral sets. To learn safe controllers with a large invariant set, piecewise affine controllers are learned instead of linear controllers. To this end, the closed-loop system is first represented using collected data, a decision variable, and noise. The effect of the decision variable on the variance of the safe violation of the closed-loop system is formalized. The decision variable is then designed such that the probability of safety violation for the learned closed-loop system is minimized. It is shown that this control-oriented approach reduces the data requirements and can also reduce the variance of safety violations. Finally, to integrate the safe and RL controllers, a new data-driven interpolation technique is introduced. This method aims to maintain the RL agent's optimal implementation while ensuring its safety within environments characterized by noise. The study concludes with a simulation example that serves to validate the theoretical results.


[3] 2505.09751

FAS-LLM: Large Language Model-Based Channel Prediction for OTFS-Enabled Satellite-FAS Links

This paper proposes FAS-LLM, a novel large language model (LLM)-based architecture for predicting future channel states in Orthogonal Time Frequency Space (OTFS)-enabled satellite downlinks equipped with fluid antenna systems (FAS). The proposed method introduces a two-stage channel compression strategy combining reference-port selection and separable principal component analysis (PCA) to extract compact, delay-Doppler-aware representations from high-dimensional OTFS channels. These representations are then embedded into a LoRA-adapted LLM, enabling efficient time-series forecasting of channel coefficients. Performance evaluations demonstrate that FAS-LLM outperforms classical baselines including GRU, LSTM, and Transformer models, achieving up to 10 dB normalized mean squared error (NMSE) improvement and threefold root mean squared error (RMSE) reduction across prediction horizons. Furthermore, the predicted channels preserve key physical-layer characteristics, enabling near-optimal performance in ergodic capacity, spectral efficiency, and outage probability across a wide range of signal-to-noise ratios (SNRs). These results highlight the potential of LLM-based forecasting for delay-sensitive and energy-efficient link adaptation in future satellite IoT networks.


[4] 2505.09767

THz-Band Near-Field RIS Channel Modeling for Linear Channel Estimation

Reconfigurable intelligent surface (RIS)-aided terahertz (THz)-band communications are promising enablers for future wireless networks. However, array densification at high frequencies introduces significant challenges in accurate channel modeling and estimation, particularly with THz-specific fading, mutual coupling (MC), spatial correlation, and near-field effects. In this work, we model THz outdoor small-scale fading channels using the mixture gamma (MG) distribution, considering absorption losses, spherical wave propagation, MC, and spatial correlation across large base stations and RISs. We derive the distribution of the cascaded RIS-aided channel and investigate linear channel estimation techniques, analyzing the impact of various channel parameters. Numerical results based on precise THz parameters reveal that accounting for spatial correlation, MC, and near-field modeling substantially enhances estimation accuracy, especially in ultra-massive arrays and short-range scenarios. These results underscore the importance of incorporating these effects for precise, physically consistent channel modeling.


[5] 2505.09789

Implicit Neural Representation of Waveform Measurements in Power Systems Waveform Data Analysis

There is currently a paradigm shift in several power system monitoring applications, such as incipient fault detection and monitoring inverter-based resources, to transition from traditional phasor analytics to more informative waveform analytics. This paper contributes to this transition by developing a novel approach to modeling voltage and current waveform measurements using implicit neural representations (INRs). INRs are continuous function approximators that are recently used in vision and signal processing. The proposed INR models are specifically designed to meet the requirements of waveform analytics in power systems, such as by using sinusoidal activation functions that capture the periodic nature of voltage and current waveforms. We also propose extended models that can efficiently represent correlated waveforms, such as three-phase waveforms and synchro-waveforms. Real-world case studies demonstrate the effectiveness of the proposed INR models in terms of accuracy (<1-2% MSE) and model size (4-6x compression). We also investigate the application of INR models in oscillation monitoring, for single mode oscillations and dual mode modulated oscillations.


[6] 2505.09817

Measuring Flexibility through Reduction Potential

While electric vehicles (EVs) often exhibit substantial flexibility, harnessing this flexibility requires precise characterization of its timing and magnitude. This paper introduces the reduction potential matrix, a novel approach to EV load flexibility modeling which is both straightforward to calculate and intuitive to interpret. This paper demonstrates the approach by quantifying flexibility for two distinct commercial vehicle groups--freight vehicles and transit buses--using simulated charging data from Virginia. While both groups are found to have substantial flexibility, its properties vary across the groups. Naturally, this variability manifests in differences in each group's role as a grid resource. The paper concludes with a discussion on how system planners, fleet operators, and other stakeholders can use the matrix to assess and leverage EV flexibility.


[7] 2505.09831

ImplicitStainer: Data-Efficient Medical Image Translation for Virtual Antibody-based Tissue Staining Using Local Implicit Functions

Hematoxylin and eosin (H&E) staining is a gold standard for microscopic diagnosis in pathology. However, H&E staining does not capture all the diagnostic information that may be needed. To obtain additional molecular information, immunohistochemical (IHC) stains highlight proteins that mark specific cell types, such as CD3 for T-cells or CK8/18 for epithelial cells. While IHC stains are vital for prognosis and treatment guidance, they are typically only available at specialized centers and time consuming to acquire, leading to treatment delays for patients. Virtual staining, enabled by deep learning-based image translation models, provides a promising alternative by computationally generating IHC stains from H&E stained images. Although many GAN and diffusion based image to image (I2I) translation methods have been used for virtual staining, these models treat image patches as independent data points, which results in increased and more diverse data requirements for effective generation. We present ImplicitStainer, a novel approach that leverages local implicit functions to improve image translation, specifically virtual staining performance, by focusing on pixel-level predictions. This method enhances robustness to variations in dataset sizes, delivering high-quality results even with limited data. We validate our approach on two datasets using a comprehensive set of metrics and benchmark it against over fifteen state-of-the-art GAN- and diffusion based models. Full Code and models trained will be released publicly via Github upon acceptance.


[8] 2505.09870

Dynamic Beam-Stabilized, Additive-Printed Flexible Antenna Arrays with On-Chip Rapid Insight Generation

Conformal phased arrays promise shape-changing properties, multiple degrees of freedom to the scan angle, and novel applications in wearables, aerospace, defense, vehicles, and ships. However, they have suffered from two critical limitations. (1) Although most applications require on-the-move communication and sensing, prior conformal arrays have suffered from dynamic deformation-induced beam pointing errors. We introduce a Dynamic Beam-Stabilized (DBS) processor capable of beam adaptation through on-chip real-time control of fundamental gain, phase, and delay for each element. (2) Prior conformal arrays have leveraged additive printing to enhance flexibility, but conventional printable inks based on silver are expensive, and those based on copper suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. We instead leverage a low-cost Copper Molecular Decomposition (CuMOD) ink with < 0.1% variation per degree C with temperature and strain and correct any residual deformity in real-time using the DBS processor. Demonstrating unified material and physical deformation correction, our CMOS DBS processor is low power, low-area, and easily scalable due to a tile architecture, thereby ideal for on-device implementations.


[9] 2505.09896

The Path Integral Bottleneck: Exploring the Control-Compute Tradeoff

Executing a control sequence requires some computation effort. Intuitively, a high-effort, fine-grained computation should result in better control (e.g. lower cost), whereas little to no computation effort would lead to worse control. To quantify and explore the tradeoff between control performance and compute effort, we present the Path Integral Bottleneck (PIB), a fusion of the Path Integral (PI) optimal control and Information Bottleneck (IB) frameworks. Both frameworks provide flexible and probabilistic descriptions of control. The PI does not limit itself to a particular control law, and the IB is not bound to any specific state encoding. Combining the generality of both frameworks enables us to produce an analytical description of the control-compute tradeoff. We provide PIB formulations for both continuous and discrete random variables. With these formulations, we can plot a tradeoff curve between performance and computation effort for any given plant description and control cost function. Simulations of a cart-pole for both the continuous and discrete variable cases reveal fundamental control-compute tradeoffs, exposing regions where the task performance-per-compute is higher than others.


[10] 2505.09897

Stability and Convergence Analysis of Multi-Agent Consensus with Communication Delays: A Lambert W Function Approach

This paper investigates the effect of constant time delay in weakly connected multi-agent systems modeled by double integrator dynamics. A novel analytical approach is proposed to establish an upper bound on the permissible time delay that ensures stability and consensus convergence. The analysis employs the Lambert W function method in higher-dimensional systems to derive explicit conditions under which consensus is achieved. The theoretical results are rigorously proven and provide insight into the allowable delay margins. The analysis applies to general leaderless undirected network topologies. The framework also accounts for complex and realistic delays, including non-commensurate communication delays. Numerical examples are provided to demonstrate the effectiveness of the proposed method.


[11] 2505.09917

Enriched K-Tier Heterogeneous Satellite Networks Model with User Association Policies

In the rapid evolution of the non-terrestrial networks (NTNs), satellite communication has emerged as a focal area of research due to its critical role in enabling seamless global connectivity. In this paper, we investigate two representative user association policies (UAPs) for multi-tier heterogeneous satellite networks (HetSatNets), namely the nearest satellite UAP and the maximum signal-to-interference-plus-noise-ratio (max-SINR) satellite UAP, where each tier is characterized by a distinct constellation configuration and transmission pattern. Employing stochastic geometric, we analyze various intermediate system aspects, including the probability of a typical user accessing each satellite tier, the aggregated interference power, and their corresponding Laplace transforms (LTs) under both UAPs. Subsequently, we derive explicit expressions for coverage probability (CP), non-handover probability (NHP), and time delay outage probability (DOP) of the typical user. Furthermore, we propose a novel weighted metric (WM) that integrates CP, NHP, and DOP to explore their trade-offs in the system design. The robustness of the theoretical framework is verified is verified through Monte Carlo simulations calibrated with the actual Starlink constellation, affirming the precision of our analytical approach. The empirical findings underscore an optimal UAP in various HetSatNet scenarios regarding CP, NHP, and DOP..


[12] 2505.09972

Who Said What WSW 2.0? Enhanced Automated Analysis of Preschool Classroom Speech

This paper introduces an automated framework WSW2.0 for analyzing vocal interactions in preschool classrooms, enhancing both accuracy and scalability through the integration of wav2vec2-based speaker classification and Whisper (large-v2 and large-v3) speech transcription. A total of 235 minutes of audio recordings (160 minutes from 12 children and 75 minutes from 5 teachers), were used to compare system outputs to expert human annotations. WSW2.0 achieves a weighted F1 score of .845, accuracy of .846, and an error-corrected kappa of .672 for speaker classification (child vs. teacher). Transcription quality is moderate to high with word error rates of .119 for teachers and .238 for children. WSW2.0 exhibits relatively high absolute agreement intraclass correlations (ICC) with expert transcriptions for a range of classroom language features. These include teacher and child mean utterance length, lexical diversity, question asking, and responses to questions and other utterances, which show absolute agreement intraclass correlations between .64 and .98. To establish scalability, we apply the framework to an extensive dataset spanning two years and over 1,592 hours of classroom audio recordings, demonstrating the framework's robustness for broad real-world applications. These findings highlight the potential of deep learning and natural language processing techniques to revolutionize educational research by providing accurate measures of key features of preschool classroom speech, ultimately guiding more effective intervention strategies and supporting early childhood language development.


[13] 2505.09980

Event-Triggered Synergistic Controllers with Dwell-Time Transmission

We propose novel event-triggered synergistic controllers for nonlinear continuous-time plants by incorporating event-triggered control into stabilizing synergistic controllers. We highlight that a naive application of common event-triggering conditions may not ensure dwell-time transmission due to the joint jumping dynamics of the closed-loop system. Under mild conditions, we develop a suite of event-triggered synergistic controllers that guarantee both dwell-time transmission and global asymptotic stability. Through numerical simulations, we demonstrate the effectiveness of our controller applied to the problem of rigid body attitude stabilization.


[14] 2505.09985

Ordered-subsets Multi-diffusion Model for Sparse-view CT Reconstruction

Score-based diffusion models have shown significant promise in the field of sparse-view CT reconstruction. However, the projection dataset is large and riddled with redundancy. Consequently, applying the diffusion model to unprocessed data results in lower learning effectiveness and higher learning difficulty, frequently leading to reconstructed images that lack fine details. To address these issues, we propose the ordered-subsets multi-diffusion model (OSMM) for sparse-view CT reconstruction. The OSMM innovatively divides the CT projection data into equal subsets and employs multi-subsets diffusion model (MSDM) to learn from each subset independently. This targeted learning approach reduces complexity and enhances the reconstruction of fine details. Furthermore, the integration of one-whole diffusion model (OWDM) with complete sinogram data acts as a global information constraint, which can reduce the possibility of generating erroneous or inconsistent sinogram information. Moreover, the OSMM's unsupervised learning framework provides strong robustness and generalizability, adapting seamlessly to varying sparsity levels of CT sinograms. This ensures consistent and reliable performance across different clinical scenarios. Experimental results demonstrate that OSMM outperforms traditional diffusion models in terms of image quality and noise resilience, offering a powerful and versatile solution for advanced CT imaging in sparse-view scenarios.


[15] 2505.09987

Provably safe and human-like car-following behaviors: Part 1. Analysis of phases and dynamics in standard models

Trajectory planning is essential for ensuring safe driving in the face of uncertainties related to communication, sensing, and dynamic factors such as weather, road conditions, policies, and other road users. Existing car-following models often lack rigorous safety proofs and the ability to replicate human-like driving behaviors consistently. This article applies multi-phase dynamical systems analysis to well-known car-following models to highlight the characteristics and limitations of existing approaches. We begin by formulating fundamental principles for safe and human-like car-following behaviors, which include zeroth-order principles for comfort and minimum jam spacings, first-order principles for speeds and time gaps, and second-order principles for comfort acceleration/deceleration bounds as well as braking profiles. From a set of these zeroth- and first-order principles, we derive Newell's simplified car-following model. Subsequently, we analyze phases within the speed-spacing plane for the stationary lead-vehicle problem in Newell's model and its extensions, which incorporate both bounded acceleration and deceleration. We then analyze the performance of the Intelligent Driver Model and the Gipps model. Through this analysis, we highlight the limitations of these models with respect to some of the aforementioned principles. Numerical simulations and empirical observations validate the theoretical insights. Finally, we discuss future research directions to further integrate safety, human-like behaviors, and vehicular automation in car-following models, which are addressed in Part 2 of this study \citep{jin2025WA20-02_Part2}, where we develop a novel multi-phase projection-based car-following model that addresses the limitations identified here.


[16] 2505.09988

Provably safe and human-like car-following behaviors: Part 2. A parsimonious multi-phase model with projected braking

Ensuring safe and human-like trajectory planning for automated vehicles amidst real-world uncertainties remains a critical challenge. While existing car-following models often struggle to consistently provide rigorous safety proofs alongside human-like acceleration and deceleration patterns, we introduce a novel multi-phase projection-based car-following model. This model is designed to balance safety and performance by incorporating bounded acceleration and deceleration rates while emulating key human driving principles. Building upon a foundation of fundamental driving principles and a multi-phase dynamical systems analysis (detailed in Part 1 of this study \citep{jin2025WA20-02_Part1}), we first highlight the limitations of extending standard models like Newell's with simple bounded deceleration. Inspired by human drivers' anticipatory behavior, we mathematically define and analyze projected braking profiles for both leader and follower vehicles, establishing safety criteria and new phase definitions based on the projected braking lead-vehicle problem. The proposed parsimonious model combines an extended Newell's model for nominal driving with a new control law for scenarios requiring projected braking. Using speed-spacing phase plane analysis, we provide rigorous mathematical proofs of the model's adherence to defined safe and human-like driving principles, including collision-free operation, bounded deceleration, and acceptable safe stopping distance, under reasonable initial conditions. Numerical simulations validate the model's superior performance in achieving both safety and human-like braking profiles for the stationary lead-vehicle problem. Finally, we discuss the model's implications and future research directions.


[17] 2505.10015

Constrained Multimodal Sensing-Aided Communications: A Dynamic Beamforming Design

Using multimodal sensory data can enhance communications systems by reducing the overhead and latency in beam training. However, processing such data incurs high computational complexity, and continuous sensing results in significant power and bandwidth consumption. This gives rise to a tradeoff between the (multimodal) sensing data acquisition rate and communications performance. In this work, we develop a constrained multimodal sensing-aided communications framework where dynamic sensing and beamforming are performed under a sensing budget. Specifically, we formulate an optimization problem that maximizes the average received signal-to-noise ratio (SNR) of user equipment, subject to constraints on the average number of sensing actions and power budget. Using the Saleh-Valenzuela mmWave channel model, we construct the channel primarily based on position information obtained via multimodal sensing. Stricter sensing constraints reduce the availability of position data, leading to degraded channel estimation and thus lower performance. We apply Lyapunov optimization to solve the problem and derive a dynamic sensing and beamforming algorithm. Numerical evaluations on the DeepSense and Raymobtime datasets show that halving sensing times leads to only up to 7.7% loss in average SNR.


[18] 2505.10020

Threshold Strategy for Leaking Corner-Free Hamilton-Jacobi Reachability with Decomposed Computations

Hamilton-Jacobi (HJ) Reachability is widely used to compute value functions for states satisfying specific control objectives. However, it becomes intractable for high-dimensional problems due to the curse of dimensionality. Dimensionality reduction approaches are essential for mitigating this challenge, whereas they could introduce the ``leaking corner issue", leading to inaccuracies in the results. In this paper, we define the ``leaking corner issue" in terms of value functions, propose and prove a necessary condition for its occurrence. We then use these theoretical contributions to introduce a new local updating method that efficiently corrects inaccurate value functions while maintaining the computational efficiency of the dimensionality reduction approaches. We demonstrate the effectiveness of our method through numerical simulations. Although we validate our method with the self-contained subsystem decomposition (SCSD), our approach is applicable to other dimensionality reduction techniques that introduce the ``leaking corners".


[19] 2505.10048

Planar Herding of Multiple Evaders with a Single Herder

A planar herding problem is considered, where a superior pursuer herds a flock of non-cooperative, inferior evaders around a predefined target point. An inverse square law of repulsion is assumed between the pursuer and each evader. Two classes of pursuer trajectories are proposed: (i) a constant angular-velocity spiral, and (ii) a constant angular-velocity circle, both centered around the target point. For the spiraling pursuer, the radial velocity is dynamically adjusted based on a feedback law that depends on the instantaneous position of the evader, which is located at the farthest distance from the target at the start of the game. It is shown that, under suitable choices of the model parameters, all the evaders are herded into an arbitrarily small limit cycle around the target point. Meanwhile, the pursuer also converges onto a circular trajectory around the target. The conditions for the stability of these limit cycles are derived. For the circling pursuer, similar guarantees are provided along with explicit formulas for the radii of the limit cycles.


[20] 2505.10053

Range Resolution of Near-field MIMO Sensing

The radiative near-field and integration of sensing capabilities are seen as two key components of the next generation of wireless communication systems. In this paper, the sensing performance of a narrowband near-field system is investigated for several practical antenna array geometries and configurations, namely SIMO/MISO and MIMO. In the SIMO/MISO configuration, the antenna aperture is exploited only a single time for either transmit or receive signal processing, while the MIMO configuration exploits both TX and RX processing. Analytical derivations, supported by simulations, show that the MIMO processing improves the maximum near-field range and sensing resolution by approximately a factor of 1.4 as compared to single-aperture systems. The value of the improvement factor is consistent for all considered array geometries. Finally, using a quadratic approximation of the array factor, an analytical improvement factor of $\sqrt{2}$ is derived, clarifying the observed improvements and validating the numerical results.


[21] 2505.10059

Improving Power Systems Controllability via Edge Centrality Measures

Improving the controllability of power networks is crucial as they are highly complex networks operating in synchrony; even minor perturbations can cause desynchronization and instability. To that end, one needs to assess the criticality of key network components (buses and lines) in terms of their impact on system performance. Traditional methods to identify the key nodes/edges in power networks often rely on static centrality measures based on the network's topological structure ignoring the network's dynamic behavior. In this paper, using multi-machine power network models and a new control-theoretic edge centrality matrix (ECM) approach, we: (i) quantify the influence of edges (i.e., the line susceptances) in terms of controllability performance metrics, (ii) identify the most influential lines, and (iii) compute near-optimal edge modifications that improve the power network controllability. Employing various IEEE power network benchmarks, we validate the effectiveness of the ECM-based algorithm and demonstrate improvements in system reachability, control, and damping performance.


[22] 2505.10085

DB InfraGO's Automated Dispatching Assistant ADA-PMB

As railway infrastructure manager, DB InfraGO AG is faced with the challenge of offering fluid and punctual operation despite rising demand and increased construction activity. The high capacity utilisation, especially in the core network sections, causes delays to be propagated quickly and widely across the entire network. Up to now, conflicts between train runs can be identified automatically, but dispatching measures have been based on past human experience. An automated dispatching assistance system is currently being piloted to provide support for train dispatchers in their work. The aim is to offer them helpful dispatching recommendations, particularly in stressful situations with a high conflict density in the network section under consideration, in order to ensure the most efficient operation of the system. The recommendations are currently displayed separately alongside the central control system. In future, they will be integrated into the central control system, which will significantly simplify communication between the train dispatcher and signal setter. Further development steps for the integration process are also presented and discussed.


[23] 2505.10129

Angle diversity receiver as a key enabler for reliable ORIS-based Visible Light Communication

Visible Light Communication (VLC) offers a promising solution to satisfy the increasing demand for wireless data. However, link blockages remain a significant challenge. This paper addresses this issue by investigating the combined use of angle diversity receivers (ADRs) and optical reconfigurable intelligent surfaces (ORISs) in multiuser VLC systems. We consider ORIS elements as small movable mirrors. We demonstrate the complementarity of ADR and ORIS in mitigating link blockages, as well as the advantages of using a larger number of ORIS elements due to the increased field-of-view (FoV) at the receiver enabled by the ADR. An optimization algorithm is proposed to maximize the minimum signal-to-noise power ratio (SNR) to deploy a fair communication network. Numerical results show that integrating ADR and ORIS significantly enhances VLC communication performance, achieving an SNR gain of up to 30 dB compared to a system without ORIS, and mitigating communication outages produced by link blockages or out-of-FoV received signals. We also prove that an ADR with a single tier of photodiodes is sufficient to complement ORIS-assisted VLC.


[24] 2505.10134

Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G Networks

Accurate and robust localization is a critical enabler for emerging 5G and 6G applications, including autonomous driving, extended reality (XR), and smart manufacturing. While data-driven approaches have shown promise, most existing models require large amounts of labeled data and struggle to generalize across deployment scenarios and wireless configurations. To address these limitations, we propose a foundation-model-based solution tailored for wireless localization. We first analyze how different self-supervised learning (SSL) tasks acquire general-purpose and task-specific semantic features based on information bottleneck (IB) theory. Building on this foundation, we design a pretraining methodology for the proposed Large Wireless Localization Model (LWLM). Specifically, we propose an SSL framework that jointly optimizes three complementary objectives: (i) spatial-frequency masked channel modeling (SF-MCM), (ii) domain-transformation invariance (DTI), and (iii) position-invariant contrastive learning (PICL). These objectives jointly capture the underlying semantics of wireless channel from multiple perspectives. We further design lightweight decoders for key downstream tasks, including time-of-arrival (ToA) estimation, angle-of-arrival (AoA) estimation, single base station (BS) localization, and multiple BS localization. Comprehensive experimental results confirm that LWLM consistently surpasses both model-based and supervised learning baselines across all localization tasks. In particular, LWLM achieves 26.0%--87.5% improvement over transformer models without pretraining, and exhibits strong generalization under label-limited fine-tuning and unseen BS configurations, confirming its potential as a foundation model for wireless localization.


[25] 2505.10150

CFARNet: Learning-Based High-Resolution Multi-Target Detection for Rainbow Beam Radar

Millimeter-wave (mmWave) OFDM radar equipped with rainbow beamforming, enabled by joint phase-time arrays (JPTAs), provides wide-angle coverage and is well-suited for fast real-time target detection and tracking. However, accurate detection of multiple closely spaced targets remains a key challenge for conventional signal processing pipelines, particularly those relying on constant false alarm rate (CFAR) detectors. This paper presents CFARNet, a learning-based processing framework that replaces CFAR with a convolutional neural network (CNN) for peak detection in the angle-Doppler domain. The network predicts target subcarrier indices, which guide angle estimation via a known frequency-angle mapping and enable high-resolution range and velocity estimation using the MUSIC algorithm. Extensive simulations demonstrate that CFARNet significantly outperforms a CFAR+MUSIC baseline, especially under low transmit power and dense multi-target conditions. The proposed method offers superior angular resolution, enhanced robustness in low-SNR scenarios, and improved computational efficiency, highlighting the potential of data-driven approaches for high-resolution mmWave radar sensing.


[26] 2505.10174

Subspace-Based Super-Resolution Sensing for Bi-Static ISAC with Clock Asynchronism

Bi-static sensing is an attractive configuration for integrated sensing and communications (ISAC) systems; however, clock asynchronism between widely separated transmitters and receivers introduces time-varying time offsets (TO) and phase offsets (PO), posing significant challenges. This paper introduces a signal-subspace-based framework that estimates decoupled angles, delays, and complex gain sequences (CGS)-- the target-reflected signals -- for multiple dynamic target paths. The proposed framework begins with a novel TO alignment algorithm, leveraging signal subspace or covariance, to mitigate TO variations across temporal snapshots, enabling coherent delay-domain analysis. Subsequently, subspace-based methods are developed to compensate for TO residuals and to perform joint angle-delay estimation. Finally, leveraging the high resolution in the joint angle-delay domain, the framework compensates for the PO and estimates the CGS for each target. The framework can be applied to both single-antenna and multi-antenna systems. Extensive simulations and experiments using commercial Wi-Fi devices demonstrate that the proposed framework significantly surpasses existing solutions in parameter estimation accuracy and delay resolution. Notably, it uniquely achieves a super-resolution in the delay domain, with a probability-of-resolution curve tightly approaching that in synchronized systems.


[27] 2505.10179

Rate Region of ISAC for Pinching-Antenna Systems

The Pinching-Antenna SyStem (PASS) reconstructs wireless channels through \emph{pinching beamforming}, wherein the activated positions of pinching antennas along dielectric waveguides are optimized to shape the radiation pattern. The aim of this article is to analyze the performance limits of employing PASS in integrated sensing and communications (ISAC). Specifically, a PASS-assisted ISAC system is considered, where a pinched waveguide is utilized to simultaneously communicate with a user and sense a target. Closed-form expressions for the achievable communication rate (CR) and sensing rate (SR) are derived to characterize the information-theoretic limits of this dual-functional operation. \romannumeral1) For the single-pinch case, closed-form solutions for the optimal pinching antenna location are derived under \emph{sensing-centric (S-C)}, \emph{communications-centric (C-C)}, and \emph{Pareto-optimal} designs. On this basis, the CR-SR trade-off is characterized by deriving the full CR-SR rate region, which is shown to encompass that of conventional fixed-antenna systems. \romannumeral2) For the multiple-pinch case, an antenna location refinement method is applied to obtain the optimal C-C and S-C pinching beamformers. As a further advance, inner and outer bounds on the achievable CR-SR region are derived using an element-wise alternating optimization technique and by invoking Cauchy-Schwarz and Karamata's inequalities, respectively. Numerical results demonstrate that: \romannumeral1) the derived bounds closely approximate the true CR-SR region; and \romannumeral2) PASS can achieve a significantly larger rate region than conventional-antenna systems.


[28] 2505.10194

Passive Channel Charting: Locating Passive Targets using a UWB Mesh

Fingerprint-based passive localization enables high localization accuracy using low-cost UWB IoT radio sensors. However, fingerprinting demands extensive effort for data acquisition. The concept of channel charting reduces this effort by modeling and projecting the manifold of \ac{csi} onto a 2D coordinate space. So far, researchers only applied this concept to active radio localization, where a mobile device intentionally and actively emits a specific signal. In this paper, we apply channel charting to passive localization. We use a pedestrian dead reckoning (PDR) system to estimate a target's velocity and derive a distance matrix from it. We then use this matrix to learn a distance-preserving embedding in 2D space, which serves as a fingerprinting model. In our experiments, we deploy six nodes in a fully connected ultra-wideband (UWB) mesh network to show that our method achieves high localization accuracy, with an average error of just 0.24\,m, even when we train and test on different targets.


[29] 2505.10220

UAV-Enabled Passive 6DMA for ISAC: Joint Location, Orientation, and Reflection Optimization

Improving the fundamental performance trade-off in integrated sensing and communication (ISAC) systems has been deemed as one of the most significant challenges. To address it, we propose in this letter a novel ISAC system that leverages an unmanned aerial vehicle (UAV)-mounted intelligent reflecting surface (IRS) and the UAV's maneuverability in six-dimensional (6D) space, i.e., three-dimensional (3D) location and 3D rotation, thus referred to as passive 6D movable antenna (6DMA). We aim to maximize the signal-to-noise ratio (SNR) for sensing a single target while ensuring a minimum SNR at a communication user equipment (UE), by jointly optimizing the transmit beamforming at the ISAC base station (BS), the 3D location and orientation as well as the reflection coefficients of the IRS. To solve this challenging non-convex optimization problem, we propose a two-stage approach. In the first stage, we aim to optimize the IRS's 3D location, 3D orientation, and reflection coefficients to enhance both the channel correlations and power gains for sensing and communication. Given their optimized parameters, the optimal transmit beamforming at the ISAC BS is derived in closed form. Simulation results demonstrate that the proposed passive 6DMA-enabled ISAC system significantly improves the sensing and communication trade-off by simultaneously enhancing channel correlations and power gains, and outperforms other baseline schemes.


[30] 2505.10275

ISAC Channel Modelling -- Perspectives from ETSI

Integrated Sensing and Communications (ISAC) is defined as one of six usage scenarios in the ITU-R International Mobile Telecommunications (IMT) 2030 framework for 6G. ISAC is envisioned to introduce the sensing capability into the cellular network, where sensing may be obtained using the cellular radio frequency (RF) signals with or without additional auxiliary sensors. To enable ISAC, specification bodies such as European Telecommunications Standards Institute (ETSI) and Third Generation Partnership Project (3GPP) have already started to look into detailed ISAC use cases, their requirements, and the channel models and evaluation methodologies that are necessary to design and evaluate ISAC performance. With focus on the channel model, the current communication-centric channel models like those specified in 3GPP technical report (TR) 38.901 do not cover the RF signals interactions between the transmitter, target object, receiver and their surrounding environment. To bridge this gap, 3GPP has been looking into the basic changes that are necessary to make to their TR38.901 channel model with focus on selected use cases from the 3GPP SA1 5G-Advanced feasibility study. In parallel, ETSI ISAC Industry Specification Group (ISG) has been studying the more advanced ISAC channel modelling features that are needed to support the variety of ISAC use cases envisioned in 6G. In this paper, we present the baseline and advanced features developed thus far in 3GPP and ETSI ISAC ISG, respectively, towards a comprehensive view of the ISAC channel model in 6G.


[31] 2505.10306

Ray Antenna Array Achieves Uniform Angular Resolution Cost-Effectively for Low-Altitude UAV Swarm ISAC

Ray antenna array (RAA) is a novel multi-antenna architecture comprising massive low-cost antenna elements and a few radio-frequency (RF) chains. The antenna elements are arranged in a novel ray-like structure, where each ray corresponds to a simple uniform linear array (sULA) with deliberately designed orientation and all its antenna elements are directly connected. By further designing a ray selection network (RSN), appropriate sULAs are selected to connect to the RF chains for further baseband processing. RAA has three appealing advantages: (i) dramatically reduced hardware cost since no phase shifters are needed; (ii) enhanced beamforming gain as antenna elements with higher directivity can be used; (iii) uniform angular resolution across all signal directions. Such benefits make RAA especially appealing for integrated sensing and communication (ISAC), particularly for low-altitude unmanned aerial vehicle (UAV) swarm ISAC, where high-mobility aerial targets may easily move away from the boresight of conventional antenna arrays, causing severe communication and sensing performance degradation. Therefore, this paper studies RAA-based ISAC for low-altitude UAV swarm systems. First, we establish an input-output mathematical model for RAA-based UAV ISAC and rigorously show that RAA achieves uniform angular resolution for all directions. Besides, we design the RAA orientation and RSN. Furthermore, RAA-based ISAC with orthogonal frequency division multiplexing (OFDM) for UAV swarm is studied, and efficient algorithm is proposed for sensing target parameter estimation. Extensive simulation results demonstrate the significant performance improvement by RAA system over the conventional antenna arrays, in terms of sensing angular resolution and communication spectral efficiency, highlighting the great potential of the novel RAA system to meet the growing demands of low-altitude UAV ISAC.


[32] 2505.10311

Whitened Score Diffusion: A Structured Prior for Imaging Inverse Problems

Conventional score-based diffusion models (DMs) may struggle with anisotropic Gaussian diffusion processes due to the required inversion of covariance matrices in the denoising score matching training objective \cite{vincent_connection_2011}. We propose Whitened Score (WS) diffusion models, a novel SDE-based framework that learns the Whitened Score function instead of the standard score. This approach circumvents covariance inversion, extending score-based DMs by enabling stable training of DMs on arbitrary Gaussian forward noising processes. WS DMs establish equivalence with FM for arbitrary Gaussian noise, allow for tailored spectral inductive biases, and provide strong Bayesian priors for imaging inverse problems with structured noise. We experiment with a variety of computational imaging tasks using the CIFAR and CelebA ($64\times64$) datasets and demonstrate that WS diffusion priors trained on anisotropic Gaussian noising processes consistently outperform conventional diffusion priors based on isotropic Gaussian noise.


[33] 2505.10367

A Hybrid Strategy for Aggregated Probabilistic Forecasting and Energy Trading in HEFTCom2024

Obtaining accurate probabilistic energy forecasts and making effective decisions amid diverse uncertainties are routine challenges in future energy systems. This paper presents the solution of team GEB, which ranked 3rd in trading, 4th in forecasting, and 1st among student teams in the IEEE Hybrid Energy Forecasting and Trading Competition 2024 (HEFTCom2024). The solution provides accurate probabilistic forecasts for a wind-solar hybrid system, and achieves substantial trading revenue in the day-ahead electricity market. Key components include: (1) a stacking-based approach combining sister forecasts from various Numerical Weather Predictions (NWPs) to provide wind power forecasts, (2) an online solar post-processing model to address the distribution shift in the online test set caused by increased solar capacity, (3) a probabilistic aggregation method for accurate quantile forecasts of hybrid generation, and (4) a stochastic trading strategy to maximize expected trading revenue considering uncertainties in electricity prices. This paper also explores the potential of end-to-end learning to further enhance the trading revenue by adjusting the distribution of forecast errors. Detailed case studies are provided to validate the effectiveness of these proposed methods. Code for all mentioned methods is available for reproduction and further research in both industry and academia.


[34] 2505.10372

Spatially Selective Active Noise Control for Open-fitting Hearables with Acausal Optimization

Recent advances in active noise control have enabled the development of hearables with spatial selectivity, which actively suppress undesired noise while preserving desired sound from specific directions. In this work, we propose an improved approach to spatially selective active noise control that incorporates acausal relative impulse responses into the optimization process, resulting in significantly improved performance over the causal design. We evaluate the system through simulations using a pair of open-fitting hearables with spatially localized speech and noise sources in an anechoic environment. Performance is evaluated in terms of speech distortion, noise reduction, and signal-to-noise ratio improvement across different delays and degrees of acausality. Results show that the proposed acausal optimization consistently outperforms the causal approach across all metrics and scenarios, as acausal filters more effectively characterize the response of the desired source.


[35] 2505.10382

Unlocking Innate Computing Abilities in Electric Grids

High energy consumption of artificial intelligence has gained momentum worldwide, which necessitates major investments on expanding efficient and carbon-neutral generation and data center infrastructure in electric power grids. Going beyond the conventional ideation, this article unleashes innate computational abilities in the power grid network circuits itself. By programming power electronic converters (PECs) to mimic biological neurons, we sustainably transform power grids into a neural network and enable it to optimize, compute and make data-driven decisions using distributed PECs. Instead of seen merely as an energy delivery platform, this article conceptualizes a novel application for electric grid to be used as a computing asset without affecting its operation. To illustrate its computational abilities, we solve a affine transformation task in a microgrid with five PECs. By encoding the digital data into the control of PECs, our preliminary results conclude that computing using electric grids does not disturb its operation. From a scientific perspective, this work fundamentally merges energy and computing optimization theories by harnessing inherent high-dimensional computational relationships in electric grids.


[36] 2505.10405

Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding

Generative semantic communication (Gen-SemCom) with large artificial intelligence (AI) model promises a transformative paradigm for 6G networks, which reduces communication costs by transmitting low-dimensional prompts rather than raw data. However, purely prompt-driven generation loses fine-grained visual details. Additionally, there is a lack of systematic metrics to evaluate the performance of Gen-SemCom systems. To address these issues, we develop a hybrid Gen-SemCom system with a critical information embedding (CIE) framework, where both text prompts and semantically critical features are extracted for transmissions. First, a novel approach of semantic filtering is proposed to select and transmit the semantically critical features of images relevant to semantic label. By integrating the text prompt and critical features, the receiver reconstructs high-fidelity images using a diffusion-based generative model. Next, we propose the generative visual information fidelity (GVIF) metric to evaluate the visual quality of the generated image. By characterizing the statistical models of image features, the GVIF metric quantifies the mutual information between the distorted features and their original counterparts. By maximizing the GVIF metric, we design a channel-adaptive Gen-SemCom system that adaptively control the volume of features and compression rate according to the channel state. Experimental results validate the GVIF metric's sensitivity to visual fidelity, correlating with both the PSNR and critical information volume. In addition, the optimized system achieves superior performance over benchmarking schemes in terms of higher PSNR and lower FID scores.


[37] 2505.10419

Analog Self-Interference Cancellation in Full-Duplex Radios: A Fundamental Limit Perspective

Analog self-interference cancellation (A-SIC) plays a crucial role in the implementation of in-band full-duplex (IBFD) radios, due to the fact that the inherent transmit (Tx) noise can only be addressed in the analog domain. It is thus natural to ask what the performance limit of A-SIC is in practical systems, which is still quite underexplored so far. In this paper, we aim to close this gap by characterizing the fundamental performance of A-SIC which employs the common multi-tap delay (MTD) architecture, by accounting for the following practical issues: 1) Nonstationarity of the Tx signal; 2) Nonlinear distortions on the Tx signal; 3) Multipath channel corresponding to the self-interference (SI); 4) Maximum amplitude constraint on the MTD tap weights. Our findings include: 1) The average approximation error for the cyclostationary Tx signals is equal to that for the stationary white Gaussian process, thus greatly simplifying the performance analysis and the optimization procedure. 2) The approximation error for the multipath SI channel can be decomposed as the sum of the approximation error for the single-path scenario. By leveraging these structural results, the optimization framework and algorithms which characterize the fundamental limit of A-SIC, by taking into account all the aforementioned practical factors, are provided.


[38] 2505.10464

HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation

Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis. This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities. As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial resource expenditure and a potential decline in analysis accuracy. To address those challenges, we have made two major contributions: First, we publicly disseminate the GCM 2025 dataset, which serves as the first large-scale, open-source collection of gastric cancer multimodal MRI scans, featuring professionally annotated FS-T2W, CE-T1W, and ADC images from 500 patients. Second, we introduce HWA-UNETR, a novel 3D segmentation framework that employs an original HWA block with learnable window aggregation layers to establish dynamic feature correspondences between different modalities' anatomical structures, and leverages the innovative tri-orientated fusion mamba mechanism for context modeling and capturing long-range spatial dependencies. Extensive experiments on our GCM 2025 dataset and the publicly BraTS 2021 dataset validate the performance of our framework, demonstrating that the new approach surpasses existing methods by up to 1.68\% in the Dice score while maintaining solid robustness. The dataset and code are public via https://github.com/JeMing-creater/HWA-UNETR.


[39] 2505.10492

Multi-contrast laser endoscopy for in vivo gastrointestinal imaging

White light endoscopy is the clinical gold standard for detecting diseases in the gastrointestinal tract. Most applications involve identifying visual abnormalities in tissue color, texture, and shape. Unfortunately, the contrast of these features is often subtle, causing many clinically relevant cases to go undetected. To overcome this challenge, we introduce Multi-contrast Laser Endoscopy (MLE): a platform for widefield clinical imaging with rapidly tunable spectral, coherent, and directional illumination. We demonstrate three capabilities of MLE: enhancing tissue chromophore contrast with multispectral diffuse reflectance, quantifying blood flow using laser speckle contrast imaging, and characterizing mucosal topography using photometric stereo. We validate MLE with benchtop models, then demonstrate MLE in vivo during clinical colonoscopies. MLE images from 31 polyps demonstrate an approximate three-fold improvement in contrast and a five-fold improvement in color difference compared to white light and narrow band imaging. With the ability to reveal multiple complementary types of tissue contrast while seamlessly integrating into the clinical environment, MLE shows promise as an investigative tool to improve gastrointestinal imaging.


[40] 2505.10500

Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio

Audio and speech data are increasingly used in machine learning applications such as speech recognition, speaker identification, and mental health monitoring. However, the passive collection of this data by audio listening devices raises significant privacy concerns. Fully homomorphic encryption (FHE) offers a promising solution by enabling computations on encrypted data and preserving user privacy. Despite its potential, prior attempts to apply FHE to audio processing have faced challenges, particularly in securely computing time frequency representations, a critical step in many audio tasks. Here, we addressed this gap by introducing a fully secure pipeline that computes, with FHE and quantized neural network operations, four fundamental time-frequency representations: Short-Time Fourier Transform (STFT), Mel filterbanks, Mel-frequency cepstral coefficients (MFCCs), and gammatone filters. Our methods also support the private computation of audio descriptors and convolutional neural network (CNN) classifiers. Besides, we proposed approximate STFT algorithms that lighten computation and bit use for statistical and machine learning analyses. We ran experiments on the VocalSet and OxVoc datasets demonstrating the fully private computation of our approach. We showed significant performance improvements with STFT approximation in private statistical analysis of audio markers, and for vocal exercise classification with CNNs. Our results reveal that our approximations substantially reduce error rates compared to conventional STFT implementations in FHE. We also demonstrated a fully private classification based on the raw audio for gender and vocal exercise classification. Finally, we provided a practical heuristic for parameter selection, making quantized approximate signal processing accessible to researchers and practitioners aiming to protect sensitive audio data.


[41] 2505.10502

WeGA: Weakly-Supervised Global-Local Affinity Learning Framework for Lymph Node Metastasis Prediction in Rectal Cancer

Accurate lymph node metastasis (LNM) assessment in rectal cancer is essential for treatment planning, yet current MRI-based evaluation shows unsatisfactory accuracy, leading to suboptimal clinical decisions. Developing automated systems also faces significant obstacles, primarily the lack of node-level annotations. Previous methods treat lymph nodes as isolated entities rather than as an interconnected system, overlooking valuable spatial and contextual information. To solve this problem, we present WeGA, a novel weakly-supervised global-local affinity learning framework that addresses these challenges through three key innovations: 1) a dual-branch architecture with DINOv2 backbone for global context and residual encoder for local node details; 2) a global-local affinity extractor that aligns features across scales through cross-attention fusion; and 3) a regional affinity loss that enforces structural coherence between classification maps and anatomical regions. Experiments across one internal and two external test centers demonstrate that WeGA outperforms existing methods, achieving AUCs of 0.750, 0.822, and 0.802 respectively. By effectively modeling the relationships between individual lymph nodes and their collective context, WeGA provides a more accurate and generalizable approach for lymph node metastasis prediction, potentially enhancing diagnostic precision and treatment selection for rectal cancer patients.


[42] 2505.10546

Can On Body Sensing Be Spatial Adaptive?

Wearable sensors are typically affixed to specific locations on the human body, and their position remains static, only changing unintentionally due to motion artifacts. This static configuration introduces significant limitations. As a result, current systems miss the opportunity to capture dynamic physiological data from diverse body regions. This research investigates the potential of developing movable sensors that adaptively reposition themselves to sample different areas of interest on the body, addressing gaps in spatial coverage. We designed, developed, and fabricated a 3 x 3 matrix platform to support moving sensors from one location to another. We validated the feasibility through simulations on a matrix of up to 9 x 9 locations with up to 16 concurrent sensors and real-world prototype characterization.


[43] 2505.09616

SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech

This paper presents SpecWav-Attack, an adversarial model for detecting speakers in anonymized speech. It leverages Wav2Vec2 for feature extraction and incorporates spectrogram resizing and incremental training for improved performance. Evaluated on librispeech-dev and librispeech-test, SpecWav-Attack outperforms conventional attacks, revealing vulnerabilities in anonymized speech systems and emphasizing the need for stronger defenses, benchmarked against the ICASSP 2025 Attacker Challenge.


[44] 2505.09644

Joint Source-Channel Noise Adding with Adaptive Denoising for Diffusion-Based Semantic Communications

Semantic communication (SemCom) aims to convey the intended meaning of messages rather than merely transmitting bits, thereby offering greater efficiency and robustness, particularly in resource-constrained or noisy environments. In this paper, we propose a novel framework which is referred to as joint source-channel noise adding with adaptive denoising (JSCNA-AD) for SemCom based on a diffusion model (DM). Unlike conventional encoder-decoder designs, our approach intentionally incorporates the channel noise during transmission, effectively transforming the harmful channel noise into a constructive component of the diffusion-based semantic reconstruction process. Besides, we introduce an attention-based adaptive denoising mechanism, in which transmitted images are divided into multiple regions, and the number of denoising steps is dynamically allocated based on the semantic importance of each region. This design effectively balances the reception quality and the inference latency by prioritizing the critical semantic information. Extensive experiments demonstrate that our method significantly outperforms existing SemCom schemes under various noise conditions, underscoring the potential of diffusion-based models in next-generation communication systems.


[45] 2505.09661

Introducing voice timbre attribute detection

This paper focuses on explaining the timbre conveyed by speech signals and introduces a task termed voice timbre attribute detection (vTAD). In this task, voice timbre is explained with a set of sensory attributes describing its human perception. A pair of speech utterances is processed, and their intensity is compared in a designated timbre descriptor. Moreover, a framework is proposed, which is built upon the speaker embeddings extracted from the speech utterances. The investigation is conducted on the VCTK-RVA dataset. Experimental examinations on the ECAPA-TDNN and FACodec speaker encoders demonstrated that: 1) the ECAPA-TDNN speaker encoder was more capable in the seen scenario, where the testing speakers were included in the training set; 2) the FACodec speaker encoder was superior in the unseen scenario, where the testing speakers were not part of the training, indicating enhanced generalization capability. The VCTK-RVA dataset and open-source code are available on the website https://github.com/vTAD2025-Challenge/vTAD.


[46] 2505.09748

Learning Multi-Attribute Differential Graphs with Non-Convex Penalties

We consider the problem of estimating differences in two multi-attribute Gaussian graphical models (GGMs) which are known to have similar structure, using a penalized D-trace loss function with non-convex penalties. The GGM structure is encoded in its precision (inverse covariance) matrix. Existing methods for multi-attribute differential graph estimation are based on a group lasso penalized loss function. In this paper, we consider a penalized D-trace loss function with non-convex (log-sum and smoothly clipped absolute deviation (SCAD)) penalties. Two proximal gradient descent methods are presented to optimize the objective function. Theoretical analysis establishing sufficient conditions for consistency in support recovery, convexity and estimation in high-dimensional settings is provided. We illustrate our approaches with numerical examples based on synthetic and real data.


[47] 2505.09784

Theoretical Model of Acoustic Power Transfer Through Solids

Acoustic Power Transfer is a relatively new technology. It is a modern type of a wireless interface, where data signals and supply voltages are transmitted, with the use of mechanical waves, through a medium. The simplest application of such systems is the measurement of frequency response for audio speakers. It consists of a variable signal generator, a measuring amplifier which drives an acoustic source and the loudspeaker driver. The receiver contains a microphone circuit with a level recorder. Acoustic Power Transfer could have many applications, such as: Cochlear Implants, Sonar Systems and Wireless Charging. However, it is a new technology, thus it needs further investigation.


[48] 2505.09799

On Signed Network Coordination Games

We study binary-action pairwise-separable network games that encompass both coordinating and anti-coordinating behaviors. Our model is grounded in an underlying directed signed graph, where each link is associated with a weight that describes the strenght and nature of the interaction. The utility for each agent is an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent 'cohesive' subset of players, who are either connected exclusively by positive weights, or forms a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Given the properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by a consensus or, respectively, a polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the supermodular properties of coordination games and on a novel use of the concept of graph cohesiveness.


[49] 2505.09819

Visual Feedback of Pattern Separability Improves Myoelectric Decoding Performance of Upper Limb Prostheses

State-of-the-art upper limb myoelectric prostheses often use pattern recognition (PR) control systems that translate electromyography (EMG) signals into desired movements. As prosthesis movement complexity increases, users often struggle to produce sufficiently distinct EMG patterns for reliable classification. Existing training typically involves heuristic, trial-and-error user adjustments to static decoder boundaries. Goal: We introduce the Reviewer, a 3D visual interface projecting EMG signals directly into the decoder's classification space, providing intuitive, real-time insight into PR algorithm behavior. This structured feedback reduces cognitive load and fosters mutual, data-driven adaptation between user-generated EMG patterns and decoder boundaries. Methods: A 10-session study with 12 able-bodied participants compared PR performance after motor-based training and updating using the Reviewer versus conventional virtual arm visualization. Performance was assessed using a Fitts law task that involved the aperture of the cursor and the control of orientation. Results: Participants trained with the Reviewer achieved higher completion rates, reduced overshoot, and improved path efficiency and throughput compared to the standard visualization group. Significance: The Reviewer introduces decoder-informed motor training, facilitating immediate and consistent PR-based myoelectric control improvements. By iteratively refining control through real-time feedback, this approach reduces reliance on trial-and-error recalibration, enabling a more adaptive, self-correcting training framework. Conclusion: The 3D visual feedback significantly improves PR control in novice operators through structured training, enabling feedback-driven adaptation and reducing reliance on extensive heuristic adjustments.


[50] 2505.09822

Learning Kronecker-Structured Graphs from Smooth Signals

Graph learning, or network inference, is a prominent problem in graph signal processing (GSP). GSP generalizes the Fourier transform to non-Euclidean domains, and graph learning is pivotal to applying GSP when these domains are unknown. With the recent prevalence of multi-way data, there has been growing interest in product graphs that naturally factorize dependencies across different ways. However, the types of graph products that can be learned are still limited for modeling diverse dependency structures. In this paper, we study the problem of learning a Kronecker-structured product graph from smooth signals. Unlike the more commonly used Cartesian product, the Kronecker product models dependencies in a more intricate, non-separable way, but posits harder constraints on the graph learning problem. To tackle this non-convex problem, we propose an alternating scheme to optimize each factor graph and provide theoretical guarantees for its asymptotic convergence. The proposed algorithm is also modified to learn factor graphs of the strong product. We conduct experiments on synthetic and real-world graphs and demonstrate our approach's efficacy and superior performance compared to existing methods.


[51] 2505.09841

Hamilton's Rule for Enabling Altruism in Multi-Agent Systems

This paper explores the application of Hamilton's rule to altruistic decision-making in multi-agent systems. Inspired by biological altruism, we introduce a framework that evaluates when individual agents should incur costs to benefit their neighbors. By adapting Hamilton's rule, we define agent ``fitness" in terms of task productivity rather than genetic survival. We formalize altruistic decision-making through a graph-based model of multi-agent interactions and propose a solution using collaborative control Lyapunov functions. The approach ensures that altruistic behaviors contribute to the collective goal-reaching efficiency of the system. We illustrate this framework on a multi-agent way-point navigation problem, where we show through simulation how agent importance levels influence altruistic decision-making, leading to improved coordination in navigation tasks.


[52] 2505.09848

Radiogenomic Bipartite Graph Representation Learning for Alzheimer's Disease Detection

Imaging and genomic data offer distinct and rich features, and their integration can unveil new insights into the complex landscape of diseases. In this study, we present a novel approach utilizing radiogenomic data including structural MRI images and gene expression data, for Alzheimer's disease detection. Our framework introduces a novel heterogeneous bipartite graph representation learning featuring two distinct node types: genes and images. The network can effectively classify Alzheimer's disease (AD) into three distinct stages:AD, Mild Cognitive Impairment (MCI), and Cognitive Normal (CN) classes, utilizing a small dataset. Additionally, it identified which genes play a significant role in each of these classification groups. We evaluate the performance of our approach using metrics including classification accuracy, recall, precision, and F1 score. The proposed technique holds potential for extending to radiogenomic-based classification to other diseases.


[53] 2505.09919

Hyper Yoshimura: How a slight tweak on a classical folding pattern unleashes meta-stability for deployable robots

Deployable structures inspired by origami offer lightweight, compact, and reconfigurable solutions for robotic and architectural applications. We present a geometric and mechanical framework for Yoshimura-Ori modules that supports a diverse set of metastable states, including newly identified asymmetric "pop-out" and "hyperfolded" configurations. These states are governed by three parameters -- tilt angle, phase shift, and slant height -- and enable discrete, programmable transformations. Using this model, we develop forward and inverse kinematic strategies to stack modules into deployable booms that approximate complex 3D shapes. We validate our approach through mechanical tests and demonstrate a tendon- and pneumatically-actuated Yoshimura Space Crane capable of object manipulation, solar tracking, and high load-bearing performance. A meter-scale solar charging station further illustrates the design's scalability. These results establish Yoshimura-Ori structures as a promising platform for adaptable, multifunctional deployable systems in both terrestrial and space environments.


[54] 2505.09920

Offline Reinforcement Learning for Microgrid Voltage Regulation

This paper presents a study on using different offline reinforcement learning algorithms for microgrid voltage regulation with solar power penetration. When environment interaction is unviable due to technical or safety reasons, the proposed approach can still obtain an applicable model through offline-style training on a previously collected dataset, lowering the negative impact of lacking online environment interactions. Experiment results on the IEEE 33-bus system demonstrate the feasibility and effectiveness of the proposed approach on different offline datasets, including the one with merely low-quality experience.


[55] 2505.09939

Non-Registration Change Detection: A Novel Change Detection Task and Benchmark Dataset

In this study, we propose a novel remote sensing change detection task, non-registration change detection, to address the increasing number of emergencies such as natural disasters, anthropogenic accidents, and military strikes. First, in light of the limited discourse on the issue of non-registration change detection, we systematically propose eight scenarios that could arise in the real world and potentially contribute to the occurrence of non-registration problems. Second, we develop distinct image transformation schemes tailored to various scenarios to convert the available registration change detection dataset into a non-registration version. Finally, we demonstrate that non-registration change detection can cause catastrophic damage to the state-of-the-art methods. Our code and dataset are available at https://github.com/ShanZard/NRCD.


[56] 2505.09940

Low-Complexity Hybrid Beamforming for Multi-Cell mmWave Massive MIMO: A Primitive Kronecker Decomposition Approach

To circumvent the high path loss of mmWave propagation and reduce the hardware cost of massive multiple-input multiple-output antenna systems, full-dimensional hybrid beamforming is critical in 5G and beyond wireless communications. Concerning an uplink multi-cell system with a large-scale uniform planar antenna array, this paper designs an efficient hybrid beamformer using primitive Kronecker decomposition and dynamic factor allocation, where the analog beamformer applies to null the inter-cell interference and simultaneously enhances the desired signals. In contrast, the digital beamformer mitigates the intra-cell interference using the minimum mean square error (MMSE) criterion. Then, due to the low accuracy of phase shifters inherent in the analog beamformer, a low-complexity hybrid beamformer is developed to slow its adjustment speed. Next, an optimality analysis from a subspace perspective is performed, and a sufficient condition for optimal antenna configuration is established. Finally, simulation results demonstrate that the achievable sum rate of the proposed beamformer approaches that of the optimal pure digital MMSE scheme, yet with much lower computational complexity and hardware cost.


[57] 2505.09978

Low-Complexity Decoding for Low-Rate Block Codes of Short Length Based on Concatenated Coding Structure

To decode a short linear block code, ordered statics decoding (OSD) and/or the $A^*$ decoding are usually considered. Either OSD or the $A^*$ decoding utilizes the magnitudes of the received symbols to establish the most reliable and independent positions (MRIP) frame. A restricted searched space can be employed to achieve near-optimum decoding with reduced decoding complexity. For a low-rate code with large minimum distance, the restricted search space is still very huge. We propose to use concatenated coding to further restrict the search space by proposing an improved MRIP frame. The improved MRIP frame is founded according to magnitudes of log likelihood ratios (LLRs) obtained by the soft-in soft-out (SISO) decoder for the inner code. We focus on the construction and decoding of several $(n,k)$ = (128,36) binary linear block codes based on concatenated coding. We use the (128,36) extended BCH (eBCH) code as a benchmark for comparison. Simulation shows that there exist constructed concatenated codes which are much more efficient than the (128,36) eBCH code. Some other codes of length 128 or close to 128 are also constructed to demonstrate the efficiency of the proposed scheme.


[58] 2505.09986

High Quality Underwater Image Compression with Adaptive Correction and Codebook-based Augmentation

With the increasing exploration and exploitation of the underwater world, underwater images have become a critical medium for human interaction with marine environments, driving extensive research into their efficient transmission and storage. However, contemporary underwater image compression algorithms fail to fully leverage the unique characteristics distinguishing underwater scenes from terrestrial images, resulting in suboptimal performance. To address this limitation, we introduce HQUIC, designed to exploit underwater-image-specific features for enhanced compression efficiency. HQUIC employs an ALTC module to adaptively predict the attenuation coefficients and global light information of the images, which effectively mitigates the issues caused by the differences in lighting and tone existing in underwater images. Subsequently, HQUIC employs a codebook as an auxiliary branch to extract the common objects within underwater images and enhances the performance of the main branch. Furthermore, HQUIC dynamically weights multi-scale frequency components, prioritizing information critical for distortion quality while discarding redundant details. Extensive evaluations on diverse underwater datasets demonstrate that HQUIC outperforms state-of-the-art compression methods.


[59] 2505.10003

AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model

Designing a 6G-oriented universal model capable of processing multi-modal data and executing diverse air interface tasks has emerged as a common goal in future wireless systems. Building on our prior work in communication multi-modal alignment and telecom large language model (LLM), we propose a scalable, task-aware artificial intelligence-air interface multi-modal universal model (AI2MMUM), which flexibility and effectively perform various physical layer tasks according to subtle task instructions. The LLM backbone provides robust contextual comprehension and generalization capabilities, while a fine-tuning approach is adopted to incorporate domain-specific knowledge. To enhance task adaptability, task instructions consist of fixed task keywords and learnable, implicit prefix prompts. Frozen radio modality encoders extract universal representations and adapter layers subsequently bridge radio and language modalities. Moreover, lightweight task-specific heads are designed to directly output task objectives. Comprehensive evaluations demonstrate that AI2MMUM achieves SOTA performance across five representative physical environment/wireless channel-based downstream tasks using the WAIR-D and DeepMIMO datasets.


[60] 2505.10004

Topology-driven identification of repetitions in multi-variate time series

Many multi-variate time series obtained in the natural sciences and engineering possess a repetitive behavior, as for instance state-space trajectories of industrial machines in discrete automation. Recovering the times of recurrence from such a multi-variate time series is of a fundamental importance for many monitoring and control tasks. For a periodic time series this is equivalent to determining its period length. In this work we present a persistent homology framework to estimate recurrence times in multi-variate time series with different generalizations of cyclic behavior (periodic, repetitive, and recurring). To this end, we provide three specialized methods within our framework that are provably stable and validate them using real-world data, including a new benchmark dataset from an injection molding machine.


[61] 2505.10028

Fast Heuristic Scheduling and Trajectory Planning for Robotic Fruit Harvesters with Multiple Cartesian Arms

This work proposes a fast heuristic algorithm for the coupled scheduling and trajectory planning of multiple Cartesian robotic arms harvesting fruits. Our method partitions the workspace, assigns fruit-picking sequences to arms, determines tight and feasible fruit-picking schedules and vehicle travel speed, and generates smooth, collision-free arm trajectories. The fruit-picking throughput achieved by the algorithm was assessed using synthetically generated fruit coordinates and a harvester design featuring up to 12 arms. The throughput increased monotonically as more arms were added. Adding more arms when fruit densities were low resulted in diminishing gains because it took longer to travel from one fruit to another. However, when there were enough fruits, the proposed algorithm achieved a linear speedup as the number of arms increased.


[62] 2505.10101

LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2

This paper introduces LAV (Latent Audio-Visual), a system that integrates EnCodec's neural audio compression with StyleGAN2's generative capabilities to produce visually dynamic outputs driven by pre-recorded audio. Unlike previous works that rely on explicit feature mappings, LAV uses EnCodec embeddings as latent representations, directly transformed into StyleGAN2's style latent space via randomly initialized linear mapping. This approach preserves semantic richness in the transformation, enabling nuanced and semantically coherent audio-visual translations. The framework demonstrates the potential of using pretrained audio compression models for artistic and computational applications.


[63] 2505.10116

Discontinuous integro-differential control systems with sliding modes

The paper deals with analysis and design sliding mode control systems modeled by integro-differential equations. Filippov method and equivalent control approach are extended to a class of nonlinear discontinuous integro-differential equations. Sliding mode control algorithm is designed for a control system with distributed input delay. The obtained results are illustrated by numerical example.


[64] 2505.10122

Energy-Efficient and Reliable Data Collection in Receiver-Initiated Wake-up Radio Enabled IoT Networks

In unmanned aerial vehicle (UAV)-assisted wake-up radio (WuR)-enabled internet of things (IoT) networks, UAVs can instantly activate the main radios (MRs) of the sensor nodes (SNs) with a wake-up call (WuC) for efficient data collection in mission-driven data collection scenarios. However, the spontaneous response of numerous SNs to the UAV's WuC can lead to significant packet loss and collisions, as WuR does not exhibit its superiority for high-traffic loads. To address this challenge, we propose an innovative receiver-initiated WuR UAV-assisted clustering (RI-WuR-UAC) medium access control (MAC) protocol to achieve low latency and high reliability in ultra-low power consumption applications. We model the proposed protocol using the $M/G/1/2$ queuing framework and derive expressions for key performance metrics, i.e., channel busyness probability, probability of successful clustering, average SN energy consumption, and average transmission delay. The RI-WuR-UAC protocol employs three distinct data flow models, tailored to different network traffic conditions, which perform three MAC mechanisms: channel assessment (CCA) clustering for light traffic loads, backoff plus CCA clustering for dense and heavy traffic, and adaptive clustering for variable traffic loads. Simulation results demonstrate that the RI-WuR-UAC protocol significantly outperforms the benchmark sub-carrier modulation clustering protocol. By varying the network load, we capture the trade-offs among the performance metrics, showcasing the superior efficiency and reliability of the RI-WuR-UAC protocol.


[65] 2505.10124

IMITATE: Image Registration with Context for unknown time frame recovery

In this paper, we formulate a novel image registration formalism dedicated to the estimation of unknown condition-related images, based on two or more known images and their associated conditions. We show how to practically model this formalism by using a new conditional U-Net architecture, which fully takes into account the conditional information and does not need any fixed image. Our formalism is then applied to image moving tumors for radiotherapy treatment at different breathing amplitude using 4D-CT (3D+t) scans in thoracoabdominal regions. This driving application is particularly complex as it requires to stitch a collection of sequential 2D slices into several 3D volumes at different organ positions. Movement interpolation with standard methods then generates well known reconstruction artefacts in the assembled volumes due to irregular patient breathing, hysteresis and poor correlation of breathing signal to internal motion. Results obtained on 4D-CT clinical data showcase artefact-free volumes achieved through real-time latencies. The code is publicly available at https://github.com/Kheil-Z/IMITATE .


[66] 2505.10234

Self Clocked Digital LDO for Cryogenic Power Management in 22nm FDSOI with 98 Percent Efficiency

A universal quantum computer~(QC), though promising ground breaking solutions to complex problems, still faces several challenges with respect to scalability. Current state-of-the-art QC use a great quantity of cables to connect the physical qubits, situated in the cryogenic temperature, to room temperature electronics. Integrated cryogenic electronics together with semiconductor spin qubits is one way closer for scalability. Such a scalable quantum computer can have qubits and the control electronics at 4K stage. Being at 4K, more thermal dissipation is allowed without overloading the cooling capability of the fridge. Still, control and power circuitry is expected to be highly efficient. While commercial CMOS technologies are found to be operatable at \qty{}{mK}, lack of reliable cryogenic models while designing, increased mismatches at cryo temperatures makes the design challenging and risky. Using an FDSOI technology with backgate biasing to compensate for the threshold voltage drift happening at cryo~(compensating around 200mV) and digital circuitry is a way to address this challenge. In this work, a self-clocked digital low dropout regulator (DLDO) is designed in FDSOI for high power efficient, variation tolerant regulator to supply cryogenic circuits for Quantum computing. The proposed digital LDO is more resilient to mismatch and having self clocking and close and fine loops addresses the power efficiency and faster transient response.


[67] 2505.10314

Country wide Shared FibreBased Infrastructure for Dissemination of Precise Time, Coherent Optical Frequency with Vibration Sensing

With the increasing demand for ultra-precise time synchronization and frequency dissemination across various scientific, industrial, and communication fields, the Czech Republic has developed an innovative, non-commercial fiber-based infrastructure. This infrastructure serves as a shared platform, utilizing optical fibers to enable high-precision timing, coherent frequency transfer, and a newly implemented vibrational sensing capability. The project also addresses challenges posed by classical communication noise-particularly from Raman scattering-on quantum channels, especially for Quantum Key Distribution (QKD). By strategically separating classical and quantum channels into distinct wavelength bands, such as the C-band and O-band, the infrastructure achieves minimal interference while enabling multiple concurrent applications over shared fiber lines.


[68] 2505.10328

A Comparative Study of SMT and MILP for the Nurse Rostering Problem

The effects of personnel scheduling on the quality of care and working conditions for healthcare personnel have been thoroughly documented. However, the ever-present demand and large variation of constraints make healthcare scheduling particularly challenging. This problem has been studied for decades, with limited research aimed at applying Satisfiability Modulo Theories (SMT). SMT has gained momentum within the formal verification community in the last decades, leading to the advancement of SMT solvers that have been shown to outperform standard mathematical programming techniques. In this work, we propose generic constraint formulations that can model a wide range of real-world scheduling constraints. Then, the generic constraints are formulated as SMT and MILP problems and used to compare the respective state-of-the-art solvers, Z3 and Gurobi, on academic and real-world inspired rostering problems. Experimental results show how each solver excels for certain types of problems; the MILP solver generally performs better when the problem is highly constrained or infeasible, while the SMT solver performs better otherwise. On real-world inspired problems containing a more varied set of shifts and personnel, the SMT solver excels. Additionally, it was noted during experimentation that the SMT solver was more sensitive to the way the generic constraints were formulated, requiring careful consideration and experimentation to achieve better performance. We conclude that SMT-based methods present a promising avenue for future research within the domain of personnel scheduling.


[69] 2505.10348

ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

Auditory attention detection (AAD) aims to identify the direction of the attended speaker in multi-speaker environments from brain signals, such as Electroencephalography (EEG) signals. However, existing EEG-based AAD methods overlook the spatio-temporal dependencies of EEG signals, limiting their decoding and generalization abilities. To address these issues, this paper proposes a Lightweight Spatio-Temporal Enhancement Nested Network (ListenNet) for AAD. The ListenNet has three key components: Spatio-temporal Dependency Encoder (STDE), Multi-scale Temporal Enhancement (MSTE), and Cross-Nested Attention (CNA). The STDE reconstructs dependencies between consecutive time windows across channels, improving the robustness of dynamic pattern extraction. The MSTE captures temporal features at multiple scales to represent both fine-grained and long-range temporal patterns. In addition, the CNA integrates hierarchical features more effectively through novel dynamic attention mechanisms to capture deep spatio-temporal correlations. Experimental results on three public datasets demonstrate the superiority of ListenNet over state-of-the-art methods in both subject-dependent and challenging subject-independent settings, while reducing the trainable parameter count by approximately 7 times. Code is available at:https://github.com/fchest/ListenNet.


[70] 2505.10355

pc-dbCBS: Kinodynamic Motion Planning of Physically-Coupled Robot Teams

Motion planning problems for physically-coupled multi-robot systems in cluttered environments are challenging due to their high dimensionality. Existing methods combining sampling-based planners with trajectory optimization produce suboptimal results and lack theoretical guarantees. We propose Physically-coupled discontinuity-bounded Conflict-Based Search (pc-dbCBS), an anytime kinodynamic motion planner, that extends discontinuity-bounded CBS to rigidly-coupled systems. Our approach proposes a tri-level conflict detection and resolution framework that includes the physical coupling between the robots. Moreover, pc-dbCBS alternates iteratively between state space representations, thereby preserving probabilistic completeness and asymptotic optimality while relying only on single-robot motion primitives. Across 25 simulated and six real-world problems involving multirotors carrying a cable-suspended payload and differential-drive robots linked by rigid rods, pc-dbCBS solves up to 92% more instances than a state-of-the-art baseline and plans trajectories that are 50-60% faster while reducing planning time by an order of magnitude.


[71] 2505.10398

AutoCam: Hierarchical Path Planning for an Autonomous Auxiliary Camera in Surgical Robotics

Incorporating an autonomous auxiliary camera into robot-assisted minimally invasive surgery (RAMIS) enhances spatial awareness and eliminates manual viewpoint control. Existing path planning methods for auxiliary cameras track two-dimensional surgical features but do not simultaneously account for camera orientation, workspace constraints, and robot joint limits. This study presents AutoCam: an automatic auxiliary camera placement method to improve visualization in RAMIS. Implemented on the da Vinci Research Kit, the system uses a priority-based, workspace-constrained control algorithm that combines heuristic geometric placement with nonlinear optimization to ensure robust camera tracking. A user study (N=6) demonstrated that the system maintained 99.84% visibility of a salient feature and achieved a pose error of 4.36 $\pm$ 2.11 degrees and 1.95 $\pm$ 5.66 mm. The controller was computationally efficient, with a loop time of 6.8 $\pm$ 12.8 ms. An additional pilot study (N=6), where novices completed a Fundamentals of Laparoscopic Surgery training task, suggests that users can teleoperate just as effectively from AutoCam's viewpoint as from the endoscope's while still benefiting from AutoCam's improved visual coverage of the scene. These results indicate that an auxiliary camera can be autonomously controlled using the da Vinci patient-side manipulators to track a salient feature, laying the groundwork for new multi-camera visualization methods in RAMIS.


[72] 2505.10438

Identification and Optimal Nonlinear Control of Turbojet Engine Using Koopman Eigenfunction Model

Gas turbine engines represent complex highly nonlinear dynamical systems. Deriving their physics-based models can be challenging as it requires performance characteristics, that are not always available, and one often has to make many simplifying assumptions. In this paper, the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models are discussed and addressed by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics were estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics was mapped into an optimally constructed Koopman eigenfunction space. The process included eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model was validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator were then designed in the eigenfunction space and compared to the classical and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure allowed targeting individual modes during the optimization process, resulting in a better performance tuning. The results showed that the Koopman-based controller outperformed the other benchmark controllers in both reference tracking and disturbance rejection, under sea-level and varying flight conditions, due to its global nature.


[73] 2505.10511

Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations

Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are possible in order to handle geometric nonlinearities. One such case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data. In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution for linear vibration of system's modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the network architecture. As an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.


[74] 2505.10561

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

Text-to-audio (T2A) generation has achieved remarkable progress in generating a variety of audio outputs from language prompts. However, current state-of-the-art T2A models still struggle to satisfy human preferences for prompt-following and acoustic quality when generating complex multi-event audio. To improve the performance of the model in these high-level applications, we propose to enhance the basic capabilities of the model with AI feedback learning. First, we introduce fine-grained AI audio scoring pipelines to: 1) verify whether each event in the text prompt is present in the audio (Event Occurrence Score), 2) detect deviations in event sequences from the language description (Event Sequence Score), and 3) assess the overall acoustic and harmonic quality of the generated audio (Acoustic&Harmonic Quality). We evaluate these three automatic scoring pipelines and find that they correlate significantly better with human preferences than other evaluation metrics. This highlights their value as both feedback signals and evaluation metrics. Utilizing our robust scoring pipelines, we construct a large audio preference dataset, T2A-FeedBack, which contains 41k prompts and 249k audios, each accompanied by detailed scores. Moreover, we introduce T2A-EpicBench, a benchmark that focuses on long captions, multi-events, and story-telling scenarios, aiming to evaluate the advanced capabilities of T2A models. Finally, we demonstrate how T2A-FeedBack can enhance current state-of-the-art audio model. With simple preference tuning, the audio generation model exhibits significant improvements in both simple (AudioCaps test set) and complex (T2A-EpicBench) scenarios.